Frequent Itemset Mining - Lab Solutions.
Set A Q. 1:
Create the following data set in python.
1. bread, milk.
2. bread , diaper, beer , eggs.
3. milk, diaper, beer, coke.
4. bread, milk, diaper, beer.
5. bread , milk, diaper, coke.
Convert the categorical values into numeric format. Apply the apriori algorithm on the above dataset to generate the frequent itemsets and association rules. Repeat the process with different min_sup values.
Python Implementation.
Step 1: Install Required Libraries
If you don’t have the mlxtend
library installed, you can install it using pip:
Step 2: Create the Dataset and Convert to Numeric Format
from mlxtend.preprocessing import TransactionEncoder # Create the dataset dataset = [ ['bread', 'milk'], ['bread', 'diaper', 'beer', 'eggs'], ['milk', 'diaper', 'beer', 'coke'], ['bread', 'milk', 'diaper', 'beer'], ['bread', 'milk', 'diaper', 'coke'] ] # Convert categorical values into numeric format using TransactionEncoder te = TransactionEncoder() te_ary = te.fit(dataset).transform(dataset) df = pd.DataFrame(te_ary, columns=te.columns_) print("Encoded Dataset:") print(df)
Step 3: Apply the Apriori Algorithm
from mlxtend.frequent_patterns import association_rules # Generate frequent itemsets with a minimum support threshold min_sup_values = [0.2, 0.4, 0.6] # Different min_sup values to test for min_sup in min_sup_values: print(f"\nFrequent Itemsets with min_sup = {min_sup}:") frequent_itemsets = apriori(df, min_support=min_sup, use_colnames=True) print(frequent_itemsets) # Generate association rules print(f"\nAssociation Rules with min_sup = {min_sup}:") rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.7) print(rules)
Set A 2:
Create your own transactions dataset and apply the above process on your dataset.
Example Transactions
['apple', 'banana', 'orange']
: A transaction containing apple, banana, and orange.['apple', 'banana', 'mango']
: A transaction containing apple, banana, and mango.['banana', 'orange', 'mango']
: A transaction containing banana, orange, and mango.['apple', 'orange', 'mango']
: A transaction containing apple, orange, and mango.['apple', 'banana', 'orange', 'mango']
: A transaction containing all four items.['banana', 'orange']
: A transaction containing banana and orange.['apple', 'mango']
: A transaction containing apple and mango.['apple', 'banana']
: A transaction containing apple and banana.
Step 1: Define Your Custom Dataset
my_dataset = [ ['apple', 'banana', 'orange'], ['apple', 'banana', 'mango'], ['banana', 'orange', 'mango'], ['apple', 'orange', 'mango'], ['apple', 'banana', 'orange', 'mango'], ['banana', 'orange'], ['apple', 'mango'], ['apple', 'banana'] ]
Step 2: Encode the Dataset
from mlxtend.preprocessing import TransactionEncoder # Convert categorical values into numeric format te = TransactionEncoder() te_ary = te.fit(my_dataset).transform(my_dataset) df = pd.DataFrame(te_ary, columns=te.columns_) print("Encoded Dataset:") print(df)
Step 3: Apply the Apriori Algorithm
from mlxtend.frequent_patterns import association_rules # Define different min_sup values to test min_sup_values = [0.3, 0.5, 0.7] for min_sup in min_sup_values: print(f"\nFrequent Itemsets with min_sup = {min_sup}:") frequent_itemsets = apriori(df, min_support=min_sup, use_colnames=True) print(frequent_itemsets) # Generate association rules with a confidence threshold of 0.7 print(f"\nAssociation Rules with min_sup = {min_sup}:") rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.7) print(rules)
Set B : 1/2,
Download the Market basket dataset.
Write a python program to read the dataset and display its information. Preprocess the data (drop null values etc.) Convert the categorical values into numeric format. Apply the apriori algorithm on the above dataset to generate the frequent itemsets and association rules.
Step 1: Install Required Libraries
Open a terminal or command prompt.
Install the necessary Python libraries using pip:
pip install pandas mlxtend
Step 2: Prepare the Dataset
Ensure you have the Groceries_data.csv
file.
Place the file in the same directory as your Python script or notebook.
Verify the dataset has the following columns:
Member_number
Date
itemDescription
Step 3: Write the Python Program
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder from mlxtend.frequent_patterns import apriori from mlxtend.frequent_patterns import association_rules # Step 1: Read the dataset file_path = "Groceries_data.csv" # Replace with the correct path to your file df = pd.read_csv(file_path) # Step 2: Preprocess the data # Group by 'Member_number' and 'Date' to create transactions transactions = df.groupby(['Member_number', 'Date'])['itemDescription'].apply(list).reset_index() # Extract the list of transactions transaction_list = transactions['itemDescription'].tolist() # Encode the transactions into a numeric format te = TransactionEncoder() te_ary = te.fit(transaction_list).transform(transaction_list) encoded_df = pd.DataFrame(te_ary, columns=te.columns_) # Step 3: Generate frequent itemsets min_sup = 0.02 # Minimum support threshold (adjust as needed) frequent_itemsets = apriori(encoded_df, min_support=min_sup, use_colnames=True) print("Frequent Itemsets:") print(frequent_itemsets) # Step 4: Generate association rules min_confidence = 0.5 # Minimum confidence threshold (adjust as needed) rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=min_confidence) print("\nAssociation Rules:") print(rules)