Frequent Itemset Mining - Lab Solutions.

Set A Q. 1:

Create the following data set in python.

1. bread, milk.

2. bread , diaper, beer , eggs.

3. milk, diaper, beer, coke.

4. bread, milk, diaper, beer.

5. bread , milk, diaper, coke.

Convert the categorical values into numeric format.  Apply the apriori algorithm on the above dataset to generate the frequent itemsets and association rules.  Repeat the process with different min_sup values.

Python Implementation.

Step 1: Install Required Libraries

If you don’t have the mlxtend library installed, you can install it using pip:

 
pip install mlxtend

Step 2: Create the Dataset and Convert to Numeric Format

import pandas as pd
from mlxtend.preprocessing import TransactionEncoder

# Create the dataset
dataset = [
    ['bread', 'milk'],
    ['bread', 'diaper', 'beer', 'eggs'],
    ['milk', 'diaper', 'beer', 'coke'],
    ['bread', 'milk', 'diaper', 'beer'],
    ['bread', 'milk', 'diaper', 'coke']
]

# Convert categorical values into numeric format using TransactionEncoder
te = TransactionEncoder()
te_ary = te.fit(dataset).transform(dataset)
df = pd.DataFrame(te_ary, columns=te.columns_)

print("Encoded Dataset:")
print(df)

Step 3: Apply the Apriori Algorithm

from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules

# Generate frequent itemsets with a minimum support threshold
min_sup_values = [0.2, 0.4, 0.6]  # Different min_sup values to test

for min_sup in min_sup_values:
    print(f"\nFrequent Itemsets with min_sup = {min_sup}:")
    frequent_itemsets = apriori(df, min_support=min_sup, use_colnames=True)
    print(frequent_itemsets)

    # Generate association rules
    print(f"\nAssociation Rules with min_sup = {min_sup}:")
    rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.7)
    print(rules)
Set A 2: 

Create your own transactions dataset and apply the above process on your dataset.

Example Transactions
  1. ['apple', 'banana', 'orange']: A transaction containing apple, banana, and orange.

  2. ['apple', 'banana', 'mango']: A transaction containing apple, banana, and mango.

  3. ['banana', 'orange', 'mango']: A transaction containing banana, orange, and mango.

  4. ['apple', 'orange', 'mango']: A transaction containing apple, orange, and mango.

  5. ['apple', 'banana', 'orange', 'mango']: A transaction containing all four items.

  6. ['banana', 'orange']: A transaction containing banana and orange.

  7. ['apple', 'mango']: A transaction containing apple and mango.

  8. ['apple', 'banana']: A transaction containing apple and banana.

Step 1: Define Your Custom Dataset

# Custom transaction dataset
my_dataset = [
    ['apple', 'banana', 'orange'],
    ['apple', 'banana', 'mango'],
    ['banana', 'orange', 'mango'],
    ['apple', 'orange', 'mango'],
    ['apple', 'banana', 'orange', 'mango'],
    ['banana', 'orange'],
    ['apple', 'mango'],
    ['apple', 'banana']
]

Step 2: Encode the Dataset

import pandas as pd
from mlxtend.preprocessing import TransactionEncoder

# Convert categorical values into numeric format
te = TransactionEncoder()
te_ary = te.fit(my_dataset).transform(my_dataset)
df = pd.DataFrame(te_ary, columns=te.columns_)

print("Encoded Dataset:")
print(df)

Step 3: Apply the Apriori Algorithm

from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules

# Define different min_sup values to test
min_sup_values = [0.3, 0.5, 0.7]

for min_sup in min_sup_values:
    print(f"\nFrequent Itemsets with min_sup = {min_sup}:")
    frequent_itemsets = apriori(df, min_support=min_sup, use_colnames=True)
    print(frequent_itemsets)

    # Generate association rules with a confidence threshold of 0.7
    print(f"\nAssociation Rules with min_sup = {min_sup}:")
    rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.7)
    print(rules)
 Set B : 1/2,

Download the Market basket dataset. 

Write a python program to read the dataset and display its information. Preprocess the data (drop null values etc.) Convert the categorical values into numeric format. Apply the apriori algorithm on the above dataset to generate the frequent itemsets and association rules.

Step 1: Install Required Libraries

  1. Open a terminal or command prompt.

  2. Install the necessary Python libraries using pip:

     
    pip install pandas mlxtend

Step 2: Prepare the Dataset

Ensure you have the Groceries_data.csv file.

    1. Place the file in the same directory as your Python script or notebook.

    2. Verify the dataset has the following columns:

      • Member_number

      • Date

      • itemDescription

Step 3: Write the Python Program

import pandas as pd

from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules

# Step 1: Read the dataset
file_path = "Groceries_data.csv"  # Replace with the correct path to your file
df = pd.read_csv(file_path)

# Step 2: Preprocess the data
# Group by 'Member_number' and 'Date' to create transactions
transactions = df.groupby(['Member_number', 'Date'])['itemDescription'].apply(list).reset_index()

# Extract the list of transactions
transaction_list = transactions['itemDescription'].tolist()

# Encode the transactions into a numeric format
te = TransactionEncoder()
te_ary = te.fit(transaction_list).transform(transaction_list)
encoded_df = pd.DataFrame(te_ary, columns=te.columns_)

# Step 3: Generate frequent itemsets
min_sup = 0.02  # Minimum support threshold (adjust as needed)
frequent_itemsets = apriori(encoded_df, min_support=min_sup, use_colnames=True)

print("Frequent Itemsets:")
print(frequent_itemsets)

# Step 4: Generate association rules
min_confidence = 0.5  # Minimum confidence threshold (adjust as needed)
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=min_confidence)

print("\nAssociation Rules:")
print(rules)