Frequent Itemset and Association Rule Mining Lab: A Practical Guide

Objective:

To understand and implement frequent itemset and association rule mining techniques.

Theory:

Frequent pattern mining is a fundamental data mining task that deals with the search of recurring regularities in large datasets . Association rule mining is a core data mining task . Association rules are formed by analyzing data for frequent if/then patterns and using the criteria support and confidence to recognize the most important relationships. Support indicates how frequently the items appear in the database . Confidence indicates the number of times the if/then statements have been found to be true

Dataset

For this lab, we’ll use a sample dataset representing transactions in a retail store. Each transaction includes a unique identifier and a set of items bought by a customer

TID	Items Purchased
T100	Milk, Bread, Eggs
T200	Milk, Bread
T300	Milk, Cheese
T400	Bread, Cheese
T500	Milk, Bread, Cheese

Pre-Lab Questions:

What is a frequent itemset?

A frequent itemset is a set of items, attributes, or events that occur frequently together in a dataset. “Frequent” means that the itemset appears in a significant number of transactions, exceeding a predefined minimum support threshold. For example, in market basket analysis, {milk, bread} being purchased together frequently would be a frequent.

Explain the concepts of support and confidence in association rule mining.

Support:

Support measures the proportion of transactions that contain both the antecedent and consequent items. It indicates how frequently the itemset appears in the. A high support value suggests that the itemset is common and potentially important.

For a rule X → Y, the support is calculated as:

Support(X → Y) = (Number of transactions containing X and Y) / (Total number of transactions)

Confidence:

Confidence measures the reliability of the association rule. It is the likelihood that a customer who buys the antecedent item will also buy the consequent item. A high confidence value indicates a strong association between the items.

For a rule X → Y, the confidence is calculated as:

Confidence(X → Y) = (Number of transactions containing X and Y) / (Number of transactions containing X)

Describe the Apriori algorithm and its purpose.

The Apriori algorithm is a classic algorithm used for frequent itemset mining and association rule learnin]. Its purpose is to identify frequent itemsets in a transaction database, which can then be used to generate association rule]. The Apriori algorithm leverages the Apriori property: all subsets of a frequent itemset must also be frequen].

Key Steps:
1. The algorithm starts by scanning the database to count the occurrences of each item, identifying frequent items (1-itemsets) that meet the minimum support threshold.
2. It iteratively generates candidate itemsets of length k+1 from the frequent itemsets of length k.
3. The algorithm prunes candidate itemsets that do not meet the minimum support threshold .
4. The process continues until no new frequent itemsets are found.

By identifying these frequent patterns and associations, businesses can make informed decisions about product placement, cross-selling strategies, and recommendation systems.

Procedure:

Data Preparation:
- Represent the dataset in a suitable format for analysis (e.g., a list of transactions).
Frequent Itemset Generation:
- Use the Apriori algorithm to generate frequent itemsets.
- Set a minimum support threshold (e.g., 30%).
- Identify itemsets that meet this threshold.
Association Rule Mining:
- Generate association rules from the frequent itemsets.
- Set a minimum confidence threshold (e.g., 60%).
- Evaluate the rules based on support and confidence.
Analysis and Interpretation:
- Interpret the generated association rules.
- Discuss the implications of these rules for business decisions.

Implementation:

from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules
import pandas as pd

# Sample Dataset
dataset = [
[‘Milk’, ‘Bread’, ‘Eggs’],
[‘Milk’, ‘Bread’],
[‘Milk’, ‘Cheese’],
[‘Bread’, ‘Cheese’],
[‘Milk’, ‘Bread’, ‘Cheese’]
]

# Data Preparation
te = TransactionEncoder()
te_ary = te.fit(dataset).transform(dataset)
df = pd.DataFrame(te_ary.astype(int), columns=te.columns_) # Convert to integers (0,1)

# Print the transformed DataFrame
print(“Transaction DataFrame:\n”, df)

# Frequent Itemset Generation
frequent_itemsets = apriori(df, min_support=0.3, use_colnames=True)

# Check if frequent itemsets exist
if frequent_itemsets.empty:
print(“\nNo frequent itemsets found with the given min_support.”)
else:
print(“\nFrequent Itemsets:\n”, frequent_itemsets)

# Association Rule Mining
rules = association_rules(frequent_itemsets, metric=”confidence”, min_threshold=0.6)

if rules.empty:
print(“\nNo association rules found with the given min_threshold.”)
else:
print(“\nAssociation Rules:\n”, rules)