Like Apriori, FP-Growth (Frequent Pattern Growth) algorithm helps us to do Market Basket Analysis on transaction data. FP-Growth is preferred to Apriori for the reason that Apriori takes more execution time for repeated scanning of the transaction dataset to mine the frequent items.
FP-Growth builds a compact- tree structure and uses the tree for frequent itemset mining and generating rules. Given below is the python- implementation of FP-Growth. Jupyter notebook is used for the work.
pyfpgrowth is a package in python. To install pyfpgrowth, go to your command prompt and type as given below,
pip install pyfpgrowth
import pandas as pd
import numpy as np
Import pandas and numpy for data cleaning and preprocessing purposes.
Read your transaction dataset,
df= pd.read_csv(“ transaction_data.csv”)
A data frame df is created and the transaction data is stored in this data frame.
Do the necessary data cleaning and preprocessing. Before feeding the transaction data into the algorithm, bring it to the following format,
Step 4: Generating the frequent patterns and the rules
patterns = pyfpgrowth. find_frequent_patterns(transactions, 10)
# Patterns are generated based on the parameters passed in the
find_frequent_patterns() , where “transactions” are the list of items bought at each transaction(refer to the ITEMS column of the table) and 10 is the minimum threshold set for support count.
rules = pyfpgrowth. generate_association_rules(patterns,0.8)
# Rules are generated based on the patterns and 0.8 is the minimum threshold set for confidence. Then, we store the rules in a data frame named as rules_df. The rules_df initially consists of Antecedent, Consequent and the Confidence value. Subsequently, we calculate the lift and conviction values and add it to the data frame.
CALCULATING LIFT AND CONVICTION USING PYTHON:
No readily available libraries/functions to derive lift and conviction is found.
Code for the same is below.
LIFT(A → B)=CONFIDENCE(A→B)/SUPPORT(B)
Lift is a useful measure to determine the strength of the association between the items in a rule.
First, we calculate the support of the Consequent or R.H.S of the rules.
Calculating the support of the Consequent
Then, we divide the confidence values by the R.H.S support values.
Conviction: It determines which part of the association rule has the upper hand; whether the L.H.S drives the R.H.S or vice versa.
Conviction(A →B) = (1- support(B)) / (1- Confidence(A →B))
Author: Pushkhalla Chandramoulli