Implementing FP- Growth in Python

Like Apriori, FP-Growth (Frequent Pattern Growth) algorithm helps us to do Market Basket Analysis on transaction data. FP-Growth is preferred to Apriori for the reason that Apriori takes more execution time for repeated scanning of the transaction dataset to mine the frequent items.

FP-Growth builds a compact- tree structure and uses the tree for frequent itemset mining and generating rules. Given below is the python- implementation of FP-Growth. Jupyter notebook is used for the work.

pyfpgrowth is a package in python.  To install pyfpgrowth, go to your command prompt and type as given below,

pip install pyfpgrowth

Step 1:

import pandas as pd
import numpy as np

import pyfpgrowth

Import pandas and numpy for data cleaning and preprocessing purposes.

Step 2:
Read your transaction dataset,

df= pd.read_csv(“ transaction_data.csv”)

A data frame df is created and the transaction data is stored in this data frame.

Step 3:
Do the necessary data cleaning and preprocessing. Before feeding the transaction data into the algorithm, bring it to the following format,

Step 4: Generating the frequent patterns and the rules

patterns = pyfpgrowth. find_frequent_patterns(transactions, 10)

# Patterns are generated based on the parameters passed in the

find_frequent_patterns() , where “transactions” are the list of items bought at each transaction(refer to the ITEMS column of the table) and 10 is the minimum threshold set for support count.

rules = pyfpgrowth. generate_association_rules(patterns,0.8)

# Rules are generated based on the patterns and 0.8 is the minimum threshold set for confidence. Then, we store the rules in a data frame named as rules_df. The rules_df initially consists of Antecedent, Consequent and the Confidence value. Subsequently, we calculate the lift and conviction values and add it to the data frame.


No readily available libraries/functions to derive lift and conviction is found.

Code for the same is below.


Lift is a useful measure to determine the strength of the association between the items in a rule.

First, we calculate the support of the Consequent or R.H.S of the rules.

Calculating the support of the Consequent

Then, we divide the confidence values by the R.H.S support values.

Conviction: It determines which part of the association rule has the upper hand; whether the L.H.S drives the R.H.S or vice versa.

Conviction(A →B) = (1- support(B)) / (1- Confidence(A →B))


Author: Pushkhalla Chandramoulli


Leave a Comment