Blogs · Machine Learning · Clustering · Unsupervised Learning

Clustering: Apriori

Association Rule realized via inference

2020.02.11 · 3 min read · by Zhenlin Wang · updated 2021-09-20

Association Rule

Association rule learning is a rule-based machine learning method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in databases using some measures of interestingness. For example, we may want to find 1-1 product category assocaition rule: product cateogry 1 -> product category 2

This is often used for discovering regularities between products in large-scale transaction data recorded by point-of-sale (POS) systems in supermarkets. Because we don’t have initial associations in our data, it is an unsupervised learning problem for marketing activities such as, e.g., promotional pricing or product placements. In contrast with sequence mining, association rule learning typically does not consider the order of items either within a transaction or across transactions. [Wikipedia]

Evaluation Metrics1

  1. Support
  1. Confidence
  1. Lift
  1. Conviction
  1. Leverage

Apriori Property

All subsets of a frequent itemset must be frequent (Apriori propertry). If an itemset is infrequent, all its supersets will be infrequent.

Applying the apriori property, we get the following algorithm.

Algorithm

  1. Generating Support Value for Itemsets containing one items (One Itemset)
  2. With a pre-defined support threshold, identify itemsets worth exploring
  3. With the shortlisted One Itemset that are above the support threshold, generate Itemsets containing two items (Two Itemsets)
  4. With the same pre-definited support threshold, identify associations in Two Itemsets that are worth exploring
  5. With the shortlisted Two Itemsets, association rule is generated between the two items
  6. Confidence value is generated for each association rule
  7. With a pre-defined confidence threshold, association rules are being shortlisted
  8. With shortlisted association rules, the lift values are computed for each of them
  9. Only association rules with lift value > 1 is considered as meaningful associations