Introduction
Apriori is not a clustering algorithm in the usual sense. It is an association rule mining algorithm. It finds itemsets that frequently appear together and derives rules from them.
Classic use case: market-basket analysis.
Customers who buy bread and peanut butter also often buy jam.
Frequent Itemsets
An itemset is a set of items, such as:
{bread, peanut butter}
Support measures how often an itemset appears:
$$ support(A) = \frac{\text{transactions containing } A}{\text{all transactions}} $$
Apriori finds itemsets whose support exceeds a minimum threshold.
Apriori Principle
The key property:
If an itemset is frequent, all of its subsets must also be frequent.
This lets the algorithm prune the search space. If {bread, jam} is not frequent, then {bread, jam, milk} cannot be frequent.
Association Rules
A rule has the form:
A -> B
Confidence measures how often B appears when A appears:
$$ confidence(A \to B) = \frac{support(A \cup B)}{support(A)} $$
Lift compares the rule to chance:
$$ lift(A \to B) = \frac{confidence(A \to B)}{support(B)} $$
Lift greater than 1 suggests A and B appear together more often than expected if independent.
Practical Use
Apriori is useful for:
- Basket analysis.
- Product bundling.
- Recommendation rules.
- Event co-occurrence.
- Pattern discovery in transaction data.
Watch out:
- Too low support creates too many rules.
- High confidence can be misleading for very common items.
- Rules show association, not causation.
- Business usefulness matters more than rule count.
Closing
Apriori is a clear method for discovering frequent co-occurrence patterns. It is best used as exploratory analysis or as a simple rule-based recommendation ingredient.