Clustering: Affinity Propagation
Affinity Propagation Introduction
Developed recently (2007), a centroid based clustering algorithm similar to k Means or K medoids
Affinity propagation finds "exemplars" i.e. members of the input set that are representative of clusters.
It uses a graph based approach to let points 'vote' on their preferred 'exemplar'. The end result is a set of cluster 'exemplars' from which we derive clusters by essentially doing what K-Means does and assigning each point to the cluster of it's nearest exemplar.
We need to calculate the following matrices:
Here we must specify to notation:= row, = column - Similarity matrix
- Responsibility matrix
- Availability matrix
- Criterion matrix
Similarity matrix
- Rationale: information about the similarity between any instances, for an element i we look for another element j for which
is the highest (least negative). Hence the diagonal values are all set to the most negative to exclude the case where i find i itself - Barring those on the diagonal, every cell in the similarity matrix is calculated by the negative sum of the squares differences between participants.
- Note that the diagonal values will not be just 0: It is
- Rationale: information about the similarity between any instances, for an element i we look for another element j for which
Responsibility matrix
Rationale:
quantifies how well-suited element k is, to be an exemplar for element i , taking into account the nearest contender k’ to be an exemplar for i. We initialize R matrix with zeros.
Then calculate every cell in the responsibility matrix using the following formula:
- Interpretation: R_{i,k} can be thought of as relative similarity between i and k. It quantifies how similar is i to k, compared to some k’, taking into account the availability of k’. The responsibility of k towards i will decrease as the availability of some other k’ to i increases.
Availability matrix
- Rationale: It quantifies how appropriate is it for i to choose k as its exemplar, taking into account the support from other elements that k should an exemplar.
- The Availability formula for different instances is
- The Self-Availability is
- Interpretation of the formulas
- Availability is self-responsibility of k plus the positive responsibilities of k towards elements other than i.
- We include only positive responsibilities as an exemplar should be positively responsible/explain at least for some data points well, regardless of how poorly it explains other data points.
- If self-responsibility is negative, it means that k is more suitable to belong to another exemplar, rather than being an exemplar. The maximum value of
is 0. reflects accumulated evidence that point k is suitable to be an exemplar, based on the positive responsibilities of k towards other elements.
and matrices are iteratively updated. This procedure may be terminated after a fixed number of iterations, after changes in the values obtained fall below a threshold, or after the values stay constant for some number of iterations. Criterion Matrix
- Criterion matrix is calculated after the updating is terminated.
- Criterion matrix
is the sum of and : - An element i will be assigned to an exemplar k which is not only highly responsible but also highly available to i.
- The highest criterion value of each row is designated as the exemplar. Rows that share the same exemplar are in the same cluster.
Sample run
2. Pros & Cons
Pros
- Does not need to specify the cluster number
- Allows for non-metric dissimilarities (i.e. we can have dissimilarities that don't obey the triangle inequality, or aren't symmetric)
- Providebetter stability over runs
Cons
- Similar issue as K-means: susceptible to outliers
- Affinity Propagation tends to be very slow. In practice running it on large datasets is essentially impossible without a carefully crafted and optimized implementation
3. Code Implementation
1 | from sklearn.cluster import AffinityPropagation |
Clustering: Affinity Propagation
https://criss-wang.github.io/post/blogs/unsupervised/clustering-4/