E - the type of objects being clusteredpublic class CompleteLinkClusterer<E> extends AbstractHierarchicalClusterer<E>
CompleteLinkClusterer implements complete link
agglomerative clustering. Complete link clustering is a greedy
algorithm in which the two closest clusters are always merged up to
a specified distance threshold. Distance between clusters for
complete link clustering is defined be the maximum of the distances
between the members of the clusters. See SingleLinkClusterer for a clusterer that takes the minimum rather
than the maximum in making clustering decisions.
For example, consider the following distance matrix (which is
also analyzed by way of example in SingleLinkClusterer):
The result of complete-link clustering is the following dendrogram:
A B C D E A 0 1 2 7 5 B 0 3 8 6 C 0 5 9 D 0 4 E 0
A B C D E
| | | | |
1 ----- | | |
2 | | | |
3 ------- | |
4 | -----
5 | |
6 | |
7 | |
8 | |
9 ----------
First, the objects A and B are merged at
distance one. At distance 3, the object C is merged
with the cluster {A,B} because
max(distance(C,A),distance(C,B))=3, and that is
smaller than any other pair of distances. Next, D
and E are merged at distance 4, because that is the
distance between them, and they are both further than 4 away from
the cluster {A,B,C}. This continues with one more
step, which happens at distance 9, the distance between C
and E, which is the maximum distance between pairs drawn from
{A,B,C} and {D,E}.
The various clusters at each distance bound threshold are:
The intervals show the clusters returned for thresholds within the specified interval. As usual, square brackets denote inclusive range bounds and round brackets exclusive bounds. Although this example has the same single-link clustering, see the class documentation in
Threshold Range Clusters [Double.NEGATIVE_INFINITY,1) {A}, {B}, {C}, {D}, {E} [1,3) {A,B}, {C}, {D}, {E} [3,4) {A,B,C}, {D}, {E} [4,9) {A,B,C}, {D,E} [9,Double.POSITIVE_INFINITY] {A,B,C,D,E}
SingleLinkClusterer for an example that does not.
Note that results may not be well-defined in the case of ties between proximities. If there are ties, there is no guarantee as to how this implementation will break the tie.
Implementation Note: This algorithm requires worst-case
O(n3) running time to cluster
n elements. The initialization loop considers all pairs
of elements, thus requiring n2
running time. The main loop then walks over these pairs, requiring
up to n steps each to compute new distances. If the
initial array of pairs is heavily pruned, the outer loop is smaller
and the updates are actually smaller, too.
| Constructor and Description |
|---|
CompleteLinkClusterer(Distance<? super E> distance)
Construct a complete link clusterer with no distance bound.
|
CompleteLinkClusterer(double maxDistance,
Distance<? super E> distance)
Construct a complete link clusterer with the specified distance
bound.
|
| Modifier and Type | Method and Description |
|---|---|
Dendrogram<E> |
hierarchicalCluster(Set<? extends E> elementSet)
Returns the array of clusters derived from performing
clustering with this class's specified maximum distance.
|
cluster, distance, getMaxDistance, setMaxDistancepublic CompleteLinkClusterer(double maxDistance,
Distance<? super E> distance)
maxDistance - Maximum distance between pairs of clusters
to allow clustering.distance - Distance measure to use among instances.public CompleteLinkClusterer(Distance<? super E> distance)
Double.POSITIVE_INFINITY as the
bound value, which effectively makes clustering unbounded.public Dendrogram<E> hierarchicalCluster(Set<? extends E> elementSet)
AbstractHierarchicalClustererDouble.POSITIVE_INFINITY should result in a complete
clustering.hierarchicalCluster in interface HierarchicalClusterer<E>hierarchicalCluster in class AbstractHierarchicalClusterer<E>elementSet - Set of objects to cluster.Copyright © 2016 Alias-i, Inc.. All rights reserved.