E - the type of object being clusteredpublic class SingleLinkClusterer<E> extends AbstractHierarchicalClusterer<E>
SingleLinkClusterer implements standard single-link
agglomerative clustering. Single link clustering is a greedy
algorithm in which the two closest clusters are always merged
up to a specified distance threshold. Distance between clustes for
single link clustering is defined to be the minimum of the distances
between the members of the clusters. See CompleteLinkClusterer
for a clusterer that takes the maximum rather than the minimum in
making clustering decisions.
For example, consider the following proximity matrix
representing distances between pairs of elements (the same example
is used in CompleteLinkClusterer):
The result of single-link clustering is the following dendrogram (pardon the ASCII art):
A B C D E A 0 1 2 7 5 B 0 3 8 6 C 0 5 9 D 0 4 E 0
A B C D E
| | | | |
1 ----- | | |
2 ------- | |
3 | | |
4 | -----
5 ----------
First, the objects A and B are merged
at distance one. Then object C is merged into the
cluster at distance 2, because that is its distance from
A. Objects D and E are
merged next at distance 4, and finally, the two big clusters are
merged at distance 5, the distance between A and
E.
The various clusters at each proximity bound threshold are:
The intervals show the clusters returned for thresholds within the specified interval. As usual, square brackets denote inclusive range bounds and round brackets exclusive bounds. For instance, if the distance threshold is 1.5, four clusters are returned, whereas if the threshold is 4.0, two clusters are returned.
Threshold Range Clusters [Double.NEGATIVE_INFINITY,1) {A}, {B}, {C}, {D}, {E} [1,2) {A,B}, {C}, {D}, {E} [2,4) {A,B,C}, {D}, {E} [4,5) {A,B,C}, {D,E} [5,Double.POSITIVE_INFINITY] {A,B,C,D,E}
Note that this example has the same clusters as that in Implementation Note: This clusterer is implemented using
the minimum-spanning-tree-based algorithm described in:
In brief, the algorithm sorts all pairs of objects in order of
increasing distance (with CompleteLinkClusterer. A simple example where the results are
different is the result of clustering four points on the x-axis,
x1=1, x2=3,
x3=6, x4=10.
The distance is computed by subtraction, e.g. d(x2,
x4) = x4- x2.
This leads to the following distance matrix:
For this example, the complete link clustering is
x1 x2 x3 x4 x1 0 2 5 9 x2 0 3 7 x3 0 4 x4 0 {{x1,x2},{x3,x4}}
whereas the single-link clustering is:
{{{x1,x2},x3},x4}
Jain, Anil K. and Richard C. Dubes. 1988. Algorithms for Clustering
Data. Prentice-Hall.
The minimum spanning tree is constructed using Kruskal's algorithm,
which is described in:
Cormen, Thomas H., Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein.
2001.
Introduction to Algorithms. MIT Press.
This algorithm requires an efficient lookup of whether two
nodes are already in the same connected component, which
is implemented by the standard disjoint set algorithm with
path compression (also described in Cormen et al.).
n objects that involves
sorting n2 pairs of objects,
leading to an initial step with worst case time complexity bound
O(n2 log n). The pairs are then
merged in order of decreasing distance. The sets involved in
Kruskal's algorithm translate directly into dendrograms with their
dereferencing.
| Constructor and Description |
|---|
SingleLinkClusterer(Distance<? super E> distance)
Construct a single-link clusterer with no distance bound.
|
SingleLinkClusterer(double maxDistance,
Distance<? super E> distance)
Construct a single-link clusterer with the specified distance
bound.
|
| Modifier and Type | Method and Description |
|---|---|
Dendrogram<E> |
hierarchicalCluster(Set<? extends E> elementSet)
Return the array of clusters derived from the specified
distance matrix by performing single-link clustering up to the
specified maximum distance bound.
|
cluster, distance, getMaxDistance, setMaxDistancepublic SingleLinkClusterer(double maxDistance,
Distance<? super E> distance)
maxDistance - Maximum distance for clusters.distance - Distance measure between objects to cluster.IllegalArgumentException - If the specified bound is not
a non-negative number.public SingleLinkClusterer(Distance<? super E> distance)
Double.POSITIVE_INFINITY,
which effectively removes the distance bound.distance - Distance measure between objects to cluster.public Dendrogram<E> hierarchicalCluster(Set<? extends E> elementSet)
Double.POSITIVE_INFINITY results in a complete
clustering.hierarchicalCluster in interface HierarchicalClusterer<E>hierarchicalCluster in class AbstractHierarchicalClusterer<E>elementSet - Set of elements to cluster.IllegalArgumentException - If the set of elements is empty.Copyright © 2016 Alias-i, Inc.. All rights reserved.