E - the type of objects being clusteredpublic abstract class AbstractHierarchicalClusterer<E> extends Object implements HierarchicalClusterer<E>
AbstractHierachicalClusterer provides an adapter
for clustering for hierarchical clusterers. The abstract method
hierarchicalCluster(Set) defines hierarchical
clustering for the specified input set, returning a dendrogram.
The basic clustering interface cluster(Set) is defined
by specifying a cutoff in terms of distance.
Distance measures between elements provide measures of
dissimilarity in that the larger the distance the more dissimilar
the members. Zero values indicate perfect similarity and larger
numbers indicate less similarity. The typical example is a
distance measure of some kind; closer objects are clustered more
readily in these cases. A typical distance metric is Euclidean
distance between vector objects. Other Minkowski metrics are also
common, such as the Manhattan metric, which reduces to Hamming
distance for binary vectors. Edit distance, as implemented in the
com.aliasi.spell package is another popular dissimilarity
metric for text. Two texts,
text1 and
text2, may be compared by sample
cross-entropy. If Mi is the
result of training a language model on
texti, then a symmetric measure of of
dissimilarity is
M1.crossEntropy(text2)
+
M2.crossEntropy(text1).
Averages, min or max may also be used.
| Constructor and Description |
|---|
AbstractHierarchicalClusterer(double maxDistance,
Distance<? super E> distance)
Construct an abstract hierarchical clusterer with the specified
maximum distance.
|
| Modifier and Type | Method and Description |
|---|---|
Set<Set<E>> |
cluster(Set<? extends E> elements)
Returns the clustering of the specified elements.
|
Distance<? super E> |
distance()
Returns the distance function for this hierarchical clusterer.
|
double |
getMaxDistance()
Returns the maximum distance for clusters in a dendrogram.
|
abstract Dendrogram<E> |
hierarchicalCluster(Set<? extends E> elements)
Returns the array of clusters derived from performing
clustering with this class's specified maximum distance.
|
void |
setMaxDistance(double maxDistance)
Sets the maximum distance at which two clusters may
be merged.
|
public AbstractHierarchicalClusterer(double maxDistance,
Distance<? super E> distance)
maxDistance - Maximum distance between clusters that can
be linked.
// * @param minClusters Minimum number of clusters to return.
// * @param maxClusters Maximum number of clusters to return.IllegalArgumentException - If the specified distance is not
a non-negative number.public Distance<? super E> distance()
public abstract Dendrogram<E> hierarchicalCluster(Set<? extends E> elements)
Double.POSITIVE_INFINITY should result in a complete
clustering.hierarchicalCluster in interface HierarchicalClusterer<E>elements - Set of objects to cluster.public Set<Set<E>> cluster(Set<? extends E> elements)
public double getMaxDistance()
public final void setMaxDistance(double maxDistance)
maxDistance - New value for maximum distance.Copyright © 2016 Alias-i, Inc.. All rights reserved.