public final class MinHash extends Object
| Modifier and Type | Class and Description |
|---|---|
static class |
MinHash.HashType |
| Modifier and Type | Method and Description |
|---|---|
static MinHash |
create(int numHashes)
Creates a
MinHash instance with the given number of hash functions
with a linear hashing function. |
static MinHash |
create(int numHashes,
long seed)
Creates a
MinHash instance with the given number of hash functions
and a seed to be used in parallel systems. |
static MinHash |
create(int numHashes,
MinHash.HashType type)
Creates a
MinHash instance with the given number of hash functions. |
static MinHash |
create(int numHashes,
MinHash.HashType type,
long seed)
Creates a
MinHash instance with the given number of hash functions
and a seed to be used in parallel systems. |
Set<String> |
createClusterKeys(int[] minHashes,
int keyGroups)
Generates cluster keys from the minhashes.
|
double |
measureSimilarity(int[] left,
int[] right)
Measures the similarity between two min hash arrays by comparing the hashes
at the same index.
|
int[] |
minHashVector(de.jungblut.math.DoubleVector vector)
Minhashes the given vector by iterating over all non zero items and hashing
each byte in its value (as an integer).
|
public int[] minHashVector(de.jungblut.math.DoubleVector vector)
vector - a arbitrary vector.public double measureSimilarity(int[] left,
int[] right)
public Set<String> createClusterKeys(int[] minHashes, int keyGroups)
keyGroups - how many keygroups there should be, normally it's just a
single per hash.public static MinHash create(int numHashes)
MinHash instance with the given number of hash functions
with a linear hashing function.public static MinHash create(int numHashes, long seed)
MinHash instance with the given number of hash functions
and a seed to be used in parallel systems. This method uses a linear
hashfunction.public static MinHash create(int numHashes, MinHash.HashType type)
MinHash instance with the given number of hash functions.public static MinHash create(int numHashes, MinHash.HashType type, long seed)
MinHash instance with the given number of hash functions
and a seed to be used in parallel systems.Copyright © 2016. All rights reserved.