| Interface | Description |
|---|---|
| Tokenizer |
Standard tokenizer interface.
|
| Class | Description |
|---|---|
| BigramTokenizer |
Advanced tokenizer that lowercases, adds start and end tags, deduplicates
tokens and builds bigrams.
|
| DocumentSimilarity |
Simply distance measure wrapper for debug string similarity measuring.
|
| HMM |
Hidden Markov Model implementation for multiple observations for all three
types of problems HMM aims to solve (Decoding, likelihood estimation,
unsupervised/supervised learning).
|
| MarkovChain |
Markov chain, that can "learn" the state transition probabilities by a given
input and returns the probability for a given sequence of states.
|
| MinHash |
Linear MinHash algorithm to find near duplicates faster or to speedup nearest
neighbour searches.
|
| SparseVectorDocumentMapper |
Mapper that maps sparse vectors into a set of their indices so they can be
used in the
InvertedIndex for fast lookup. |
| StandardTokenizer |
Just a basic tokenizer by certain attributes with normalization.
|
| TokenizerUtils |
Nifty text utility for majorly tokenizing tasks.
|
| VectorizerUtils |
Vectorizing utility for basic tf-idf and wordcount vectorizing of
tokens/strings.
|
| Enum | Description |
|---|---|
| MinHash.HashType |
Copyright © 2016. All rights reserved.