public class NGramModel extends Object implements Iterable<StringList>
NGramModel can be used to crate ngrams and character ngrams.StringList| Constructor and Description |
|---|
NGramModel()
Initializes an empty instance.
|
NGramModel(InputStream in)
Initializes the current instance.
|
| Modifier and Type | Method and Description |
|---|---|
void |
add(String chars,
int minLength,
int maxLength)
Adds character NGrams to the current instance.
|
void |
add(StringList ngram)
Adds one NGram, if it already exists the count increase by one.
|
void |
add(StringList ngram,
int minLength,
int maxLength)
Adds NGrams up to the specified length to the current instance.
|
boolean |
contains(StringList tokens)
Checks fit he given tokens are contained by the current instance.
|
void |
cutoff(int cutoffUnder,
int cutoffOver)
Deletes all ngram which do appear less than the cutoffUnder value
and more often than the cutoffOver value.
|
boolean |
equals(Object obj) |
int |
getCount(StringList ngram)
Retrieves the count of the given ngram.
|
int |
hashCode() |
Iterator<StringList> |
iterator()
Retrieves an
Iterator over all StringList entries. |
int |
numberOfGrams()
Retrieves the total count of all Ngrams.
|
void |
remove(StringList tokens)
Removes the specified tokens form the NGram model, they are just dropped.
|
void |
serialize(OutputStream out)
Writes the ngram instance to the given
OutputStream. |
void |
setCount(StringList ngram,
int count)
Sets the count of an existing ngram.
|
int |
size()
Retrieves the number of
StringList entries in the current instance. |
Dictionary |
toDictionary()
Creates a dictionary which contain all
StringList which
are in the current NGramModel. |
Dictionary |
toDictionary(boolean caseSensitive)
Creates a dictionary which contains all
StringLists which
are in the current NGramModel. |
String |
toString() |
public NGramModel()
public NGramModel(InputStream in) throws IOException, InvalidFormatException
in - IOExceptionInvalidFormatExceptionpublic int getCount(StringList ngram)
ngram - public void setCount(StringList ngram, int count)
ngram - count - public void add(StringList ngram)
ngram - public void add(StringList ngram, int minLength, int maxLength)
ngram - the tokens to build the uni-grams, bi-grams, tri-grams, ..
from.minLength - - minimal lengthmaxLength - - maximal lengthpublic void add(String chars, int minLength, int maxLength)
chars - minLength - maxLength - public void remove(StringList tokens)
tokens - public boolean contains(StringList tokens)
tokens - public int size()
StringList entries in the current instance.public Iterator<StringList> iterator()
Iterator over all StringList entries.iterator in interface Iterable<StringList>public int numberOfGrams()
public void cutoff(int cutoffUnder,
int cutoffOver)
cutoffUnder - cutoffOver - public Dictionary toDictionary()
StringList which
are in the current NGramModel.
Entries which are only different in the case are merged into one.
Calling this method is the same as calling toDictionary(boolean) with true.public Dictionary toDictionary(boolean caseSensitive)
StringLists which
are in the current NGramModel.caseSensitive - Specifies whether case distinctions should be kept in the creation of the dictionary.public void serialize(OutputStream out) throws IOException
OutputStream.out - IOException - if an I/O Error during writing occursCopyright © 2015 The Apache Software Foundation. All rights reserved.