Package opennlp.tools.ngram
Class NGramModel
- java.lang.Object
-
- opennlp.tools.ngram.NGramModel
-
- All Implemented Interfaces:
Iterable<StringList>
- Direct Known Subclasses:
NGramLanguageModel
public class NGramModel extends Object implements Iterable<StringList>
TheNGramModelcan be used to crate ngrams and character ngrams.- See Also:
StringList
-
-
Constructor Summary
Constructors Constructor Description NGramModel()Initializes an empty instance.NGramModel(InputStream in)Initializes the current instance.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidadd(CharSequence chars, int minLength, int maxLength)Adds character NGrams to the current instance.voidadd(StringList ngram)Adds one NGram, if it already exists the count increase by one.voidadd(StringList ngram, int minLength, int maxLength)Adds NGrams up to the specified length to the current instance.booleancontains(StringList tokens)Checks fit he given tokens are contained by the current instance.voidcutoff(int cutoffUnder, int cutoffOver)Deletes all ngram which do appear less than the cutoffUnder value and more often than the cutoffOver value.booleanequals(Object obj)intgetCount(StringList ngram)Retrieves the count of the given ngram.inthashCode()Iterator<StringList>iterator()Retrieves anIteratorover allStringListentries.intnumberOfGrams()Retrieves the total count of all Ngrams.voidremove(StringList tokens)Removes the specified tokens form the NGram model, they are just dropped.voidserialize(OutputStream out)Writes the ngram instance to the givenOutputStream.voidsetCount(StringList ngram, int count)Sets the count of an existing ngram.intsize()Retrieves the number ofStringListentries in the current instance.DictionarytoDictionary()Creates a dictionary which contain allStringListwhich are in the currentNGramModel.DictionarytoDictionary(boolean caseSensitive)Creates a dictionary which contains allStringLists which are in the currentNGramModel.StringtoString()-
Methods inherited from interface java.lang.Iterable
forEach, spliterator
-
-
-
-
Constructor Detail
-
NGramModel
public NGramModel()
Initializes an empty instance.
-
NGramModel
public NGramModel(InputStream in) throws IOException
Initializes the current instance.- Parameters:
in- the serialized model stream- Throws:
IOException
-
-
Method Detail
-
getCount
public int getCount(StringList ngram)
Retrieves the count of the given ngram.- Parameters:
ngram- an ngram- Returns:
- count of the ngram or 0 if it is not contained
-
setCount
public void setCount(StringList ngram, int count)
Sets the count of an existing ngram.- Parameters:
ngram-count-
-
add
public void add(StringList ngram)
Adds one NGram, if it already exists the count increase by one.- Parameters:
ngram-
-
add
public void add(StringList ngram, int minLength, int maxLength)
Adds NGrams up to the specified length to the current instance.- Parameters:
ngram- the tokens to build the uni-grams, bi-grams, tri-grams, .. from.minLength- - minimal lengthmaxLength- - maximal length
-
add
public void add(CharSequence chars, int minLength, int maxLength)
Adds character NGrams to the current instance.- Parameters:
chars-minLength-maxLength-
-
remove
public void remove(StringList tokens)
Removes the specified tokens form the NGram model, they are just dropped.- Parameters:
tokens-
-
contains
public boolean contains(StringList tokens)
Checks fit he given tokens are contained by the current instance.- Parameters:
tokens-- Returns:
- true if the ngram is contained
-
size
public int size()
Retrieves the number ofStringListentries in the current instance.- Returns:
- number of different grams
-
iterator
public Iterator<StringList> iterator()
Retrieves anIteratorover allStringListentries.- Specified by:
iteratorin interfaceIterable<StringList>- Returns:
- iterator over all grams
-
numberOfGrams
public int numberOfGrams()
Retrieves the total count of all Ngrams.- Returns:
- total count of all ngrams
-
cutoff
public void cutoff(int cutoffUnder, int cutoffOver)Deletes all ngram which do appear less than the cutoffUnder value and more often than the cutoffOver value.- Parameters:
cutoffUnder-cutoffOver-
-
toDictionary
public Dictionary toDictionary()
Creates a dictionary which contain allStringListwhich are in the currentNGramModel. Entries which are only different in the case are merged into one. Calling this method is the same as callingtoDictionary(boolean)with true.- Returns:
- a dictionary of the ngrams
-
toDictionary
public Dictionary toDictionary(boolean caseSensitive)
Creates a dictionary which contains allStringLists which are in the currentNGramModel.- Parameters:
caseSensitive- Specifies whether case distinctions should be kept in the creation of the dictionary.- Returns:
- a dictionary of the ngrams
-
serialize
public void serialize(OutputStream out) throws IOException
Writes the ngram instance to the givenOutputStream.- Parameters:
out-- Throws:
IOException- if an I/O Error during writing occurs
-
-