All Classes and Interfaces
Class
Description
Abstract class which contains code to tag and chunk parses for bottom up parsing and
leaves implementation of advancing parses and completing parses to extend class.
Abstract class containing many of the methods used to generate contexts for parsing.
Abstract
DataIndexer implementation for collecting
event and context counts used in training.A basic
EventModelSequenceTrainer implementation that processes events.A base
ObjectStream implementation for events.A basic
EventTrainer implementation.A basic
MaxentModel implementation.An abstract, basic implementation of a model reader.
An abstract, basic implementation of a model writer.
A base
ObjectStream implementation.Abstract class extended by parser event streams which perform tagging and chunking.
Base class for sample stream factories.
An interface for generating features for name entity identification and for
updating document level contexts.
Parser for Floresta Sita(c)tica Arvores Deitadas corpus, output to for the
Portuguese Chunker training.
A Factory to create a Arvores Deitadas ChunkStream from the command line
utility.
The
AdditionalContextFeatureGenerator generates the context from the passed
in additional context.Parser for Floresta Sita(c)tica Arvores Deitadas corpus, output to for the
Portuguese NER training.
A Factory to create a Arvores Deitadas NameSampleDataStream from the command line
utility.
Note:
Do not use this class, internal use only!
Note:
Do not use this class, internal use only!
Note:
Do not use this class, internal use only!
Note:
Do not use this class, internal use only!
Stream filter which merges text lines into sentences, following the Arvores
Deitadas syntax.
Parses a sample of AD corpus.
Represents the AD leaf
Represents the AD node
Represents a tree element, Node or Leaf
Note:
Do not use this class, internal use only!
A
CharSequenceNormalizer implementation that aggregates the
functionality of other normalizers.The
AggregatedFeatureGenerator aggregates a set of
AdaptiveFeatureGenerators and calls them to generate the features.Class for storing the Ancora Spanish head rules associated with parsing.
This class implements the stemming algorithm defined by a snowball script.
Utility class for simple vector arithmetic.
Provides access to model persisted artifacts.
Responsible to create an artifact from an
InputStream.Generates predictive contexts for deciding how constituents should be attached.
The
Attributes class stores name value pairs.Generates a feature for each word in a document.
Represents a minimal tuple of information.
This is a common base model which can be used by the components' specific
model classes.
Base class for all tool
factories.A
ContextGenerator implementation for maxent decisions, assuming that the input
given to the BasicContextGenerator.getContext(String) method is a String containing contextual
predicates separated by spaces, for instance:Common format parameters.
Common training parameters.
Performs k-best search over a sequence.
Interface for context generators used with a sequence beam search.
The default
SequenceCodec implementation according to the BILOU scheme.A
SequenceValidator implementation for the BilouCodec.A
DataReader that reads files from a binary format.A
GISModelReader that reads models from a binary format.A
GISModelWriter that writes models in a binary format.A
NaiveBayesModelReader that reads models from a binary format.A
NaiveBayesModelWriter that writes models in a binary format.A
PerceptronModelReader that reads models from a binary format.A
PerceptronModelWriter that writes models in a binary format.A
QNModelReader that reads models from a binary format.A
QNModelWriter that writes models in a binary format.The default
SequenceCodec implementation according to the BIO scheme:
B: 'beginning' of a NE
I: 'inside', the word is inside a NE
O: 'outside', the word is a regular word outside a NE
See also the paper by Roth D. and Ratinov L.:
Design Challenges and Misconceptions in Named Entity Recognition.A
sample stream for the training files of the
BioNLP/NLPBA 2004 shared task.Reads the annotations from the brat
.ann annotation file.Brat (brat rapid annotation tool) is based on the stav visualiser
which was originally made in order to visualise BioNLP'11 Shared Task data.
Generates Name Sample objects for a Brat Document object.
Generates Brown cluster features for token bigrams.
Class to load a Brown cluster document: word\tword_class\tprob
Generates Brown clustering features for token bigrams.
Generates Brown clustering features for token classes.
Generates Brown clustering features for current token.
Obtain the paths listed in the pathLengths array from the Brown class.
Generates
BrownCluster features for current token and token class.Generates
BrownCluster features for current token.Generates predictive contexts for deciding how constituents should be combined.
Creates the features or contexts for the building phase of parsing.
An
ArtifactSerializer implementation for binary data, kept in byte[].Provides fixed size, pre-allocated, least recently used replacement cache.
Caches features of the aggregated
generators.This class implements the stemming algorithm defined by a snowball script.
This tool helps create a loadable dictionary for the
NameFinder,
from data collected from US Census data.The
CharacterNgramFeatureGenerator uses character ngrams to
generate features about each token.A char sequence normalizer, used to adjusting (prune, substitute, add, etc.)
Generates predictive context for deciding when a constituent is complete.
Generates predictive context for deciding when a constituent is complete.
Trains a new check model.
Creates predictive context for the pre-chunking phases of parsing.
The interface for chunkers which provide chunk tags for a sequence of tokens.
Interface for a
BeamSearchContextGenerator used in syntactic chunking.Tool to convert multiple data formats into native OpenNLP chunker training
format.
Cross validator for
Chunker.A marker interface for evaluating
chunkers.The
ChunkerEvaluator measures the performance of the given Chunker with the provided
reference samples.A default
ChunkSample-centric implementation of AbstractEvaluatorTool
that prints to an output stream.Class for creating an event stream out of data files for training a
Chunker.The class represents a maximum-entropy-based
Chunker.The
ChunkerModel is the model used by a learnable Chunker.Loads a
ChunkerModel for the command line tools.An
ArtifactSerializer implementation for models.Factory producing OpenNLP
ChunkSampleStreams.A default implementation of
EvaluationMonitor that prints
to an output stream.Class for holding chunks for a single unit of text.
A
SequenceStream implementation encapsulating samples.Parses the conll 2000 shared task shallow parser training data.
An
ObjectStream implementation that works on a
Collection of CollectionObjectStream as source for elements.A maxent event representation which we can use to sort based on the
predicates indexes contained in the events.
A maxent predicate representation which we can use to sort based on the
outcomes.
A configurable
context generator for a POSTagger.Parser for the Dutch and Spanish ner training files of the CONLL 2002 shared task.
Note:
Do not use this class, internal use only!
An import stream which can parse the CONLL03 data.
Note: Do not use this class, internal use only!
Note: Do not use this class, internal use only!
Note:
Do not use this class, internal use only!
The CoNNL-U Format is specified
here.
Note:
Do not use this class, internal use only!
Parses the data from the CONLL 06 shared task into POS Samples.
Note:
Do not use this class, internal use only!
Note:
Do not use this class, internal use only!
Note:
Do not use this class, internal use only!
Holds feature information about a specific
Parse node.Note:
Do not use this class, internal use only!
Holds constituents when reading
parses.Class which associates a real valued parameter or expected value with a particular contextual
predicate or feature.
Represents a generator of contexts for maxent decisions.
Provides access to training and test partitions for n-fold cross validation.
The
CrossValidationPartitioner.TrainingSampleStream which iterates over
all training elements.Common cross validator parameters.
This class implements the stemming algorithm defined by a snowball script.
Represents an indexer which compresses events in memory and performs feature selection.
A factory that produces
DataIndexer instances.Describes generic ways to read data from a
DataInputStream.An interface for objects which can deliver a stream of training data to be
supplied to an EventStream.
Features based on chunking model described in Fei Sha and Fernando Pereira.
The default chunker
SequenceValidator implementation.Default implementation of the
EndOfSentenceScanner.A context generator for language detector.
Simple feature generator for learning statistical lemmatizers.
The default lemmatizer
SequenceValidator implementation.A
NameContextGenerator implementation for determining contextual features
for a tag-chunk style named-entity recognizer.A default
context generator for a POSTagger.The default POS tagger
SequenceValidator implementation.Generate event contexts for maxent decisions for sentence detection.
A default
TokenContextGenerator which produces events for maxent decisions
for tokenization.A default implementation of
EvaluationMonitor that prints
to an output stream.A
Detokenizer merges tokens back to their detokenized representation.This enum contains an operation for every token to merge the
tokens together to their detokenized form.
The
DetokenizerEvaluator measures the performance of
the given Detokenizer with the provided reference
samples.Base class for factories which need a
Detokenizer.An iterable and serializable dictionary implementation.
A rule based detokenizer.
A persistor used by for reading and writing
dictionaries
of all kinds.The
DictionaryFeatureGenerator uses the DictionaryNameFinder
to generated features for detected names based on the InSpanGenerator.A
Lemmatizer implementation that works by simple dictionary lookup into
a Map built from a file containing, for each line:This is a
Dictionary based name finder.An
ArtifactSerializer implementation for dictionaries.The directory sample stream allows for creating an
ObjectStream<File>
from a directory listing of files.Tool to convert multiple data formats into native OpenNLP doccat training
format.
Cross validator for
DocumentCategorizer.A default implementation of
EvaluationMonitor that prints to an
output stream.A marker interface for evaluating
doccat.A default
DocumentSample-centric implementation of AbstractEvaluatorTool
that prints to an output stream.The factory that provides Doccat default implementations and resources.
Generates a detailed report for the POS Tagger.
A model for document categorization
Loads a
DoccatModel for the command line tools.Interface for classes which categorize documents.
The
DocumentCategorizerEvaluator measures the performance of
the given DocumentCategorizer with the provided reference
samples.Iterator-like class for modeling document classification events.
A Max-Ent based implementation of
DocumentCategorizer.Interface for processing an entire document allowing a
TokenNameFinder to use context
from the entire document.Class which holds a classified document and its category.
Reads in string encoded training samples, parses them and
outputs
DocumentSample objects.Factory producing OpenNLP
DocumentSampleStreams.Reads a plain text file and return each line as a
String object.This class facilitates the downloading of pretrained OpenNLP models.
The type of model.
This class implements the stemming algorithm defined by a snowball script.
A
EmojiCharSequenceNormalizer implementation that normalizes text
in terms of emojis.ObjectStream to clean up empty lines for empty line separated document streams.- Skips empty line at training data start
- Transforms multiple empty lines in a row into one
- Replaces white space lines with empty lines
- TODO: Terminates last document with empty line if it is missing
This stream should be used by the components that mark empty lines to mark document boundaries.
Encoding parameter.
This class implements the stemming algorithm defined by a snowball script.
EntityLinkers establish connections with external data to enrich extracted
entities.
Generates a
EntityLinker instances via a properties file configuration.Properties wrapper for
EntityLinker implementations.An
Entry is a StringList which can
optionally be mapped to attributes.Parser for the Italian NER training files of the Evalita 2007 and 2009 NER shared tasks.
Note:
Do not use this class, internal use only!
This class encapsulates the variables used in producing probabilities from a model
and facilitates passing these variables to the eval method.
An abstract base class for evaluators.
Common evaluation parameters.
The context of a decision point during training.
A specialized
Trainer that is based on a 'EventModelSequence' approach.Indicates that a certain API feature is not stable
and might change with a new release.
The
ExtensionLoader is responsible to load extensions to the OpenNLP library.Exception indicates that an OpenNLP extension could not be loaded.
Interface for generating features for document categorization.
The
FeatureGeneratorResourceProvider provides access to the resources
available in the model.This class provide common utilities for feature generation.
Class for using a file of
events as an event stream.Note:
Do not use this class, internal use only!
Provides the ability to read the contents of files
contained in an object stream of files.
Abstract base class for filtering
streams.Common evaluation parameters.
This class implements the stemming algorithm defined by a snowball script.
The
FMeasure is a utility class for evaluators
which measures precision, recall and the resulting f-measure.This class implements the stemming algorithm defined by a snowball script.
Interface for a function.
Represents a labeler for nodes which contain traces so that these traces can be predicted
by a
Parser.Creates a set of feature generators based on a provided XML descriptor.
An generic
AbstractModelReader implementation.An
ArtifactSerializer implementation for models.An generic
AbstractModelWriter implementation.This class implements the stemming algorithm defined by a snowball script.
A maximum entropy model which has been trained using the Generalized
Iterative Scaling (GIS) procedure.
The base class for readers of
GIS models.The base class for writers of
GIS models.An implementation of Generalized Iterative Scaling (GIS).
GloVe is an unsupervised learning algorithm for obtaining vector representations for words.
This class implements the stemming algorithm defined by a snowball script.
A hash sum based
AbstractObjectStream implementation.Encoder for head rules associated with parsing.
Class for storing the English
HeadRules associated with parsing.This class implements the stemming algorithm defined by a snowball script.
This classes indexes
string lists.This class implements the stemming algorithm defined by a snowball script.
Allows repeated reads through a stream for certain model building types.
Generates features if the tokens are recognized by the provided
TokenNameFinder.This exception indicates that the provided training data is
insufficient to train a desired model.
Classes, fields, or methods annotated
@Internal are for OpenNLP
internal use only.This exception indicates that a resource violates the expected data format.
A structure to hold an Irish Sentence Bank document, which is a collection
of tokenized sentences.
This class implements the stemming algorithm defined by a snowball script.
This class implements the stemming algorithm defined by a snowball script.
Class for holding the document language and its confidence
The interface for
LanguageDetector which predicts the Language for a context.A context generator interface for
LanguageDetector.Tool to convert multiple data formats into native OpenNLP language detection
training format.
Cross validator for
LanguageDetector.A default implementation of
EvaluationMonitor that prints to an
output stream.A marker interface for evaluating
language detectors.The
LanguageDetectorEvaluator measures the performance of
the given LanguageDetector with the provided reference
LanguageSamples.A default
LanguageSample-centric implementation of AbstractEvaluatorTool
that prints to an output stream.Iterator-like class for modeling an event stream of
samples.Default factory used by
LanguageDetector.Generates a detailed report for the POS Tagger.
Implements a learnable
LanguageDetector.The
LanguageDetectorModel is the model used by a learnable LanguageDetector.Loads a
LanguageDetectorModel for the command line tools.This class reads in string encoded
training samples, parses them
and outputs LanguageSample objects.Factory producing OpenNLP
DocumentSampleStreams.A language model can calculate the probability p (between 0 and 1) of a
certain
sequence of tokens, given its underlying vocabulary.Holds a classified document and its
Language.Stream factory for those streams which carry language.
Note:
Do not use this class, internal use only!
A default implementation of
EvaluationMonitor that prints to an
output stream.Represents a lemmatized sentence.
Class for creating an event stream out of data files for training a probabilistic
Lemmatizer.A
SequenceStream implementation encapsulating samples.Reads data for training and testing the
Lemmatizer.The common interface for lemmatizers.
Interface for the context generator used for probabilistic
Lemmatizer.A marker interface for evaluating
lemmatizers.The
LemmatizerEvaluator measures the performance of
the given Lemmatizer with the provided reference
samples.A default
LemmaSample-centric implementation of AbstractEvaluatorTool
that prints to an output stream.The factory that provides
Lemmatizer default implementation and
resources.Generates a detailed report for the Lemmatizer.
A probabilistic
Lemmatizer implementation.The
LemmatizerModel is the model used by a learnable Lemmatizer.Loads a
LemmatizerModel for the command line tools.Factory producing OpenNLP
LemmaSampleStreams.A structure to hold the letsmt document.
A
content handler to receive and process SAX events.Class that performs line search to find minimum.
Represents a LineSearch result.
This class serves as an adapter for a
Logger used within a PrintStream.Class implementing the probability distribution over labels returned by
a classifier as a log of probabilities.
A class implementing the logarithmic
Probability for a label.A factory that creates
MarkableFileInputStream from a FileA class to process the MASC Named entity stand-off annotation file
A class for parsing MASC's Penn tagging/tokenization stand-off annotation
Interface for maximum entropy models.
Calculates the arithmetic mean of values
added with the
Mean.add(double) method.A helper class that handles Strings with more than 64k (65535 bytes) in length.
Enumeration of supported model types.
Utility class for handling of
models.Factory producing OpenNLP
MosesSentenceSampleStream objects.An extension of
Context used to store parameters or expected values
associated with this context which can be updated or assigned.This is a non-thread safe mutable int.
Interface that allows
TagDictionary entries to be added and removed.Specialized
parameters for the evaluation of a naive bayes classifierA
MaxentModel implementation of the multinomial Naive Bayes classifier model.The base class for readers of
models.The base class for
NaiveBayesModel writers.Trains
models using the combination of EM algorithm
and Naive Bayes classifier which is described in:Interface for generating the context for a
name finder by
specifying a set of feature generators.A default implementation of
EvaluationMonitor that prints
to an output stream.This class helps to read the US Census data from the files to build a
StringList for each dictionary entry in the name-finder dictionary.
Class for creating an event stream out of data files for training an
TokenNameFinder.A maximum-entropy-based
name finder implementation.The default name finder
SequenceValidator implementation.Encapsulates names for a single unit of text.
Counts tokens, sentences and names by type.
The
NameSampleDataStream class converts tagged strings
provided by a DataStream to NameSample objects.Factory producing OpenNLP
NameSampleDataStreams.A
SequenceStream implementation encapsulating samples.A
stream which removes name samples
which do not have a certain type.Note:
Do not use this class, internal use only!
Note:
Do not use this class, internal use only!
Note:
Do not use this class, internal use only!
Note:
Do not use this class, internal use only!
Evaluate negative log-likelihood and its gradient from
DataIndexer.The Newline
SentenceDetector assumes that sentences are line delimited and
recognizes one sentence per non-empty line.The
NGramCharModel can be used to create character ngrams.Generates ngram features for a document.
Generates an nGram, via an optional separator, and returns the grams as a list
of strings
A
LanguageModel based on a NGramModel using Stupid Backoff to get
the probabilities of the ngrams.Command line tool for
NGramLanguageModel.The
NGramModel can be used to crate ngrams and character ngrams.Utility class for ngrams.
The National corpus of Polish (NKJP) format.
This class implements the stemming algorithm defined by a snowball script.
A
NumberCharSequenceNormalizer implementation that normalizes text
in terms of numbers.A
DataReader implementation based on ObjectInputStream.Reads
objects from a stream.A
DataIndexer for maxent model data which handles cutoffs for uncommon
contextual predicates and provides a unique integer index for each of the
predicates.A
DataIndexer for maxent model data which handles cutoffs for uncommon
contextual predicates and provides a unique integer index for each of the
predicates and maintains event values.Name Sample Stream parser for the OntoNotes 4.0 corpus.
The definition feature maps the underlying distribution of outcomes.
A
FilterObjectStream which merges text lines into paragraphs.Evaluate negative log-likelihood and its gradient in parallel
Data structure for holding parse constituents.
A shift reduce style
Parser implementation
based on Adwait Ratnaparkhi's 1998 thesis.Defines common methods for full-syntactic parsers.
A built-attach
Parser implementation.The parser chunker
SequenceValidator implementation.Tool to convert multiple data formats into native OpenNLP parser
format.
Cross validator for a
Parser.A marker interface for evaluating
parsers.This implementation of
Evaluator<Parse> behaves like EVALB with no exceptions,
e.g, without removing punctuation tags, or equality between ADVP and PRT, as
in
COLLINS convention.A default
Parse-centric implementation of AbstractEvaluatorTool
that prints to an output stream.Wrapper class for one of four
shift-reduce parser event streams.Wrapper class for one of four
built-attach parser event streams.Enumeration of event types for a
Parser.This is the default
ParserModel implementation.Loads a
ParserModel for the command line tools.Enumeration of supported
Parser types.Factory producing OpenNLP
ParseSampleStreams.Note:
Do not use this class, internal use only!
Note:
Do not use this class, internal use only!
Note:
Do not use this class, internal use only!
Note:
Do not use this class, internal use only!
A
model implementation based one the perceptron algorithm.The base class for readers of
models.The base class for
PerceptronModel writers.Trains
models using the perceptron algorithm.Reads a plain text file and returns each line as a
String object.A generic
DataReader implementation for plain text files.A
NaiveBayesModelReader that reads models from a plain text format.A
NaiveBayesModelWriter that writes models in a plain text format.This class implements the stemming algorithm defined by a snowball script.
A
Stemmer, implementing the
Porter Stemming AlgorithmUtility class to handle Portuguese contractions.
This class implements the stemming algorithm defined by a snowball script.
Interface for a
BeamSearchContextGenerator used in POS tagging.Provides a means of determining which tags are valid for a particular word
based on a
TagDictionary read from a file.A default implementation of
EvaluationMonitor that prints
to an output stream.The
POSEvaluator measures the performance of the given POSTagger
with the provided reference samples.Loads a
POSModel for the command line tools.An
ArtifactSerializer implementation for models.Represents an pos-tagged
sentence.A
SequenceStream implementation encapsulating samples.The interface for part of speech taggers.
Tool to convert multiple data formats into native OpenNLP part of speech tagging
training format.
A marker interface for evaluating
pos taggers.A default
POSSample-centric implementation of AbstractEvaluatorTool
that prints to an output stream.The factory that provides
POSTagger default implementations and resources.Generates a detailed report for the POS Tagger.
A
part-of-speech tagger that uses maximum entropy.Adds the token POS Tag as feature.
Note:
Do not use this class, internal use only!
Note:
Do not use this class, internal use only!
Note:
Do not use this class, internal use only!
Note:
Do not use this class, internal use only!
This
AdaptiveFeatureGenerator generates features indicating the
outcome associated with a previously occurring word.This
AdaptiveFeatureGenerator generates features indicating the
outcome associated with two previously occurring words.This interface allows one to implement a prior distribution for use in
maximum entropy model training.
Class implementing the probability distribution over labels returned by a classifier.
Class implementing the probability for a label.
A data container encapsulating language detection results.
Implementation of L-BFGS which supports L1-, L2-regularization
and Elastic Net for solving convex optimization problems.
Evaluate quality of training parameters.
L2-regularized objective
Function.A maximum entropy model which has been trained using the Quasi Newton (QN) algorithm.
The base class for readers of
QN models.The base class for writers of
models.A Maxent model
Trainer using L-BFGS algorithm.Class for real-valued
events as an
event stream.
.Class for using a file of real-valued
events as an
event stream.A
TokenNameFinder implementation based on a series of regular expressions.Returns a
RegexNameFinder based on a selection of
defaults or a configuration and a selection of defaults.Enumeration of typical regex expressions available in OpenNLP.
This interface makes an
Iterator resettable.An iterator for a list which returns values in the opposite order as the typical list iterator.
This class implements the stemming algorithm defined by a snowball script.
This class implements the stemming algorithm defined by a snowball script.
Represents a generic type of processable elements.
Interface for
SentenceDetectorME context generators.A cross validator for
sentence detectors.Creates contexts/features for end-of-sentence detection in Thai text.
The interface for sentence detectors, which find the sentence boundaries in
a text.
Tool to convert multiple data formats into native OpenNLP sentence detector
training format.
The
SentenceDetectorEvaluator measures the performance of
the given SentenceDetector with the provided reference
SentenceSamples.A default
SentenceSample-centric implementation of AbstractEvaluatorTool
that prints to an output stream.The factory that provides
SentenceDetector default implementations and
resourcesA sentence detector for splitting up raw text into sentences.
A sentence detector which uses a maxent model to predict the sentences.
A default implementation of
EvaluationMonitor that prints
to an output stream.This feature generator creates sentence begin and end features.
The
SentenceModel is the model used by a learnable
SentenceDetector.A
SentenceSample contains a document with
begin indexes of the individual sentences.This class is a stream filter which reads a sentence by line samples from
an
ObjectStream and converts them into SentenceSample objects.Factory producing OpenNLP
SentenceSampleStreams.Class which models a sequence.
Represents a weighted sequence of outcomes.
A classification model that can label an input
Sequence.A codec for sequences of type
SequenceCodec.Interface for streams of
sequences used to train sequence models.Class which turns a
SequenceStream into an event stream.A marker interface so that implementing classes can refer to
the corresponding
ArtifactSerializer implementation.SAX style SGML parser.
A
ShrinkCharSequenceNormalizer implementation that shrinks repeated spaces / chars in text.Trains
models with sequences using the perceptron algorithm.A basic
Tokenizer implementation which performs tokenization
using character classes.Class for storing start and end integer offsets.
This class implements the stemming algorithm defined by a snowball script.
The stemmer is reducing a word to its stem.
A marker-interface for a String interner implementation.
Provides string interning utility methods.
A
StringList is an immutable list of Strings.Recognizes predefined patterns in strings.
This class implements the stemming algorithm defined by a snowball script.
Interface to determine which tags are valid for a particular word
based on a tag dictionary.
Classes, fields, or methods annotated
@ThreadSafe are safe to use
in multithreading contexts.Generates features for different for the class of the token.
Interface for context generators required for
TokenizerME.A default implementation of
EvaluationMonitor that prints
to an output stream.Generates a feature which contains the token itself.
The interface for tokenizers, which segment a string into its tokens.
Tool to convert multiple data formats into native OpenNLP sentence detector
training format.
A cross validator for
tokenizers.A marker interface for evaluating
tokenizers.The
TokenizerEvaluator measures the performance of
the given Tokenizer with the provided reference
samples.The factory that provides
Tokenizer default implementation and
resources.A
Tokenizer for converting raw text into separated tokens.A default
TokenSample-centric implementation of AbstractEvaluatorTool
that prints to an output stream.The
TokenizerModel is the model used
by a learnable Tokenizer.Loads a
TokenizerModel for the command line tools.The interface for name finders which provide name tags for a sequence of tokens.
Tool to convert multiple data formats into native OpenNLP name finder
training format.
Cross validator for
TokenNameFinder.A marker interface for evaluating
name finders.The
TokenNameFinderEvaluator measures the performance
of the given TokenNameFinder with the provided
reference samples.A default
NameSample-centric implementation of AbstractEvaluatorTool
that prints to an output stream.The factory that provides
TokenNameFinder default implementations and
resources.Generates a detailed report for the NameFinder.
The
TokenNameFinderModel is the model used by a learnable TokenNameFinder.Loads a
TokenNameFinderModel for the command line tools.Partitions tokens into sub-tokens based on character classes and generates
class features for each of the sub-tokens and combinations of those sub-tokens.
A
TokenSample is text with token spans.Class which produces an Iterator<TokenSample> from a file of space delimited token.
This class is a
stream filter which reads in string encoded
samples and creates samples out of them.Factory producing OpenNLP
TokenSampleStreams.Represents a common base for training implementations.
A factory to initialize
Trainer instances depending on a trainer type
configured via TrainingParameters.Declares and handles default parameters used for or during training models.
Common training parameters.
Adds trigram features based on tokens and token classes.
This class implements the stemming algorithm defined by a snowball script.
A
TwitterCharSequenceNormalizer implementation that normalizes text
in terms of Twitter character patterns.Collecting event and context counts by making two passes over the events.
An
InputStream which cannot be closed.Provide a maximum entropy model with a uniform
Prior.A
UrlCharSequenceNormalizer implementation that normalizes text
in terms of URls and email addresses.The
Version class represents the OpenNLP Tools library version.A basic
Tokenizer implementation which performs tokenization
using white spaces.This stream formats
ObjectStream of samples into whitespace
separated token strings.Generates previous and next features for a given
AdaptiveFeatureGenerator.Defines a word cluster generator factory; it reads an element containing
'w2vwordcluster' as a tag name; these clusters are typically produced by
word2vec or clark pos induction systems.
A
Tokenizer implementation which performs tokenization
using word pieces.A stream filter which reads a sentence per line which contains
words and tags in
word_tag format and outputs a POSSample objects.Note:
Do not use this class, internal use only!
A word vector.
A table that maps tokens to word vectors.