E - Type of the tokens being tagged.public class ClassifierTagger<E> extends Object implements Tagger<E>, Compilable, Serializable
ClassifierTagger implements the first-best tagger
interface with a classifier that operates left-to-right
over the tokens, classifying one token at a time.
The current state of the tagging up to the current
position being tagged is represented using the static
nested class ClassifierTagger.State. The state
contains all of the input tokens, an integer input position,
and the tags for all of the tokens earlier in the sequence.
An advantage of the classifier tagger over a more complex tagger such as conditional random fields (CRF) is that it is able to use longer-distance information about tags that have already been assigned. Another advantage is that the classifier tagger will use much less memory and tag much more quickly. Depending on the base classifier used, a classifier tagger will likely be more efficient to train in terms of time and space than a CRF.
The implementation of the decoder is the obvious one. It walks along the input string, constructing a state of the position so far, then feeds the state into the classifier, the output classification of which determines the next state.
toClassifierCorpus() converts
a tagging corpus to a classifier corpus, which may then be
used to train a classifier. The resulting trained classifier
may then be plugged into a classifier tagger, which may be
serialized or compiled, depending on the serializability and
compilability of the underlying classifier.
ClassifierTagger<E> with the
deserialized classifier as its base classifier.
A classifier tagger is compilable if the underlying
classifier is compilable. The deserialized classifier tagger
will be an instance of ClassifierTagger<E> with the
deserialized compiled classifier as its base classifier.
| Modifier and Type | Class and Description |
|---|---|
static class |
ClassifierTagger.State<F>
A
ClassifierTagger.State represents the full list
of input tokens and a list of the tags assigned so far. |
| Constructor and Description |
|---|
ClassifierTagger(BaseClassifier<ClassifierTagger.State<E>> classifier)
Construct a classifier tagger based on the specified
base classifier over states.
|
| Modifier and Type | Method and Description |
|---|---|
BaseClassifier<ClassifierTagger.State<E>> |
classifier()
Returns the underlying classifier for this classifier tagger.
|
void |
compileTo(ObjectOutput out)
Compile this classifier tagger to the specified
object output stream.
|
Tagging<E> |
tag(List<E> tokens)
Return the tagging for the specified list of tokens.
|
static <F> Corpus<ObjectHandler<Classified<ClassifierTagger.State<F>>>> |
toClassifiedCorpus(Corpus<ObjectHandler<Tagging<F>>> taggingCorpus)
Return a corpus consisting of classified tagger states derived
from the specified corpus of taggings.
|
public ClassifierTagger(BaseClassifier<ClassifierTagger.State<E>> classifier)
classifier - Base classifier over tagging partial results.public BaseClassifier<ClassifierTagger.State<E>> classifier()
public void compileTo(ObjectOutput out) throws IOException
compileTo in interface Compilableout - Object output stream to which this classifier tagger
is compiled.NotSerializableException - If the base classifier is
not compilable.IOException - If there is an underlying error during
I/O.public static <F> Corpus<ObjectHandler<Classified<ClassifierTagger.State<F>>>> toClassifiedCorpus(Corpus<ObjectHandler<Tagging<F>>> taggingCorpus)
The resulting corpus is implemented as a lightweight wrapper around the tagging corpus. This makes it slightly slower than explicitly converting the corpus, but is much smaller in memory.
The returned corpus will be serializable if the specified corpus is serializable.
F - Type of the tokens being tagged.taggingCorpus - Corpus of taggings.Copyright © 2019 Alias-i, Inc.. All rights reserved.