E - Type of tokens in the tagging.public class ChainCrf<E> extends Object implements Tagger<E>, NBestTagger<E>, MarginalTagger<E>, Serializable
ChainCrf<E> class implements linear chain conditional
random field decoding and estimation for input token sequences of
type E.
The cliques of the graphical model are typically called nodes and edges in a chain CRF. A node consists of a token position, the rest of the input tokens, whereas an edge adds the previous tag. A CRF feature extractor converts nodes and edges to feature maps, which are then converted to vectors for use in CRFs.
First, we suppose a fixed, finite set of K tags tags[0],...,tags[K-1] which may be assigned to tokens as part of a
tagging. The coeffcient matrix β consists of
K coefficient vectors,
β[0],...,β[K-1], one for each tag.
Second, we assume a pair of feature extractors, one for nodes in
the graphical model and one for edges. The node feature extractor
φ0 maps sequences xs of input
tokens and a position n in that sequence (0 <= n <
xs.length) into a feature map φ0(xs,n).
The edge feature extractor The combined features extracted for input token sequence The probability model assumes a fixed sequence of input tokens
Note that this presentation of CRFs is a bit non-standard. In
particular, LingPipe has factored what is usually a single large
coefficient vector into one vector per output tag. This somewhat
restricts the kinds of features that may be extracted so that each
feature depends on a specific output tag. The reason this is done
is for efficiency. Note that the node feature extractor
The more verbose constructor for chain CRFs takes three array
arguments which indicate which tags are legal in which positions.
The shorter convenience constructor implicitly assumes any tag may
occur anywhere. The legal start tag array indicates if a tag may
start a tagging. The legal end tag array indicates if a tag may
be the last tag in a tagging. The legal transition array indicates
for a pair of tags if one may follow the other.
These are typically used when chunkers are encoded as taggers,
and some sequences of tags are structurally illegal. These are
called structural zeros in statistical modeling terms.
During estimation, these may be computed by specifying
N-best functionality is available through the method
N-best functionality with scores normalized to conditional
probabilities may be computed using The forward-backward algorithm efficiently computes unnormalized
forward, backward, and normalizing probabilities. As defined
in the The unnormalized total probability Z may be used with
n-best output to convert n-best sequence scores into conditional
probabilities of the tagging given the input tokens.
Training is carried out for conditional random fields the same
way as for logistic regression classifiers (see See the logistic regression classes for a description of the
prior (also see Reporting is exactly the same for conditional random fields as
for logistic regression, with each epoch reporting learning rate
(lr), log likelihood (ll), log prior probability (lp), and the
combined objective being optimized consisting of the sum of log
likelihood and log prior probability (llp), and the best combined
objective so far (llp*).
Unlike in logistic regression, the prior is applied according
to a blocking strategy rather than through lazy updates. The block
size is a parameter that determines how often the prior gradient
is applied. In effect, it treats updates as blocks. The weight of
the prior is adjusted accordingly. After all of the instances in
an epoch, the prior is applied one more time to ensure that the
coefficients are in the right state after each epoch.
Significant speedups can be achieved by setting the feature
caching flag in the estimator method to The CRF read back in will be identical to the one that was
serialized to the extent that the parts are equivalent: feature
extractors, symbol table, and coefficients.
For CRFs estimated using the gradient descent estimator method,
the symbol table and vector coefficients are serializable in such a
way as to reproduce equivalent objects to the ones being
serialized.
φ1 maps
input token sequences xs, a position n in that
sequence, and previous tag tags[k] to a feature
map φ1(xs,n,tags[k]).
xs, position n in that sequence, and tag tags[k] are:
defined by:
φ(xs,0,tags[k]) = φ0(xs,0)
The special case for position 0 is because there is no previous
tag. Note that boundary features (begin-of-sequence, end-of-sequence)
may be defined in φ(xs,n,tags[k]) = φ0(xs,n) + φ1(xs,n,tags[k]) for n > 0
φ0 because it knows
the position and length of input.
Tag Sequence Probability
xs[0],...,xs[N-1] of length N. The model assigns
a conditional probability to sequences of tag indices
ys[0],...,ys[N-1] (where 0 <= ys[n] < tags.length),
by:
where the sum is over the dot product of the feature vector
p(ys|ws) ∝ exp(Σn < N φ(xs,n,tags[ys[n-1]])⋅β[ys[n]])
φ(xs,n,tags[ys[n-1]]) extracted for the input
sequence of tokens xs at position n and
previous tag tags[ys[n-1]] with index ys[n-1],
and the coefficient for output tag tags[ys[n]],
namely β[tags[ys[n]].
Note on Factored Coefficients
φ0(xs,n) could be rolled into the more
general edge feature extractor
φ1(xs,n,t); the node feature extractor
is only for efficiency, allowing features shared for all previous
tags t to be extracted only once.
Legal Tags: Structural Zeros
First-Best Output: Viterbi
First-best output is calculated very efficiently in linear time
in the size of the inputs using the Viterbi algorithm, a form
of dynamic programming applicable to sequence tagging. Note that
the normalizing constant does not need to be computed. This
functionality is avaiable through the method tag(List)
of the Tagger interface.
N-Best Sequence Output: Viterbi Forward, A* Backward
N-best output may be calculated in one linear forward pass and an
efficient backward pass. The forward pass computes and stores the
full Viterbi lattice consisting of the best analysis and score up
to a given token for a given tag. The backward pass uses the
A* algorithm with the Viterbi scores as exact completion
estimates. This allows n-best analyses to be returned in
order, with at most a constant amount of work per result.
tagNBest(List,int) of the NBestTagger
interface. This returns an iterator over scored tagging
instances. The scores are the unnormalized products defined
above.
tagNBestConditional(List,int). This requires computing the
normalizing constant using the forward-backward algorithm described
in the next section.
Marginal Tag Probabiilty Output: Forward-Backward Algorithm
The forward-backward algorithm may be applied to conditional random
fields. The generalized undirected graphical model form of this
algorithm is known as the product-sum algorithm. The result is
returned as an instance of TagLattice.
TagLattice class documentation, these
may be used to define the probability of a tag or tag sequence
staring at a specified position given the input tokens.
Stochastic Gradient Descent Training
LogisticRegressionClassifier and LogisticRegression. It's mathematically
identical to think of CRFs as multi-way classifiers where the
categories are sequences of tags. Under that analogy, training for
conditional random fields exactly mirrors that for logistic
regression. The only difference for conditional random fields is
that the expectations at each step are computed using the
forward-backward algorithm and efficiently applied over the
lattice.
RegressionPrior), and the learning and
annealing rate (also see AnnealingSchedule).
true. This may
require a large amount of memory, because it stores the entire
lattice of vectors for each training instance.
Serialization
A chain CRF may be serialized if its parts are serializable: the
node and edge feature extractor, the feature symbol table, and the
vector coefficients.
Thread Safety
A CRF is thread safe to the extent that its symbol table and
feature extractors are thread safe. Calling the tagging methods
concurrently will result in concurrent calls to feature extractors
and symbol tables. The standard symbol table implementations are
all thread safe, and almost all well-implemented feature extractors
will be thread safe.
| Constructor and Description |
|---|
ChainCrf(String[] tags,
boolean[] legalTagStarts,
boolean[] legalTagEnds,
boolean[][] legalTagTransitions,
Vector[] coefficients,
SymbolTable featureSymbolTable,
ChainCrfFeatureExtractor<E> featureExtractor,
boolean addInterceptFeature)
Construct a conditional random field from the specified tags,
feature vector coefficients, symbol table for feature, feature
extractors and flag indicating whether to add intercepts or
not.
|
ChainCrf(String[] tags,
Vector[] coefficients,
SymbolTable featureSymbolTable,
ChainCrfFeatureExtractor<E> featureExtractor,
boolean addInterceptFeature)
Construct a conditional random field from the specified tags,
feature vector coefficients, symbol table for feature, feature
extractors and flag indicating whether to add intercepts or
not.
|
| Modifier and Type | Method and Description |
|---|---|
boolean |
addInterceptFeature()
Returns
true if this CRF adds an intercept feature
with value 1.0 at index 0 to all feature vectors. |
Vector[] |
coefficients()
Return the coefficient vectors for this CRF.
|
static <F> ChainCrf<F> |
estimate(Corpus<ObjectHandler<Tagging<F>>> corpus,
ChainCrfFeatureExtractor<F> featureExtractor,
boolean addInterceptFeature,
int minFeatureCount,
boolean cacheFeatureVectors,
boolean allowUnseenTransitions,
RegressionPrior prior,
int priorBlockSize,
AnnealingSchedule annealingSchedule,
double minImprovement,
int minEpochs,
int maxEpochs,
Reporter reporter)
Return the CRF estimated using stochastic gradient descent with
the specified prior from the specified corpus of taggings of
type
F pruned to the specified minimum feature count,
using the specified feature extractor, automatically adding an
intercept feature if the flag is true, allow unseen tag
transitions as specified, using the specified training
parameters for annealing, measuring convergence, and reporting
the incremental results to the specified reporter. |
ChainCrfFeatureExtractor<E> |
featureExtractor()
Return the feature extractor for this CRF.
|
SymbolTable |
featureSymbolTable()
Returns an unmodifiable view of the symbol table for features for
this CRF.
|
String |
tag(int k)
Returns the tag for the specified tag index.
|
Tagging<E> |
tag(List<E> tokens)
Return the tagging for the specified list of tokens.
|
TagLattice<E> |
tagMarginal(List<E> tokens)
Return the marginal tagging for the specified list of
input tokens.
|
Iterator<ScoredTagging<E>> |
tagNBest(List<E> tokens,
int maxResults)
Return an iterator over the
n-best scored taggings for
the specified input tokens up to a specified maximum n. |
Iterator<ScoredTagging<E>> |
tagNBestConditional(List<E> tokens,
int maxResults)
Return an iterator over the
n-best scored taggings for
the specified input tokens up to a specified maximum n,
with scores normalized to conditional probabilities. |
List<String> |
tags()
Returns an unmodifiable view of the array of tags underlying this CRF.
|
String |
toString()
Return a string-based representation of this chain CRF.
|
public ChainCrf(String[] tags, Vector[] coefficients, SymbolTable featureSymbolTable, ChainCrfFeatureExtractor<E> featureExtractor, boolean addInterceptFeature)
tags - Array of output tags.coefficients - Array of coefficient vectors parallel to tags.featureSymbolTable - Symbol table for feature extraction
to vectors.featureExtractor - CRF feature extractor.addInterceptFeature - true if an intercept feature
at position 0 with value 1 is added to all feature vectors.IllegalArgumentException - If the tag and coefficient
vector arrays are not non-empty and the same length, or if the
coefficient vectors are not all of the same number of dimensions.public ChainCrf(String[] tags, boolean[] legalTagStarts, boolean[] legalTagEnds, boolean[][] legalTagTransitions, Vector[] coefficients, SymbolTable featureSymbolTable, ChainCrfFeatureExtractor<E> featureExtractor, boolean addInterceptFeature)
tags - Array of output tagslegalTagStarts - Array of flags indicating if tag may be
first tag for a tagging.legalTagEnds - Array of flags indicating if tag may be
last tag for a tagging.legalTagTransitions - Two dimensional array of flags indicating
if the first tag may be followed by the second tag.coefficients - Array of coefficient vectors parallel to tags.featureSymbolTable - Symbol table for feature extraction
to vectors.featureExtractor - CRF feature extractor.addInterceptFeature - true if an intercept feature
at position 0 with value 1 is added to all feature vectors.IllegalArgumentException - If the tag and coefficient
vector arrays are not non-empty and the same length, or if the
coefficient vectors are not all of the same number of dimensions.public List<String> tags()
The array of coefficient vectors is parallel to the array of
tags returned by tags()k, so the coefficient vector
coefficients()[n] is for output tag tags()[n].
public String tag(int k)
tag(k) == tags()[k].k - Position of tag.ArrayIndexOutOfBoundsException - If the specified index
is out of bounds for the tag array (k < 0 or k
>= tags().length).public Vector[] coefficients()
The array of coefficient vectors is parallel to the array of
tags returned by tags()k, so the coefficient vector
coefficients()[n] is for output tag tags()[n].
public SymbolTable featureSymbolTable()
public ChainCrfFeatureExtractor<E> featureExtractor()
public boolean addInterceptFeature()
true if this CRF adds an intercept feature
with value 1.0 at index 0 to all feature vectors.public Tagging<E> tag(List<E> tokens)
Taggerpublic Iterator<ScoredTagging<E>> tagNBest(List<E> tokens, int maxResults)
NBestTaggern-best scored taggings for
the specified input tokens up to a specified maximum n.tagNBest in interface NBestTagger<E>tokens - Input tokens to tag.maxResults - Maximum number of results to return.public Iterator<ScoredTagging<E>> tagNBestConditional(List<E> tokens, int maxResults)
NBestTaggern-best scored taggings for
the specified input tokens up to a specified maximum n,
with scores normalized to conditional probabilities.
Optional operation.
tagNBestConditional in interface NBestTagger<E>tokens - Input tokens to tag.maxResults - Maximum number of results to return.public TagLattice<E> tagMarginal(List<E> tokens)
MarginalTaggertagMarginal in interface MarginalTagger<E>tokens - Input tokens to tag.public String toString()
Warning: The output is very verbose, including symbolic representations of all the coefficients.
public static <F> ChainCrf<F> estimate(Corpus<ObjectHandler<Tagging<F>>> corpus, ChainCrfFeatureExtractor<F> featureExtractor, boolean addInterceptFeature, int minFeatureCount, boolean cacheFeatureVectors, boolean allowUnseenTransitions, RegressionPrior prior, int priorBlockSize, AnnealingSchedule annealingSchedule, double minImprovement, int minEpochs, int maxEpochs, Reporter reporter) throws IOException
F pruned to the specified minimum feature count,
using the specified feature extractor, automatically adding an
intercept feature if the flag is true, allow unseen tag
transitions as specified, using the specified training
parameters for annealing, measuring convergence, and reporting
the incremental results to the specified reporter.
Reporting at the info level provides parameter and epoch level. At the debug level, it reports epoch-by-epoch likelihoods.
corpus - Corpus from which to estimate.featureExtractor - Feature extractor for the CRF.addInterceptFeature - Set to true if an intercept
feature with index 0 is automatically added to all feature
vectors with value 1.0.minFeatureCount - Minimum number of instances of a feature
to keep it.cacheFeatureVectors - Flag indicating whether or not to
keep the computed feature vectors in memory.allowUnseenTransitions - Flag indicating whether to allow
tags to start a tagging, end a tagging, or follow another tag
if there was not an example of that in the corpus.prior - Prior for coefficients to use during estimation.annealingSchedule - Schedule for annealing the learning
rate during gradient descent.minImprovement - Minimum relative improvement objective
(log likelihood plus log prior) computed as a 10-epoch rolling
average to signal convergence.minEpochs - Minimum number of epochs for which to run
gradient descent estimation.maxEpochs - Maximum number of epochs for which to run
gradient descent estimation.reporter - Reporter to which results are written, or
null for no reporting of intermediate results.IOExceptionCopyright © 2019 Alias-i, Inc.. All rights reserved.