E - Type of tokens in the lattice.public abstract class TagLattice<E> extends Object
TagLattice provides an interface
for the output of a marginal tagger, based on forward, backward,
and transition log potentials. The allowance of general potentials
makes this interface suitable for the output of either hidden
Markov models (HMM) or linear chain conditional random fields
(CRF).
The forward, backward, and normalizing terms are used to define the marginal probability of a single tag at a specified position given the inputs:
To avoid problems with overflow, natural log values are used. On the log scale,p(tag[n]=k | toks[0],...,toks[N-1]) = fwd(n,k) * bk(n,k) / Z
Warning: No checks are made to ensure that values supplied to the constructor form a coherent probability estimate.log p(tag[n]=k | toks[0],...,toks[N-1]) = log fwd(n,k) + log bk(n,k) - log Z
This allows us to compute the probability of a sequence of tags, for instance for a phrase, by:
On the log scale, this becomes:p(tag[n]=k0, tag[n+1]=k1, ..., tag[n+m]=km | toks) = fwd(n,k0) * bk(n+m,km) * Πi < m trans(n+i,tags[n+i],tags[n+i+1]) / Z
The log of this value returned bylog p(tag[n]=k0, tag[n+1]=k1, ..., tag[n+m]=km | toks) = log fwd(n,k0) + log bk(n+m,km) * Σi < m log trans(n+i,tags[n+i],tags[n+i+1]) - log Z
logProbability(int,int[]), where
the first argument is n and the second argument
{k0,k1,...,km}.
The transitions are defined by:
wheretrans(n,k1,k2) = φ(tag[n]=k2|tag[n-1]=k1)
φ is the transition potential from the
tag k1 at position n-1 to tag k2 at position n.
The log of this value is returned by logTransition(int,int,int),
where the first argument is n and the second two k1
and k2.
The forward values are defined by:
wherelog fwd(0,k) = init(k)
init(k) is the start potential, defined on an
application-specific basis. The recursive step defines the forward
values for subsequent positions 0 < n < N& and tags k
This is typically computed for all n and k in n*k time using the forward algorithm.fwd(n,k) = Σk' fwd(n-1,k') * trans(n-1,k,k')
The log of this value is returned by logForward(int,int)
where the first argument is n and the second k.
The backward potentials are similar, but condition on rather than predict the label. This simplifies the basis of the recursion to
where N is the length of the input (number of tokens). The recursive step for 0 <= n < N-1 isbk(N-1,k) = 1.0
The log of this value is returned bybk(n,k) = Σk' trans(n,k,k') * bk(n+1,k')
logBackward(int,int)
where the first argument is n and the second k.
The normalizer is the sum over all paths, and acts to normalize sequences to probabilities. It may be computed by:
The log of this value is returned byZ = Σk fwd(N-1,k)
logZ().
In some settings, such as hidden Markov models (HMMs) the forward, backward, transition and normalizer may be interpreted as probabilities.
trans(n,k1,k2) = p(tag[n]=k2|tag[n-1]=k1)
fwd(n,k) = p(label[n]=k, toks[0], ..., toks[n])
bk(n,k) = p(toks[n+1],...,toks[N-1] | label[n]=k)
Z = p(toks[0],...,toks[N-1])
| Constructor and Description |
|---|
TagLattice()
Construct an empty tag lattice.
|
| Modifier and Type | Method and Description |
|---|---|
abstract double |
logBackward(int token,
int tag)
Returns the log of the backward probability to the specified
token and tag.
|
abstract double |
logForward(int token,
int tag)
Return the log of the forward probability of the specified tag
at the specified position.
|
abstract double |
logProbability(int n,
int tag)
Convenience method returning the log of the conditional
probability that the specified token has the specified tag,
given the complete list of input tokens.
|
abstract double |
logProbability(int nFrom,
int[] tags)
Return the log conditional probability that the tokens starting
with the specified token position have the specified tags given
the complete sequence of input tokens.
|
abstract double |
logProbability(int nTo,
int tagFrom,
int tagTo)
Convenience method returning the log of the conditional
probability that the specified two tokens have the specified
tag given the complete list of input tokens.
|
abstract double |
logTransition(int tokenFrom,
int tagFrom,
int tagTo)
Returns the log of the transition probability from the specified
input token position with the specified previous tag to the
specified target tag.
|
abstract double |
logZ()
Return the log of the normalizing constant for the lattice.
|
abstract int |
numTags()
Return the number of tags in this tag lattice.
|
abstract int |
numTokens()
Returns the length of this tag lattice as measured
by number of tokens.
|
abstract String |
tag(int id)
Return the tag with the specified symbol identifier.
|
abstract List<String> |
tagList()
Returns an unmodifiable view of the list of tags
used in this lattice, indexed by identifier.
|
abstract SymbolTable |
tagSymbolTable()
Returns a symbol table which converts tags to identifiers and
vice-versa.
|
abstract E |
token(int n)
Return the token at the specified position in the input.
|
ConditionalClassification |
tokenClassification(int tokenIndex)
Returns the classification of the token at the specified position
in this tag lattice.
|
abstract List<E> |
tokenList()
Return an unmodifiable view of the underlying tokens for this
tag lattice.
|
public abstract List<E> tokenList()
public abstract List<String> tagList()
public abstract String tag(int id)
id - Identifer for tag.IndexOutOfBoundsException - If the specified identifier is
not in range for the list of tags.public abstract int numTags()
public abstract E token(int n)
n - Input position.IndexOutOfBoundsException - If the specified index is
not in range for the list of tokens.public abstract int numTokens()
public abstract SymbolTable tagSymbolTable()
A new symbol table is constructed for each call, so it should be saved and reused if possible. Changing the returned symbol table will not affect this lattice.
public abstract double logProbability(int n,
int tag)
This method returns results defined by
logProbability(n,tag)
== logProbability(n,new int[] { tag })n - Position of input token.tag - Identifier of tag.ArrayIndexOutOfBoundsException - If the token or tag
identifiers are not in range.public abstract double logProbability(int nTo,
int tagFrom,
int tagTo)
This method returns results defined by
logProbability(nTo,tagFrom,tagTo)
== logProbability(n-1,new int[] { tagFrom, tagTo })nTo - Position of second token.tagFrom - First Tag from which transition is made.tagTo - Second Tag to which transition is made.public abstract double logProbability(int nFrom,
int[] tags)
nFrom - Starting position of sequence.tags - Tag identifiers for sequence.IllegalArgumentException - If the token is out of range or
the token plus the length of the tag sequence is out of range of
tokens, or if any of the tags is not a known identifier.public abstract double logForward(int token,
int tag)
token - Token position.tag - Tag identifier.ArrayIndexOutOfBoundsException - If the token or tag index are
out of bounds for this lattice's tokens or tags.public abstract double logBackward(int token,
int tag)
token - Input token position.tag - Tag identifier.ArrayIndexOutOfBoundsException - If the token or tag index are
out of bounds for this lattice's tokens or tags.public abstract double logTransition(int tokenFrom,
int tagFrom,
int tagTo)
tokenFrom - Token position from which the transition is
made.tagFrom - Identifier for the previous tag from which the
transition is made.tagTo - Tag identifier for the target tag to which the
the transition is made.ArrayIndexOutOfBoundsException - If the token index or
either of the tag indexes are out of bounds for this lattice's
tokens or tags.public abstract double logZ()
public ConditionalClassification tokenClassification(int tokenIndex)
tokenIndex - Position of token to classify.Copyright © 2019 Alias-i, Inc.. All rights reserved.