public class StringTagging extends Tagging<String>
StringTagging is a tagging over string-based tokens
that indexes each token to a position in an underlying character
sequence.
Because tokenizers may normalize inputs, the underlying
characters between a token's start and end are not necessarily
equivalent to the token itself. That is, token(n) does not
need to be equal to characters().substring(tokenStart(n),tokenEnd(n)).
| Constructor and Description |
|---|
StringTagging(List<String> tokens,
List<String> tags,
CharSequence cs,
int[] tokenStarts,
int[] tokenEnds)
Construct a string tagging from the specified string-based
tokens and tags, an underlying character sequence, and arrays
representing the position at which each token starts and ends.
|
StringTagging(List<String> tokens,
List<String> tags,
CharSequence cs,
List<Integer> tokenStarts,
List<Integer> tokenEnds)
Construct a string tagging from the specified string-based
tokens and tags, an underlying character sequence, and lists
representing the position at which each token starts and ends.
|
| Modifier and Type | Method and Description |
|---|---|
String |
characters()
Returns the characters underlying this string tagging.
|
boolean |
equals(Object that)
Returns
true if the specified object is a string
tagging that's structurally identical to this tagging. |
int |
hashCode()
Returns a hash code computed from the underlying string and
tags.
|
String |
rawToken(int n)
Return the string underlying the token in the specified
position.
|
int |
tokenEnd(int n)
Return the character offset of the end of the token in the
specified input position in the underlying characters.
|
int |
tokenStart(int n)
Return the character offfset of the start of the token in
the specified input position in the underlying characters.
|
String |
toString()
Return a string-based representation of this tagging.
|
public StringTagging(List<String> tokens, List<String> tags, CharSequence cs, int[] tokenStarts, int[] tokenEnds)
The lists and arrays are copied, and the character sequence converted to a string. Subsequent changes to these arguments will not affect the constructed tagging.
tokens - List of strings representing token inputs.tags - List of strings representing tag outputs, parallel to tags.cs - Underlying character sequence.tokenStarts - Starting positions of tokens, parallel to tokens.tokenEnds - Ending positions of tokens, parallel to tokens.IllegalArgumentException - If the list of tokens, list of tags,
token starts, and token ends are not all the same length, or if a token
start/end index is not possible for the underlying characters.public StringTagging(List<String> tokens, List<String> tags, CharSequence cs, List<Integer> tokenStarts, List<Integer> tokenEnds)
The lists are copied, and the character sequence converted to a string. Subsequent changes to these arguments will not affect the constructed tagging.
tokens - List of strings representing token inputs.tags - List of strings representing tag outputs, parallel to tags.cs - Underlying character sequence.tokenStarts - Starting positions of tokens, parallel to tokens.tokenEnds - Ending positions of tokens, parallel to tokens.IllegalArgumentException - If the list of tokens, list of tags,
token starts, and token ends are not all the same length, or if a token
start/end index is not possible for the underlying characters.public int tokenStart(int n)
n - Position of token in input token list.IndexOutOfBoundsException - If the position is not between 0
(inclusive) and the number of tokens (exclusive).public int tokenEnd(int n)
n - Position of token in input token list.IndexOutOfBoundsException - If the position is not between 0
(inclusive) and the number of tokens (exclusive).public String rawToken(int n)
n - Token input position.IndexOutOfBoundsException - If the position is not between 0
(inclusive) and the number of tokens (exclusive).public String characters()
public String toString()
Taggingpublic boolean equals(Object that)
true if the specified object is a string
tagging that's structurally identical to this tagging.
For taggings to be identical, their underlying strings must
be equal, all tags and tokens must be equal, and all token
starts and ends must be equal.public int hashCode()
31**N * characters().hashCode() + 31**(N-1) * token(N-1).hashCode() + 31**(N-2) * token(N-2).hashCode() + ... + 31**1 * token(1).hashCode() + 31**0 * token(0).hashCode()
This hash code is consistent with equality.
Copyright © 2016 Alias-i, Inc.. All rights reserved.