public interface SentenceModel
SentenceModel interface specifies a means of doing
sentence segmentation from arrays of tokens and whitespaces.
The sentence model operates over aligned arrays of tokens and
whitespaces, as derived from a Tokenizer. There are two methods in the
interface. The standard external interface is boundaryIndices(String[],String[]), which returns an array of
token indices that are sentence-final. For instance, with tokens
{"John", "ran", ".", "He", "also", "jumped", "!"}, and
whitespaces {"", " ", "", " ", " ", " ", " ", "", ""}.
the return result from the Indo-European model would be
{2,6}, because the token indexed 2 is a period
(.) and the token indexed 6 is an exclamation point
(!). The return result will often depend on the
whitespaces as well as the tokens.
The second method is boundaryIndices(String[],String[],int,int,Collection), which adds
the boundary indexes as Integers to the specified
collection for the slice determined by the start and end plus one
indices.
| Modifier and Type | Method and Description |
|---|---|
int[] |
boundaryIndices(String[] tokens,
String[] whitespaces)
Returns an array of indices of sentence-final tokens.
|
void |
boundaryIndices(String[] tokens,
String[] whitespaces,
int start,
int end,
Collection<Integer> indices)
Adds the sentence final token indices as
Integer
instances to the specified collection, only considering tokens
between index start and end-1
inclusive. |
int[] boundaryIndices(String[] tokens, String[] whitespaces)
tokens - Array of tokens to annotate.whitespaces - Array of whitespaces to annotate.IllegalArgumentException - If the array of whitespaces is
not one longer than the array of tokens.void boundaryIndices(String[] tokens, String[] whitespaces, int start, int end, Collection<Integer> indices)
Integer
instances to the specified collection, only considering tokens
between index start and end-1
inclusive.tokens - Array of tokens to annotate.whitespaces - Array of whitespaces to annotate.start - Index of first token to annotate.end - Index one beyond the last token to annotate.indices - Collection into which to write the boundary
indices.IllegalArgumentException - If the array of tokens is
not at least as long as start+end and the
array of whitespaces at least as long as start+end+1.Copyright © 2016 Alias-i, Inc.. All rights reserved.