public final class IndoEuropeanTokenCategorizer extends Object implements Compilable, TokenCategorizer
IndoEuropeanTokenCategorizer is a generic token
categorizer for Indo-European languages that is based on character
"shape".
The token categories returned by categorize(String) are
as follows. To find the category for a given token, the first
category that matches in the following list is chosen.
| Category | Description |
NULL-TOK |
Zero-length string |
1-DIG |
A single digit. |
2-DIG |
A two-digit string. |
3-DIG |
A three digit string. |
4-DIG |
A four digit string. |
5+-DIG |
String of all digits five or more digits long. |
DIG-LET |
Contains digits and letters. |
DIG-- |
Contains digits and hyphens |
DIG-/ |
Contains digits and slashes. |
DIG-, |
Contains digits and commas. |
DIG-. |
Contains digits and periods. |
1-LET-UP |
A single uppercase letter. |
1-LET-LOW |
One lowercase letter |
LET-UP |
Uppercase letters only. |
LET-LOW |
Lowercase letters only. |
LET-CAP |
Uppercase letter followed by one or more lowercase letters. |
LET-MIX |
Letters only, containing both uppercase and lettercase. |
PUNC- |
A sequence of punctuation characters. |
OTHER |
Anything else. |
| Modifier and Type | Field and Description |
|---|---|
static IndoEuropeanTokenCategorizer |
CATEGORIZER
This is a constant Indo-European token categorizer.
|
| Modifier and Type | Method and Description |
|---|---|
String[] |
categories()
Returns a copy of the array of strings representing all the
categories produced by this categorizer.
|
String |
categorize(String token)
Returns the type of a token, based on its structure or other
information.
|
void |
compileTo(ObjectOutput objOut)
Compiles this token categorizer to the specified object output.
|
public static final IndoEuropeanTokenCategorizer CATEGORIZER
public String categorize(String token)
categorize in interface TokenCategorizertoken - Token whose class is returned.public String[] categories()
categories in interface TokenCategorizerpublic void compileTo(ObjectOutput objOut) throws IOException
CATEGORIZER.compileTo in interface CompilableobjOut - Object output to which this categorizer is
written.IOException - If there is an underlying I/O exception
during the write.aCopyright © 2019 Alias-i, Inc.. All rights reserved.