public class CharacterTokenCategorizer extends Object implements TokenCategorizer
LETTER, DIGIT,
PUNCTUATION, OTHER, and
UNKNOWN. The latter class is for those tokens that
are not single characters.| Modifier and Type | Field and Description |
|---|---|
static String |
DIGIT_CAT
The digit category.
|
static String |
LETTER_CAT
The letter category.
|
static String |
OTHER_CAT
The other category for non-digits, non-letters and non-punctuation
tokens of a single character long.
|
static String |
PUNCTUATION_CAT
The punctuation category.
|
static String |
UNKNOWN_CAT
The unknown category for tokens not one character long.
|
| Modifier and Type | Method and Description |
|---|---|
String[] |
categories()
Returns a copy of the array of categories used by this categorizer.
|
String |
categorize(String token)
Returns the category of the specified token.
|
String |
toString()
Returns the name of this class.
|
public static final String UNKNOWN_CAT
public static final String DIGIT_CAT
public static final String LETTER_CAT
public static final String PUNCTUATION_CAT
public static final String OTHER_CAT
public String categorize(String token)
UNKNOWN for tokens that are not a single
character long. A token that is a single digit will return
DIGIT, a single letter LETTER, and
punctuation PUNCTUATION. All other single-letter
tokens will return OTHER.categorize in interface TokenCategorizertoken - Token to categorize.public String[] categories()
categories in interface TokenCategorizerCopyright © 2016 Alias-i, Inc.. All rights reserved.