public class WordTokenizer extends Object implements WordTokenizerConstants
This should be a good tokenizer for most European-language documents.
Many applications have specific tokenizer needs. If this tokenizer does not suit your application, please consider copying this source code directory to your project and maintaining your own grammar-based tokenizer.
| Modifier and Type | Field and Description |
|---|---|
Token |
jj_nt
Next token.
|
Token |
token
Current token.
|
WordTokenizerTokenManager |
token_source
Generated Token Manager.
|
ACRONYM, ALPHA, ALPHANUM, APOSTROPHE, COMPANY, DEFAULT, DIGIT, EMAIL, EOF, HAS_DIGIT, HOST, HYPHENATED, LETTER, NOISE, NUM, P, tokenImage| Constructor and Description |
|---|
WordTokenizer(InputStream stream)
Constructor with InputStream.
|
WordTokenizer(InputStream stream,
String encoding)
Constructor with InputStream and supplied encoding
|
WordTokenizer(Reader stream)
Constructor.
|
WordTokenizer(WordTokenizerTokenManager tm)
Constructor with generated Token Manager.
|
| Modifier and Type | Method and Description |
|---|---|
void |
disable_tracing()
Disable tracing.
|
void |
enable_tracing()
Enable tracing.
|
ParseException |
generateParseException()
Generate ParseException.
|
Token |
getNextToken()
Get the next Token.
|
Token |
getToken(int index)
Get the specific Token.
|
Token |
nextToken()
Returns the next token in the stream, or null at EOS.
|
void |
ReInit(InputStream stream)
Reinitialise.
|
void |
ReInit(InputStream stream,
String encoding)
Reinitialise.
|
void |
ReInit(Reader stream)
Reinitialise.
|
void |
ReInit(WordTokenizerTokenManager tm)
Reinitialise.
|
public WordTokenizerTokenManager token_source
public Token token
public Token jj_nt
public WordTokenizer(InputStream stream)
public WordTokenizer(InputStream stream, String encoding)
public WordTokenizer(Reader stream)
public WordTokenizer(WordTokenizerTokenManager tm)
public final Token nextToken() throws ParseException, IOException
The returned token's type is set to an element of WordTokenizerConstants.tokenImage.
ParseExceptionIOExceptionpublic void ReInit(InputStream stream)
public void ReInit(InputStream stream, String encoding)
public void ReInit(Reader stream)
public void ReInit(WordTokenizerTokenManager tm)
public final Token getNextToken()
public final Token getToken(int index)
public ParseException generateParseException()
public final void enable_tracing()
public final void disable_tracing()
Copyright © 2021. All rights reserved.