| Package | Description |
|---|---|
| com.aliasi.tokenizer |
Classes for tokenizing character sequences.
|
| Modifier and Type | Class and Description |
|---|---|
class |
EnglishStopTokenizerFactory
An
EnglishStopTokenizerFactory applies an English stop
list to a contained base tokenizer factory. |
class |
LowerCaseTokenizerFactory
A
LowerCaseTokenizerFactory filters the tokenizers produced
by a base tokenizer factory to produce lower case output. |
class |
PorterStemmerTokenizerFactory
A
PorterStemmerTokenizerFactory applies Porter's stemmer
to the tokenizers produced by a base tokenizer factory. |
class |
RegExFilteredTokenizerFactory
A
RegExFilteredTokenizerFactory modifies the tokens
returned by a base tokenizer factory's tokizer by removing
those that do not match a regular expression pattern. |
class |
SoundexTokenizerFactory
A
SoundexTokenizerFactory modifies the output of a base
tokenizer factory to produce tokens in soundex representation. |
class |
StopTokenizerFactory
A
StopTokenizerFactory modifies a base tokenizer factory
by removing tokens in a specified stop set. |
class |
TokenLengthTokenizerFactory
A
TokenLengthTokenizerFactory filters the tokenizers produced
by a base tokenizer to only return tokens between specified lower and
upper length limits. |
class |
WhitespaceNormTokenizerFactory
A
WhitespaceNormTokenizerFactory filters the tokenizers produced
by a base tokenizer factory to convert non-empty whitespaces to a single
space and leave empty (zero-length) whitespaces alone. |
Copyright © 2016 Alias-i, Inc.. All rights reserved.