public class PorterStemmerTokenizerFactory extends ModifyTokenTokenizerFactory implements Serializable
PorterStemmerTokenizerFactory applies Porter's stemmer
to the tokenizers produced by a base tokenizer factory.
Porter's stemmer computes an approximation of converting words
to their morphological base form. This class provides a single
top-level static method, stem(String), which returns a
stemmed form of an input string.
The underlying stemming code is Martin Porter's own public domain Java port of his original C implementation of stemming. More information can be found at:
Porter Stemmer Home Page
The original paper describing Porter's stemmer is:
Porter, Martin. 1980. An algorithm for suffix stripping. Program. 14:3. 130--137.
| Constructor and Description |
|---|
PorterStemmerTokenizerFactory(TokenizerFactory factory)
Construct a tokenizer factory that applies Porter stemming
to the tokenizers produced by the specified base factory.
|
| Modifier and Type | Method and Description |
|---|---|
String |
modifyToken(String token)
Returns the Porter stemmed version of the specified
token.
|
static String |
stem(String in)
Return the stem of the specified input string using the Porter
stemmer.
|
String |
toString() |
modify, modifyWhitespacebaseTokenizerFactory, tokenizerpublic PorterStemmerTokenizerFactory(TokenizerFactory factory)
factory - Base tokenizer factory.public String modifyToken(String token)
modifyToken in class ModifyTokenTokenizerFactorytoken - Token to stem.public static String stem(String in)
in - String to stem.public String toString()
toString in class ModifyTokenTokenizerFactoryCopyright © 2016 Alias-i, Inc.. All rights reserved.