public class EnglishStopTokenizerFactory extends StopTokenizerFactory implements Serializable
EnglishStopTokenizerFactory applies an English stop
list to a contained base tokenizer factory.
The built-in stoplist consists of the following words:
a, be, had, it, only, she, was, about, because, has, its, of, some, we, after, been, have, last, on, such, were, all, but, he, more, one, than, when, also, by, her, most, or, that, which, an, can, his, mr, other, the, who, any, co, if, mrs, out, their, will, and, corp, in, ms, over, there, with, are, could, inc, mz, s, they, would, as, for, into, no, so, this, up, at, from, is, not, says, toNote that the stoplist entries are all lowercase. Thus the input should probably first be filtered by a
LowerCaseTokenizerFactory.
An EnglishStopTokenizerFactory is serializable if its
base tokenizer factory is serializable.
| Constructor and Description |
|---|
EnglishStopTokenizerFactory(TokenizerFactory factory)
Construct an English stop tokenizer factory with the
specified base factory.
|
| Modifier and Type | Method and Description |
|---|---|
String |
toString() |
modifyToken, stopSetmodify, modifyWhitespacebaseTokenizerFactory, tokenizerpublic EnglishStopTokenizerFactory(TokenizerFactory factory)
factory - Base tokenizer factory.public String toString()
toString in class StopTokenizerFactoryCopyright © 2016 Alias-i, Inc.. All rights reserved.