public class TokenFeatureExtractor extends Object implements FeatureExtractor<CharSequence>, Serializable
TokenFeatureExtractor produces feature vectors from
character sequences representing token counts.
The token feature extractors implement the Serializable
interface. A token feature extractor will actually be serializable
if the underlying tokenizer factory is serializable, either by
implementing the Serializable interface or the Compilable interface. If it is not, attempting to serialize the
feature extractor will throw an exception.
| Constructor and Description |
|---|
TokenFeatureExtractor(TokenizerFactory factory)
Construct a token-based feature extractor from the
specified tokenizer factory.
|
| Modifier and Type | Method and Description |
|---|---|
Map<String,Counter> |
features(CharSequence in)
Return the feature vector for the specified character sequence.
|
TokenizerFactory |
tokenizerFactory()
Return the tokenizer factory underlying this token
feature extractor.
|
String |
toString()
Returns a description of this token feature extractor including
its contained tokenizer factory.
|
public TokenFeatureExtractor(TokenizerFactory factory)
factory - Tokenizer factory to use for tokenization.public TokenizerFactory tokenizerFactory()
Warning: This is the actual tokenizer factory, not a copy, so changes to it will affect this class.
public Map<String,Counter> features(CharSequence in)
features in interface FeatureExtractor<CharSequence>in - Character sequence from which to extract features.public String toString()
toString() method of the contained tokenizer factory.Copyright © 2019 Alias-i, Inc.. All rights reserved.