Package opennlp.tools.tokenize
Class TokenizerFactory
- java.lang.Object
-
- opennlp.tools.util.BaseToolFactory
-
- opennlp.tools.tokenize.TokenizerFactory
-
public class TokenizerFactory extends BaseToolFactory
The factory that providesTokenizerdefault implementation and resources. Users can extend this class if their application requires overriding theTokenContextGenerator,Dictionaryetc.
-
-
Constructor Summary
Constructors Constructor Description TokenizerFactory()Instantiates aTokenizerFactorythat provides the default implementation of the resources.TokenizerFactory(String languageCode, Dictionary abbreviationDictionary, boolean useAlphaNumericOptimization, Pattern alphaNumericPattern)Instantiates aTokenizerFactory.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static TokenizerFactorycreate(String subclassName, String languageCode, Dictionary abbreviationDictionary, boolean useAlphaNumericOptimization, Pattern alphaNumericPattern)Factory method the framework uses instantiate a newTokenizerFactory.Map<String,Object>createArtifactMap()A model's implementation should call this constructor that creates a model programmatically.Map<String,String>createManifestEntries()DictionarygetAbbreviationDictionary()PatterngetAlphaNumericPattern()TokenContextGeneratorgetContextGenerator()StringgetLanguageCode()booleanisUseAlphaNumericOptimization()voidvalidateArtifactMap()Validates the parsed artifacts.-
Methods inherited from class opennlp.tools.util.BaseToolFactory
create, create, createArtifactSerializersMap
-
-
-
-
Constructor Detail
-
TokenizerFactory
public TokenizerFactory()
Instantiates aTokenizerFactorythat provides the default implementation of the resources.
-
TokenizerFactory
public TokenizerFactory(String languageCode, Dictionary abbreviationDictionary, boolean useAlphaNumericOptimization, Pattern alphaNumericPattern)
Instantiates aTokenizerFactory. Use this constructor to programmatically create a factory.- Parameters:
languageCode- The ISO language code to be used for this factory.abbreviationDictionary- TheDictionarywhich holds abbreviations.useAlphaNumericOptimization- Whether alphanumerics are skipped, or not.alphaNumericPattern-nullor a custom alphanumericPattern(default is:"^[A-Za-z0-9]+$", provided byFactory.DEFAULT_ALPHANUMERIC.
-
-
Method Detail
-
validateArtifactMap
public void validateArtifactMap() throws InvalidFormatExceptionDescription copied from class:BaseToolFactoryValidates the parsed artifacts.Note: Subclasses should generally invoke
super.validateArtifactMapat the beginning of this method.- Specified by:
validateArtifactMapin classBaseToolFactory- Throws:
InvalidFormatException- Thrown if validation found invalid states.
-
createArtifactMap
public Map<String,Object> createArtifactMap()
Description copied from class:BaseToolFactoryA model's implementation should call this constructor that creates a model programmatically.The base implementation will return a
HashMapthat should be populated by subclasses.- Overrides:
createArtifactMapin classBaseToolFactory- Returns:
- Retrieves a
Mapwith pairs of keys and objects.
-
createManifestEntries
public Map<String,String> createManifestEntries()
- Overrides:
createManifestEntriesin classBaseToolFactory- Returns:
- Retrieves the manifest entries to be added to the model manifest.
-
create
public static TokenizerFactory create(String subclassName, String languageCode, Dictionary abbreviationDictionary, boolean useAlphaNumericOptimization, Pattern alphaNumericPattern) throws InvalidFormatException
Factory method the framework uses instantiate a newTokenizerFactory.- Parameters:
subclassName- The name of the class implementing theTokenizerFactory.languageCode- The ISO language code theTokenizershould use.abbreviationDictionary- An optionalDictionarycontaining abbreviations, ornullif not present.useAlphaNumericOptimization- Whether the alphanumeric optimization is be enabled or not.alphaNumericPattern- ThePatternthe alphanumeric optimization should use, if enabled.- Returns:
- A valid
TokenizerFactoryinstance. - Throws:
InvalidFormatException- Thrown if one of the input parameters doesn't comply the expected format.
-
getAlphaNumericPattern
public Pattern getAlphaNumericPattern()
- Returns:
- Retrieves the (user-)specified alphanumeric
Patternor a default.
-
isUseAlphaNumericOptimization
public boolean isUseAlphaNumericOptimization()
- Returns:
trueif the alphanumeric optimization is enabled, otherwisefalse.
-
getAbbreviationDictionary
public Dictionary getAbbreviationDictionary()
- Returns:
- The abbreviation
Dictionaryornullif none is active.
-
getLanguageCode
public String getLanguageCode()
- Returns:
- Retrieves the ISO language code in use.
-
getContextGenerator
public TokenContextGenerator getContextGenerator()
- Returns:
- Retrieves a
TokenContextGeneratorinstance.
-
-