Package banner.tagging.dictionary
Class DictionaryTagger
- java.lang.Object
-
- banner.tagging.dictionary.DictionaryTagger
-
- All Implemented Interfaces:
Tagger
- Direct Known Subclasses:
UMLSMetathesaurusDictionaryTagger
public class DictionaryTagger extends Object implements Tagger
This class represents a very simple dictionary-based tagger. All text subsequences which match an entry will be tagged, without regard to the context.- Author:
- Bob
-
-
Field Summary
Fields Modifier and Type Field Description protected Trie<String,Set<EntityType>>entitiesprotected Trie<String,Boolean>notInclude
-
Constructor Summary
Constructors Constructor Description DictionaryTagger()Creates a newDictionaryTagger
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidadd(String text, EntityType type)Adds a single entry to the dictionary.voidadd(String text, Collection<EntityType> types)booleanadd(List<String> tokens, Collection<EntityType> types)voidconfigure(org.apache.commons.configuration.HierarchicalConfiguration config, Tokenizer tokenizer)TokenizergetTokenizer()booleanisDropEndParentheticals()booleanisFilterContainedMentions()booleanisGenerate2PartVariations()booleanisNormalizeDigits()booleanisNormalizeMixedCase()voidload(org.apache.commons.configuration.HierarchicalConfiguration config)protected List<String>process(String input)voidsetDropEndParentheticals(boolean dropEndParentheticals)voidsetFilterContainedMentions(boolean filterContainedMentions)voidsetGenerate2PartVariations(boolean generate2PartVariations)voidsetNormalizeDigits(boolean normalizeDigits)voidsetNormalizeMixedCase(boolean normalizeMixedCase)voidsetTokenizer(Tokenizer tokenizer)intsize()voidsuppress(String text)voidtag(Sentence sentence)protected Stringtransform(String str)
-
-
-
Constructor Detail
-
DictionaryTagger
public DictionaryTagger()
Creates a newDictionaryTagger
-
-
Method Detail
-
configure
public void configure(org.apache.commons.configuration.HierarchicalConfiguration config, Tokenizer tokenizer)
-
load
public void load(org.apache.commons.configuration.HierarchicalConfiguration config) throws IOException- Throws:
IOException
-
add
public void add(String text, EntityType type)
Adds a single entry to the dictionary. The text is processed by the tokenizer and the resulting tokens are stored.- Parameters:
text- The text to findtype- TheEntityTypeto tag the text with
-
add
public void add(String text, Collection<EntityType> types)
-
add
public boolean add(List<String> tokens, Collection<EntityType> types)
-
suppress
public void suppress(String text)
-
size
public int size()
- Returns:
- The number of entries in this dictionary
-
getTokenizer
public Tokenizer getTokenizer()
-
setTokenizer
public void setTokenizer(Tokenizer tokenizer)
-
isFilterContainedMentions
public boolean isFilterContainedMentions()
-
setFilterContainedMentions
public void setFilterContainedMentions(boolean filterContainedMentions)
-
isNormalizeMixedCase
public boolean isNormalizeMixedCase()
-
setNormalizeMixedCase
public void setNormalizeMixedCase(boolean normalizeMixedCase)
-
isNormalizeDigits
public boolean isNormalizeDigits()
-
setNormalizeDigits
public void setNormalizeDigits(boolean normalizeDigits)
-
isGenerate2PartVariations
public boolean isGenerate2PartVariations()
-
setGenerate2PartVariations
public void setGenerate2PartVariations(boolean generate2PartVariations)
-
isDropEndParentheticals
public boolean isDropEndParentheticals()
-
setDropEndParentheticals
public void setDropEndParentheticals(boolean dropEndParentheticals)
-
-