Class DictionaryTagger

  • All Implemented Interfaces:
    Tagger
    Direct Known Subclasses:
    UMLSMetathesaurusDictionaryTagger

    public class DictionaryTagger
    extends Object
    implements Tagger
    This class represents a very simple dictionary-based tagger. All text subsequences which match an entry will be tagged, without regard to the context.
    Author:
    Bob
    • Constructor Detail

    • Method Detail

      • configure

        public void configure​(org.apache.commons.configuration.HierarchicalConfiguration config,
                              Tokenizer tokenizer)
      • load

        public void load​(org.apache.commons.configuration.HierarchicalConfiguration config)
                  throws IOException
        Throws:
        IOException
      • add

        public void add​(String text,
                        EntityType type)
        Adds a single entry to the dictionary. The text is processed by the tokenizer and the resulting tokens are stored.
        Parameters:
        text - The text to find
        type - The EntityType to tag the text with
      • tag

        public void tag​(Sentence sentence)
        Description copied from interface: Tagger
        Add Mentions to the Sentence. The Sentence must have been tokenized previously.
        Specified by:
        tag in interface Tagger
        Parameters:
        sentence - The sentence to which Mentions should be added
      • suppress

        public void suppress​(String text)
      • size

        public int size()
        Returns:
        The number of entries in this dictionary
      • getTokenizer

        public Tokenizer getTokenizer()
      • setTokenizer

        public void setTokenizer​(Tokenizer tokenizer)
      • isFilterContainedMentions

        public boolean isFilterContainedMentions()
      • setFilterContainedMentions

        public void setFilterContainedMentions​(boolean filterContainedMentions)
      • isNormalizeMixedCase

        public boolean isNormalizeMixedCase()
      • setNormalizeMixedCase

        public void setNormalizeMixedCase​(boolean normalizeMixedCase)
      • isNormalizeDigits

        public boolean isNormalizeDigits()
      • setNormalizeDigits

        public void setNormalizeDigits​(boolean normalizeDigits)
      • isGenerate2PartVariations

        public boolean isGenerate2PartVariations()
      • setGenerate2PartVariations

        public void setGenerate2PartVariations​(boolean generate2PartVariations)
      • isDropEndParentheticals

        public boolean isDropEndParentheticals()
      • setDropEndParentheticals

        public void setDropEndParentheticals​(boolean dropEndParentheticals)