Class CRFTagger

    • Field Detail

      • model

        protected cc.mallet.fst.CRF model
    • Constructor Detail

      • CRFTagger

        protected CRFTagger​(cc.mallet.fst.CRF model,
                            FeatureSet featureSet,
                            int order)
    • Method Detail

      • load

        public static CRFTagger load​(InputStream f,
                                     dragon.nlp.tool.Lemmatiser lemmatiser,
                                     dragon.nlp.tool.Tagger posTagger,
                                     Tagger preTagger)
                              throws IOException
        Loads a CRFTagger from the specified file. As the lemmatiser and part-of-speech tagger both require data, these cannot be written to disk and must be passed in new.
        Parameters:
        f - The file to load the CRFTagger from, as written by the write() method.
        lemmatiser - The Lemmatiser to use
        posTagger - The part-of-speech Tagger to use
        Returns:
        A new instance of the CRFTagger contained in the specified file
        Throws:
        IOException
      • train

        public static CRFTagger train​(Set<Sentence> sentences,
                                      int order,
                                      TagFormat format,
                                      FeatureSet featureSet)
        Trains and returns a CRFTagger on the specified Sentence s. This method may take hours or even days to complete. When training, you will likely need to increase the amount of memory used by the Java virtual machine (try adding "-Xms1024m" to the command line).
        Parameters:
        sentences - The Sentences to train the tagger on
        order - The CRF order to use
        format - The TagFormat to use
        Returns:
        A trained CRFTagger; ready to tag unseen sentences or be output to disk
      • write

        public void write​(File f)
        Serializes and writes this CRFTagger to the specified file
        Parameters:
        f - The file to write this CRFTagger to
      • tag

        public void tag​(Sentence sentence)
        Description copied from interface: Tagger
        Add Mentions to the Sentence. The Sentence must have been tokenized previously.
        Specified by:
        tag in interface Tagger
        Parameters:
        sentence - The sentence to which Mentions should be added
      • getInstance

        protected cc.mallet.types.Instance getInstance​(Sentence sentence)
      • getTagList

        protected static List<String> getTagList​(cc.mallet.types.Sequence<Object> tags)
      • getOrder

        public int getOrder()
        Returns:
        The CRF order used by this tagger. Order 1 means that the last state is used and order 2 means that the last 2 states are used.
      • getFeatureNames

        public Set<String> getFeatureNames()