Class SentenceSplitter


  • public class SentenceSplitter
    extends java.lang.Object
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      java.util.ArrayList<java.lang.String> getLabelsFromLabelSequence​(cc.mallet.types.LabelSequence ls)  
      cc.mallet.fst.CRF getModel()  
      cc.mallet.types.InstanceList makePredictionData​(java.io.File[] predictFiles, cc.mallet.pipe.Pipe myPipe)
      creates a list of instances with the pipe provided from the given array of files
      cc.mallet.types.Instance makePredictionData​(java.io.File predictFile, cc.mallet.pipe.Pipe myPipe)
      creates a single instance from the file provided and the given pipe
      cc.mallet.types.Instance makePredictionData​(java.util.ArrayList<java.lang.String> lines, cc.mallet.pipe.Pipe myPipe)
      creates a single instance from the arraylist with lines provided and the given pipe
      cc.mallet.types.InstanceList makeTrainingData​(java.io.File[] trainFiles, boolean useTokenOffset, boolean splitUnitsAfterPunctuation)  
      java.util.List<Unit> predict​(cc.mallet.types.Instance inst, java.lang.String filterName)
      predict a single Instance
      java.util.List<Unit> predict​(java.util.List<java.lang.String> lines, java.lang.String postprocessingFilter)
      predict a couple of lines
      java.util.ArrayList<java.lang.String> readFile​(java.io.File myFile)  
      void readModel​(java.io.File file)
      load a previously trained FeatureSubsetModel (CRF4+Properties) which was stored as serialized object to disk.
      void readModel​(java.io.InputStream is)  
      void train​(cc.mallet.types.InstanceList instList, cc.mallet.pipe.Pipe dataPipe)  
      void writeModel​(java.lang.String filename)
      Save the model learned to disk.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • SentenceSplitter

        public SentenceSplitter()
    • Method Detail

      • makePredictionData

        public cc.mallet.types.Instance makePredictionData​(java.util.ArrayList<java.lang.String> lines,
                                                           cc.mallet.pipe.Pipe myPipe)
        creates a single instance from the arraylist with lines provided and the given pipe
      • makePredictionData

        public cc.mallet.types.Instance makePredictionData​(java.io.File predictFile,
                                                           cc.mallet.pipe.Pipe myPipe)
        creates a single instance from the file provided and the given pipe
      • makePredictionData

        public cc.mallet.types.InstanceList makePredictionData​(java.io.File[] predictFiles,
                                                               cc.mallet.pipe.Pipe myPipe)
        creates a list of instances with the pipe provided from the given array of files
      • makeTrainingData

        public cc.mallet.types.InstanceList makeTrainingData​(java.io.File[] trainFiles,
                                                             boolean useTokenOffset,
                                                             boolean splitUnitsAfterPunctuation)
        Parameters:
        trainFiles -
        useTokenOffset - if true the tokens offset and not is string representation is stored in the instance source
        Returns:
        InstanceList with training data
      • train

        public void train​(cc.mallet.types.InstanceList instList,
                          cc.mallet.pipe.Pipe dataPipe)
      • predict

        public java.util.List<Unit> predict​(java.util.List<java.lang.String> lines,
                                            java.lang.String postprocessingFilter)
        predict a couple of lines
        Parameters:
        lines -
        postprocessingFilter -
        Returns:
        ArrayList of Unit objects
      • predict

        public java.util.List<Unit> predict​(cc.mallet.types.Instance inst,
                                            java.lang.String filterName)
        predict a single Instance
        Parameters:
        inst -
        filterName -
        Returns:
        ArrayList of Unit objects
      • getLabelsFromLabelSequence

        public java.util.ArrayList<java.lang.String> getLabelsFromLabelSequence​(cc.mallet.types.LabelSequence ls)
      • writeModel

        public void writeModel​(java.lang.String filename)
        Save the model learned to disk. THis is done via Java's object serialization.
        Parameters:
        filename - where to write it (full path!)
      • readModel

        public void readModel​(java.io.File file)
                       throws java.io.IOException,
                              java.io.FileNotFoundException,
                              java.lang.ClassNotFoundException
        load a previously trained FeatureSubsetModel (CRF4+Properties) which was stored as serialized object to disk.
        Parameters:
        file - where to find the serialized featureSubsetModel (full path!)
        Throws:
        java.io.IOException
        java.io.FileNotFoundException
        java.lang.ClassNotFoundException
      • readModel

        public void readModel​(java.io.InputStream is)
                       throws java.io.IOException,
                              java.lang.ClassNotFoundException
        Throws:
        java.io.IOException
        java.lang.ClassNotFoundException
      • readFile

        public java.util.ArrayList<java.lang.String> readFile​(java.io.File myFile)
      • getModel

        public cc.mallet.fst.CRF getModel()