Class SentenceSplitter


  • public class SentenceSplitter
    extends Object
    • Constructor Detail

      • SentenceSplitter

        public SentenceSplitter()
    • Method Detail

      • makePredictionData

        public cc.mallet.types.Instance makePredictionData​(ArrayList<String> lines,
                                                           cc.mallet.pipe.Pipe myPipe)
        creates a single instance from the arraylist with lines provided and the given pipe
      • makePredictionData

        public cc.mallet.types.Instance makePredictionData​(File predictFile,
                                                           cc.mallet.pipe.Pipe myPipe)
        creates a single instance from the file provided and the given pipe
      • makePredictionData

        public cc.mallet.types.InstanceList makePredictionData​(File[] predictFiles,
                                                               cc.mallet.pipe.Pipe myPipe)
        creates a list of instances with the pipe provided from the given array of files
      • makeTrainingData

        public cc.mallet.types.InstanceList makeTrainingData​(File[] trainFiles,
                                                             boolean useTokenOffset,
                                                             boolean splitUnitsAfterPunctuation)
        Parameters:
        trainFiles -
        useTokenOffset - if true the tokens offset and not is string representation is stored in the instance source
        Returns:
        InstanceList with training data
      • train

        public void train​(cc.mallet.types.InstanceList instList,
                          cc.mallet.pipe.Pipe dataPipe)
      • predict

        public List<Unit> predict​(List<String> lines,
                                  String postprocessingFilter)
        predict a couple of lines
        Parameters:
        lines -
        postprocessingFilter -
        Returns:
        ArrayList of Unit objects
      • predict

        public List<Unit> predict​(cc.mallet.types.Instance inst,
                                  String filterName)
        predict a single Instance
        Parameters:
        inst -
        filterName -
        Returns:
        ArrayList of Unit objects
      • getLabelsFromLabelSequence

        public ArrayList<String> getLabelsFromLabelSequence​(cc.mallet.types.LabelSequence ls)
      • writeModel

        public void writeModel​(String filename)
        Save the model learned to disk. THis is done via Java's object serialization.
        Parameters:
        filename - where to write it (full path!)
      • getModel

        public cc.mallet.fst.CRF getModel()