public class SentenceSplitter extends Object
| Constructor and Description |
|---|
SentenceSplitter() |
| Modifier and Type | Method and Description |
|---|---|
ArrayList<String> |
getLabelsFromLabelSequence(cc.mallet.types.LabelSequence ls) |
cc.mallet.fst.CRF |
getModel() |
cc.mallet.types.Instance |
makePredictionData(ArrayList<String> lines,
cc.mallet.pipe.Pipe myPipe)
creates a single instance from the arraylist with lines provided and the given pipe
|
cc.mallet.types.InstanceList |
makePredictionData(File[] predictFiles,
cc.mallet.pipe.Pipe myPipe)
creates a list of instances with the pipe provided from the given array of files
|
cc.mallet.types.Instance |
makePredictionData(File predictFile,
cc.mallet.pipe.Pipe myPipe)
creates a single instance from the file provided and the given pipe
|
cc.mallet.types.InstanceList |
makeTrainingData(File[] trainFiles,
boolean useTokenOffset) |
ArrayList<String> |
postprocessingFilter(ArrayList<String> predLabels,
ArrayList<Unit> units)
a postprocessing filter (to be used after prediction) which can correct known errors
|
ArrayList<Unit> |
predict(ArrayList<String> lines,
boolean doPostprocessing)
predict a couple of lines
|
ArrayList<Unit> |
predict(cc.mallet.types.Instance inst,
boolean doPostProcessing)
predict a single Instance
|
ArrayList<String> |
readFile(File myFile) |
void |
readModel(File file)
load a previously trained FeatureSubsetModel (CRF4+Properties) which was stored as serialized object to disk.
|
void |
readModel(InputStream is) |
void |
train(cc.mallet.types.InstanceList instList,
cc.mallet.pipe.Pipe dataPipe) |
void |
writeModel(String filename)
Save the model learned to disk.
|
public cc.mallet.types.Instance makePredictionData(ArrayList<String> lines, cc.mallet.pipe.Pipe myPipe)
public cc.mallet.types.Instance makePredictionData(File predictFile, cc.mallet.pipe.Pipe myPipe)
public cc.mallet.types.InstanceList makePredictionData(File[] predictFiles, cc.mallet.pipe.Pipe myPipe)
public cc.mallet.types.InstanceList makeTrainingData(File[] trainFiles, boolean useTokenOffset)
trainFiles - useTokenOffset - if true the tokens offset and not is string representation is stored in the instance sourcepublic void train(cc.mallet.types.InstanceList instList,
cc.mallet.pipe.Pipe dataPipe)
public ArrayList<Unit> predict(ArrayList<String> lines, boolean doPostprocessing)
lines - doPostprocessing - public ArrayList<Unit> predict(cc.mallet.types.Instance inst, boolean doPostProcessing)
inst - doPostProcessing - public ArrayList<String> postprocessingFilter(ArrayList<String> predLabels, ArrayList<Unit> units)
predLabels - units - abbrList - public ArrayList<String> getLabelsFromLabelSequence(cc.mallet.types.LabelSequence ls)
public void writeModel(String filename)
filename - where to write it (full path!)public void readModel(File file) throws IOException, FileNotFoundException, ClassNotFoundException
filename - where to find the serialized featureSubsetModel (full path!)IOExceptionFileNotFoundExceptionClassNotFoundExceptionpublic void readModel(InputStream is) throws IOException, ClassNotFoundException
IOExceptionClassNotFoundExceptionpublic cc.mallet.fst.CRF getModel()
Copyright © 2015 JULIE Lab Jena, Germany. All rights reserved.