Package de.julielab.jcore.ae.jsbd
Class SentenceSplitter
- java.lang.Object
-
- de.julielab.jcore.ae.jsbd.SentenceSplitter
-
public class SentenceSplitter extends Object
-
-
Constructor Summary
Constructors Constructor Description SentenceSplitter()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description ArrayList<String>getLabelsFromLabelSequence(cc.mallet.types.LabelSequence ls)cc.mallet.fst.CRFgetModel()cc.mallet.types.InstanceListmakePredictionData(File[] predictFiles, cc.mallet.pipe.Pipe myPipe)creates a list of instances with the pipe provided from the given array of filescc.mallet.types.InstancemakePredictionData(File predictFile, cc.mallet.pipe.Pipe myPipe)creates a single instance from the file provided and the given pipecc.mallet.types.InstancemakePredictionData(ArrayList<String> lines, cc.mallet.pipe.Pipe myPipe)creates a single instance from the arraylist with lines provided and the given pipecc.mallet.types.InstanceListmakeTrainingData(File[] trainFiles, boolean useTokenOffset, boolean splitUnitsAfterPunctuation)List<Unit>predict(cc.mallet.types.Instance inst, String filterName)predict a single InstanceList<Unit>predict(List<String> lines, String postprocessingFilter)predict a couple of linesArrayList<String>readFile(File myFile)voidreadModel(File file)load a previously trained FeatureSubsetModel (CRF4+Properties) which was stored as serialized object to disk.voidreadModel(InputStream is)voidtrain(cc.mallet.types.InstanceList instList, cc.mallet.pipe.Pipe dataPipe)voidwriteModel(String filename)Save the model learned to disk.
-
-
-
Method Detail
-
makePredictionData
public cc.mallet.types.Instance makePredictionData(ArrayList<String> lines, cc.mallet.pipe.Pipe myPipe)
creates a single instance from the arraylist with lines provided and the given pipe
-
makePredictionData
public cc.mallet.types.Instance makePredictionData(File predictFile, cc.mallet.pipe.Pipe myPipe)
creates a single instance from the file provided and the given pipe
-
makePredictionData
public cc.mallet.types.InstanceList makePredictionData(File[] predictFiles, cc.mallet.pipe.Pipe myPipe)
creates a list of instances with the pipe provided from the given array of files
-
makeTrainingData
public cc.mallet.types.InstanceList makeTrainingData(File[] trainFiles, boolean useTokenOffset, boolean splitUnitsAfterPunctuation)
- Parameters:
trainFiles-useTokenOffset- if true the tokens offset and not is string representation is stored in the instance source- Returns:
- InstanceList with training data
-
train
public void train(cc.mallet.types.InstanceList instList, cc.mallet.pipe.Pipe dataPipe)
-
predict
public List<Unit> predict(List<String> lines, String postprocessingFilter)
predict a couple of lines- Parameters:
lines-postprocessingFilter-- Returns:
- ArrayList of Unit objects
-
predict
public List<Unit> predict(cc.mallet.types.Instance inst, String filterName)
predict a single Instance- Parameters:
inst-filterName-- Returns:
- ArrayList of Unit objects
-
getLabelsFromLabelSequence
public ArrayList<String> getLabelsFromLabelSequence(cc.mallet.types.LabelSequence ls)
-
writeModel
public void writeModel(String filename)
Save the model learned to disk. THis is done via Java's object serialization.- Parameters:
filename- where to write it (full path!)
-
readModel
public void readModel(File file) throws IOException, FileNotFoundException, ClassNotFoundException
load a previously trained FeatureSubsetModel (CRF4+Properties) which was stored as serialized object to disk.- Parameters:
file- where to find the serialized featureSubsetModel (full path!)- Throws:
IOExceptionFileNotFoundExceptionClassNotFoundException
-
readModel
public void readModel(InputStream is) throws IOException, ClassNotFoundException
- Throws:
IOExceptionClassNotFoundException
-
getModel
public cc.mallet.fst.CRF getModel()
-
-