Package de.julielab.jcore.ae.jsbd
Class SentenceSplitter
- java.lang.Object
-
- de.julielab.jcore.ae.jsbd.SentenceSplitter
-
public class SentenceSplitter extends java.lang.Object
-
-
Constructor Summary
Constructors Constructor Description SentenceSplitter()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description java.util.ArrayList<java.lang.String>getLabelsFromLabelSequence(cc.mallet.types.LabelSequence ls)cc.mallet.fst.CRFgetModel()cc.mallet.types.InstanceListmakePredictionData(java.io.File[] predictFiles, cc.mallet.pipe.Pipe myPipe)creates a list of instances with the pipe provided from the given array of filescc.mallet.types.InstancemakePredictionData(java.io.File predictFile, cc.mallet.pipe.Pipe myPipe)creates a single instance from the file provided and the given pipecc.mallet.types.InstancemakePredictionData(java.util.ArrayList<java.lang.String> lines, cc.mallet.pipe.Pipe myPipe)creates a single instance from the arraylist with lines provided and the given pipecc.mallet.types.InstanceListmakeTrainingData(java.io.File[] trainFiles, boolean useTokenOffset, boolean splitUnitsAfterPunctuation)java.util.List<Unit>predict(cc.mallet.types.Instance inst, java.lang.String filterName)predict a single Instancejava.util.List<Unit>predict(java.util.List<java.lang.String> lines, java.lang.String postprocessingFilter)predict a couple of linesjava.util.ArrayList<java.lang.String>readFile(java.io.File myFile)voidreadModel(java.io.File file)load a previously trained FeatureSubsetModel (CRF4+Properties) which was stored as serialized object to disk.voidreadModel(java.io.InputStream is)voidtrain(cc.mallet.types.InstanceList instList, cc.mallet.pipe.Pipe dataPipe)voidwriteModel(java.lang.String filename)Save the model learned to disk.
-
-
-
Method Detail
-
makePredictionData
public cc.mallet.types.Instance makePredictionData(java.util.ArrayList<java.lang.String> lines, cc.mallet.pipe.Pipe myPipe)creates a single instance from the arraylist with lines provided and the given pipe
-
makePredictionData
public cc.mallet.types.Instance makePredictionData(java.io.File predictFile, cc.mallet.pipe.Pipe myPipe)creates a single instance from the file provided and the given pipe
-
makePredictionData
public cc.mallet.types.InstanceList makePredictionData(java.io.File[] predictFiles, cc.mallet.pipe.Pipe myPipe)creates a list of instances with the pipe provided from the given array of files
-
makeTrainingData
public cc.mallet.types.InstanceList makeTrainingData(java.io.File[] trainFiles, boolean useTokenOffset, boolean splitUnitsAfterPunctuation)- Parameters:
trainFiles-useTokenOffset- if true the tokens offset and not is string representation is stored in the instance source- Returns:
- InstanceList with training data
-
train
public void train(cc.mallet.types.InstanceList instList, cc.mallet.pipe.Pipe dataPipe)
-
predict
public java.util.List<Unit> predict(java.util.List<java.lang.String> lines, java.lang.String postprocessingFilter)
predict a couple of lines- Parameters:
lines-postprocessingFilter-- Returns:
- ArrayList of Unit objects
-
predict
public java.util.List<Unit> predict(cc.mallet.types.Instance inst, java.lang.String filterName)
predict a single Instance- Parameters:
inst-filterName-- Returns:
- ArrayList of Unit objects
-
getLabelsFromLabelSequence
public java.util.ArrayList<java.lang.String> getLabelsFromLabelSequence(cc.mallet.types.LabelSequence ls)
-
writeModel
public void writeModel(java.lang.String filename)
Save the model learned to disk. THis is done via Java's object serialization.- Parameters:
filename- where to write it (full path!)
-
readModel
public void readModel(java.io.File file) throws java.io.IOException, java.io.FileNotFoundException, java.lang.ClassNotFoundExceptionload a previously trained FeatureSubsetModel (CRF4+Properties) which was stored as serialized object to disk.- Parameters:
file- where to find the serialized featureSubsetModel (full path!)- Throws:
java.io.IOExceptionjava.io.FileNotFoundExceptionjava.lang.ClassNotFoundException
-
readModel
public void readModel(java.io.InputStream is) throws java.io.IOException, java.lang.ClassNotFoundException- Throws:
java.io.IOExceptionjava.lang.ClassNotFoundException
-
readFile
public java.util.ArrayList<java.lang.String> readFile(java.io.File myFile)
-
getModel
public cc.mallet.fst.CRF getModel()
-
-