Package net.sf.okapi.steps.tokenization
Class TokenizationStep
- java.lang.Object
-
- net.sf.okapi.common.pipeline.BasePipelineStep
-
- net.sf.okapi.steps.tokenization.TokenizationStep
-
- All Implemented Interfaces:
AutoCloseable,Function<Stream<Event>,Stream<Event>>,IPipelineStep
public class TokenizationStep extends BasePipelineStep
-
-
Constructor Summary
Constructors Constructor Description TokenizationStep()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description List<Token>apostrophe(Token token, LocaleId locale)Break French and Italian words with apostrophe into three tokens WORD, PUNCTUATION, WORDStringgetDescription()StringgetName()LocaleIdgetSourceLocale()LocaleIdgetTargetLocale()protected EventhandleStartDocument(Event event)protected EventhandleTextUnit(Event event)Collection<? extends Token>postProcess(Token t, LocaleId language)Various rules to make corrections toRbbiTokenizervoidsetSourceLocale(LocaleId sourceLocale)voidsetTargetLocale(LocaleId targetLocale)-
Methods inherited from class net.sf.okapi.common.pipeline.BasePipelineStep
cancel, destroy, getHelpLocation, getParameters, handleCustom, handleDocumentPart, handleEndBatch, handleEndBatchItem, handleEndDocument, handleEndGroup, handleEndSubDocument, handleEndSubfilter, handleEvent, handleMultiEvent, handlePipelineParameters, handleRawDocument, handleStartBatch, handleStartBatchItem, handleStartGroup, handleStartSubDocument, handleStartSubfilter, isDone, isLastOutputStep, setLastOutputStep, setParameters
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface net.sf.okapi.common.pipeline.IPipelineStep
apply, close, handleStream
-
-
-
-
Method Detail
-
handleStartDocument
protected Event handleStartDocument(Event event)
- Overrides:
handleStartDocumentin classBasePipelineStep
-
handleTextUnit
protected Event handleTextUnit(Event event)
- Overrides:
handleTextUnitin classBasePipelineStep
-
getSourceLocale
public LocaleId getSourceLocale()
- Specified by:
getSourceLocalein interfaceIPipelineStep- Overrides:
getSourceLocalein classBasePipelineStep
-
setSourceLocale
public void setSourceLocale(LocaleId sourceLocale)
- Specified by:
setSourceLocalein interfaceIPipelineStep- Overrides:
setSourceLocalein classBasePipelineStep
-
getTargetLocale
public LocaleId getTargetLocale()
- Specified by:
getTargetLocalein interfaceIPipelineStep- Overrides:
getTargetLocalein classBasePipelineStep
-
setTargetLocale
public void setTargetLocale(LocaleId targetLocale)
- Specified by:
setTargetLocalein interfaceIPipelineStep- Overrides:
setTargetLocalein classBasePipelineStep
-
postProcess
public Collection<? extends Token> postProcess(Token t, LocaleId language)
Various rules to make corrections toRbbiTokenizer- Parameters:
t- theToken- Returns:
- list of correct tokens or the original token if no changes were made
-
apostrophe
public List<Token> apostrophe(Token token, LocaleId locale)
Break French and Italian words with apostrophe into three tokens WORD, PUNCTUATION, WORD- Parameters:
token-- Returns:
- list of transformed tokens if any
-
getName
public String getName()
-
getDescription
public String getDescription()
-
-