public class SRXSegmenter extends Object implements ISegmenter
ISegmenter interface for SRX rules.| Constructor and Description |
|---|
SRXSegmenter()
Creates a new SRXSegmenter object.
|
| Modifier and Type | Method and Description |
|---|---|
protected void |
addRule(net.sf.okapi.lib.segmentation.CompiledRule compiledRule)
Adds a compiled rule to this segmenter.
|
boolean |
cascade()
Indicates if cascading must be applied when selecting the rules for
a given language pattern.
|
int |
computeSegments(String text) |
int |
computeSegments(TextContainer container) |
LocaleId |
getLanguage() |
Range |
getNextSegmentRange(TextContainer container) |
List<Range> |
getRanges() |
List<Integer> |
getSplitPositions() |
boolean |
includeEndCodes() |
boolean |
includeIsolatedCodes() |
boolean |
includeStartCodes() |
boolean |
oneSegmentIncludesAll() |
void |
reset() |
boolean |
segmentSubFlows() |
protected void |
setCascade(boolean value)
Sets the flag indicating if cascading must be applied when selecting the
rules for a given language pattern.
|
void |
setIncludeEndCodes(boolean includeEndCodes) |
void |
setIncludeIsolatedCodes(boolean includeIsolatedCodes) |
void |
setIncludeStartCodes(boolean includeStartCodes) |
void |
setLanguage(LocaleId languageCode) |
protected void |
setMaskRule(String pattern)
Sets the pattern for the mask rule.
|
void |
setOneSegmentIncludesAll(boolean oneSegmentIncludesAll) |
void |
setOptions(boolean segmentSubFlows,
boolean includeStartCodes,
boolean includeEndCodes,
boolean includeIsolatedCodes,
boolean oneSegmentIncludesAll,
boolean trimLeadingWS,
boolean trimTrailingWS) |
void |
setOptions(boolean segmentSubFlows,
boolean includeStartCodes,
boolean includeEndCodes,
boolean includeIsolatedCodes,
boolean oneSegmentIncludesAll,
boolean trimLeadingWS,
boolean trimTrailingWS,
boolean useJavaRegex,
boolean useIcu4JBreakRules,
boolean treatIsolatedCodesAsWhitespace)
Sets the options for this segmenter.
|
void |
setSegmentSubFlows(boolean segmentSubFlows) |
void |
setTreatIsolatedCodesAsWhitespace(boolean treatIsolatedCodesAsWhitespace) |
void |
setTrimCodes(boolean trimCodes) |
void |
setTrimLeadingWS(boolean trimLeadingWS) |
void |
setTrimTrailingWS(boolean trimTrailingWS) |
void |
setUseJavaRegex(boolean useJavaRegex)
Sets the indicator that tells if this document has rules that are defined for the Java regular expression engine (vs ICU).
|
boolean |
treatIsolatedCodesAsWhitespace() |
boolean |
trimLeadingWhitespaces() |
boolean |
trimTrailingWhitespaces() |
boolean |
useJavaRegex()
Indicates if this document has rules that are defined for the Java regular expression engine (vs ICU).
|
public void reset()
reset in interface ISegmenterpublic void setOptions(boolean segmentSubFlows,
boolean includeStartCodes,
boolean includeEndCodes,
boolean includeIsolatedCodes,
boolean oneSegmentIncludesAll,
boolean trimLeadingWS,
boolean trimTrailingWS,
boolean useJavaRegex,
boolean useIcu4JBreakRules,
boolean treatIsolatedCodesAsWhitespace)
segmentSubFlows - true to segment sub-flows, false to no segment them.includeStartCodes - true to include start codes just before a break in the 'left' segment,
false to put them in the next segment.includeEndCodes - true to include end codes just before a break in the 'left' segment,
false to put them in the next segment.includeIsolatedCodes - true to include isolated codes just before a break in the 'left' segment,
false to put them in the next segment.oneSegmentIncludesAll - true to include everything in segments that are alone.trimLeadingWS - true to trim leading white-spaces from the segments, false to keep them.trimTrailingWS - true to trim trailing white-spaces from the segments, false to keep them.useJavaRegex - true if the rules are for the Java regular expression engine, false if they are for ICU.treatIsolatedCodesAsWhitespace - if true then the isolated code markers in codedText get converted
to spaces, so that they don't get in the way of the rules. If false, the codes are simply removed.public void setOptions(boolean segmentSubFlows,
boolean includeStartCodes,
boolean includeEndCodes,
boolean includeIsolatedCodes,
boolean oneSegmentIncludesAll,
boolean trimLeadingWS,
boolean trimTrailingWS)
setOptions in interface ISegmenterpublic boolean oneSegmentIncludesAll()
oneSegmentIncludesAll in interface ISegmenterpublic boolean segmentSubFlows()
segmentSubFlows in interface ISegmenterpublic boolean cascade()
public boolean trimLeadingWhitespaces()
trimLeadingWhitespaces in interface ISegmenterpublic boolean trimTrailingWhitespaces()
trimTrailingWhitespaces in interface ISegmenterpublic boolean useJavaRegex()
public boolean treatIsolatedCodesAsWhitespace()
treatIsolatedCodesAsWhitespace in interface ISegmenterpublic void setUseJavaRegex(boolean useJavaRegex)
useJavaRegex - true if the rules should be treated as Java regular expression, false for ICU.public boolean includeStartCodes()
includeStartCodes in interface ISegmenterpublic boolean includeEndCodes()
includeEndCodes in interface ISegmenterpublic boolean includeIsolatedCodes()
includeIsolatedCodes in interface ISegmenterpublic int computeSegments(String text)
computeSegments in interface ISegmenterpublic int computeSegments(TextContainer container)
computeSegments in interface ISegmenterpublic Range getNextSegmentRange(TextContainer container)
getNextSegmentRange in interface ISegmenterpublic List<Integer> getSplitPositions()
getSplitPositions in interface ISegmenterpublic List<Range> getRanges()
getRanges in interface ISegmenterpublic LocaleId getLanguage()
getLanguage in interface ISegmenterpublic void setLanguage(LocaleId languageCode)
setLanguage in interface ISegmenterprotected void setCascade(boolean value)
value - true if cascading must be applied, false otherwise.protected void addRule(net.sf.okapi.lib.segmentation.CompiledRule compiledRule)
compiledRule - the compiled rule to add.protected void setMaskRule(String pattern)
pattern - the new pattern to use for the mask rule.public void setSegmentSubFlows(boolean segmentSubFlows)
setSegmentSubFlows in interface ISegmenterpublic void setIncludeStartCodes(boolean includeStartCodes)
setIncludeStartCodes in interface ISegmenterpublic void setIncludeEndCodes(boolean includeEndCodes)
setIncludeEndCodes in interface ISegmenterpublic void setIncludeIsolatedCodes(boolean includeIsolatedCodes)
setIncludeIsolatedCodes in interface ISegmenterpublic void setOneSegmentIncludesAll(boolean oneSegmentIncludesAll)
setOneSegmentIncludesAll in interface ISegmenterpublic void setTrimLeadingWS(boolean trimLeadingWS)
setTrimLeadingWS in interface ISegmenterpublic void setTrimTrailingWS(boolean trimTrailingWS)
setTrimTrailingWS in interface ISegmenterpublic void setTrimCodes(boolean trimCodes)
setTrimCodes in interface ISegmenterpublic void setTreatIsolatedCodesAsWhitespace(boolean treatIsolatedCodesAsWhitespace)
setTreatIsolatedCodesAsWhitespace in interface ISegmenterCopyright © 2019. All rights reserved.