Class SRXSegmenter

    • Constructor Detail

      • SRXSegmenter

        public SRXSegmenter()
        Creates a new SRXSegmenter object.
    • Method Detail

      • reset

        public void reset()
        Specified by:
        reset in interface ISegmenter
      • setOptions

        public void setOptions​(boolean segmentSubFlows,
                               boolean includeStartCodes,
                               boolean includeEndCodes,
                               boolean includeIsolatedCodes,
                               boolean oneSegmentIncludesAll,
                               boolean trimLeadingWS,
                               boolean trimTrailingWS,
                               boolean useJavaRegex,
                               boolean useIcu4JBreakRules,
                               boolean treatIsolatedCodesAsWhitespace)
        Sets the options for this segmenter.
        Parameters:
        segmentSubFlows - true to segment sub-flows, false to no segment them.
        includeStartCodes - true to include start codes just before a break in the 'left' segment, false to put them in the next segment.
        includeEndCodes - true to include end codes just before a break in the 'left' segment, false to put them in the next segment.
        includeIsolatedCodes - true to include isolated codes just before a break in the 'left' segment, false to put them in the next segment.
        oneSegmentIncludesAll - true to include everything in segments that are alone.
        trimLeadingWS - true to trim leading white-spaces from the segments, false to keep them.
        trimTrailingWS - true to trim trailing white-spaces from the segments, false to keep them.
        useJavaRegex - true if the rules are for the Java regular expression engine, false if they are for ICU.
        treatIsolatedCodesAsWhitespace - if true then the isolated code markers in codedText get converted to spaces, so that they don't get in the way of the rules. If false, the codes are simply removed.
      • setOptions

        public void setOptions​(boolean segmentSubFlows,
                               boolean includeStartCodes,
                               boolean includeEndCodes,
                               boolean includeIsolatedCodes,
                               boolean oneSegmentIncludesAll,
                               boolean trimLeadingWS,
                               boolean trimTrailingWS)
        Specified by:
        setOptions in interface ISegmenter
      • cascade

        public boolean cascade()
        Indicates if cascading must be applied when selecting the rules for a given language pattern.
        Returns:
        true if cascading must be applied, false otherwise.
      • useJavaRegex

        public boolean useJavaRegex()
        Indicates if this document has rules that are defined for the Java regular expression engine (vs ICU).
        Returns:
        true if the rules are for the Java regular expression engine, false if they are for ICU.
      • setUseJavaRegex

        public void setUseJavaRegex​(boolean useJavaRegex)
        Sets the indicator that tells if this document has rules that are defined for the Java regular expression engine (vs ICU).
        Parameters:
        useJavaRegex - true if the rules should be treated as Java regular expression, false for ICU.
      • setCascade

        protected void setCascade​(boolean value)
        Sets the flag indicating if cascading must be applied when selecting the rules for a given language pattern.
        Parameters:
        value - true if cascading must be applied, false otherwise.
      • addRule

        protected void addRule​(net.sf.okapi.lib.segmentation.CompiledRule compiledRule)
        Adds a compiled rule to this segmenter.
        Parameters:
        compiledRule - the compiled rule to add.
      • setMaskRule

        protected void setMaskRule​(String pattern)
        Sets the pattern for the mask rule.
        Parameters:
        pattern - the new pattern to use for the mask rule.
      • setSegmentSubFlows

        public void setSegmentSubFlows​(boolean segmentSubFlows)
        Specified by:
        setSegmentSubFlows in interface ISegmenter
      • setIncludeEndCodes

        public void setIncludeEndCodes​(boolean includeEndCodes)
        Specified by:
        setIncludeEndCodes in interface ISegmenter
      • setTrimLeadingWS

        public void setTrimLeadingWS​(boolean trimLeadingWS)
        Specified by:
        setTrimLeadingWS in interface ISegmenter
      • setTrimTrailingWS

        public void setTrimTrailingWS​(boolean trimTrailingWS)
        Specified by:
        setTrimTrailingWS in interface ISegmenter
      • setTrimCodes

        public void setTrimCodes​(boolean trimCodes)
        Specified by:
        setTrimCodes in interface ISegmenter