Class HTMLHighlighter


  • public final class HTMLHighlighter
    extends java.lang.Object
    Highlights text blocks in an HTML document that have been marked as "content" in the corresponding TextDocument.
    • Method Summary

      Modifier and Type Method Description
      java.lang.String getExtraStyleSheet()
      Returns the extra stylesheet definition that will be inserted in the HEAD element.
      java.lang.String getPostHighlight()
      Returns the string that will be inserted after any highlighted HTML block.
      java.lang.String getPreHighlight()
      Returns the string that will be inserted before any highlighted HTML block.
      boolean isOutputHighlightOnly()
      If true, only HTML enclosed within highlighted content will be returned
      static HTMLHighlighter newExtractingInstance()
      Creates a new HTMLHighlighter, which is set-up to return only the extracted HTML text, including enclosed markup.
      static HTMLHighlighter newHighlightingInstance()
      Creates a new HTMLHighlighter, which is set-up to return the full HTML text, with the extracted text portion highlighted.
      java.lang.String process​(TextDocument doc, java.lang.String origHTML)
      Processes the given TextDocument and the original HTML text (as a String).
      java.lang.String process​(TextDocument doc, org.xml.sax.InputSource is)
      Processes the given TextDocument and the original HTML text (as an InputSource).
      java.lang.String process​(java.net.URL url, BoilerpipeExtractor extractor)  
      void setExtraStyleSheet​(java.lang.String extraStyleSheet)
      Sets the extra stylesheet definition that will be inserted in the HEAD element.
      void setOutputHighlightOnly​(boolean outputHighlightOnly)
      Sets whether only HTML enclosed within highlighted content will be returned, or the whole HTML document.
      void setPostHighlight​(java.lang.String postHighlight)
      Sets the string that will be inserted after any highlighted HTML block.
      void setPreHighlight​(java.lang.String preHighlight)
      Sets the string that will be inserted prior to any highlighted HTML block.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Method Detail

      • newHighlightingInstance

        public static HTMLHighlighter newHighlightingInstance()
        Creates a new HTMLHighlighter, which is set-up to return the full HTML text, with the extracted text portion highlighted.
      • newExtractingInstance

        public static HTMLHighlighter newExtractingInstance()
        Creates a new HTMLHighlighter, which is set-up to return only the extracted HTML text, including enclosed markup.
      • isOutputHighlightOnly

        public boolean isOutputHighlightOnly()
        If true, only HTML enclosed within highlighted content will be returned
      • setOutputHighlightOnly

        public void setOutputHighlightOnly​(boolean outputHighlightOnly)
        Sets whether only HTML enclosed within highlighted content will be returned, or the whole HTML document.
      • getExtraStyleSheet

        public java.lang.String getExtraStyleSheet()
        Returns the extra stylesheet definition that will be inserted in the HEAD element. By default, this corresponds to a simple definition that marks text in class "x-boilerpipe-mark1" as inline text with yellow background.
      • setExtraStyleSheet

        public void setExtraStyleSheet​(java.lang.String extraStyleSheet)
        Sets the extra stylesheet definition that will be inserted in the HEAD element. To disable, set it to the empty string: ""
        Parameters:
        extraStyleSheet - Plain HTML
      • getPreHighlight

        public java.lang.String getPreHighlight()
        Returns the string that will be inserted before any highlighted HTML block. By default, this corresponds to <span class=&qupt;x-boilerpipe-mark1">
      • setPreHighlight

        public void setPreHighlight​(java.lang.String preHighlight)
        Sets the string that will be inserted prior to any highlighted HTML block. To disable, set it to the empty string: ""
      • getPostHighlight

        public java.lang.String getPostHighlight()
        Returns the string that will be inserted after any highlighted HTML block. By default, this corresponds to </span>
      • setPostHighlight

        public void setPostHighlight​(java.lang.String postHighlight)
        Sets the string that will be inserted after any highlighted HTML block. To disable, set it to the empty string: ""