Class FormattingVisitor

  • All Implemented Interfaces:
    org.jsoup.select.NodeVisitor

    public class FormattingVisitor
    extends java.lang.Object
    implements org.jsoup.select.NodeVisitor
    HTML to plain-text. This example program demonstrates the use of jsoup to convert HTML input to lightly-formatted plain-text. That is divergent from the general goal of jsoup's .text() methods, which is to get clean data from a scrape.

    Note that this is a fairly simplistic formatter -- for real world use you'll want to embrace and extend.

    To invoke from the command line, assuming you've downloaded the jsoup jar to your current directory:

    java -cp jsoup.jar org.jsoup.examples.HtmlToPlainText url [selector]

    where url is the URL to fetch, and selector is an optional CSS selector.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void head​(org.jsoup.nodes.Node node, int depth)  
      void tail​(org.jsoup.nodes.Node node, int depth)  
      java.lang.String toString()  
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
    • Constructor Detail

      • FormattingVisitor

        public FormattingVisitor()
    • Method Detail

      • head

        public void head​(org.jsoup.nodes.Node node,
                         int depth)
        Specified by:
        head in interface org.jsoup.select.NodeVisitor
      • tail

        public void tail​(org.jsoup.nodes.Node node,
                         int depth)
        Specified by:
        tail in interface org.jsoup.select.NodeVisitor
      • toString

        public java.lang.String toString()
        Overrides:
        toString in class java.lang.Object