Package net.dankito.utils.html
Class FormattingVisitor
- java.lang.Object
-
- net.dankito.utils.html.FormattingVisitor
-
- All Implemented Interfaces:
org.jsoup.select.NodeVisitor
public class FormattingVisitor extends java.lang.Object implements org.jsoup.select.NodeVisitorHTML to plain-text. This example program demonstrates the use of jsoup to convert HTML input to lightly-formatted plain-text. That is divergent from the general goal of jsoup's .text() methods, which is to get clean data from a scrape.Note that this is a fairly simplistic formatter -- for real world use you'll want to embrace and extend.
To invoke from the command line, assuming you've downloaded the jsoup jar to your current directory:
where url is the URL to fetch, and selector is an optional CSS selector.java -cp jsoup.jar org.jsoup.examples.HtmlToPlainText url [selector]
-
-
Constructor Summary
Constructors Constructor Description FormattingVisitor()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidhead(org.jsoup.nodes.Node node, int depth)voidtail(org.jsoup.nodes.Node node, int depth)java.lang.StringtoString()
-
-
-
Method Detail
-
head
public void head(org.jsoup.nodes.Node node, int depth)- Specified by:
headin interfaceorg.jsoup.select.NodeVisitor
-
tail
public void tail(org.jsoup.nodes.Node node, int depth)- Specified by:
tailin interfaceorg.jsoup.select.NodeVisitor
-
toString
public java.lang.String toString()
- Overrides:
toStringin classjava.lang.Object
-
-