Package de.l3s.boilerpipe.sax
Class BoilerpipeHTMLContentHandler
- java.lang.Object
-
- de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
-
- All Implemented Interfaces:
org.xml.sax.ContentHandler
public class BoilerpipeHTMLContentHandler extends java.lang.Object implements org.xml.sax.ContentHandlerA simple SAXContentHandler, used byBoilerpipeSAXInput. Can be used by different parser implementations, e.g. NekoHTML and TagSoup.
-
-
Constructor Summary
Constructors Constructor Description BoilerpipeHTMLContentHandler()Constructs aBoilerpipeHTMLContentHandlerusing theDefaultTagActionMap.BoilerpipeHTMLContentHandler(TagActionMap tagActions)Constructs aBoilerpipeHTMLContentHandlerusing the givenTagActionMap.
-
Method Summary
Modifier and Type Method Description voidaddLabelAction(LabelAction la)protected voidaddTextBlock(TextBlock tb)voidaddWhitespaceIfNecessary()voidcharacters(char[] ch, int start, int length)voidendDocument()voidendElement(java.lang.String uri, java.lang.String localName, java.lang.String qName)voidendPrefixMapping(java.lang.String prefix)voidflushBlock()java.lang.StringgetTitle()voidignorableWhitespace(char[] ch, int start, int length)voidprocessingInstruction(java.lang.String target, java.lang.String data)voidrecycle()Recycles this instance.voidsetDocumentLocator(org.xml.sax.Locator locator)voidsetTitle(java.lang.String s)voidskippedEntity(java.lang.String name)voidstartDocument()voidstartElement(java.lang.String uri, java.lang.String localName, java.lang.String qName, org.xml.sax.Attributes atts)voidstartPrefixMapping(java.lang.String prefix, java.lang.String uri)TextDocumenttoTextDocument()Returns aTextDocumentcontaining the extractedTextBlocks.
-
-
-
Constructor Detail
-
BoilerpipeHTMLContentHandler
public BoilerpipeHTMLContentHandler()
Constructs aBoilerpipeHTMLContentHandlerusing theDefaultTagActionMap.
-
BoilerpipeHTMLContentHandler
public BoilerpipeHTMLContentHandler(TagActionMap tagActions)
Constructs aBoilerpipeHTMLContentHandlerusing the givenTagActionMap.- Parameters:
tagActions- TheTagActionMapto use, e.g.DefaultTagActionMap.
-
-
Method Detail
-
recycle
public void recycle()
Recycles this instance.
-
endDocument
public void endDocument() throws org.xml.sax.SAXException- Specified by:
endDocumentin interfaceorg.xml.sax.ContentHandler- Throws:
org.xml.sax.SAXException
-
endPrefixMapping
public void endPrefixMapping(java.lang.String prefix) throws org.xml.sax.SAXException- Specified by:
endPrefixMappingin interfaceorg.xml.sax.ContentHandler- Throws:
org.xml.sax.SAXException
-
ignorableWhitespace
public void ignorableWhitespace(char[] ch, int start, int length) throws org.xml.sax.SAXException- Specified by:
ignorableWhitespacein interfaceorg.xml.sax.ContentHandler- Throws:
org.xml.sax.SAXException
-
processingInstruction
public void processingInstruction(java.lang.String target, java.lang.String data) throws org.xml.sax.SAXException- Specified by:
processingInstructionin interfaceorg.xml.sax.ContentHandler- Throws:
org.xml.sax.SAXException
-
setDocumentLocator
public void setDocumentLocator(org.xml.sax.Locator locator)
- Specified by:
setDocumentLocatorin interfaceorg.xml.sax.ContentHandler
-
skippedEntity
public void skippedEntity(java.lang.String name) throws org.xml.sax.SAXException- Specified by:
skippedEntityin interfaceorg.xml.sax.ContentHandler- Throws:
org.xml.sax.SAXException
-
startDocument
public void startDocument() throws org.xml.sax.SAXException- Specified by:
startDocumentin interfaceorg.xml.sax.ContentHandler- Throws:
org.xml.sax.SAXException
-
startPrefixMapping
public void startPrefixMapping(java.lang.String prefix, java.lang.String uri) throws org.xml.sax.SAXException- Specified by:
startPrefixMappingin interfaceorg.xml.sax.ContentHandler- Throws:
org.xml.sax.SAXException
-
startElement
public void startElement(java.lang.String uri, java.lang.String localName, java.lang.String qName, org.xml.sax.Attributes atts) throws org.xml.sax.SAXException- Specified by:
startElementin interfaceorg.xml.sax.ContentHandler- Throws:
org.xml.sax.SAXException
-
endElement
public void endElement(java.lang.String uri, java.lang.String localName, java.lang.String qName) throws org.xml.sax.SAXException- Specified by:
endElementin interfaceorg.xml.sax.ContentHandler- Throws:
org.xml.sax.SAXException
-
characters
public void characters(char[] ch, int start, int length) throws org.xml.sax.SAXException- Specified by:
charactersin interfaceorg.xml.sax.ContentHandler- Throws:
org.xml.sax.SAXException
-
flushBlock
public void flushBlock()
-
addTextBlock
protected void addTextBlock(TextBlock tb)
-
getTitle
public java.lang.String getTitle()
-
setTitle
public void setTitle(java.lang.String s)
-
toTextDocument
public TextDocument toTextDocument()
Returns aTextDocumentcontaining the extractedTextBlocks. NOTE: Only call this after parsing.- Returns:
- The
TextDocument
-
addWhitespaceIfNecessary
public void addWhitespaceIfNecessary()
-
addLabelAction
public void addLabelAction(LabelAction la) throws java.lang.IllegalStateException
- Throws:
java.lang.IllegalStateException
-
-