Package de.l3s.boilerpipe
Interface BoilerpipeExtractor
-
- All Superinterfaces:
BoilerpipeFilter
- All Known Implementing Classes:
ArticleExtractor,ArticleSentencesExtractor,CanolaExtractor,DefaultExtractor,ExtractorBase,KeepEverythingExtractor,KeepEverythingWithMinKWordsExtractor,LargestContentExtractor,NumWordsRulesExtractor
public interface BoilerpipeExtractor extends BoilerpipeFilter
Describes a complete filter pipeline.
-
-
Method Summary
All Methods Instance Methods Abstract Methods Modifier and Type Method Description java.lang.StringgetText(TextDocument doc)Extracts text from the givenTextDocumentobject.java.lang.StringgetText(java.io.Reader r)Extracts text from the HTML code available from the givenReader.java.lang.StringgetText(java.lang.String html)Extracts text from the HTML code given as a String.java.lang.StringgetText(org.xml.sax.InputSource is)Extracts text from the HTML code available from the givenInputSource.-
Methods inherited from interface de.l3s.boilerpipe.BoilerpipeFilter
process
-
-
-
-
Method Detail
-
getText
java.lang.String getText(java.lang.String html) throws BoilerpipeProcessingExceptionExtracts text from the HTML code given as a String.- Parameters:
html- The HTML code as a String.- Returns:
- The extracted text.
- Throws:
BoilerpipeProcessingException
-
getText
java.lang.String getText(org.xml.sax.InputSource is) throws BoilerpipeProcessingExceptionExtracts text from the HTML code available from the givenInputSource.- Parameters:
is- The InputSource containing the HTML- Returns:
- The extracted text.
- Throws:
BoilerpipeProcessingException
-
getText
java.lang.String getText(java.io.Reader r) throws BoilerpipeProcessingExceptionExtracts text from the HTML code available from the givenReader.- Parameters:
r- The Reader containing the HTML- Returns:
- The extracted text.
- Throws:
BoilerpipeProcessingException
-
getText
java.lang.String getText(TextDocument doc) throws BoilerpipeProcessingException
Extracts text from the givenTextDocumentobject.- Parameters:
doc- TheTextDocument.- Returns:
- The extracted text.
- Throws:
BoilerpipeProcessingException
-
-