Package de.l3s.boilerpipe.extractors
Class CommonExtractors
- java.lang.Object
-
- de.l3s.boilerpipe.extractors.CommonExtractors
-
public final class CommonExtractors extends java.lang.ObjectProvides quick access to commonBoilerpipeExtractors.
-
-
Field Summary
Fields Modifier and Type Field Description static ArticleExtractorARTICLE_EXTRACTORWorks very well for most types of Article-like HTML.static CanolaExtractorCANOLA_EXTRACTORTrained on krdwrd Canola (different definition of "boilerplate").static DefaultExtractorDEFAULT_EXTRACTORUsually worse thanArticleExtractor, but simpler/no heuristics.static KeepEverythingExtractorKEEP_EVERYTHING_EXTRACTORDummy Extractor; should return the input text.static LargestContentExtractorLARGEST_CONTENT_EXTRACTORLikeDefaultExtractor, but keeps the largest text block only.
-
-
-
Field Detail
-
ARTICLE_EXTRACTOR
public static final ArticleExtractor ARTICLE_EXTRACTOR
Works very well for most types of Article-like HTML.
-
DEFAULT_EXTRACTOR
public static final DefaultExtractor DEFAULT_EXTRACTOR
Usually worse thanArticleExtractor, but simpler/no heuristics.
-
LARGEST_CONTENT_EXTRACTOR
public static final LargestContentExtractor LARGEST_CONTENT_EXTRACTOR
LikeDefaultExtractor, but keeps the largest text block only.
-
CANOLA_EXTRACTOR
public static final CanolaExtractor CANOLA_EXTRACTOR
Trained on krdwrd Canola (different definition of "boilerplate"). You may give it a try.
-
KEEP_EVERYTHING_EXTRACTOR
public static final KeepEverythingExtractor KEEP_EVERYTHING_EXTRACTOR
Dummy Extractor; should return the input text. Use this to double-check that your problem is within a particularBoilerpipeExtractor, or somewhere else.
-
-