Class CommonExtractors


  • public final class CommonExtractors
    extends java.lang.Object
    Provides quick access to common BoilerpipeExtractors.
    • Field Detail

      • ARTICLE_EXTRACTOR

        public static final ArticleExtractor ARTICLE_EXTRACTOR
        Works very well for most types of Article-like HTML.
      • CANOLA_EXTRACTOR

        public static final CanolaExtractor CANOLA_EXTRACTOR
        Trained on krdwrd Canola (different definition of "boilerplate"). You may give it a try.
      • KEEP_EVERYTHING_EXTRACTOR

        public static final KeepEverythingExtractor KEEP_EVERYTHING_EXTRACTOR
        Dummy Extractor; should return the input text. Use this to double-check that your problem is within a particular BoilerpipeExtractor, or somewhere else.