Class NumWordsRulesClassifier

  • All Implemented Interfaces:
    BoilerpipeFilter

    public class NumWordsRulesClassifier
    extends java.lang.Object
    implements BoilerpipeFilter
    Classifies TextBlocks as content/not-content through rules that have been determined using the C4.8 machine learning algorithm, as described in the paper "Boilerplate Detection using Shallow Text Features" (WSDM 2010), particularly using number of words per block and link density per block.