Class DensityRulesClassifier

  • All Implemented Interfaces:
    BoilerpipeFilter

    public class DensityRulesClassifier
    extends java.lang.Object
    implements BoilerpipeFilter
    Classifies TextBlocks as content/not-content through rules that have been determined using the C4.8 machine learning algorithm, as described in the paper "Boilerplate Detection using Shallow Text Features", particularly using text densities and link densities.