Class NumWordsRulesClassifier
- java.lang.Object
-
- de.l3s.boilerpipe.filters.english.NumWordsRulesClassifier
-
- All Implemented Interfaces:
BoilerpipeFilter
public class NumWordsRulesClassifier extends java.lang.Object implements BoilerpipeFilter
ClassifiesTextBlocks as content/not-content through rules that have been determined using the C4.8 machine learning algorithm, as described in the paper "Boilerplate Detection using Shallow Text Features" (WSDM 2010), particularly using number of words per block and link density per block.
-
-
Field Summary
Fields Modifier and Type Field Description static NumWordsRulesClassifierINSTANCE
-
Constructor Summary
Constructors Constructor Description NumWordsRulesClassifier()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected booleanclassify(TextBlock prev, TextBlock curr, TextBlock next)static NumWordsRulesClassifiergetInstance()Returns the singleton instance for RulebasedBoilerpipeClassifier.booleanprocess(TextDocument doc)Processes the given documentdoc.
-
-
-
Field Detail
-
INSTANCE
public static final NumWordsRulesClassifier INSTANCE
-
-
Method Detail
-
getInstance
public static NumWordsRulesClassifier getInstance()
Returns the singleton instance for RulebasedBoilerpipeClassifier.
-
process
public boolean process(TextDocument doc) throws BoilerpipeProcessingException
Description copied from interface:BoilerpipeFilterProcesses the given documentdoc.- Specified by:
processin interfaceBoilerpipeFilter- Parameters:
doc- TheTextDocumentthat is to be processed.- Returns:
trueif changes have been made to theTextDocument.- Throws:
BoilerpipeProcessingException
-
-