Class IgnoreBlocksAfterContentFilter
- java.lang.Object
-
- de.l3s.boilerpipe.filters.english.IgnoreBlocksAfterContentFilter
-
- All Implemented Interfaces:
BoilerpipeFilter
public final class IgnoreBlocksAfterContentFilter extends java.lang.Object implements BoilerpipeFilter
Marks all blocks as "non-content" that occur after blocks that have been markedDefaultLabels.INDICATES_END_OF_TEXT. These marks are ignored unless a minimum number of words in content blocks occur before this mark (default: 60). This can be used in conjunction with an upstreamTerminatingBlocksFinder.- See Also:
TerminatingBlocksFinder
-
-
Field Summary
Fields Modifier and Type Field Description static IgnoreBlocksAfterContentFilterDEFAULT_INSTANCEstatic IgnoreBlocksAfterContentFilterINSTANCE_200
-
Constructor Summary
Constructors Constructor Description IgnoreBlocksAfterContentFilter(int minNumWords)
-
Method Summary
Modifier and Type Method Description static IgnoreBlocksAfterContentFiltergetDefaultInstance()Returns the singleton instance for DeleteBlocksAfterContentFilter.protected static intgetNumFullTextWords(TextBlock tb)protected static intgetNumFullTextWords(TextBlock tb, float minTextDensity)booleanprocess(TextDocument doc)Processes the given documentdoc.
-
-
-
Field Detail
-
DEFAULT_INSTANCE
public static final IgnoreBlocksAfterContentFilter DEFAULT_INSTANCE
-
INSTANCE_200
public static final IgnoreBlocksAfterContentFilter INSTANCE_200
-
-
Method Detail
-
getDefaultInstance
public static IgnoreBlocksAfterContentFilter getDefaultInstance()
Returns the singleton instance for DeleteBlocksAfterContentFilter.
-
process
public boolean process(TextDocument doc) throws BoilerpipeProcessingException
Description copied from interface:BoilerpipeFilterProcesses the given documentdoc.- Specified by:
processin interfaceBoilerpipeFilter- Parameters:
doc- TheTextDocumentthat is to be processed.- Returns:
trueif changes have been made to theTextDocument.- Throws:
BoilerpipeProcessingException
-
getNumFullTextWords
protected static int getNumFullTextWords(TextBlock tb)
-
getNumFullTextWords
protected static int getNumFullTextWords(TextBlock tb, float minTextDensity)
-
-