| BoilerplateBlockFilter |
Removes TextBlocks which have explicitly been marked as "not content".
|
| InvertedFilter |
Reverts the "isContent" flag for all TextBlocks
|
| LabelToBoilerplateFilter |
Marks all blocks that contain a given label as "boilerplate".
|
| LabelToContentFilter |
Marks all blocks that contain a given label as "content".
|
| MarkEverythingContentFilter |
Marks all blocks as content.
|
| MinClauseWordsFilter |
Keeps only blocks that have at least one segment fragment ("clause") with at
least k words (default: 5).
|
| MinWordsFilter |
Keeps only those content blocks which contain at least k words.
|
| SplitParagraphBlocksFilter |
Splits TextBlocks at paragraph boundaries.
|
| SurroundingToContentFilter |
|