Package de.l3s.boilerpipe.filters.heuristics
-
Class Summary Class Description AddPrecedingLabelsFilter Adds the labels of the preceding block to the current block, optionally adding a prefix.ArticleMetadataFilter BlockProximityFusion Fuses adjacent blocks if their distance (in blocks) does not exceed a certain limit.ContentFusion DocumentTitleMatchClassifier MarksTextBlocks which contain parts of the HTML<TITLE>tag, using some heuristics which are quite specific to the news domain.ExpandTitleToContentFilter Marks allTextBlocks "content" which are between the headline and the part that has already been marked content, if they are markedDefaultLabels.MIGHT_BE_CONTENT.KeepLargestBlockFilter Keeps the largestTextBlockonly (by the number of words).LabelFusion Fuses adjacent blocks if their labels are equal.SimpleBlockFusionProcessor Merges two subsequent blocks if their text densities are equal.