public class DocumentBlockCleaner extends Object
| Modifier and Type | Field and Description |
|---|---|
static double |
REMOVETHRESHOLD |
static int |
SMALLBLOCKSIZE |
| Constructor and Description |
|---|
DocumentBlockCleaner() |
| Modifier and Type | Method and Description |
|---|---|
void |
blockCleanup(Document doc)
The cleanup is done using a greedy heuristic as follows: Start with short
text blocks on the first page and than iterate over all other pages and
try to build a sequence of most similar TextBlocks to it.
|
public static final int SMALLBLOCKSIZE
public static final double REMOVETHRESHOLD
public void blockCleanup(Document doc)
doc - a document.Copyright © 2014. All rights reserved.