| Package | Description |
|---|---|
| de.citec.scie.pdf |
| Modifier and Type | Method and Description |
|---|---|
static Document |
PDFStructuredTextExtractor.importAsDocument(InputStream input)
Assumes the given InputStream to contain PDF data and parses it.
|
| Modifier and Type | Method and Description |
|---|---|
void |
DocumentBlockCleaner.blockCleanup(Document doc)
The cleanup is done using a greedy heuristic as follows: Start with short
text blocks on the first page and than iterate over all other pages and
try to build a sequence of most similar TextBlocks to it.
|
Copyright © 2014. All rights reserved.