| Class | Description |
|---|---|
| DocumentBlockCleaner | |
| Histogramm<H> |
A convenience implementation for histogramms.
|
| ParagraphEstimator |
This class is able to estimate if a line break also indicates a new
paragraph.
|
| PDFStructuredTextExtractor |
This class takes a PDF File as input and extracts the text of it in an
HTML-like hierarchical object structure (see the package "structure" for the
classes itself).
|
| PreTextBlock |
A PreTextBlock represents a ThreadBead with some additional information.
|
| PreTextLine |
This just aggregates all TextPosition objects that are part of one line.
|
| StringSimilarity |
This implements an algorithm to determine the similarity between Strings by
utilizing an alignment/edit distance approach.
|
| TextBlockRankEstimator |
This estimator has the purpose to determine if a TextBlock has a larger usual
Font Size as the usual Font Size for the whole page, an equal or a smaller
one.
|
| VerticalAlignmentEstimator |
This just determines the vertical alignment of a given glyph in relation to
the line it is part of.
|
| WhiteSpaceEstimator |
This is based on the work of Ben Litchfield in the PDFTextStripper of Apache
PDFBox.
|
Copyright © 2014. All rights reserved.