- getAverage() - Method in class de.citec.scie.pdf.Histogramm
-
This only works if the given class type is a number.
- getBackingMap() - Method in class de.citec.scie.pdf.Histogramm
-
Returns the backing HashMap.
- getBegin() - Method in interface de.citec.scie.pdf.structure.AbstractLineSegment
-
Returns the start index of the word in the text.
- getBegin() - Method in class de.citec.scie.pdf.structure.LineSegment
-
Returns the begin position of the line.
- getEnd() - Method in interface de.citec.scie.pdf.structure.AbstractLineSegment
-
Returns the end index of the word in the text.
- getEnd() - Method in class de.citec.scie.pdf.structure.LineSegment
-
Returns the end position of the line.
- getFontName() - Method in class de.citec.scie.pdf.structure.Text
-
Get the value of fontName
- getFontSize() - Method in class de.citec.scie.pdf.structure.Text
-
Get the value of fontSize
- getMaxElement() - Method in class de.citec.scie.pdf.Histogramm
-
Returns the element that was counted the most.
- getNumber(H) - Method in class de.citec.scie.pdf.Histogramm
-
Returns the current count for a given datapoint/bin.
- getPageNumber() - Method in class de.citec.scie.pdf.structure.Page
-
Get the value of pageNumber
- getRelativeFontSize() - Method in class de.citec.scie.pdf.structure.TextBlock
-
The font size of this TextBlocks content relative to the page-wide
average.
- getRelativeFontSize(TextBlock) - Method in class de.citec.scie.pdf.TextBlockRankEstimator
-
Returns the relativ font size of this block in relation to the whole
page.
- getSize() - Method in class de.citec.scie.pdf.PreTextBlock
-
- getText() - Method in class de.citec.scie.pdf.structure.Text
-
Get the value of text
- getVerticalAlignment() - Method in class de.citec.scie.pdf.structure.Text
-
Get the value of verticalAlignment
- getX_end() - Method in class de.citec.scie.pdf.PreTextLine
-
- getX_start() - Method in class de.citec.scie.pdf.PreTextLine
-
- importAsDocument(InputStream) - Static method in class de.citec.scie.pdf.PDFStructuredTextExtractor
-
Assumes the given InputStream to contain PDF data and parses it.
- importAsInputStream(InputStream) - Static method in class de.citec.scie.pdf.PDFStructuredTextExtractor
-
Assumes the given InputStream to contain PDF data and parses it.
- importAsString(InputStream) - Static method in class de.citec.scie.pdf.PDFStructuredTextExtractor
-
Assumes the given InputStream to contain PDF data and parses it.
- indexedToString(int) - Method in class de.citec.scie.pdf.structure.Document
-
Does the same as toString but also inserts the beginning and end index of
each objects respective text representation to this objects
attributes (which is retrievable by getBegin and getEnd).
- indexedToString(int) - Method in class de.citec.scie.pdf.structure.Page
-
Does the same as toString but also inserts the beginning and end index of
each objects respective text representation to this objects
attributes (which is retrievable by getBegin and getEnd).
- indexedToString(int) - Method in class de.citec.scie.pdf.structure.Paragraph
-
Does the same as toString but also inserts the beginning and end index of
each objects respective text representation to this objects
attributes (which is retrievable by getBegin and getEnd).
- indexedToString(int) - Method in class de.citec.scie.pdf.structure.Text
-
Does the same as toString but also inserts the beginning and end index of
each objects respective text representation to this objects
attributes (which is retrievable by getBegin and getEnd).
- indexedToString(int) - Method in class de.citec.scie.pdf.structure.TextBlock
-
Does the same as toString but also inserts the beginning and end index of
each objects respective text representation to this objects
attributes (which is retrievable by getBegin and getEnd).
- intersection(AbstractLineSegment, AbstractLineSegment) - Static method in class de.citec.scie.pdf.structure.LineSegment
-
Returns a new line which is the union of the two lines.
- isNewParagraph(PreTextLine) - Method in class de.citec.scie.pdf.ParagraphEstimator
-
- isPartOfLine(TextPosition) - Method in class de.citec.scie.pdf.PreTextLine
-
- isValid(AbstractLineSegment) - Static method in class de.citec.scie.pdf.structure.LineSegment
-
Returns true if the given line is valid (its begin is smaller or equal to
its end).
- isValid() - Method in class de.citec.scie.pdf.structure.LineSegment
-
Returns true if this line is valid (its begin is smaller or equal to its
end).
- Text - Class in de.citec.scie.pdf.structure
-
This is a wrapper class for text itself with additional information about the
style of the text.
- Text() - Constructor for class de.citec.scie.pdf.structure.Text
-
- Text.VerticalAlignment - Enum in de.citec.scie.pdf.structure
-
- TextBlock - Class in de.citec.scie.pdf.structure
-
This represents a syntatic block of Text, which can be a column on a page, a
header or something similar.
- TextBlock() - Constructor for class de.citec.scie.pdf.structure.TextBlock
-
- TextBlockRankEstimator - Class in de.citec.scie.pdf
-
This estimator has the purpose to determine if a TextBlock has a larger usual
Font Size as the usual Font Size for the whole page, an equal or a smaller
one.
- TextBlockRankEstimator() - Constructor for class de.citec.scie.pdf.TextBlockRankEstimator
-
- toString() - Method in class de.citec.scie.pdf.structure.Document
-
Converts this object to a string by going recursively through the
underlying page structure and calling their respective toString
methods.
- toString() - Method in class de.citec.scie.pdf.structure.Page
-
Converts this object to a string by going recursively through the
underlying block structure and calling their respective toString
methods.
- toString() - Method in class de.citec.scie.pdf.structure.Paragraph
-
Converts this object to a string by going recursively through the
underlying text objects and calling their respective toString
methods.
- toString() - Method in class de.citec.scie.pdf.structure.Text
-
Returns the text content of this Text object.
- toString() - Method in class de.citec.scie.pdf.structure.TextBlock
-
Converts this object to a string by going recursively through the
underlying paragraph structure and calling their respective toString
methods.
- toXML() - Method in class de.citec.scie.pdf.structure.Document
-
Returns a XML representation of this document by going recursively
through the underlying page structure and calling their respective toXML
methods.
- toXML() - Method in class de.citec.scie.pdf.structure.Page
-
Returns a XML representation of this page by going recursively
through the underlying block structure and calling their respective toXML
methods.
- toXML() - Method in class de.citec.scie.pdf.structure.Paragraph
-
Returns a XML representation of this paragraph by going recursively
through the underlying text objects and calling their respective toXML
methods.
- toXML() - Method in class de.citec.scie.pdf.structure.Text
-
Returns a XML representation of this text object including its
font size, font name and vertical alignment as XML attributes.
- toXML() - Method in class de.citec.scie.pdf.structure.TextBlock
-
Returns a XML representation of this block by going recursively
through the underlying paragraph structure and calling their respective
toXML methods.