Class TextBlock

  • All Implemented Interfaces:
    java.lang.Cloneable

    public class TextBlock
    extends java.lang.Object
    implements java.lang.Cloneable
    Describes a block of text. A block can be an "atomic" text element (i.e., a sequence of text that is not interrupted by any HTML markup) or a compound of such atomic elements.
    • Field Detail

      • EMPTY_START

        public static final TextBlock EMPTY_START
      • EMPTY_END

        public static final TextBlock EMPTY_END
    • Constructor Detail

      • TextBlock

        public TextBlock​(java.lang.String text)
      • TextBlock

        public TextBlock​(java.lang.String text,
                         java.util.BitSet containedTextElements,
                         int numWords,
                         int numWordsInAnchorText,
                         int numWordsInWrappedLines,
                         int numWrappedLines,
                         int offsetBlocks)
    • Method Detail

      • isContent

        public boolean isContent()
      • setIsContent

        public boolean setIsContent​(boolean isContent)
      • getText

        public java.lang.String getText()
      • getNumWords

        public int getNumWords()
      • getNumWordsInAnchorText

        public int getNumWordsInAnchorText()
      • getTextDensity

        public float getTextDensity()
      • getLinkDensity

        public float getLinkDensity()
      • mergeNext

        public void mergeNext​(TextBlock other)
      • getOffsetBlocksStart

        public int getOffsetBlocksStart()
      • getOffsetBlocksEnd

        public int getOffsetBlocksEnd()
      • toString

        public java.lang.String toString()
        Overrides:
        toString in class java.lang.Object
      • addLabel

        public void addLabel​(java.lang.String label)
        Adds an arbitrary String label to this TextBlock.
        Parameters:
        label - The label
        See Also:
        DefaultLabels
      • hasLabel

        public boolean hasLabel​(java.lang.String label)
        Checks whether this TextBlock has the given label.
        Parameters:
        label - The label
        Returns:
        true if this block is marked by the given label.
      • removeLabel

        public boolean removeLabel​(java.lang.String label)
      • getLabels

        public java.util.Set<java.lang.String> getLabels()
        Returns the labels associated to this TextBlock, or null if no such labels exist. NOTE: The returned instance is the one used directly in TextBlock. You have full access to the data structure. However it is recommended to use the label-specific methods in TextBlock whenever possible.
        Returns:
        Returns the set of labels, or null if no labels was added yet.
      • addLabels

        public void addLabels​(java.util.Set<java.lang.String> l)
        Adds a set of labels to this TextBlock. null-references are silently ignored.
        Parameters:
        l - The labels to be added.
      • addLabels

        public void addLabels​(java.lang.String... l)
        Adds a set of labels to this TextBlock. null-references are silently ignored.
        Parameters:
        l - The labels to be added.
      • getContainedTextElements

        public java.util.BitSet getContainedTextElements()
        Returns the containedTextElements BitSet, or null.
        Returns:
      • clone

        protected java.lang.Object clone()
        Overrides:
        clone in class java.lang.Object
      • getTagLevel

        public int getTagLevel()
      • setTagLevel

        public void setTagLevel​(int tagLevel)