Package de.l3s.boilerpipe.document
Class TextBlock
- java.lang.Object
-
- de.l3s.boilerpipe.document.TextBlock
-
- All Implemented Interfaces:
java.lang.Cloneable
public class TextBlock extends java.lang.Object implements java.lang.CloneableDescribes a block of text. A block can be an "atomic" text element (i.e., a sequence of text that is not interrupted by any HTML markup) or a compound of such atomic elements.
-
-
Field Summary
Fields Modifier and Type Field Description static TextBlockEMPTY_ENDstatic TextBlockEMPTY_START
-
Method Summary
Modifier and Type Method Description voidaddLabel(java.lang.String label)Adds an arbitrary String label to thisTextBlock.voidaddLabels(java.lang.String... l)Adds a set of labels to thisTextBlock.voidaddLabels(java.util.Set<java.lang.String> l)Adds a set of labels to thisTextBlock.protected java.lang.Objectclone()java.util.BitSetgetContainedTextElements()Returns the containedTextElements BitSet, ornull.java.util.Set<java.lang.String>getLabels()Returns the labels associated to this TextBlock, ornullif no such labels exist.floatgetLinkDensity()intgetNumWords()intgetNumWordsInAnchorText()intgetOffsetBlocksEnd()intgetOffsetBlocksStart()intgetTagLevel()java.lang.StringgetText()floatgetTextDensity()booleanhasLabel(java.lang.String label)Checks whether this TextBlock has the given label.booleanisContent()voidmergeNext(TextBlock other)booleanremoveLabel(java.lang.String label)booleansetIsContent(boolean isContent)voidsetTagLevel(int tagLevel)java.lang.StringtoString()
-
-
-
Method Detail
-
isContent
public boolean isContent()
-
setIsContent
public boolean setIsContent(boolean isContent)
-
getText
public java.lang.String getText()
-
getNumWords
public int getNumWords()
-
getNumWordsInAnchorText
public int getNumWordsInAnchorText()
-
getTextDensity
public float getTextDensity()
-
getLinkDensity
public float getLinkDensity()
-
mergeNext
public void mergeNext(TextBlock other)
-
getOffsetBlocksStart
public int getOffsetBlocksStart()
-
getOffsetBlocksEnd
public int getOffsetBlocksEnd()
-
toString
public java.lang.String toString()
- Overrides:
toStringin classjava.lang.Object
-
addLabel
public void addLabel(java.lang.String label)
Adds an arbitrary String label to thisTextBlock.- Parameters:
label- The label- See Also:
DefaultLabels
-
hasLabel
public boolean hasLabel(java.lang.String label)
Checks whether this TextBlock has the given label.- Parameters:
label- The label- Returns:
trueif this block is marked by the given label.
-
removeLabel
public boolean removeLabel(java.lang.String label)
-
getLabels
public java.util.Set<java.lang.String> getLabels()
Returns the labels associated to this TextBlock, ornullif no such labels exist. NOTE: The returned instance is the one used directly in TextBlock. You have full access to the data structure. However it is recommended to use the label-specific methods inTextBlockwhenever possible.- Returns:
- Returns the set of labels, or
nullif no labels was added yet.
-
addLabels
public void addLabels(java.util.Set<java.lang.String> l)
Adds a set of labels to thisTextBlock.null-references are silently ignored.- Parameters:
l- The labels to be added.
-
addLabels
public void addLabels(java.lang.String... l)
Adds a set of labels to thisTextBlock.null-references are silently ignored.- Parameters:
l- The labels to be added.
-
getContainedTextElements
public java.util.BitSet getContainedTextElements()
Returns the containedTextElements BitSet, ornull.- Returns:
-
clone
protected java.lang.Object clone()
- Overrides:
clonein classjava.lang.Object
-
getTagLevel
public int getTagLevel()
-
setTagLevel
public void setTagLevel(int tagLevel)
-
-