public abstract class OcrPassageFormatter
extends org.apache.lucene.search.uhighlight.PassageFormatter
OcrSnippet instances.| Modifier and Type | Field and Description |
|---|---|
protected boolean |
absoluteHighlights |
protected String |
endHlTag |
protected String |
startHlTag |
| Modifier | Constructor and Description |
|---|---|
protected |
OcrPassageFormatter(String startHlTag,
String endHlTag,
boolean absoluteHighlights) |
| Modifier and Type | Method and Description |
|---|---|
protected void |
addHighlightsToSnippet(List<List<OcrBox>> hlBoxes,
OcrSnippet snippet) |
abstract String |
determineStartPage(String ocrFragment,
int startOffset,
IterableCharSequence content)
Determine the id of the page an OCR fragment resides on.
|
OcrSnippet[] |
format(org.apache.lucene.search.uhighlight.Passage[] passages,
IterableCharSequence content)
Format the passages that point to subsequences of the document text into
OcrSnippet instances |
Object |
format(org.apache.lucene.search.uhighlight.Passage[] passages,
String content)
Convenience implementation to format document text that is available as a
String. |
protected String |
getTextFromXml(String xmlFragment)
Helper method to get plaintext from XML/HTML-like fragments
|
protected List<OcrBox> |
mergeBoxes(List<OcrBox> boxes)
Merge adjacent OCR boxes into a single one, taking line breaks into account
|
protected OcrSnippet |
parseFragment(String ocrFragment,
String pageId)
Parse an
OcrSnippet from an OCR fragment. |
protected abstract List<OcrBox> |
parseWords(String ocrFragment,
String startPage)
Parse word boxes from an OCR fragment.
|
protected final String startHlTag
protected final String endHlTag
protected final boolean absoluteHighlights
public OcrSnippet[] format(org.apache.lucene.search.uhighlight.Passage[] passages, IterableCharSequence content)
OcrSnippet instancespassages - in the the document text that contain highlighted textcontent - of the OCR field, implemented as an IterableCharSequenceprotected String getTextFromXml(String xmlFragment)
public abstract String determineStartPage(String ocrFragment, int startOffset, IterableCharSequence content)
protected OcrSnippet parseFragment(String ocrFragment, String pageId)
OcrSnippet from an OCR fragment.protected abstract List<OcrBox> parseWords(String ocrFragment, String startPage)
protected void addHighlightsToSnippet(List<List<OcrBox>> hlBoxes, OcrSnippet snippet)
protected List<OcrBox> mergeBoxes(List<OcrBox> boxes)
public Object format(org.apache.lucene.search.uhighlight.Passage[] passages, String content)
String.
Wraps the String in a IterableCharSequence implementation and calls
format(Passage[], IterableCharSequence)format in class org.apache.lucene.search.uhighlight.PassageFormatterpassages - in the the document text that contain highlighted textcontent - of the OCR field, implemented as an IterableCharSequenceCopyright © 2019. All rights reserved.