Skip navigation links
A B C D E F G H I L M N O P S T U V Z 

A

ABSOLUTE_HIGHLIGHTS - Static variable in interface de.digitalcollections.solrocr.solr.OcrHighlightParams
 
absoluteHighlights - Variable in class de.digitalcollections.solrocr.formats.OcrPassageFormatter
 
addHighlightRegion(List<OcrBox>) - Method in class de.digitalcollections.solrocr.formats.OcrSnippet
Add a new highlighted region in the snippet.
addHighlightsToSnippet(List<List<OcrBox>>, OcrSnippet) - Method in class de.digitalcollections.solrocr.formats.mini.MiniOcrPassageFormatter
 
addHighlightsToSnippet(List<List<OcrBox>>, OcrSnippet) - Method in class de.digitalcollections.solrocr.formats.OcrPassageFormatter
 
addSnippetCountForField(String, int) - Method in class de.digitalcollections.solrocr.util.OcrHighlightResult
 
addSnippetsForField(String, OcrSnippet[]) - Method in class de.digitalcollections.solrocr.util.OcrHighlightResult
 
AltoByteOffsetsParser - Class in de.digitalcollections.solrocr.formats.alto
 
AltoByteOffsetsParser() - Constructor for class de.digitalcollections.solrocr.formats.alto.AltoByteOffsetsParser
 
AltoCharFilterFactory - Class in de.digitalcollections.solrocr.formats.alto
CharFilter to convert ALTO to plaintext while keeping track of the token offsets.
AltoCharFilterFactory(Map<String, String>) - Constructor for class de.digitalcollections.solrocr.formats.alto.AltoCharFilterFactory
 
AltoFormat - Class in de.digitalcollections.solrocr.formats.alto
 
AltoFormat() - Constructor for class de.digitalcollections.solrocr.formats.alto.AltoFormat
 
AltoPassageFormatter - Class in de.digitalcollections.solrocr.formats.alto
 
AltoPassageFormatter(String, String, boolean) - Constructor for class de.digitalcollections.solrocr.formats.alto.AltoPassageFormatter
 

B

byteOffset() - Method in class de.digitalcollections.solrocr.lucene.byteoffset.ByteOffsetsEnum
 
byteOffset() - Method in class de.digitalcollections.solrocr.lucene.byteoffset.ByteOffsetsEnum.MultiByteOffsetsEnum
 
byteOffset() - Method in class de.digitalcollections.solrocr.lucene.byteoffset.ByteOffsetsEnum.OfPostings
 
ByteOffsetEncoder - Class in de.digitalcollections.solrocr.lucene.byteoffset
 
ByteOffsetEncoder() - Constructor for class de.digitalcollections.solrocr.lucene.byteoffset.ByteOffsetEncoder
 
ByteOffsetPhraseHelper - Class in de.digitalcollections.solrocr.lucene.byteoffset
Customization of PhraseHelper to add support for byte offsets from payloads About 80% of this code is copied straight from the original class.
ByteOffsetPhraseHelper(Query, String, Predicate<String>, Function<SpanQuery, Boolean>, Function<Query, Collection<Query>>, boolean) - Constructor for class de.digitalcollections.solrocr.lucene.byteoffset.ByteOffsetPhraseHelper
 
ByteOffsetsEnum - Class in de.digitalcollections.solrocr.lucene.byteoffset
Customization of OffsetsEnum to load the byte offset from payloads.
ByteOffsetsEnum() - Constructor for class de.digitalcollections.solrocr.lucene.byteoffset.ByteOffsetsEnum
 
ByteOffsetsEnum.MultiByteOffsetsEnum - Class in de.digitalcollections.solrocr.lucene.byteoffset
 
ByteOffsetsEnum.OfPostings - Class in de.digitalcollections.solrocr.lucene.byteoffset
 

C

charAt(int) - Method in class de.digitalcollections.solrocr.util.FileBytesCharIterator
Get character at the given byte offset.
charAt(int) - Method in class de.digitalcollections.solrocr.util.IterableCharSequence.IterableStringCharSequence
 
charAt(int) - Method in class de.digitalcollections.solrocr.util.MultiFileBytesCharIterator
 
chars() - Method in class de.digitalcollections.solrocr.util.IterableCharSequence.IterableStringCharSequence
 
clone() - Method in class de.digitalcollections.solrocr.util.FileBytesCharIterator
 
clone() - Method in class de.digitalcollections.solrocr.util.IterableCharSequence.IterableStringCharSequence
 
clone() - Method in class de.digitalcollections.solrocr.util.MultiFileBytesCharIterator
 
close() - Method in class de.digitalcollections.solrocr.lucene.byteoffset.ByteOffsetsEnum
 
close() - Method in class de.digitalcollections.solrocr.lucene.byteoffset.ByteOffsetsEnum.MultiByteOffsetsEnum
 
codePoints() - Method in class de.digitalcollections.solrocr.util.IterableCharSequence.IterableStringCharSequence
 
compareTo(ByteOffsetsEnum) - Method in class de.digitalcollections.solrocr.lucene.byteoffset.ByteOffsetsEnum
 
compareTo(OcrBox) - Method in class de.digitalcollections.solrocr.util.OcrBox
 
components - Variable in class de.digitalcollections.solrocr.lucene.byteoffset.FieldByteOffsetStrategy
 
CONTEXT_BLOCK - Static variable in interface de.digitalcollections.solrocr.solr.OcrHighlightParams
 
CONTEXT_SIZE - Static variable in interface de.digitalcollections.solrocr.solr.OcrHighlightParams
 
ContextBreakIterator - Class in de.digitalcollections.solrocr.util
A meta break iterator that wraps other BreakIterators and aggregates their breaks to form larger contexts.
ContextBreakIterator(BreakIterator, BreakIterator, int) - Constructor for class de.digitalcollections.solrocr.util.ContextBreakIterator
Wrap another BreakIterator and configure the output context size
create(Reader) - Method in class de.digitalcollections.solrocr.formats.alto.AltoCharFilterFactory
 
create(Reader) - Method in class de.digitalcollections.solrocr.formats.hocr.HocrCharFilterFactory
 
create(TokenStream) - Method in class de.digitalcollections.solrocr.lucene.NonAlphaTrimFilterFactory
 
createByteOffsetsEnumFromReader(LeafReader, int) - Method in class de.digitalcollections.solrocr.lucene.byteoffset.FieldByteOffsetStrategy
 
createByteOffsetsEnumsForAutomata(Terms, int, List<ByteOffsetsEnum>) - Method in class de.digitalcollections.solrocr.lucene.byteoffset.FieldByteOffsetStrategy
 
createByteOffsetsEnumsForSpans(LeafReader, int, List<ByteOffsetsEnum>) - Method in class de.digitalcollections.solrocr.lucene.byteoffset.ByteOffsetPhraseHelper
 
createByteOffsetsEnumsForTerms(BytesRef[], Terms, int, List<ByteOffsetsEnum>) - Method in class de.digitalcollections.solrocr.lucene.byteoffset.FieldByteOffsetStrategy
 
current() - Method in class de.digitalcollections.solrocr.formats.hocr.HocrClassBreakIterator
 
current() - Method in class de.digitalcollections.solrocr.util.ContextBreakIterator
 
current() - Method in class de.digitalcollections.solrocr.util.FileBytesCharIterator
 
current() - Method in class de.digitalcollections.solrocr.util.IterableCharSequence.IterableStringCharSequence
 
current() - Method in class de.digitalcollections.solrocr.util.MultiFileBytesCharIterator
 
current() - Method in class de.digitalcollections.solrocr.util.TagBreakIterator
 

D

de.digitalcollections.solrocr.formats - package de.digitalcollections.solrocr.formats
 
de.digitalcollections.solrocr.formats.alto - package de.digitalcollections.solrocr.formats.alto
 
de.digitalcollections.solrocr.formats.hocr - package de.digitalcollections.solrocr.formats.hocr
 
de.digitalcollections.solrocr.formats.mini - package de.digitalcollections.solrocr.formats.mini
 
de.digitalcollections.solrocr.lucene - package de.digitalcollections.solrocr.lucene
 
de.digitalcollections.solrocr.lucene.byteoffset - package de.digitalcollections.solrocr.lucene.byteoffset
 
de.digitalcollections.solrocr.lucene.fieldloader - package de.digitalcollections.solrocr.lucene.fieldloader
 
de.digitalcollections.solrocr.lucene.vendor - package de.digitalcollections.solrocr.lucene.vendor
 
de.digitalcollections.solrocr.solr - package de.digitalcollections.solrocr.solr
 
de.digitalcollections.solrocr.util - package de.digitalcollections.solrocr.util
 
decode(BytesRef) - Static method in class de.digitalcollections.solrocr.lucene.byteoffset.ByteOffsetEncoder
 
determineStartPage(String, int, IterableCharSequence) - Method in class de.digitalcollections.solrocr.formats.alto.AltoPassageFormatter
 
determineStartPage(String, int, IterableCharSequence) - Method in class de.digitalcollections.solrocr.formats.hocr.HocrPassageFormatter
 
determineStartPage(String, int, IterableCharSequence) - Method in class de.digitalcollections.solrocr.formats.mini.MiniOcrPassageFormatter
 
determineStartPage(String, int, IterableCharSequence) - Method in class de.digitalcollections.solrocr.formats.OcrPassageFormatter
Determine the id of the page an OCR fragment resides on.
doHighlighting(DocList, Query, SolrQueryRequest, String[]) - Method in class de.digitalcollections.solrocr.solr.SolrOcrHighlighter
 

E

EMPTY - Static variable in class de.digitalcollections.solrocr.lucene.byteoffset.ByteOffsetsEnum
 
encode(int) - Static method in class de.digitalcollections.solrocr.lucene.byteoffset.ByteOffsetEncoder
 
encode(char[], int, int) - Method in class de.digitalcollections.solrocr.lucene.byteoffset.ByteOffsetEncoder
 
encodedLength(CharSequence) - Static method in class de.digitalcollections.solrocr.util.Utf8
Returns the number of bytes in the UTF-8-encoded form of sequence.
endHlTag - Variable in class de.digitalcollections.solrocr.formats.OcrPassageFormatter
 
ExternalFieldLoader - Interface in de.digitalcollections.solrocr.lucene.fieldloader
Allows loading field values from arbitrary sources outside of Solr/Lucene

F

FieldByteOffsetStrategy - Class in de.digitalcollections.solrocr.lucene.byteoffset
Customization of FieldOffsetStrategy to load byte offsets from payloads.
FieldByteOffsetStrategy(OcrHComponents) - Constructor for class de.digitalcollections.solrocr.lucene.byteoffset.FieldByteOffsetStrategy
 
fieldByteOffsetStrategy - Variable in class de.digitalcollections.solrocr.lucene.OcrFieldHighlighter
 
FieldByteOffsetStrategy.PostingsByteOffsetStrategy - Class in de.digitalcollections.solrocr.lucene.byteoffset
 
FieldByteOffsetStrategy.PostingsWithTermVectorsByteOffsetStrategy - Class in de.digitalcollections.solrocr.lucene.byteoffset
 
FieldByteOffsetStrategy.TermVectorByteOffsetStrategy - Class in de.digitalcollections.solrocr.lucene.byteoffset
 
FileBytesCharIterator - Class in de.digitalcollections.solrocr.util
ATTENTION: This breaks the semantics of CharacterIterator and CharSequence since all indices are byte offsets into the underlying file, not character indices.
FileBytesCharIterator(Path) - Constructor for class de.digitalcollections.solrocr.util.FileBytesCharIterator
 
FileBytesCharIterator(Path, Charset) - Constructor for class de.digitalcollections.solrocr.util.FileBytesCharIterator
 
FileBytesCharIterator(FileBytesCharIterator) - Constructor for class de.digitalcollections.solrocr.util.FileBytesCharIterator
 
first() - Method in class de.digitalcollections.solrocr.formats.hocr.HocrClassBreakIterator
 
first() - Method in class de.digitalcollections.solrocr.util.ContextBreakIterator
 
first() - Method in class de.digitalcollections.solrocr.util.FileBytesCharIterator
 
first() - Method in class de.digitalcollections.solrocr.util.IterableCharSequence.IterableStringCharSequence
 
first() - Method in class de.digitalcollections.solrocr.util.MultiFileBytesCharIterator
 
first() - Method in class de.digitalcollections.solrocr.util.TagBreakIterator
 
following(int) - Method in class de.digitalcollections.solrocr.formats.hocr.HocrClassBreakIterator
 
following(int) - Method in class de.digitalcollections.solrocr.util.ContextBreakIterator
 
following(int) - Method in class de.digitalcollections.solrocr.util.TagBreakIterator
 
format(Passage[], String) - Method in class de.digitalcollections.solrocr.formats.mini.MiniOcrPassageFormatter
 
format(Passage[], IterableCharSequence) - Method in class de.digitalcollections.solrocr.formats.OcrPassageFormatter
Format the passages that point to subsequences of the document text into OcrSnippet instances
format(Passage[], String) - Method in class de.digitalcollections.solrocr.formats.OcrPassageFormatter
Convenience implementation to format document text that is available as a String.
freq() - Method in class de.digitalcollections.solrocr.lucene.byteoffset.ByteOffsetsEnum
 
freq() - Method in class de.digitalcollections.solrocr.lucene.byteoffset.ByteOffsetsEnum.MultiByteOffsetsEnum
 
freq() - Method in class de.digitalcollections.solrocr.lucene.byteoffset.ByteOffsetsEnum.OfPostings
 
fromString(String) - Static method in interface de.digitalcollections.solrocr.util.IterableCharSequence
 

G

getBeginIndex() - Method in class de.digitalcollections.solrocr.util.FileBytesCharIterator
 
getBeginIndex() - Method in class de.digitalcollections.solrocr.util.IterableCharSequence.IterableStringCharSequence
 
getBeginIndex() - Method in class de.digitalcollections.solrocr.util.MultiFileBytesCharIterator
 
getBreakIterator(OcrBlock, OcrBlock, int) - Method in class de.digitalcollections.solrocr.formats.alto.AltoFormat
 
getBreakIterator(OcrBlock, OcrBlock, int) - Method in class de.digitalcollections.solrocr.formats.hocr.HocrFormat
 
getBreakIterator(OcrBlock, OcrBlock, int) - Method in class de.digitalcollections.solrocr.formats.mini.MiniOcrFormat
 
getBreakIterator(OcrBlock, OcrBlock, int) - Method in interface de.digitalcollections.solrocr.formats.OcrFormat
Get a BreakIterator that splits the content according to the break parameters
getByteOffsetPhraseHelper() - Method in class de.digitalcollections.solrocr.lucene.OcrHComponents
 
getByteOffsetPhraseHelper(String, Query, Set<UnifiedHighlighter.HighlightFlag>) - Method in class de.digitalcollections.solrocr.lucene.OcrHighlighter
 
getByteOffsetsEnum(LeafReader, int) - Method in class de.digitalcollections.solrocr.lucene.byteoffset.FieldByteOffsetStrategy
 
getByteOffsetsEnum(LeafReader, int) - Method in class de.digitalcollections.solrocr.lucene.byteoffset.FieldByteOffsetStrategy.PostingsByteOffsetStrategy
 
getByteOffsetsEnum(LeafReader, int) - Method in class de.digitalcollections.solrocr.lucene.byteoffset.FieldByteOffsetStrategy.PostingsWithTermVectorsByteOffsetStrategy
 
getByteOffsetsEnum(LeafReader, int) - Method in class de.digitalcollections.solrocr.lucene.byteoffset.FieldByteOffsetStrategy.TermVectorByteOffsetStrategy
 
getByteOffsetsEnum(LeafReader, int) - Method in class de.digitalcollections.solrocr.lucene.byteoffset.NoOpByteOffsetStrategy
 
getByteOffsetStrategy(UnifiedHighlighter.OffsetSource, OcrHComponents) - Method in class de.digitalcollections.solrocr.lucene.OcrHighlighter
 
getCharset() - Method in interface de.digitalcollections.solrocr.lucene.fieldloader.ExternalFieldLoader
Get the charset that field values will be encoded in
getCharset() - Method in class de.digitalcollections.solrocr.lucene.fieldloader.PathFieldLoader
 
getCharset() - Method in class de.digitalcollections.solrocr.util.FileBytesCharIterator
 
getCharset() - Method in interface de.digitalcollections.solrocr.util.IterableCharSequence
 
getCharset() - Method in class de.digitalcollections.solrocr.util.IterableCharSequence.IterableStringCharSequence
 
getCharset() - Method in class de.digitalcollections.solrocr.util.MultiFileBytesCharIterator
 
getCoreCacheHelper() - Method in class de.digitalcollections.solrocr.lucene.vendor.TermVectorFilteredLeafReader
 
getDescription() - Method in class de.digitalcollections.solrocr.solr.HighlightComponent
 
getEndIndex() - Method in class de.digitalcollections.solrocr.util.FileBytesCharIterator
 
getEndIndex() - Method in class de.digitalcollections.solrocr.util.IterableCharSequence.IterableStringCharSequence
 
getEndIndex() - Method in class de.digitalcollections.solrocr.util.MultiFileBytesCharIterator
 
getField() - Method in class de.digitalcollections.solrocr.lucene.byteoffset.FieldByteOffsetStrategy
 
getFieldSnippets(String) - Method in class de.digitalcollections.solrocr.util.OcrHighlightResult
 
getFlags(String) - Method in class de.digitalcollections.solrocr.lucene.OcrHighlighter
 
getHighlightRegions() - Method in class de.digitalcollections.solrocr.formats.OcrSnippet
Get the highlighted regions of the snippet region.
getIdentifier() - Method in class de.digitalcollections.solrocr.util.FileBytesCharIterator
 
getIdentifier() - Method in interface de.digitalcollections.solrocr.util.IterableCharSequence
 
getIdentifier() - Method in class de.digitalcollections.solrocr.util.IterableCharSequence.IterableStringCharSequence
 
getIdentifier() - Method in class de.digitalcollections.solrocr.util.MultiFileBytesCharIterator
 
getIndex() - Method in class de.digitalcollections.solrocr.util.FileBytesCharIterator
 
getIndex() - Method in class de.digitalcollections.solrocr.util.IterableCharSequence.IterableStringCharSequence
 
getIndex() - Method in class de.digitalcollections.solrocr.util.MultiFileBytesCharIterator
 
getLrx() - Method in class de.digitalcollections.solrocr.util.OcrBox
 
getLry() - Method in class de.digitalcollections.solrocr.util.OcrBox
 
getNumMatches(int) - Method in class de.digitalcollections.solrocr.lucene.OcrFieldHighlighter
 
getOffsetSource() - Method in class de.digitalcollections.solrocr.lucene.byteoffset.FieldByteOffsetStrategy
 
getOffsetSource() - Method in class de.digitalcollections.solrocr.lucene.byteoffset.FieldByteOffsetStrategy.PostingsByteOffsetStrategy
 
getOffsetSource() - Method in class de.digitalcollections.solrocr.lucene.byteoffset.FieldByteOffsetStrategy.PostingsWithTermVectorsByteOffsetStrategy
 
getOffsetSource() - Method in class de.digitalcollections.solrocr.lucene.byteoffset.FieldByteOffsetStrategy.TermVectorByteOffsetStrategy
 
getOffsetSource() - Method in class de.digitalcollections.solrocr.lucene.byteoffset.NoOpByteOffsetStrategy
 
getOffsetType() - Method in class de.digitalcollections.solrocr.util.FileBytesCharIterator
 
getOffsetType() - Method in interface de.digitalcollections.solrocr.util.IterableCharSequence
 
getOffsetType() - Method in class de.digitalcollections.solrocr.util.IterableCharSequence.IterableStringCharSequence
 
getOffsetType() - Method in class de.digitalcollections.solrocr.util.MultiFileBytesCharIterator
 
getPageId() - Method in class de.digitalcollections.solrocr.util.OcrBox
 
getPassageFormatter(String, String, boolean) - Method in class de.digitalcollections.solrocr.formats.alto.AltoFormat
 
getPassageFormatter(String, String, boolean) - Method in class de.digitalcollections.solrocr.formats.hocr.HocrFormat
 
getPassageFormatter(String, String, boolean) - Method in class de.digitalcollections.solrocr.formats.mini.MiniOcrFormat
 
getPassageFormatter(String, String, boolean) - Method in interface de.digitalcollections.solrocr.formats.OcrFormat
Get a PassageFormatter that builds OCR snippets from passages
getPostingsEnum() - Method in class de.digitalcollections.solrocr.lucene.byteoffset.ByteOffsetsEnum.OfPostings
 
getReaderCacheHelper() - Method in class de.digitalcollections.solrocr.lucene.vendor.TermVectorFilteredLeafReader
 
getRequiredFields() - Method in interface de.digitalcollections.solrocr.lucene.fieldloader.ExternalFieldLoader
Get the names of the fields that are required for ExternalFieldLoader.loadField(Map, String)
getRequiredFields() - Method in class de.digitalcollections.solrocr.lucene.fieldloader.PathFieldLoader
 
getScore() - Method in class de.digitalcollections.solrocr.formats.OcrSnippet
Get the score of the passage, compared to all other passages in the document
getScorer(String) - Method in class de.digitalcollections.solrocr.lucene.OcrHighlighter
 
getSnippetCount(String) - Method in class de.digitalcollections.solrocr.util.OcrHighlightResult
 
getSnippetRegions() - Method in class de.digitalcollections.solrocr.formats.OcrSnippet
Get the region of the page that the snippes is located in
getSummaryPassagesNoHighlight(int) - Method in class de.digitalcollections.solrocr.lucene.OcrFieldHighlighter
We don't provide summaries if there is no highlighting, i.e.
getTerm() - Method in class de.digitalcollections.solrocr.lucene.byteoffset.ByteOffsetsEnum
 
getTerm() - Method in class de.digitalcollections.solrocr.lucene.byteoffset.ByteOffsetsEnum.MultiByteOffsetsEnum
 
getTerm() - Method in class de.digitalcollections.solrocr.lucene.byteoffset.ByteOffsetsEnum.OfPostings
 
getText() - Method in class de.digitalcollections.solrocr.formats.hocr.HocrClassBreakIterator
 
getText() - Method in class de.digitalcollections.solrocr.formats.OcrSnippet
Get the plaintext version of the highlighted page text with highlighting tags
getText() - Method in class de.digitalcollections.solrocr.util.ContextBreakIterator
 
getText() - Method in class de.digitalcollections.solrocr.util.OcrBox
 
getText() - Method in class de.digitalcollections.solrocr.util.TagBreakIterator
 
getTextFromXml(String) - Method in class de.digitalcollections.solrocr.formats.alto.AltoPassageFormatter
 
getTextFromXml(String) - Method in class de.digitalcollections.solrocr.formats.OcrPassageFormatter
Helper method to get plaintext from XML/HTML-like fragments
getUlx() - Method in class de.digitalcollections.solrocr.util.OcrBox
 
getUly() - Method in class de.digitalcollections.solrocr.util.OcrBox
 

H

highlightByteOffsetsEnums(ByteOffsetsEnum, int, String) - Method in class de.digitalcollections.solrocr.lucene.OcrFieldHighlighter
Highlight passages from the document using the byte offsets in the payloads of the matching terms.
HighlightComponent - Class in de.digitalcollections.solrocr.solr
 
HighlightComponent() - Constructor for class de.digitalcollections.solrocr.solr.HighlightComponent
 
highlightFieldForDoc(LeafReader, int, IterableCharSequence, String) - Method in class de.digitalcollections.solrocr.lucene.OcrFieldHighlighter
The primary method -- highlight this doc, assuming a specific field and given this content.
highlightingResponseField() - Method in class de.digitalcollections.solrocr.solr.HighlightComponent
 
highlightOcrFields(String[], Query, int[], int[], BreakIterator, OcrPassageFormatter, String) - Method in class de.digitalcollections.solrocr.lucene.OcrHighlighter
 
highlightOffsetsEnums(OffsetsEnum) - Method in class de.digitalcollections.solrocr.lucene.OcrFieldHighlighter
 
highlightOffsetsEnums(OffsetsEnum, int, String) - Method in class de.digitalcollections.solrocr.lucene.OcrFieldHighlighter
 
HocrByteOffsetsParser - Class in de.digitalcollections.solrocr.formats.hocr
 
HocrByteOffsetsParser() - Constructor for class de.digitalcollections.solrocr.formats.hocr.HocrByteOffsetsParser
 
HocrCharFilterFactory - Class in de.digitalcollections.solrocr.formats.hocr
CharFilter to convert hOCR to plaintext while resolving hyphenation.
HocrCharFilterFactory(Map<String, String>) - Constructor for class de.digitalcollections.solrocr.formats.hocr.HocrCharFilterFactory
 
HocrClassBreakIterator - Class in de.digitalcollections.solrocr.formats.hocr
 
HocrClassBreakIterator(String) - Constructor for class de.digitalcollections.solrocr.formats.hocr.HocrClassBreakIterator
 
HocrClassBreakIterator(Set<String>) - Constructor for class de.digitalcollections.solrocr.formats.hocr.HocrClassBreakIterator
 
HocrFormat - Class in de.digitalcollections.solrocr.formats.hocr
 
HocrFormat() - Constructor for class de.digitalcollections.solrocr.formats.hocr.HocrFormat
 
HocrPassageFormatter - Class in de.digitalcollections.solrocr.formats.hocr
 
HocrPassageFormatter(String, String, boolean) - Constructor for class de.digitalcollections.solrocr.formats.hocr.HocrPassageFormatter
 

I

incrementToken() - Method in class de.digitalcollections.solrocr.lucene.NonAlphaTrimFilterFactory.NonAlphaTrimFilter
 
inform(SolrCore) - Method in class de.digitalcollections.solrocr.solr.HighlightComponent
 
init(PluginInfo) - Method in class de.digitalcollections.solrocr.lucene.fieldloader.PathFieldLoader
 
init(PluginInfo) - Method in class de.digitalcollections.solrocr.solr.HighlightComponent
 
INSTANCE - Static variable in class de.digitalcollections.solrocr.lucene.byteoffset.NoOpByteOffsetStrategy
 
isExternalField(String) - Method in interface de.digitalcollections.solrocr.lucene.fieldloader.ExternalFieldLoader
Check if the field content is located in an external source
isExternalField(String) - Method in class de.digitalcollections.solrocr.lucene.fieldloader.PathFieldLoader
 
isHighlight() - Method in class de.digitalcollections.solrocr.util.OcrBox
 
isWellFormed(byte[]) - Static method in class de.digitalcollections.solrocr.util.Utf8
Returns true if bytes is a well-formed UTF-8 byte sequence according to Unicode 6.0.
isWellFormed(byte[], int, int) - Static method in class de.digitalcollections.solrocr.util.Utf8
Returns whether the given byte array slice is a well-formed UTF-8 byte sequence, as defined by isWellFormed(byte[]).
IterableCharSequence - Interface in de.digitalcollections.solrocr.util
A combination interface of CharSequence and CharacterIterator.
IterableCharSequence.IterableStringCharSequence - Class in de.digitalcollections.solrocr.util
 
IterableCharSequence.OffsetType - Enum in de.digitalcollections.solrocr.util
 

L

last() - Method in class de.digitalcollections.solrocr.formats.hocr.HocrClassBreakIterator
 
last() - Method in class de.digitalcollections.solrocr.util.ContextBreakIterator
 
last() - Method in class de.digitalcollections.solrocr.util.FileBytesCharIterator
 
last() - Method in class de.digitalcollections.solrocr.util.IterableCharSequence.IterableStringCharSequence
 
last() - Method in class de.digitalcollections.solrocr.util.MultiFileBytesCharIterator
 
last() - Method in class de.digitalcollections.solrocr.util.TagBreakIterator
 
length() - Method in class de.digitalcollections.solrocr.util.FileBytesCharIterator
 
length() - Method in class de.digitalcollections.solrocr.util.IterableCharSequence.IterableStringCharSequence
 
length() - Method in class de.digitalcollections.solrocr.util.MultiFileBytesCharIterator
 
LIMIT_BLOCK - Static variable in interface de.digitalcollections.solrocr.solr.OcrHighlightParams
 
loadField(Map<String, String>, String) - Method in interface de.digitalcollections.solrocr.lucene.fieldloader.ExternalFieldLoader
Load the field content from an external source
loadField(Map<String, String>, String) - Method in class de.digitalcollections.solrocr.lucene.fieldloader.PathFieldLoader
 
loadFieldValues(String[], DocIdSetIterator, int) - Method in class de.digitalcollections.solrocr.lucene.OcrHighlighter
 
loadOcrFieldValues(String[], DocIdSetIterator) - Method in class de.digitalcollections.solrocr.lucene.OcrHighlighter
 

M

main(String[]) - Static method in class de.digitalcollections.solrocr.formats.alto.AltoByteOffsetsParser
 
main(String[]) - Static method in class de.digitalcollections.solrocr.formats.hocr.HocrByteOffsetsParser
 
main(String[]) - Static method in class de.digitalcollections.solrocr.formats.mini.MiniOcrByteOffsetsParser
 
mergeBoxes(List<OcrBox>) - Method in class de.digitalcollections.solrocr.formats.OcrPassageFormatter
Merge adjacent OCR boxes into a single one, taking line breaks into account
MiniOcrByteOffsetsParser - Class in de.digitalcollections.solrocr.formats.mini
 
MiniOcrByteOffsetsParser() - Constructor for class de.digitalcollections.solrocr.formats.mini.MiniOcrByteOffsetsParser
 
MiniOcrFormat - Class in de.digitalcollections.solrocr.formats.mini
 
MiniOcrFormat() - Constructor for class de.digitalcollections.solrocr.formats.mini.MiniOcrFormat
 
MiniOcrPassageFormatter - Class in de.digitalcollections.solrocr.formats.mini
 
MiniOcrPassageFormatter(String, String, boolean) - Constructor for class de.digitalcollections.solrocr.formats.mini.MiniOcrPassageFormatter
 
MultiByteOffsetsEnum(List<ByteOffsetsEnum>) - Constructor for class de.digitalcollections.solrocr.lucene.byteoffset.ByteOffsetsEnum.MultiByteOffsetsEnum
 
MultiFileBytesCharIterator - Class in de.digitalcollections.solrocr.util
 
MultiFileBytesCharIterator(List<Path>, Charset) - Constructor for class de.digitalcollections.solrocr.util.MultiFileBytesCharIterator
 
MultiFileBytesCharIterator(MultiFileBytesCharIterator) - Constructor for class de.digitalcollections.solrocr.util.MultiFileBytesCharIterator
 

N

next(int) - Method in class de.digitalcollections.solrocr.formats.hocr.HocrClassBreakIterator
 
next() - Method in class de.digitalcollections.solrocr.formats.hocr.HocrClassBreakIterator
 
next(int) - Method in class de.digitalcollections.solrocr.util.ContextBreakIterator
 
next() - Method in class de.digitalcollections.solrocr.util.ContextBreakIterator
 
next() - Method in class de.digitalcollections.solrocr.util.FileBytesCharIterator
 
next() - Method in class de.digitalcollections.solrocr.util.IterableCharSequence.IterableStringCharSequence
 
next() - Method in class de.digitalcollections.solrocr.util.MultiFileBytesCharIterator
 
next(int) - Method in class de.digitalcollections.solrocr.util.TagBreakIterator
 
next() - Method in class de.digitalcollections.solrocr.util.TagBreakIterator
 
nextPosition() - Method in class de.digitalcollections.solrocr.lucene.byteoffset.ByteOffsetsEnum.MultiByteOffsetsEnum
 
nextPosition() - Method in class de.digitalcollections.solrocr.lucene.byteoffset.ByteOffsetsEnum
 
nextPosition() - Method in class de.digitalcollections.solrocr.lucene.byteoffset.ByteOffsetsEnum.OfPostings
 
NO_WEIGHT_MATCHES_SUPPORT_MSG - Static variable in class de.digitalcollections.solrocr.solr.SolrOcrHighlighter
 
NonAlphaTrimFilter(TokenStream) - Constructor for class de.digitalcollections.solrocr.lucene.NonAlphaTrimFilterFactory.NonAlphaTrimFilter
Construct a token stream filtering the given input.
NonAlphaTrimFilterFactory - Class in de.digitalcollections.solrocr.lucene
This filter trims leading and/or/ trailing non-letter characters from tokens.
NonAlphaTrimFilterFactory(Map<String, String>) - Constructor for class de.digitalcollections.solrocr.lucene.NonAlphaTrimFilterFactory
Initialize this factory via a set of key-value pairs.
NonAlphaTrimFilterFactory.NonAlphaTrimFilter - Class in de.digitalcollections.solrocr.lucene
 
NONE - Static variable in class de.digitalcollections.solrocr.lucene.byteoffset.ByteOffsetPhraseHelper
 
NoOpByteOffsetStrategy - Class in de.digitalcollections.solrocr.lucene.byteoffset
A variant of NoOpOffsetStrategy for byte offsets from payloads
norm(int) - Method in class de.digitalcollections.solrocr.lucene.OcrPassageScorer
If enabled with `hl.score.boostEarly`, normalize the passage start so that earlier starts are given more weight.

O

OcrBlock - Enum in de.digitalcollections.solrocr.formats
 
OcrBox - Class in de.digitalcollections.solrocr.util
 
OcrBox(String, String, float, float, float, float, boolean) - Constructor for class de.digitalcollections.solrocr.util.OcrBox
 
OcrFieldHighlighter - Class in de.digitalcollections.solrocr.lucene
A customization of FieldHighlighter to support lazy-loaded field values and byte offsets from payloads.
OcrFieldHighlighter(String, FieldOffsetStrategy, FieldByteOffsetStrategy, PassageScorer, BreakIterator, OcrPassageFormatter, int, int) - Constructor for class de.digitalcollections.solrocr.lucene.OcrFieldHighlighter
 
OcrFormat - Interface in de.digitalcollections.solrocr.formats
Provides access to format-specific BreakIterator and OcrPassageFormatter instances.
OcrHComponents - Class in de.digitalcollections.solrocr.lucene
Components for the OcrHighlighter, with support for loading byte offsets from payloads.
OcrHComponents(String, Predicate<String>, Query, BytesRef[], PhraseHelper, CharacterRunAutomaton[], Set<UnifiedHighlighter.HighlightFlag>) - Constructor for class de.digitalcollections.solrocr.lucene.OcrHComponents
 
OcrHComponents(String, Predicate<String>, Query, BytesRef[], PhraseHelper, ByteOffsetPhraseHelper, CharacterRunAutomaton[], Set<UnifiedHighlighter.HighlightFlag>) - Constructor for class de.digitalcollections.solrocr.lucene.OcrHComponents
 
OcrHighlighter - Class in de.digitalcollections.solrocr.lucene
A UnifiedHighlighter variant to support lazy-loading field values from arbitrary storage and using byte offsets from term payloads for highlighting instead of character offsets.
OcrHighlighter(IndexSearcher, Analyzer, ExternalFieldLoader, SolrParams) - Constructor for class de.digitalcollections.solrocr.lucene.OcrHighlighter
 
OcrHighlightParams - Interface in de.digitalcollections.solrocr.solr
 
OcrHighlightResult - Class in de.digitalcollections.solrocr.util
 
OcrHighlightResult() - Constructor for class de.digitalcollections.solrocr.util.OcrHighlightResult
 
OcrPassageFormatter - Class in de.digitalcollections.solrocr.formats
Takes care of formatting fragments of the OCR format into OcrSnippet instances.
OcrPassageFormatter(String, String, boolean) - Constructor for class de.digitalcollections.solrocr.formats.OcrPassageFormatter
 
OcrPassageScorer - Class in de.digitalcollections.solrocr.lucene
 
OcrPassageScorer(float, float, float, boolean) - Constructor for class de.digitalcollections.solrocr.lucene.OcrPassageScorer
 
OcrSnippet - Class in de.digitalcollections.solrocr.formats
A structured representation of a highlighted OCR snippet.
OcrSnippet(String, List<OcrBox>) - Constructor for class de.digitalcollections.solrocr.formats.OcrSnippet
Create a new snippet on the given region on the page along with its plaintext.
OfPostings(BytesRef, int, PostingsEnum) - Constructor for class de.digitalcollections.solrocr.lucene.byteoffset.ByteOffsetsEnum.OfPostings
 
OfPostings(BytesRef, PostingsEnum) - Constructor for class de.digitalcollections.solrocr.lucene.byteoffset.ByteOffsetsEnum.OfPostings
 

P

PAGE_ID - Static variable in interface de.digitalcollections.solrocr.solr.OcrHighlightParams
 
parse(byte[], OutputStream) - Static method in class de.digitalcollections.solrocr.formats.alto.AltoByteOffsetsParser
 
parse(byte[], OutputStream) - Static method in class de.digitalcollections.solrocr.formats.hocr.HocrByteOffsetsParser
 
parse(byte[], OutputStream, String, String) - Static method in class de.digitalcollections.solrocr.formats.hocr.HocrByteOffsetsParser
Convert the hOCR document, starting from startPage and ending at, not including endPage.
parse(byte[], int, String, String) - Static method in class de.digitalcollections.solrocr.formats.mini.MiniOcrByteOffsetsParser
 
parse(byte[], OutputStream) - Static method in class de.digitalcollections.solrocr.formats.mini.MiniOcrByteOffsetsParser
 
parse(byte[], OutputStream, String) - Static method in class de.digitalcollections.solrocr.formats.mini.MiniOcrByteOffsetsParser
 
parse(byte[], OutputStream, String, String) - Static method in class de.digitalcollections.solrocr.formats.mini.MiniOcrByteOffsetsParser
 
parseFragment(String, String) - Method in class de.digitalcollections.solrocr.formats.OcrPassageFormatter
Parse an OcrSnippet from an OCR fragment.
parseWords(String, String) - Method in class de.digitalcollections.solrocr.formats.alto.AltoPassageFormatter
 
parseWords(String, String) - Method in class de.digitalcollections.solrocr.formats.hocr.HocrPassageFormatter
 
parseWords(String, String) - Method in class de.digitalcollections.solrocr.formats.mini.MiniOcrPassageFormatter
 
parseWords(String, String) - Method in class de.digitalcollections.solrocr.formats.OcrPassageFormatter
Parse word boxes from an OCR fragment.
PathFieldLoader - Class in de.digitalcollections.solrocr.lucene.fieldloader
Load field values from filesystem paths.
PathFieldLoader() - Constructor for class de.digitalcollections.solrocr.lucene.fieldloader.PathFieldLoader
 
PostingsByteOffsetStrategy(OcrHComponents) - Constructor for class de.digitalcollections.solrocr.lucene.byteoffset.FieldByteOffsetStrategy.PostingsByteOffsetStrategy
 
PostingsWithTermVectorsByteOffsetStrategy(OcrHComponents) - Constructor for class de.digitalcollections.solrocr.lucene.byteoffset.FieldByteOffsetStrategy.PostingsWithTermVectorsByteOffsetStrategy
 
preceding(int) - Method in class de.digitalcollections.solrocr.formats.hocr.HocrClassBreakIterator
 
preceding(int) - Method in class de.digitalcollections.solrocr.util.ContextBreakIterator
 
preceding(int) - Method in class de.digitalcollections.solrocr.util.TagBreakIterator
 
previous() - Method in class de.digitalcollections.solrocr.formats.hocr.HocrClassBreakIterator
 
previous() - Method in class de.digitalcollections.solrocr.util.ContextBreakIterator
 
previous() - Method in class de.digitalcollections.solrocr.util.FileBytesCharIterator
 
previous() - Method in class de.digitalcollections.solrocr.util.IterableCharSequence.IterableStringCharSequence
 
previous() - Method in class de.digitalcollections.solrocr.util.MultiFileBytesCharIterator
 
previous() - Method in class de.digitalcollections.solrocr.util.TagBreakIterator
 
process(ResponseBuilder) - Method in class de.digitalcollections.solrocr.solr.HighlightComponent
 

S

SCORE_BOOST_EARLY - Static variable in interface de.digitalcollections.solrocr.solr.OcrHighlightParams
 
setHighlight(boolean) - Method in class de.digitalcollections.solrocr.util.OcrBox
 
setIndex(int) - Method in class de.digitalcollections.solrocr.util.FileBytesCharIterator
 
setIndex(int) - Method in class de.digitalcollections.solrocr.util.IterableCharSequence.IterableStringCharSequence
 
setIndex(int) - Method in class de.digitalcollections.solrocr.util.MultiFileBytesCharIterator
 
setLrx(float) - Method in class de.digitalcollections.solrocr.util.OcrBox
 
setLry(float) - Method in class de.digitalcollections.solrocr.util.OcrBox
 
setPageId(String) - Method in class de.digitalcollections.solrocr.util.OcrBox
 
setScore(float) - Method in class de.digitalcollections.solrocr.formats.OcrSnippet
Set the score of the passage, compared to all other passages in the document
setText(CharacterIterator) - Method in class de.digitalcollections.solrocr.formats.hocr.HocrClassBreakIterator
 
setText(CharacterIterator) - Method in class de.digitalcollections.solrocr.util.ContextBreakIterator
 
setText(String) - Method in class de.digitalcollections.solrocr.util.OcrBox
 
setText(CharacterIterator) - Method in class de.digitalcollections.solrocr.util.TagBreakIterator
 
setUlx(float) - Method in class de.digitalcollections.solrocr.util.OcrBox
 
setUly(float) - Method in class de.digitalcollections.solrocr.util.OcrBox
 
SolrOcrHighlighter - Class in de.digitalcollections.solrocr.solr
 
SolrOcrHighlighter(ExternalFieldLoader, OcrFormat, List<String>) - Constructor for class de.digitalcollections.solrocr.solr.SolrOcrHighlighter
 
startHlTag - Variable in class de.digitalcollections.solrocr.formats.OcrPassageFormatter
 
stream(Iterator<T>) - Static method in class de.digitalcollections.solrocr.util.Streams
 
Streams - Class in de.digitalcollections.solrocr.util
Stream helpers.
Streams() - Constructor for class de.digitalcollections.solrocr.util.Streams
 
subSequence(int, int) - Method in class de.digitalcollections.solrocr.util.FileBytesCharIterator
 
subSequence(int, int) - Method in class de.digitalcollections.solrocr.util.IterableCharSequence.IterableStringCharSequence
 
subSequence(int, int) - Method in class de.digitalcollections.solrocr.util.MultiFileBytesCharIterator
 

T

TagBreakIterator - Class in de.digitalcollections.solrocr.util
A BreakIterator that splits an XML-like document on a specific opening or closing tag.
TagBreakIterator(String) - Constructor for class de.digitalcollections.solrocr.util.TagBreakIterator
 
TagBreakIterator(String, boolean) - Constructor for class de.digitalcollections.solrocr.util.TagBreakIterator
 
terms(String) - Method in class de.digitalcollections.solrocr.lucene.vendor.TermVectorFilteredLeafReader
 
TermVectorByteOffsetStrategy(OcrHComponents) - Constructor for class de.digitalcollections.solrocr.lucene.byteoffset.FieldByteOffsetStrategy.TermVectorByteOffsetStrategy
 
TermVectorFilteredLeafReader - Class in de.digitalcollections.solrocr.lucene.vendor
A filtered LeafReader that only includes the terms that are also in a provided set of terms.
TermVectorFilteredLeafReader(LeafReader, Terms, String) - Constructor for class de.digitalcollections.solrocr.lucene.vendor.TermVectorFilteredLeafReader
Construct a FilterLeafReader based on the specified base reader.
toNamedList() - Method in class de.digitalcollections.solrocr.formats.OcrSnippet
Convert the snippet to a NamedList that is used by Solr to populate the response.
toNamedList() - Method in class de.digitalcollections.solrocr.util.OcrBox
 
toNamedList() - Method in class de.digitalcollections.solrocr.util.OcrHighlightResult
 
toString() - Method in class de.digitalcollections.solrocr.lucene.byteoffset.ByteOffsetsEnum
 
toString() - Method in class de.digitalcollections.solrocr.util.IterableCharSequence.IterableStringCharSequence
 
toString() - Method in class de.digitalcollections.solrocr.util.OcrBox
 

U

Utf8 - Class in de.digitalcollections.solrocr.util
Low-level, high-performance utility methods related to the UTF-8 character encoding.

V

valueOf(String) - Static method in enum de.digitalcollections.solrocr.formats.OcrBlock
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum de.digitalcollections.solrocr.util.IterableCharSequence.OffsetType
Returns the enum constant of this type with the specified name.
values() - Static method in enum de.digitalcollections.solrocr.formats.OcrBlock
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum de.digitalcollections.solrocr.util.IterableCharSequence.OffsetType
Returns an array containing the constants of this enum type, in the order they are declared.

Z

zip(Stream<? extends A>, Stream<? extends B>, BiFunction<? super A, ? super B, ? extends C>) - Static method in class de.digitalcollections.solrocr.util.Streams
Implementation from https://stackoverflow.com/a/23529010/487903
A B C D E F G H I L M N O P S T U V Z 
Skip navigation links

Copyright © 2019. All rights reserved.