A B C D E F G H I J K L M N O P Q R S T U V W X Z
All Classes All Packages
All Classes All Packages
All Classes All Packages
A
- AbstractListManager - Class in org.apache.tika.parser.microsoft
- AbstractListManager() - Constructor for class org.apache.tika.parser.microsoft.AbstractListManager
- AbstractListManager.LevelTuple - Class in org.apache.tika.parser.microsoft
- AbstractListManager.ParagraphLevelCounter - Class in org.apache.tika.parser.microsoft
- AbstractOfficeParser - Class in org.apache.tika.parser.microsoft
-
Intermediate layer to set
OfficeParserConfiguniformly. - AbstractOfficeParser() - Constructor for class org.apache.tika.parser.microsoft.AbstractOfficeParser
- AbstractOOXMLExtractor - Class in org.apache.tika.parser.microsoft.ooxml
-
Base class for all Tika OOXML extractors.
- AbstractOOXMLExtractor(ParseContext, POIXMLTextExtractor) - Constructor for class org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor
- AbstractXML2003Parser - Class in org.apache.tika.parser.microsoft.xml
- AbstractXML2003Parser() - Constructor for class org.apache.tika.parser.microsoft.xml.AbstractXML2003Parser
- AccessChecker - Class in org.apache.tika.parser.pdf
-
Checks whether or not a document allows extraction generally or extraction for accessibility only.
- AccessChecker() - Constructor for class org.apache.tika.parser.pdf.AccessChecker
-
This constructs an
AccessCheckerthat will not perform any checking and will always return without throwing an exception. - AccessChecker(boolean) - Constructor for class org.apache.tika.parser.pdf.AccessChecker
-
This constructs an
AccessCheckerthat will check for whether or not content should be extracted from a document. - Activator - Class in org.apache.tika.parser.internal
- Activator() - Constructor for class org.apache.tika.parser.internal.Activator
- addAlternative(GeoTag) - Method in class org.apache.tika.parser.geo.topic.GeoTag
- addDrawingHyperLinks(PackagePart) - Method in class org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator
- addEvenIfNull(Property, String, Metadata) - Static method in class org.apache.tika.parser.microsoft.OutlookExtractor
- addMetadata(String) - Method in class org.apache.tika.parser.xml.AttributeDependantMetadataHandler
- addMetadata(String) - Method in class org.apache.tika.parser.xml.ElementMetadataHandler
- addMetadata(String) - Method in class org.apache.tika.parser.xml.MetadataHandler
-
Deprecated.
- addMulti(Metadata, Property, String) - Static method in class org.apache.tika.parser.microsoft.SummaryExtractor
- addOtherTesseractConfig(String, String) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
-
Add a key-value pair to pass to Tesseract using its -c command line option.
- addPersonAndEmail(String, Property, Property, Metadata) - Static method in class org.apache.tika.parser.mail.MailUtil
-
This tries to split a "from" or "to" value into a person field and an email field.
- AdobeFontMetricParser - Class in org.apache.tika.parser.font
-
Parser for AFM Font Files
- AdobeFontMetricParser() - Constructor for class org.apache.tika.parser.font.AdobeFontMetricParser
- ALIGNED_OFFSET - Static variable in class org.apache.tika.parser.chm.core.ChmCommons
- alignedLenTable - Variable in class org.apache.tika.parser.chm.lzx.ChmLzxState
- alignedTreeTable - Variable in class org.apache.tika.parser.chm.lzx.ChmLzxState
- apiBaseUri - Variable in class org.apache.tika.parser.recognition.tf.TensorflowRESTRecogniser
- apiUri - Variable in class org.apache.tika.parser.recognition.tf.TensorflowRESTRecogniser
- AppleSingleFileParser - Class in org.apache.tika.parser.apple
-
Parser that strips the header off of AppleSingle and AppleDouble files.
- AppleSingleFileParser() - Constructor for class org.apache.tika.parser.apple.AppleSingleFileParser
- ARCHITECTURE_BITS - Static variable in interface org.apache.tika.parser.executable.MachineMetadata
- assertByteArrayNotNull(byte[]) - Static method in class org.apache.tika.parser.chm.assertion.ChmAssert
-
Checks if byte[] is not null
- assertByteArrayNotNull(byte[]) - Static method in class org.apache.tika.parser.chm.core.ChmCommons
- assertChmAccessorNotNull(ChmAccessor<?>) - Static method in class org.apache.tika.parser.chm.assertion.ChmAssert
-
Checks if ChmAccessor is not null In case of null throws exception
- assertChmAccessorParameters(byte[], ChmAccessor<?>, int) - Static method in class org.apache.tika.parser.chm.assertion.ChmAssert
-
Checks validity of ChmAccessor parameters
- assertChmBlockSegment(byte[], ChmLzxcResetTable, int, int, int) - Static method in class org.apache.tika.parser.chm.assertion.ChmAssert
-
Checks a validity of the chmBlockSegment parameters
- assertCopyingDataIndex(int, int) - Static method in class org.apache.tika.parser.chm.assertion.ChmAssert
- assertDirectoryListingEntry(int, String, ChmCommons.EntryType, int, int) - Static method in class org.apache.tika.parser.chm.assertion.ChmAssert
-
Checks validity of the DirectoryListingEntry's parameters In case of invalid parameter(s) throws an exception
- assertInputStreamNotNull(InputStream) - Static method in class org.apache.tika.parser.chm.assertion.ChmAssert
-
Checks if InputStream is not null
- assertPositiveInt(int) - Static method in class org.apache.tika.parser.chm.assertion.ChmAssert
-
Checks if int param is greater than zero In case param <= 0 throws an exception
- AttributeDependantMetadataHandler - Class in org.apache.tika.parser.xml
-
This adds a Metadata entry for a given node.
- AttributeDependantMetadataHandler(Metadata, String, String) - Constructor for class org.apache.tika.parser.xml.AttributeDependantMetadataHandler
- AttributeMetadataHandler - Class in org.apache.tika.parser.xml
-
SAX event handler that maps the contents of an XML attribute into a metadata field.
- AttributeMetadataHandler(String, String, Metadata, String) - Constructor for class org.apache.tika.parser.xml.AttributeMetadataHandler
- AttributeMetadataHandler(String, String, Metadata, Property) - Constructor for class org.apache.tika.parser.xml.AttributeMetadataHandler
- AudioFrame - Class in org.apache.tika.parser.mp3
-
An Audio Frame in an MP3 file.
- AudioFrame(int, int, int, int, int, int, float) - Constructor for class org.apache.tika.parser.mp3.AudioFrame
-
Creates a new instance of
AudioFrameand initializes all properties. - AudioFrame(int, int, int, int, InputStream) - Constructor for class org.apache.tika.parser.mp3.AudioFrame
-
Deprecated.Use the constructor which is passed all values directly.
- AudioFrame(InputStream, ContentHandler) - Constructor for class org.apache.tika.parser.mp3.AudioFrame
-
Deprecated.Use the constructor which is passed all values directly.
- AudioParser - Class in org.apache.tika.parser.audio
- AudioParser() - Constructor for class org.apache.tika.parser.audio.AudioParser
- AUTO - org.apache.tika.parser.pdf.PDFParserConfig.OCR_STRATEGY
- available - Variable in class org.apache.tika.parser.recognition.tf.TensorflowRESTRecogniser
- available() - Method in class org.apache.tika.parser.hwp.HwpStreamReader
-
More data to read ?
B
- B - org.apache.tika.parser.microsoft.FormattingUtils.Tag
- BCC - org.apache.tika.parser.microsoft.OutlookExtractor.RECIPIENT_TYPE
- BEGIN - org.apache.tika.parser.ctakes.CTAKESAnnotationProperty
- BIG - Static variable in class org.apache.tika.parser.executable.MachineMetadata.Endian
- BIGENDIAN_16_BIT - org.apache.tika.parser.strings.StringsEncoding
- BIGENDIAN_32_BIT - org.apache.tika.parser.strings.StringsEncoding
- BoilerpipeContentHandler - Class in org.apache.tika.parser.html
-
Uses the boilerpipe library to automatically extract the main content from a web page.
- BoilerpipeContentHandler(Writer) - Constructor for class org.apache.tika.parser.html.BoilerpipeContentHandler
-
Creates a content handler that writes XHTML body character events to the given writer.
- BoilerpipeContentHandler(ContentHandler) - Constructor for class org.apache.tika.parser.html.BoilerpipeContentHandler
-
Creates a new boilerpipe-based content extractor, using the
DefaultExtractorextraction rules and "delegate" as the content handler. - BoilerpipeContentHandler(ContentHandler, BoilerpipeExtractor) - Constructor for class org.apache.tika.parser.html.BoilerpipeContentHandler
-
Creates a new boilerpipe-based content extractor, using the given extraction rules.
- BouncyCastleDigester - Class in org.apache.tika.parser.utils
-
Digester that relies on BouncyCastle for MessageDigest implementations.
- BouncyCastleDigester(int, String) - Constructor for class org.apache.tika.parser.utils.BouncyCastleDigester
-
Include a string representing the comma-separated algorithms to run: e.g.
- BPGParser - Class in org.apache.tika.parser.image
-
Parser for the Better Portable Graphics )BPG) File Format.
- BPGParser() - Constructor for class org.apache.tika.parser.image.BPGParser
- buildParagraphTagAndStyle(String, boolean) - Static method in class org.apache.tika.parser.microsoft.WordExtractor
-
Given a style name, return what tag should be used, and what style should be applied to it.
- buildXHTML(XHTMLContentHandler) - Method in class org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor
-
Populates the
XHTMLContentHandlerobject received as parameter. - buildXHTML(XHTMLContentHandler) - Method in class org.apache.tika.parser.microsoft.ooxml.POIXMLTextExtractorDecorator
- buildXHTML(XHTMLContentHandler) - Method in class org.apache.tika.parser.microsoft.ooxml.SXSLFPowerPointExtractorDecorator
- buildXHTML(XHTMLContentHandler) - Method in class org.apache.tika.parser.microsoft.ooxml.SXWPFWordExtractorDecorator
- buildXHTML(XHTMLContentHandler) - Method in class org.apache.tika.parser.microsoft.ooxml.xps.XPSExtractorDecorator
- buildXHTML(XHTMLContentHandler) - Method in class org.apache.tika.parser.microsoft.ooxml.XSLFPowerPointExtractorDecorator
- buildXHTML(XHTMLContentHandler) - Method in class org.apache.tika.parser.microsoft.ooxml.XSSFBExcelExtractorDecorator
- buildXHTML(XHTMLContentHandler) - Method in class org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator
- buildXHTML(XHTMLContentHandler) - Method in class org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator
- BYTE_ARRAY_LENGHT - Static variable in class org.apache.tika.parser.chm.core.ChmConstants
C
- canRun() - Static method in class org.apache.tika.parser.journal.GrobidRESTParser
- CaptionObject - Class in org.apache.tika.parser.captioning
-
A model for caption objects from graphics and texts typically includes human readable sentence, language of the sentence and confidence score.
- CaptionObject(String, String, double) - Constructor for class org.apache.tika.parser.captioning.CaptionObject
- CC - org.apache.tika.parser.microsoft.OutlookExtractor.RECIPIENT_TYPE
- cell(String, String, XSSFComment) - Method in class org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.SheetTextAsHTML
- Cell - Interface in org.apache.tika.parser.microsoft
-
Cell of content.
- CellDecorator - Class in org.apache.tika.parser.microsoft
-
Cell decorator.
- CellDecorator(Cell) - Constructor for class org.apache.tika.parser.microsoft.CellDecorator
- characters(char[], int, int) - Method in class org.apache.tika.parser.ctakes.CTAKESContentHandler
- characters(char[], int, int) - Method in class org.apache.tika.parser.dif.DIFContentHandler
- characters(char[], int, int) - Method in class org.apache.tika.parser.html.BoilerpipeContentHandler
- characters(char[], int, int) - Method in class org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler
- characters(char[], int, int) - Method in class org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.XSSFSheetInterestingPartsCapturer
- characters(char[], int, int) - Method in class org.apache.tika.parser.xliff.XLIFF12ContentHandler
- characters(char[], int, int) - Method in class org.apache.tika.parser.xml.AttributeDependantMetadataHandler
- characters(char[], int, int) - Method in class org.apache.tika.parser.xml.ElementMetadataHandler
- characters(char[], int, int) - Method in class org.apache.tika.parser.xml.MetadataHandler
-
Deprecated.
- CharsetDetector - Class in org.apache.tika.parser.txt
-
CharsetDetectorprovides a facility for detecting the charset or encoding of character data in an unknown format. - CharsetDetector() - Constructor for class org.apache.tika.parser.txt.CharsetDetector
-
Constructor
- CharsetDetector(int) - Constructor for class org.apache.tika.parser.txt.CharsetDetector
- CharsetMatch - Class in org.apache.tika.parser.txt
-
This class represents a charset that has been identified by a CharsetDetector as a possible encoding for a set of input data.
- check(Metadata) - Method in class org.apache.tika.parser.pdf.AccessChecker
-
Checks to see if a document's content should be extracted based on metadata values and the value of
AccessChecker.allowAccessibilityin the constructor. - checkAvail() - Method in class org.apache.tika.parser.geo.topic.gazetteer.GeoGazetteerClient
-
Ping lucene-geo-gazetteer API
- checkBit(int) - Method in class org.apache.tika.parser.chm.lzx.ChmSection
- checkInitialization(InitializableProblemHandler) - Method in class org.apache.tika.parser.captioning.tf.TensorflowRESTCaptioner
- checkInitialization(InitializableProblemHandler) - Method in class org.apache.tika.parser.jdbc.SQLite3Parser
- checkInitialization(InitializableProblemHandler) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- checkInitialization(InitializableProblemHandler) - Method in class org.apache.tika.parser.pdf.PDFParser
- checkInitialization(InitializableProblemHandler) - Method in class org.apache.tika.parser.recognition.ObjectRecognitionParser
- checkInitialization(InitializableProblemHandler) - Method in class org.apache.tika.parser.recognition.tf.TensorflowImageRecParser
- checkInitialization(InitializableProblemHandler) - Method in class org.apache.tika.parser.recognition.tf.TensorflowRESTRecogniser
- checkInitialization(InitializableProblemHandler) - Method in class org.apache.tika.parser.sentiment.SentimentAnalysisParser
- CHM_ITSF_V2_LEN - Static variable in class org.apache.tika.parser.chm.core.ChmConstants
- CHM_ITSF_V3_LEN - Static variable in class org.apache.tika.parser.chm.core.ChmConstants
- CHM_ITSP_V1_LEN - Static variable in class org.apache.tika.parser.chm.core.ChmConstants
- CHM_LZXC_MIN_LEN - Static variable in class org.apache.tika.parser.chm.core.ChmConstants
- CHM_LZXC_RESETTABLE_V1_LEN - Static variable in class org.apache.tika.parser.chm.core.ChmConstants
- CHM_LZXC_V2_LEN - Static variable in class org.apache.tika.parser.chm.core.ChmConstants
- CHM_PMGI_LEN - Static variable in class org.apache.tika.parser.chm.core.ChmConstants
- CHM_PMGI_MARKER - Static variable in class org.apache.tika.parser.chm.core.ChmConstants
- CHM_PMGL_LEN - Static variable in class org.apache.tika.parser.chm.core.ChmConstants
- CHM_SIGNATURE_LEN - Static variable in class org.apache.tika.parser.chm.core.ChmConstants
- CHM_VER_1 - Static variable in class org.apache.tika.parser.chm.core.ChmConstants
- CHM_VER_2 - Static variable in class org.apache.tika.parser.chm.core.ChmConstants
- CHM_VER_3 - Static variable in class org.apache.tika.parser.chm.core.ChmConstants
- CHM_WINDOW_SIZE_BLOCK - Static variable in class org.apache.tika.parser.chm.core.ChmConstants
- ChmAccessor<T> - Interface in org.apache.tika.parser.chm.accessor
-
Defines an accessor interface
- ChmAssert - Class in org.apache.tika.parser.chm.assertion
-
Contains chm extractor assertions
- ChmAssert() - Constructor for class org.apache.tika.parser.chm.assertion.ChmAssert
- ChmBlockInfo - Class in org.apache.tika.parser.chm.lzx
-
A container that contains chm block information such as: i.
- ChmCommons - Class in org.apache.tika.parser.chm.core
- ChmCommons.EntryType - Enum in org.apache.tika.parser.chm.core
-
Represents entry types: uncompressed, compressed
- ChmCommons.IntelState - Enum in org.apache.tika.parser.chm.core
-
Represents intel file states during decompression
- ChmCommons.LzxState - Enum in org.apache.tika.parser.chm.core
-
Represents lzx states: started decoding, not started decoding
- ChmConstants - Class in org.apache.tika.parser.chm.core
- ChmDirectoryListingSet - Class in org.apache.tika.parser.chm.accessor
-
Holds chm listing entries
- ChmDirectoryListingSet(byte[], ChmItsfHeader, ChmItspHeader) - Constructor for class org.apache.tika.parser.chm.accessor.ChmDirectoryListingSet
-
Constructs chm directory listing set
- ChmExtractor - Class in org.apache.tika.parser.chm.core
-
Extracts text from chm file.
- ChmExtractor(InputStream) - Constructor for class org.apache.tika.parser.chm.core.ChmExtractor
- ChmItsfHeader - Class in org.apache.tika.parser.chm.accessor
-
The Header 0000: char[4] 'ITSF' 0004: DWORD 3 (Version number) 0008: DWORD Total header length, including header section table and following data.
- ChmItsfHeader() - Constructor for class org.apache.tika.parser.chm.accessor.ChmItsfHeader
- ChmItspHeader - Class in org.apache.tika.parser.chm.accessor
-
Directory header The directory starts with a header; its format is as follows: 0000: char[4] 'ITSP' 0004: DWORD Version number 1 0008: DWORD Length of the directory header 000C: DWORD $0a (unknown) 0010: DWORD $1000 Directory chunk size 0014: DWORD "Density" of quickref section, usually 2 0018: DWORD Depth of the index tree - 1 there is no index, 2 if there is one level of PMGI chunks 001C: DWORD Chunk number of root index chunk, -1 if there is none (though at least one file has 0 despite there being no index chunk, probably a bug) 0020: DWORD Chunk number of first PMGL (listing) chunk 0024: DWORD Chunk number of last PMGL (listing) chunk 0028: DWORD -1 (unknown) 002C: DWORD Number of directory chunks (total) 0030: DWORD Windows language ID 0034: GUID {5D02926A-212E-11D0-9DF9-00A0C922E6EC} 0044: DWORD $54 (This is the length again) 0048: DWORD -1 (unknown) 004C: DWORD -1 (unknown) 0050: DWORD -1 (unknown)
- ChmItspHeader() - Constructor for class org.apache.tika.parser.chm.accessor.ChmItspHeader
- ChmLzxBlock - Class in org.apache.tika.parser.chm.lzx
-
Decompresses a chm block.
- ChmLzxBlock(int, byte[], long, ChmLzxBlock) - Constructor for class org.apache.tika.parser.chm.lzx.ChmLzxBlock
- ChmLzxcControlData - Class in org.apache.tika.parser.chm.accessor
-
::DataSpace/Storage/
/ControlData This file contains $20 bytes of information on the compression. - ChmLzxcControlData() - Constructor for class org.apache.tika.parser.chm.accessor.ChmLzxcControlData
- ChmLzxcResetTable - Class in org.apache.tika.parser.chm.accessor
-
LZXC reset table For ensuring a decompression.
- ChmLzxcResetTable() - Constructor for class org.apache.tika.parser.chm.accessor.ChmLzxcResetTable
- ChmLzxState - Class in org.apache.tika.parser.chm.lzx
- ChmLzxState(int) - Constructor for class org.apache.tika.parser.chm.lzx.ChmLzxState
- ChmParser - Class in org.apache.tika.parser.chm
- ChmParser() - Constructor for class org.apache.tika.parser.chm.ChmParser
- ChmParsingException - Exception in org.apache.tika.parser.chm.exception
- ChmParsingException(String) - Constructor for exception org.apache.tika.parser.chm.exception.ChmParsingException
- ChmPmgiHeader - Class in org.apache.tika.parser.chm.accessor
-
Description Note: not always exists An index chunk has the following format: 0000: char[4] 'PMGI' 0004: DWORD Length of quickref/free area at end of directory chunk 0008: Directory index entries (to quickref/free area) The quickref area in an PMGI is the same as in an PMGL The format of a directory index entry is as follows: BYTE: length of name BYTEs: name (UTF-8 encoded) ENCINT: directory listing chunk which starts with name Encoded Integers aka ENCINT An ENCINT is a variable-length integer.
- ChmPmgiHeader() - Constructor for class org.apache.tika.parser.chm.accessor.ChmPmgiHeader
- ChmPmglHeader - Class in org.apache.tika.parser.chm.accessor
-
Description There are two types of directory chunks -- index chunks, and listing chunks.
- ChmPmglHeader() - Constructor for class org.apache.tika.parser.chm.accessor.ChmPmglHeader
- ChmSection - Class in org.apache.tika.parser.chm.lzx
- ChmSection(byte[]) - Constructor for class org.apache.tika.parser.chm.lzx.ChmSection
- ChmSection(byte[], byte[]) - Constructor for class org.apache.tika.parser.chm.lzx.ChmSection
- ChmWrapper - Class in org.apache.tika.parser.chm.core
- ChmWrapper() - Constructor for class org.apache.tika.parser.chm.core.ChmWrapper
- ClassParser - Class in org.apache.tika.parser.asm
-
Parser for Java .class files.
- ClassParser() - Constructor for class org.apache.tika.parser.asm.ClassParser
- clone() - Method in class org.apache.tika.parser.chm.lzx.ChmLzxState
- close() - Method in class org.apache.tika.parser.mp4.DirectFileReadDataSource
- closeStyleTags(XHTMLContentHandler, Deque<FormattingUtils.Tag>) - Static method in class org.apache.tika.parser.microsoft.FormattingUtils
-
Closes all formatting tags.
- CommonsDigester - Class in org.apache.tika.parser.utils
-
Implementation of
DigestingParser.Digesterthat relies on commons.codec.digest.DigestUtils to calculate digest hashes. - CommonsDigester(int, String) - Constructor for class org.apache.tika.parser.utils.CommonsDigester
-
Include a string representing the comma-separated algorithms to run: e.g.
- CommonsDigester(int, CommonsDigester.DigestAlgorithm...) - Constructor for class org.apache.tika.parser.utils.CommonsDigester
-
Deprecated.
- CommonsDigester.DigestAlgorithm - Enum in org.apache.tika.parser.utils
- COMP_OBJ - org.apache.tika.parser.microsoft.OfficeParser.POIFSDocumentType
- COMP_OBJ - Static variable in class org.apache.tika.parser.microsoft.POIFSContainerDetector
-
Some other kind of embedded document, in a CompObj container within another OLE2 document
- compareTo(CSVResult) - Method in class org.apache.tika.parser.csv.CSVResult
-
Sorts in descending order of confidence
- compareTo(CharsetMatch) - Method in class org.apache.tika.parser.txt.CharsetMatch
-
Compare to other CharsetMatch objects.
- CompositeTagHandler - Class in org.apache.tika.parser.mp3
- CompositeTagHandler(ID3Tags[]) - Constructor for class org.apache.tika.parser.mp3.CompositeTagHandler
- COMPRESSED - org.apache.tika.parser.chm.core.ChmCommons.EntryType
- CompressorParser - Class in org.apache.tika.parser.pkg
-
Parser for various compression formats.
- CompressorParser() - Constructor for class org.apache.tika.parser.pkg.CompressorParser
- CompressorParserOptions - Interface in org.apache.tika.parser.pkg
-
Interface for setting options for the
CompressorParserby passing via theParseContext. - CONDITIONAL - org.apache.tika.parser.ctakes.CTAKESAnnotationProperty
- confidence - Variable in class org.apache.tika.parser.recognition.RecognisedObject
-
Confidence score
- CONFIDENCE - org.apache.tika.parser.ctakes.CTAKESAnnotationProperty
- config - Variable in class org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor
- configure(ParseContext) - Method in class org.apache.tika.parser.microsoft.AbstractOfficeParser
-
Checks to see if the user has specified an
OfficeParserConfig. - configure(PDF2XHTML) - Method in class org.apache.tika.parser.pdf.PDFParserConfig
-
Configures the given pdf2XHTML.
- configureExtractor(POIXMLTextExtractor, Locale) - Method in class org.apache.tika.parser.microsoft.ooxml.XSSFBExcelExtractorDecorator
- configureExtractor(POIXMLTextExtractor, Locale) - Method in class org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator
- contains(Charset) - Method in class org.apache.tika.parser.html.charsetdetector.charsets.ReplacementCharset
- contains(Charset) - Method in class org.apache.tika.parser.html.charsetdetector.charsets.XUserDefinedCharset
- containsEmail(String) - Static method in class org.apache.tika.parser.mail.MailUtil
-
If the chunk looks like it contains an email
- CONTENT - Static variable in class org.apache.tika.parser.chm.core.ChmConstants
- CONTROL_DATA - Static variable in class org.apache.tika.parser.chm.core.ChmConstants
- converttoInt(byte[]) - Static method in class org.apache.tika.parser.image.ICNSType
- convertToJSONArray(JSONObject, String) - Method in class org.apache.tika.parser.ner.grobid.GrobidNERecogniser
-
Converts JSON Object to JSON Array
- convertToJSONObject(String) - Method in class org.apache.tika.parser.ner.grobid.GrobidNERecogniser
-
Parses a JSON String and converts it to a JSON Object
- copyOfRange(byte[], int, int) - Static method in class org.apache.tika.parser.chm.core.ChmCommons
- CoreNLPNERecogniser - Class in org.apache.tika.parser.ner.corenlp
-
This class offers an implementation of
NERecogniserbased on CRF classifiers from Stanford CoreNLP. - CoreNLPNERecogniser() - Constructor for class org.apache.tika.parser.ner.corenlp.CoreNLPNERecogniser
- CoreNLPNERecogniser(String) - Constructor for class org.apache.tika.parser.ner.corenlp.CoreNLPNERecogniser
-
Creates a NERecogniser by loading model from given path
- createDecryptStream(InputStream, Key) - Method in class org.apache.tika.parser.hwp.HwpTextExtractorV5
- createFrameIfPresent(InputStream) - Static method in class org.apache.tika.parser.mp3.ID3v2Frame
-
Returns the next ID3v2 Frame in the file, or null if the next batch of data doesn't correspond to either an ID3v2 header.
- CSVParams - Class in org.apache.tika.parser.csv
- CSVResult - Class in org.apache.tika.parser.csv
- CSVResult(double, MediaType, Character) - Constructor for class org.apache.tika.parser.csv.CSVResult
- CTAKES_META_PREFIX - Static variable in class org.apache.tika.parser.ctakes.CTAKESContentHandler
- CTAKESAnnotationProperty - Enum in org.apache.tika.parser.ctakes
-
This enumeration includes the properties that an
IdentifiedAnnotationobject can provide. - CTAKESConfig - Class in org.apache.tika.parser.ctakes
-
Configuration for
CTAKESContentHandler. - CTAKESConfig() - Constructor for class org.apache.tika.parser.ctakes.CTAKESConfig
-
Default constructor.
- CTAKESConfig(InputStream) - Constructor for class org.apache.tika.parser.ctakes.CTAKESConfig
-
Loads properties from InputStream and then tries to close InputStream.
- CTAKESContentHandler - Class in org.apache.tika.parser.ctakes
-
Class used to extract biomedical information while parsing.
- CTAKESContentHandler() - Constructor for class org.apache.tika.parser.ctakes.CTAKESContentHandler
-
Default constructor.
- CTAKESContentHandler(ContentHandler, Metadata) - Constructor for class org.apache.tika.parser.ctakes.CTAKESContentHandler
-
Creates a new
CTAKESContentHandlerfor the givenContentHandlerand Metadata objects. - CTAKESContentHandler(ContentHandler, Metadata, CTAKESConfig) - Constructor for class org.apache.tika.parser.ctakes.CTAKESContentHandler
-
Creates a new
CTAKESContentHandlerfor the givenContentHandlerand Metadata objects. - CTAKESParser - Class in org.apache.tika.parser.ctakes
-
CTAKESParser decorates a
Parserand leverages onCTAKESContentHandlerto extract biomedical information from clinical text using Apache cTAKES. - CTAKESParser() - Constructor for class org.apache.tika.parser.ctakes.CTAKESParser
-
Wraps the default Parser
- CTAKESParser(TikaConfig) - Constructor for class org.apache.tika.parser.ctakes.CTAKESParser
-
Wraps the default Parser for this Config
- CTAKESParser(Parser) - Constructor for class org.apache.tika.parser.ctakes.CTAKESParser
-
Wraps the specified Parser
- CTAKESSerializer - Enum in org.apache.tika.parser.ctakes
-
Enumeration for types of cTAKES (UIMA) CAS serializer supported by cTAKES.
- CTAKESUtils - Class in org.apache.tika.parser.ctakes
-
This class provides methods to extract biomedical information from plain text using
CTAKESContentHandlerthat relies on Apache cTAKES. - CTAKESUtils() - Constructor for class org.apache.tika.parser.ctakes.CTAKESUtils
D
- data - Variable in class org.apache.tika.parser.mp3.ID3v2Frame.RawTag
- DataURIScheme - Class in org.apache.tika.parser.utils
- DataURISchemeParseException - Exception in org.apache.tika.parser.utils
- DataURISchemeParseException(String) - Constructor for exception org.apache.tika.parser.utils.DataURISchemeParseException
- DataURISchemeUtil - Class in org.apache.tika.parser.utils
-
Not thread safe.
- DataURISchemeUtil() - Constructor for class org.apache.tika.parser.utils.DataURISchemeUtil
- DATE - Static variable in interface org.apache.tika.parser.ner.NERecogniser
- DATE_FILE - Static variable in class org.apache.tika.parser.ner.opennlp.OpenNLPNERecogniser
- DBFParser - Class in org.apache.tika.parser.dbf
-
This is a Tika wrapper around the DBFReader.
- DBFParser() - Constructor for class org.apache.tika.parser.dbf.DBFParser
- DcXMLParser - Class in org.apache.tika.parser.xml
-
Dublin Core metadata parser
- DcXMLParser() - Constructor for class org.apache.tika.parser.xml.DcXMLParser
- decompressConcatenated(Metadata) - Method in interface org.apache.tika.parser.pkg.CompressorParserOptions
- DEF_MODEL - Static variable in class org.apache.tika.parser.sentiment.SentimentAnalysisParser
- DEFAULT_CHARSET - Static variable in class org.apache.tika.parser.chm.core.ChmConstants
- DEFAULT_MODEL_PATH - Static variable in class org.apache.tika.parser.ner.corenlp.CoreNLPNERecogniser
-
default Model path
- DEFAULT_MODELS - Static variable in class org.apache.tika.parser.ner.opennlp.OpenNLPNERecogniser
- DEFAULT_NER_IMPL - Static variable in class org.apache.tika.parser.ner.NamedEntityParser
- DefaultHtmlMapper - Class in org.apache.tika.parser.html
-
The default HTML mapping rules in Tika.
- DefaultHtmlMapper() - Constructor for class org.apache.tika.parser.html.DefaultHtmlMapper
- DELETE - org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.EditType
- DELIMITER_PROPERTY - Static variable in class org.apache.tika.parser.csv.TextAndCSVParser
- detect() - Method in class org.apache.tika.parser.txt.CharsetDetector
-
Return the charset that best matches the supplied input data.
- detect(InputStream, Metadata) - Method in class org.apache.tika.parser.html.charsetdetector.StandardHtmlEncodingDetector
- detect(InputStream, Metadata) - Method in class org.apache.tika.parser.html.HtmlEncodingDetector
- detect(InputStream, Metadata) - Method in class org.apache.tika.parser.microsoft.POIFSContainerDetector
- detect(InputStream, Metadata) - Method in class org.apache.tika.parser.pkg.StreamingZipContainerDetector
- detect(InputStream, Metadata) - Method in class org.apache.tika.parser.pkg.ZipContainerDetector
- detect(InputStream, Metadata) - Method in class org.apache.tika.parser.txt.Icu4jEncodingDetector
- detect(InputStream, Metadata) - Method in class org.apache.tika.parser.txt.UniversalEncodingDetector
- detect(Set<String>) - Static method in class org.apache.tika.parser.microsoft.POIFSContainerDetector
-
Deprecated.Use
POIFSContainerDetector.detect(Set, DirectoryEntry)and pass the root entry of the filesystem whose type is to be detected, as a second argument. - detect(Set<String>, DirectoryEntry) - Static method in class org.apache.tika.parser.microsoft.POIFSContainerDetector
-
Internal detection of the specific kind of OLE2 document, based on the names of the top-level streams within the file.
- detect(ZipFile) - Static method in enum org.apache.tika.parser.iwork.iwana.IWork13PackageParser.IWork13DocumentType
- detectAll() - Method in class org.apache.tika.parser.txt.CharsetDetector
-
Return an array of all charsets that appear to be plausible matches with the input data.
- detectIfPossible(ZipEntry) - Static method in enum org.apache.tika.parser.iwork.iwana.IWork13PackageParser.IWork13DocumentType
- detectOfficeOpenXML(OPCPackage) - Static method in class org.apache.tika.parser.pkg.ZipContainerDetector
-
Detects the type of an OfficeOpenXML (OOXML) file from opened Package
- detectType(InputStream) - Static method in enum org.apache.tika.parser.iwork.IWorkPackageParser.IWORKDocumentType
- detectType(ZipArchiveEntry, ZipArchiveInputStream) - Static method in enum org.apache.tika.parser.iwork.IWorkPackageParser.IWORKDocumentType
- detectType(ZipArchiveEntry, ZipFile) - Static method in enum org.apache.tika.parser.iwork.IWorkPackageParser.IWORKDocumentType
- detectType(DirectoryEntry) - Static method in enum org.apache.tika.parser.microsoft.OfficeParser.POIFSDocumentType
- detectType(POIFSFileSystem) - Static method in enum org.apache.tika.parser.microsoft.OfficeParser.POIFSDocumentType
- DIFContentHandler - Class in org.apache.tika.parser.dif
- DIFContentHandler(ContentHandler, Metadata) - Constructor for class org.apache.tika.parser.dif.DIFContentHandler
- DIFParser - Class in org.apache.tika.parser.dif
- DIFParser() - Constructor for class org.apache.tika.parser.dif.DIFParser
- DirectFileReadDataSource - Class in org.apache.tika.parser.mp4
-
A
DataSourceimplementation that relies on direct reads from aRandomAccessFile. - DirectFileReadDataSource(File) - Constructor for class org.apache.tika.parser.mp4.DirectFileReadDataSource
- DirectoryListingEntry - Class in org.apache.tika.parser.chm.accessor
-
The format of a directory listing entry is as follows: BYTE: length of name BYTEs: name (UTF-8 encoded) ENCINT: content section ENCINT: offset ENCINT: length The offset is from the beginning of the content section the file is in, after the section has been decompressed (if appropriate).
- DirectoryListingEntry() - Constructor for class org.apache.tika.parser.chm.accessor.DirectoryListingEntry
- DirectoryListingEntry(int, String, ChmCommons.EntryType, int, int) - Constructor for class org.apache.tika.parser.chm.accessor.DirectoryListingEntry
-
Constructs directoryListingEntry
- DISCOVERY_TECNIQUE - org.apache.tika.parser.ctakes.CTAKESAnnotationProperty
- DOC - Static variable in class org.apache.tika.parser.microsoft.POIFSContainerDetector
-
Microsoft Word
- doubleByte - Variable in class org.apache.tika.parser.mp3.ID3v2Frame.TextEncoding
- DRAW_NS - Static variable in class org.apache.tika.parser.odf.OpenDocumentContentParser
- drawingHyperlinks - Variable in class org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator
- DWGParser - Class in org.apache.tika.parser.dwg
-
DWG (CAD Drawing) parser.
- DWGParser() - Constructor for class org.apache.tika.parser.dwg.DWGParser
E
- ElementMetadataHandler - Class in org.apache.tika.parser.xml
-
SAX event handler that maps the contents of an XML element into a metadata field.
- ElementMetadataHandler(String, String, Metadata, String) - Constructor for class org.apache.tika.parser.xml.ElementMetadataHandler
-
Constructor for string metadata keys.
- ElementMetadataHandler(String, String, Metadata, String, boolean, boolean) - Constructor for class org.apache.tika.parser.xml.ElementMetadataHandler
-
Constructor for string metadata keys which allows change of behavior for duplicate and empty entry values.
- ElementMetadataHandler(String, String, Metadata, Property) - Constructor for class org.apache.tika.parser.xml.ElementMetadataHandler
-
Constructor for Property metadata keys.
- ElementMetadataHandler(String, String, Metadata, Property, boolean, boolean) - Constructor for class org.apache.tika.parser.xml.ElementMetadataHandler
-
Constructor for Property metadata keys which allows change of behavior for duplicate and empty entry values.
- EMBEDDED_RELATIONSHIPS - Static variable in class org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor
- embeddedOLERef(String) - Method in class org.apache.tika.parser.microsoft.ooxml.OOXMLTikaBodyPartHandler
- embeddedOLERef(String) - Method in interface org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler
- embeddedPicRef(String, String) - Method in class org.apache.tika.parser.microsoft.ooxml.OOXMLTikaBodyPartHandler
- embeddedPicRef(String, String) - Method in interface org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler
- EMFParser - Class in org.apache.tika.parser.microsoft
-
Extracts files embedded in EMF and offers a very rough capability to extract text if there is text stored in the EMF.
- EMFParser() - Constructor for class org.apache.tika.parser.microsoft.EMFParser
- EMPTY_LIST - Static variable in class org.apache.tika.parser.microsoft.ooxml.XWPFListManager
-
Empty singleton to be used when there is no list manager.
- EMPTY_STYLES - Static variable in class org.apache.tika.parser.microsoft.ooxml.xwpf.XWPFStylesShim
-
Empty singleton to be used when there is no style info
- enableInputFilter(boolean) - Method in class org.apache.tika.parser.txt.CharsetDetector
-
Enable filtering of input text.
- encoding - Variable in class org.apache.tika.parser.mp3.ID3v2Frame.TextEncoding
- encodings - Static variable in class org.apache.tika.parser.mp3.ID3v2Frame
- ENCRYPTED - org.apache.tika.parser.iwork.IWorkPackageParser.IWORKDocumentType
- ENCRYPTED - org.apache.tika.parser.microsoft.OfficeParser.POIFSDocumentType
- END - org.apache.tika.parser.ctakes.CTAKESAnnotationProperty
- endBookmark(String) - Method in class org.apache.tika.parser.microsoft.ooxml.OOXMLTikaBodyPartHandler
- endBookmark(String) - Method in interface org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler
- endDocument() - Method in class org.apache.tika.parser.ctakes.CTAKESContentHandler
- endDocument() - Method in class org.apache.tika.parser.dif.DIFContentHandler
- endDocument() - Method in class org.apache.tika.parser.html.BoilerpipeContentHandler
- endDocument() - Method in class org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler
- endDocument() - Method in class org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.XSSFSheetInterestingPartsCapturer
- endDocument() - Method in class org.apache.tika.parser.xliff.XLIFF12ContentHandler
- endEditedSection() - Method in class org.apache.tika.parser.microsoft.ooxml.OOXMLTikaBodyPartHandler
- endEditedSection() - Method in interface org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler
- endElement(String, String, String) - Method in class org.apache.tika.parser.dif.DIFContentHandler
- endElement(String, String, String) - Method in class org.apache.tika.parser.html.BoilerpipeContentHandler
- endElement(String, String, String) - Method in class org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler
- endElement(String, String, String) - Method in class org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.XSSFSheetInterestingPartsCapturer
- endElement(String, String, String) - Method in class org.apache.tika.parser.odf.NSNormalizerContentHandler
- endElement(String, String, String) - Method in class org.apache.tika.parser.xliff.XLIFF12ContentHandler
- endElement(String, String, String) - Method in class org.apache.tika.parser.xml.AttributeDependantMetadataHandler
- endElement(String, String, String) - Method in class org.apache.tika.parser.xml.ElementMetadataHandler
- endElement(String, String, String) - Method in class org.apache.tika.parser.xml.MetadataHandler
-
Deprecated.
- ENDIAN - Static variable in interface org.apache.tika.parser.executable.MachineMetadata
- endnoteReference(String) - Method in class org.apache.tika.parser.microsoft.ooxml.OOXMLTikaBodyPartHandler
- endnoteReference(String) - Method in interface org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler
- endParagraph() - Method in class org.apache.tika.parser.microsoft.ooxml.OOXMLTikaBodyPartHandler
- endParagraph() - Method in interface org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler
- endPrefixMapping(String) - Method in class org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler
- endPrefixMapping(String) - Method in class org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.XSSFSheetInterestingPartsCapturer
- endRow(int) - Method in class org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.SheetTextAsHTML
- endSDT() - Method in class org.apache.tika.parser.microsoft.ooxml.OOXMLTikaBodyPartHandler
- endSDT() - Method in interface org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler
- endTable() - Method in class org.apache.tika.parser.microsoft.ooxml.OOXMLTikaBodyPartHandler
- endTable() - Method in interface org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler
- endTableCell() - Method in class org.apache.tika.parser.microsoft.ooxml.OOXMLTikaBodyPartHandler
- endTableCell() - Method in interface org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler
- endTableRow() - Method in class org.apache.tika.parser.microsoft.ooxml.OOXMLTikaBodyPartHandler
- endTableRow() - Method in interface org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler
- ensureFormattingState(XHTMLContentHandler, EnumSet<FormattingUtils.Tag>, Deque<FormattingUtils.Tag>) - Static method in class org.apache.tika.parser.microsoft.FormattingUtils
-
Closes all tags until
currentStatecontains only tags fromdesiredset, then open all required tags to reach desired state. - ensureSkip(long) - Method in class org.apache.tika.parser.hwp.HwpStreamReader
-
ensure skip of n byte
- ENTITY_TYPES - Static variable in class org.apache.tika.parser.ner.corenlp.CoreNLPNERecogniser
- ENTITY_TYPES - Static variable in class org.apache.tika.parser.ner.grobid.GrobidNERecogniser
- ENTITY_TYPES - Static variable in class org.apache.tika.parser.ner.mitie.MITIENERecogniser
- ENTITY_TYPES - Static variable in class org.apache.tika.parser.ner.nltk.NLTKNERecogniser
-
some common entities identified by NLTK
- entityTypes - Variable in class org.apache.tika.parser.ner.regex.RegexNERecogniser
- enumerateChm() - Method in class org.apache.tika.parser.chm.core.ChmExtractor
-
Enumerates chm entities
- ENVI_MIME_TYPE - Static variable in class org.apache.tika.parser.envi.EnviHeaderParser
- EnviHeaderParser - Class in org.apache.tika.parser.envi
- EnviHeaderParser() - Constructor for class org.apache.tika.parser.envi.EnviHeaderParser
- EnviHeaderParser(EncodingDetector) - Constructor for class org.apache.tika.parser.envi.EnviHeaderParser
- EpubContentParser - Class in org.apache.tika.parser.epub
-
Parser for EPUB OPS
*.htmlfiles. - EpubContentParser() - Constructor for class org.apache.tika.parser.epub.EpubContentParser
- EpubParser - Class in org.apache.tika.parser.epub
-
Epub parser
- EpubParser() - Constructor for class org.apache.tika.parser.epub.EpubParser
- equals(Object) - Method in class org.apache.tika.parser.csv.CSVResult
- equals(Object) - Method in class org.apache.tika.parser.pdf.AccessChecker
- equals(Object) - Method in class org.apache.tika.parser.pdf.PDFParserConfig
- equals(Object) - Method in class org.apache.tika.parser.txt.CharsetMatch
-
compare this CharsetMatch to another based on confidence value
- equals(Object) - Method in class org.apache.tika.parser.utils.DataURIScheme
- ExcelExtractor - Class in org.apache.tika.parser.microsoft
-
Excel parser implementation which uses POI's Event API to handle the contents of a Workbook.
- ExcelExtractor(ParseContext, Metadata) - Constructor for class org.apache.tika.parser.microsoft.ExcelExtractor
- ExecutableParser - Class in org.apache.tika.parser.executable
-
Parser for executable files.
- ExecutableParser() - Constructor for class org.apache.tika.parser.executable.ExecutableParser
- EXTENSION_TAG_EXIF - Static variable in class org.apache.tika.parser.image.BPGParser
- EXTENSION_TAG_ICC_PROFILE - Static variable in class org.apache.tika.parser.image.BPGParser
- EXTENSION_TAG_THUMBNAIL - Static variable in class org.apache.tika.parser.image.BPGParser
- EXTENSION_TAG_XMP - Static variable in class org.apache.tika.parser.image.BPGParser
- EXTRA_BITS - Static variable in class org.apache.tika.parser.chm.core.ChmConstants
- extract(InputStream, Metadata, XHTMLContentHandler) - Method in class org.apache.tika.parser.hwp.HwpTextExtractorV5
-
extract Text from HWP Stream.
- extract(String) - Method in class org.apache.tika.parser.utils.DataURISchemeUtil
-
Extracts DataURISchemes from free text, as in javascript.
- extract(Metadata) - Method in class org.apache.tika.parser.microsoft.ooxml.MetadataExtractor
- extractChmEntry(DirectoryListingEntry) - Method in class org.apache.tika.parser.chm.core.ChmExtractor
-
Decompresses a chm entry
- extractDublinCore(XMPMetadata, Metadata) - Static method in class org.apache.tika.parser.image.xmp.JempboxExtractor
-
Tries to extract Dublin Core schema from XMP.
- extractGenre(String) - Static method in class org.apache.tika.parser.mp3.ID3v22Handler
- extractHeaderFooter(String, XHTMLContentHandler) - Method in class org.apache.tika.parser.microsoft.ooxml.XSSFBExcelExtractorDecorator
- extractHeaderFooter(String, XHTMLContentHandler) - Method in class org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator
- extractHyperLinks(PackagePart, XHTMLContentHandler) - Method in class org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator
- extractMacros(POIFSFileSystem, ContentHandler, EmbeddedDocumentExtractor) - Static method in class org.apache.tika.parser.microsoft.OfficeParser
-
Helper to extract macros from an NPOIFS/vbaProject.bin As of POI-3.15-final, there are still some bugs in VBAMacroReader.
- extractor - Variable in class org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor
- extractXMPMM(XMPMetadata, Metadata) - Static method in class org.apache.tika.parser.image.xmp.JempboxExtractor
-
Extracts Media Management metadata from XMP.
F
- FeedParser - Class in org.apache.tika.parser.feed
-
Feed parser.
- FeedParser() - Constructor for class org.apache.tika.parser.feed.FeedParser
- FictionBookParser - Class in org.apache.tika.parser.xml
- FictionBookParser() - Constructor for class org.apache.tika.parser.xml.FictionBookParser
- FileConfig - Class in org.apache.tika.parser.strings
-
Configuration for the "file" (or file-alternative) command.
- FileConfig() - Constructor for class org.apache.tika.parser.strings.FileConfig
-
Default constructor.
- findIconType(byte[]) - Static method in class org.apache.tika.parser.image.ICNSType
- findMatches(String, Pattern) - Method in class org.apache.tika.parser.ner.regex.RegexNERecogniser
-
finds matching sub groups in text
- findNames(String[]) - Method in class org.apache.tika.parser.ner.opennlp.OpenNLPNameFinder
-
finds names from given array of tokens
- flag - Variable in class org.apache.tika.parser.mp3.ID3v2Frame.RawTag
- FLVParser - Class in org.apache.tika.parser.video
-
Parser for metadata contained in Flash Videos (.flv).
- FLVParser() - Constructor for class org.apache.tika.parser.video.FLVParser
- footers - Variable in class org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.SheetTextAsHTML
- footnoteReference(String) - Method in class org.apache.tika.parser.microsoft.ooxml.OOXMLTikaBodyPartHandler
- footnoteReference(String) - Method in interface org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler
- format(Object, StringBuffer, FieldPosition) - Method in class org.apache.tika.parser.microsoft.TikaExcelGeneralFormat
- formatter - Variable in class org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator
- FORMATTING_OBJECTS_NS - Static variable in class org.apache.tika.parser.odf.OpenDocumentContentParser
- FormattingUtils - Class in org.apache.tika.parser.microsoft
- FormattingUtils.Tag - Enum in org.apache.tika.parser.microsoft
G
- GDALParser - Class in org.apache.tika.parser.gdal
-
Wraps execution of the Geospatial Data Abstraction Library (GDAL)
gdalinfotool used to extract geospatial information out of hundreds of geo file formats. - GDALParser() - Constructor for class org.apache.tika.parser.gdal.GDALParser
- GENERAL_EMBEDDED - Static variable in class org.apache.tika.parser.microsoft.POIFSContainerDetector
-
General embedded document type within an OLE2 container
- GENERIC - org.apache.tika.parser.ctakes.CTAKESAnnotationProperty
- GENRES - Static variable in interface org.apache.tika.parser.mp3.ID3Tags
-
List of predefined genres.
- GeoGazetteerClient - Class in org.apache.tika.parser.geo.topic.gazetteer
- GeoGazetteerClient(String) - Constructor for class org.apache.tika.parser.geo.topic.gazetteer.GeoGazetteerClient
-
Pass URL on which lucene-geo-gazetteer is available - eg.
- GeoGazetteerClient(GeoParserConfig) - Constructor for class org.apache.tika.parser.geo.topic.gazetteer.GeoGazetteerClient
- GeographicInformationParser - Class in org.apache.tika.parser.geoinfo
- GeographicInformationParser() - Constructor for class org.apache.tika.parser.geoinfo.GeographicInformationParser
- geoInfoType - Static variable in class org.apache.tika.parser.geoinfo.GeographicInformationParser
- GeoParser - Class in org.apache.tika.parser.geo.topic
- GeoParser() - Constructor for class org.apache.tika.parser.geo.topic.GeoParser
- GeoParserConfig - Class in org.apache.tika.parser.geo.topic
- GeoParserConfig() - Constructor for class org.apache.tika.parser.geo.topic.GeoParserConfig
- GeoTag - Class in org.apache.tika.parser.geo.topic
- GeoTag() - Constructor for class org.apache.tika.parser.geo.topic.GeoTag
- get() - Method in enum org.apache.tika.parser.strings.StringsEncoding
- get7BitsInt(byte[], int) - Static method in class org.apache.tika.parser.mp3.ID3v2Frame
-
AKA a Synchsafe integer.
- getAccessChecker() - Method in class org.apache.tika.parser.pdf.PDFParserConfig
- getAdmin1Code() - Method in class org.apache.tika.parser.geo.topic.gazetteer.Location
- getAdmin2Code() - Method in class org.apache.tika.parser.geo.topic.gazetteer.Location
- getAeDescriptorPath() - Method in class org.apache.tika.parser.ctakes.CTAKESConfig
-
Returns the path to XML descriptor for AnalysisEngine.
- getAlbum() - Method in class org.apache.tika.parser.mp3.CompositeTagHandler
- getAlbum() - Method in interface org.apache.tika.parser.mp3.ID3Tags
- getAlbum() - Method in class org.apache.tika.parser.mp3.ID3v1Handler
- getAlbum() - Method in class org.apache.tika.parser.mp3.ID3v22Handler
- getAlbum() - Method in class org.apache.tika.parser.mp3.ID3v23Handler
- getAlbum() - Method in class org.apache.tika.parser.mp3.ID3v24Handler
- getAlbumArtist() - Method in class org.apache.tika.parser.mp3.CompositeTagHandler
- getAlbumArtist() - Method in interface org.apache.tika.parser.mp3.ID3Tags
-
The Artist for the overall album / compilation of albums
- getAlbumArtist() - Method in class org.apache.tika.parser.mp3.ID3v1Handler
-
ID3v1 doesn't have album-wide artists, so returns null;
- getAlbumArtist() - Method in class org.apache.tika.parser.mp3.ID3v22Handler
- getAlbumArtist() - Method in class org.apache.tika.parser.mp3.ID3v23Handler
- getAlbumArtist() - Method in class org.apache.tika.parser.mp3.ID3v24Handler
- getAlignedLenTable() - Method in class org.apache.tika.parser.chm.lzx.ChmLzxState
- getAlignedTreeTable() - Method in class org.apache.tika.parser.chm.lzx.ChmLzxState
- getAllDetectableCharsets() - Static method in class org.apache.tika.parser.txt.CharsetDetector
-
Get the names of all charsets supported by
CharsetDetectorclass. - getAllNameEntitiesfromInput(InputStream) - Method in class org.apache.tika.parser.geo.topic.NameEntityExtractor
- getAllTagHandlers(InputStream, ContentHandler) - Static method in class org.apache.tika.parser.mp3.Mp3Parser
-
Scans the MP3 frames for ID3 tags, and creates ID3Tag Handlers for each supported set of tags.
- getAnalysisEngine(String, String, String) - Static method in class org.apache.tika.parser.ctakes.CTAKESUtils
-
Returns a new UIMA Analysis Engine (AE).
- getAnnotationProperty(IdentifiedAnnotation, CTAKESAnnotationProperty) - Static method in class org.apache.tika.parser.ctakes.CTAKESUtils
-
Returns the annotation value based on the given annotation type.
- getAnnotationProps() - Method in class org.apache.tika.parser.ctakes.CTAKESConfig
-
Returns an array of
CTAKESAnnotationProperty's that will be included into cTAKES metadata. - getAnnotationPropsAsString() - Method in class org.apache.tika.parser.ctakes.CTAKESConfig
-
Returns a string containing a comma-separated list of
CTAKESAnnotationPropertynames that will be included into cTAKES metadata. - getApiUri(Metadata) - Method in class org.apache.tika.parser.captioning.tf.TensorflowRESTCaptioner
- getApiUri(Metadata) - Method in class org.apache.tika.parser.recognition.tf.TensorflowRESTRecogniser
- getApiUri(Metadata) - Method in class org.apache.tika.parser.recognition.tf.TensorflowRESTVideoRecogniser
- getApplyRotation() - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
- getArtist() - Method in class org.apache.tika.parser.mp3.CompositeTagHandler
- getArtist() - Method in interface org.apache.tika.parser.mp3.ID3Tags
-
The Artist for the track
- getArtist() - Method in class org.apache.tika.parser.mp3.ID3v1Handler
- getArtist() - Method in class org.apache.tika.parser.mp3.ID3v22Handler
- getArtist() - Method in class org.apache.tika.parser.mp3.ID3v23Handler
- getArtist() - Method in class org.apache.tika.parser.mp3.ID3v24Handler
- getAverageCharTolerance() - Method in class org.apache.tika.parser.pdf.PDFParserConfig
- getBestNameEntity() - Method in class org.apache.tika.parser.geo.topic.NameEntityExtractor
- getBigInteger(int) - Method in class org.apache.tika.parser.chm.lzx.ChmSection
- getBitRate() - Method in class org.apache.tika.parser.mp3.AudioFrame
-
Get the bit rate in bit per second.
- getBitsPerPixel() - Method in class org.apache.tika.parser.image.ICNSType
- getBlock_len() - Method in class org.apache.tika.parser.chm.accessor.ChmItspHeader
-
Returns block's length
- getBlockAddress() - Method in class org.apache.tika.parser.chm.accessor.ChmLzxcResetTable
-
Returns block addresses
- getBlockCount() - Method in class org.apache.tika.parser.chm.accessor.ChmLzxcResetTable
-
Gets a block count
- getBlockidx_intvl() - Method in class org.apache.tika.parser.chm.accessor.ChmItspHeader
-
Returns block index interval
- getBlockLen() - Method in class org.apache.tika.parser.chm.accessor.ChmLzxcResetTable
-
Gets a block length
- getBlockLength() - Method in class org.apache.tika.parser.chm.lzx.ChmLzxState
- getBlockNext() - Method in class org.apache.tika.parser.chm.accessor.ChmPmglHeader
- getBlockNumber() - Method in class org.apache.tika.parser.chm.lzx.ChmLzxBlock
- getBlockPrev() - Method in class org.apache.tika.parser.chm.accessor.ChmPmglHeader
- getBlockRemaining() - Method in class org.apache.tika.parser.chm.lzx.ChmLzxState
- getBlockType() - Method in class org.apache.tika.parser.chm.lzx.ChmLzxState
- getByte() - Method in class org.apache.tika.parser.chm.lzx.ChmSection
- getCatchIntermediateIOExceptions() - Method in class org.apache.tika.parser.pdf.PDFParserConfig
- getCenter() - Method in class org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.HeaderFooterFromString
- getChannels() - Method in class org.apache.tika.parser.mp3.AudioFrame
-
Get the number of channels (1=mono, 2=stereo)
- getCharset() - Method in class org.apache.tika.parser.csv.CSVParams
- getChmBlockInfoInstance(DirectoryListingEntry, int, ChmLzxcControlData) - Static method in class org.apache.tika.parser.chm.lzx.ChmBlockInfo
-
Deprecated.
- getChmBlockInfoInstance(DirectoryListingEntry, int, ChmLzxcControlData, ChmBlockInfo) - Static method in class org.apache.tika.parser.chm.lzx.ChmBlockInfo
- getChmBlockSegment(byte[], ChmLzxcResetTable, int, int, int) - Static method in class org.apache.tika.parser.chm.core.ChmCommons
- getChmDirList() - Method in class org.apache.tika.parser.chm.core.ChmExtractor
- getChmDirList() - Method in class org.apache.tika.parser.chm.core.ChmWrapper
- getChmItsfHeader() - Method in class org.apache.tika.parser.chm.core.ChmWrapper
- getChmItspHeader() - Method in class org.apache.tika.parser.chm.core.ChmWrapper
- getChmLzxcControlData() - Method in class org.apache.tika.parser.chm.core.ChmWrapper
- getChmLzxcResetTable() - Method in class org.apache.tika.parser.chm.core.ChmWrapper
- getClassName() - Method in enum org.apache.tika.parser.ctakes.CTAKESSerializer
- getColorspace() - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
- getCommand() - Method in class org.apache.tika.parser.gdal.GDALParser
- getComment(byte[], int, int) - Static method in class org.apache.tika.parser.mp3.ID3v2Frame
-
Builds up the ID3 comment, by parsing and extracting the comment string parts from the given data.
- getComments() - Method in class org.apache.tika.parser.mp3.CompositeTagHandler
- getComments() - Method in interface org.apache.tika.parser.mp3.ID3Tags
-
Retrieves the comments, if any.
- getComments() - Method in class org.apache.tika.parser.mp3.ID3v1Handler
- getComments() - Method in class org.apache.tika.parser.mp3.ID3v22Handler
- getComments() - Method in class org.apache.tika.parser.mp3.ID3v23Handler
- getComments() - Method in class org.apache.tika.parser.mp3.ID3v24Handler
- getCompilation() - Method in class org.apache.tika.parser.mp3.CompositeTagHandler
- getCompilation() - Method in interface org.apache.tika.parser.mp3.ID3Tags
- getCompilation() - Method in class org.apache.tika.parser.mp3.ID3v1Handler
-
ID3v1 doesn't have compilations, so returns null;
- getCompilation() - Method in class org.apache.tika.parser.mp3.ID3v22Handler
-
ID3v22 doesn't have compilations, so returns null;
- getCompilation() - Method in class org.apache.tika.parser.mp3.ID3v23Handler
- getCompilation() - Method in class org.apache.tika.parser.mp3.ID3v24Handler
- getComposer() - Method in class org.apache.tika.parser.mp3.CompositeTagHandler
- getComposer() - Method in interface org.apache.tika.parser.mp3.ID3Tags
- getComposer() - Method in class org.apache.tika.parser.mp3.ID3v1Handler
-
ID3v1 doesn't have composers, so returns null;
- getComposer() - Method in class org.apache.tika.parser.mp3.ID3v22Handler
- getComposer() - Method in class org.apache.tika.parser.mp3.ID3v23Handler
- getComposer() - Method in class org.apache.tika.parser.mp3.ID3v24Handler
- getCompressedLen() - Method in class org.apache.tika.parser.chm.accessor.ChmLzxcResetTable
-
Gets compressed length
- getConcatenatePhoneticRuns() - Method in class org.apache.tika.parser.microsoft.OfficeParserConfig
- getConfidence() - Method in class org.apache.tika.parser.csv.CSVResult
- getConfidence() - Method in class org.apache.tika.parser.recognition.RecognisedObject
- getConfidence() - Method in class org.apache.tika.parser.txt.CharsetMatch
-
Get an indication of the confidence in the charset detected.
- getContent() - Method in class org.apache.tika.parser.chm.lzx.ChmLzxBlock
- getContent(int) - Method in class org.apache.tika.parser.chm.lzx.ChmLzxBlock
- getContent(int, int) - Method in class org.apache.tika.parser.chm.lzx.ChmLzxBlock
- getContentHandler(ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.dif.DIFParser
- getContentHandler(ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.microsoft.xml.AbstractXML2003Parser
- getContentHandler(ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.microsoft.xml.SpreadsheetMLParser
- getContentHandler(ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.microsoft.xml.WordMLParser
- getContentHandler(ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.odf.OpenDocumentMetaParser
- getContentHandler(ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.xml.DcXMLParser
- getContentHandler(ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.xml.FictionBookParser
- getContentHandler(ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.xml.XMLParser
- getContentLength() - Method in class org.apache.tika.parser.chm.lzx.ChmLzxBlock
- getContentParser() - Method in class org.apache.tika.parser.epub.EpubParser
- getContentParser() - Method in class org.apache.tika.parser.odf.OpenDocumentParser
- getControlDataIndex() - Method in class org.apache.tika.parser.chm.accessor.ChmDirectoryListingSet
-
Returns control data index that located in List
- getCoreProperties() - Method in class org.apache.tika.parser.microsoft.ooxml.xps.XPSTextExtractor
- getCoreProperties() - Method in class org.apache.tika.parser.microsoft.ooxml.xslf.XSLFEventBasedPowerPointExtractor
- getCoreProperties() - Method in class org.apache.tika.parser.microsoft.ooxml.xwpf.XWPFEventBasedWordExtractor
- getCountryCode() - Method in class org.apache.tika.parser.geo.topic.gazetteer.Location
- getCustomProperties() - Method in class org.apache.tika.parser.microsoft.ooxml.xps.XPSTextExtractor
- getCustomProperties() - Method in class org.apache.tika.parser.microsoft.ooxml.xslf.XSLFEventBasedPowerPointExtractor
- getCustomProperties() - Method in class org.apache.tika.parser.microsoft.ooxml.xwpf.XWPFEventBasedWordExtractor
- getData() - Method in class org.apache.tika.parser.chm.core.ChmWrapper
- getData() - Method in class org.apache.tika.parser.chm.lzx.ChmSection
- getData() - Method in class org.apache.tika.parser.mp3.ID3v2Frame
- getDataOffset() - Method in class org.apache.tika.parser.chm.accessor.ChmDirectoryListingSet
-
Returns data offset
- getDataOffset() - Method in class org.apache.tika.parser.chm.accessor.ChmItsfHeader
-
Returns data offset
- getDecorationName() - Method in class org.apache.tika.parser.ctakes.CTAKESParser
- getDefaultConfig() - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- getDelimiter() - Method in class org.apache.tika.parser.csv.CSVParams
- getDelimiter() - Method in class org.apache.tika.parser.csv.CSVResult
- getDensity() - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
- getDepth() - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
- getDescription() - Method in class org.apache.tika.parser.mp3.ID3Tags.ID3Comment
-
Gets the description, if present
- getDetectableCharsets() - Method in class org.apache.tika.parser.txt.CharsetDetector
-
Deprecated.This API is ICU internal only.
- getDetectAngles() - Method in class org.apache.tika.parser.pdf.PDFParserConfig
- getDir_uuid() - Method in class org.apache.tika.parser.chm.accessor.ChmItsfHeader
-
Returns directory uuid
- getDirectoryListingEntryList() - Method in class org.apache.tika.parser.chm.accessor.ChmDirectoryListingSet
-
Returns chm directory listing entry list
- getDirLen() - Method in class org.apache.tika.parser.chm.accessor.ChmItsfHeader
-
Returns directory length
- getDirOffset() - Method in class org.apache.tika.parser.chm.accessor.ChmItsfHeader
-
Returns directory offset
- getDisc() - Method in class org.apache.tika.parser.mp3.CompositeTagHandler
- getDisc() - Method in interface org.apache.tika.parser.mp3.ID3Tags
-
The number of the disc this belongs to, within the set
- getDisc() - Method in class org.apache.tika.parser.mp3.ID3v1Handler
-
ID3v1 doesn't have disc numbers, so returns null;
- getDisc() - Method in class org.apache.tika.parser.mp3.ID3v22Handler
- getDisc() - Method in class org.apache.tika.parser.mp3.ID3v23Handler
- getDisc() - Method in class org.apache.tika.parser.mp3.ID3v24Handler
- getDocument() - Method in class org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor
- getDocument() - Method in interface org.apache.tika.parser.microsoft.ooxml.OOXMLExtractor
-
Returns the opened document.
- getDocument() - Method in class org.apache.tika.parser.microsoft.ooxml.xps.XPSExtractorDecorator
- getDuration() - Method in class org.apache.tika.parser.mp3.AudioFrame
-
Returns the duration in milliseconds.
- getEnableAutoSpace() - Method in class org.apache.tika.parser.pdf.PDFParser
-
Deprecated.
- getEnableAutoSpace() - Method in class org.apache.tika.parser.pdf.PDFParserConfig
- getEncint() - Method in class org.apache.tika.parser.chm.lzx.ChmSection
- getEncoding() - Method in class org.apache.tika.parser.strings.StringsConfig
-
Returns the character encoding of the strings that are to be found.
- getEndBlock() - Method in class org.apache.tika.parser.chm.lzx.ChmBlockInfo
-
Returns the end block index
- getEndOffset() - Method in class org.apache.tika.parser.chm.lzx.ChmBlockInfo
-
Returns the end offset index
- getEntityTypes() - Method in class org.apache.tika.parser.ner.corenlp.CoreNLPNERecogniser
-
Gets set of entity types recognised by this recogniser
- getEntityTypes() - Method in class org.apache.tika.parser.ner.grobid.GrobidNERecogniser
-
Gets set of entity types recognised by this recogniser
- getEntityTypes() - Method in class org.apache.tika.parser.ner.mitie.MITIENERecogniser
-
Gets set of entity types recognised by this recogniser
- getEntityTypes() - Method in interface org.apache.tika.parser.ner.NERecogniser
-
gets a set of entity types whose names are recognisable by this
- getEntityTypes() - Method in class org.apache.tika.parser.ner.nltk.NLTKNERecogniser
-
Gets set of entity types recognised by this recogniser
- getEntityTypes() - Method in class org.apache.tika.parser.ner.opennlp.OpenNLPNameFinder
- getEntityTypes() - Method in class org.apache.tika.parser.ner.opennlp.OpenNLPNERecogniser
- getEntityTypes() - Method in class org.apache.tika.parser.ner.regex.RegexNERecogniser
- getEntryType() - Method in class org.apache.tika.parser.chm.accessor.DirectoryListingEntry
-
Returns ChmCommons.EntryType (COMPRESSED or UNCOMPRESSED)
- getExtendedHeader() - Method in class org.apache.tika.parser.mp3.ID3v2Frame
- getExtendedProperties() - Method in class org.apache.tika.parser.microsoft.ooxml.xps.XPSTextExtractor
- getExtendedProperties() - Method in class org.apache.tika.parser.microsoft.ooxml.xslf.XSLFEventBasedPowerPointExtractor
- getExtendedProperties() - Method in class org.apache.tika.parser.microsoft.ooxml.xwpf.XWPFEventBasedWordExtractor
- getExtension() - Method in enum org.apache.tika.parser.microsoft.OfficeParser.POIFSDocumentType
- getExtractAcroFormContent() - Method in class org.apache.tika.parser.pdf.PDFParserConfig
- getExtractActions() - Method in class org.apache.tika.parser.pdf.PDFParserConfig
- getExtractAllAlternativesFromMSG() - Method in class org.apache.tika.parser.microsoft.AbstractOfficeParser
- getExtractAllAlternativesFromMSG() - Method in class org.apache.tika.parser.microsoft.OfficeParserConfig
- getExtractAnnotationText() - Method in class org.apache.tika.parser.pdf.PDFParser
-
Deprecated.
- getExtractAnnotationText() - Method in class org.apache.tika.parser.pdf.PDFParserConfig
- getExtractBookmarksText() - Method in class org.apache.tika.parser.pdf.PDFParserConfig
- getExtractFontNames() - Method in class org.apache.tika.parser.pdf.PDFParserConfig
- getExtractInlineImages() - Method in class org.apache.tika.parser.pdf.PDFParserConfig
- getExtractMacros() - Method in class org.apache.tika.parser.microsoft.AbstractOfficeParser
- getExtractMacros() - Method in class org.apache.tika.parser.microsoft.OfficeParserConfig
- getExtractScripts() - Method in class org.apache.tika.parser.html.HtmlParser
- getExtractUniqueInlineImagesOnly() - Method in class org.apache.tika.parser.pdf.PDFParserConfig
- getFilePath() - Method in class org.apache.tika.parser.strings.FileConfig
-
Returns the "file" installation folder.
- getFileProg() - Static method in class org.apache.tika.parser.strings.StringsParser
- getFilter() - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
- getFlags() - Method in class org.apache.tika.parser.mp3.ID3v2Frame
- getFormattedNumber(BigInteger, int) - Method in class org.apache.tika.parser.microsoft.ooxml.XWPFListManager
- getFormattedNumber(Paragraph) - Method in class org.apache.tika.parser.microsoft.ListManager
-
Get the formatted number for a given paragraph
- getFormattedNumber(XWPFParagraph) - Method in class org.apache.tika.parser.microsoft.ooxml.XWPFListManager
- getFramesRead() - Method in class org.apache.tika.parser.chm.lzx.ChmLzxState
- getFreeSpace() - Method in class org.apache.tika.parser.chm.accessor.ChmPmgiHeader
-
Returns pmgi free space
- getFreeSpace() - Method in class org.apache.tika.parser.chm.accessor.ChmPmglHeader
- getGazetteerRestEndpoint() - Method in class org.apache.tika.parser.geo.topic.GeoParserConfig
- getGenre() - Method in class org.apache.tika.parser.mp3.CompositeTagHandler
- getGenre() - Method in interface org.apache.tika.parser.mp3.ID3Tags
- getGenre() - Method in class org.apache.tika.parser.mp3.ID3v1Handler
- getGenre() - Method in class org.apache.tika.parser.mp3.ID3v22Handler
- getGenre() - Method in class org.apache.tika.parser.mp3.ID3v23Handler
- getGenre() - Method in class org.apache.tika.parser.mp3.ID3v24Handler
- getHadStarted() - Method in class org.apache.tika.parser.chm.lzx.ChmLzxState
- getHeader_len() - Method in class org.apache.tika.parser.chm.accessor.ChmItspHeader
-
Returns header length
- getHeaderLen() - Method in class org.apache.tika.parser.chm.accessor.ChmItsfHeader
-
Returns itsf header length
- getHeight() - Method in class org.apache.tika.parser.image.ICNSType
- getId() - Method in class org.apache.tika.parser.recognition.RecognisedObject
- getIfXFAExtractOnlyXFA() - Method in class org.apache.tika.parser.pdf.PDFParserConfig
- getIlvl() - Method in class org.apache.tika.parser.microsoft.ooxml.ParagraphProperties
- getImageMagickPath() - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
- getIncludeDeletedContent() - Method in class org.apache.tika.parser.microsoft.AbstractOfficeParser
- getIncludeDeletedContent() - Method in class org.apache.tika.parser.microsoft.OfficeParserConfig
- getIncludeDeletedText() - Method in class org.apache.tika.parser.microsoft.ooxml.OOXMLTikaBodyPartHandler
- getIncludeDeletedText() - Method in interface org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler
- getIncludeHeadersAndFooters() - Method in class org.apache.tika.parser.microsoft.OfficeParserConfig
- getIncludeMissingRows() - Method in class org.apache.tika.parser.microsoft.OfficeParserConfig
- getIncludeMoveFromContent() - Method in class org.apache.tika.parser.microsoft.AbstractOfficeParser
- getIncludeMoveFromContent() - Method in class org.apache.tika.parser.microsoft.OfficeParserConfig
- getIncludeMoveFromText() - Method in class org.apache.tika.parser.microsoft.ooxml.OOXMLTikaBodyPartHandler
- getIncludeMoveFromText() - Method in interface org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler
- getIncludeShapeBasedContent() - Method in class org.apache.tika.parser.microsoft.OfficeParserConfig
- getIncludeSlideMasterContent() - Method in class org.apache.tika.parser.microsoft.OfficeParserConfig
- getIncludeSlideNotes() - Method in class org.apache.tika.parser.microsoft.OfficeParserConfig
- getIndex_depth() - Method in class org.apache.tika.parser.chm.accessor.ChmItspHeader
-
Returns an index depth
- getIndex_head() - Method in class org.apache.tika.parser.chm.accessor.ChmItspHeader
-
Returns an index head
- getIndex_root() - Method in class org.apache.tika.parser.chm.accessor.ChmItspHeader
-
Returns index root
- getIndexOfContent() - Method in class org.apache.tika.parser.chm.core.ChmWrapper
- getIndexOfResetData() - Method in class org.apache.tika.parser.chm.core.ChmWrapper
- getIndexOfResetTable() - Method in class org.apache.tika.parser.chm.core.ChmWrapper
- getIniBlock() - Method in class org.apache.tika.parser.chm.lzx.ChmBlockInfo
-
Returns an initial block index
- getInputStream() - Method in class org.apache.tika.parser.utils.DataURIScheme
- getInstance() - Static method in class org.apache.tika.parser.ner.regex.RegexNERecogniser
- getInt(byte[]) - Static method in class org.apache.tika.parser.mp3.ID3v2Frame
- getInt(byte[], int) - Static method in class org.apache.tika.parser.mp3.ID3v2Frame
- getInt2(byte[], int) - Static method in class org.apache.tika.parser.mp3.ID3v2Frame
- getInt3(byte[], int) - Static method in class org.apache.tika.parser.mp3.ID3v2Frame
- getIntelCurrentPossition() - Method in class org.apache.tika.parser.chm.lzx.ChmLzxState
- getIntelFileSize() - Method in class org.apache.tika.parser.chm.lzx.ChmLzxState
- getIntelState() - Method in class org.apache.tika.parser.chm.lzx.ChmLzxState
- getJCas(AnalysisEngine) - Static method in class org.apache.tika.parser.ctakes.CTAKESUtils
-
Returns a new JCas () appropriate for the given Analysis Engine.
- getJustFileName(String) - Method in class org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor
- getLabel() - Method in class org.apache.tika.parser.recognition.RecognisedObject
- getLabelLang() - Method in class org.apache.tika.parser.recognition.RecognisedObject
- getLang_id() - Method in class org.apache.tika.parser.chm.accessor.ChmItspHeader
-
Returns language id
- getLangId() - Method in class org.apache.tika.parser.chm.accessor.ChmItsfHeader
-
Returns language ID
- getLanguage() - Method in class org.apache.tika.parser.mp3.ID3Tags.ID3Comment
-
Gets the language, if present
- getLanguage() - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
- getLanguage() - Method in class org.apache.tika.parser.txt.CharsetMatch
-
Get the ISO code for the language of the detected charset.
- getLanguage(long) - Static method in class org.apache.tika.parser.chm.core.ChmCommons
-
Returns textual representation of LangID
- getLastModified() - Method in class org.apache.tika.parser.chm.accessor.ChmItsfHeader
-
Returns last modified date of the chm file
- getLatitude() - Method in class org.apache.tika.parser.geo.topic.gazetteer.Location
- getLayer() - Method in class org.apache.tika.parser.mp3.AudioFrame
-
Get the audio layer code.
- getLeft() - Method in class org.apache.tika.parser.chm.lzx.ChmSection
- getLeft() - Method in class org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.HeaderFooterFromString
- getLength() - Method in class org.apache.tika.parser.chm.accessor.DirectoryListingEntry
- getLength() - Method in class org.apache.tika.parser.mp3.AudioFrame
-
Returns the frame length in bytes.
- getLength() - Method in class org.apache.tika.parser.mp3.ID3v2Frame
- getLengthTreeLengtsTable() - Method in class org.apache.tika.parser.chm.lzx.ChmLzxState
- getLengthTreeTable() - Method in class org.apache.tika.parser.chm.lzx.ChmLzxState
- getLocations(List<String>) - Method in class org.apache.tika.parser.geo.topic.gazetteer.GeoGazetteerClient
-
Calls API of lucene-geo-gazetteer to search location name in gazetteer.
- getLongitude() - Method in class org.apache.tika.parser.geo.topic.gazetteer.Location
- getLzxBlockLength() - Method in class org.apache.tika.parser.chm.core.ChmWrapper
- getLzxBlockOffset() - Method in class org.apache.tika.parser.chm.core.ChmWrapper
- getLzxBlocksCache() - Method in class org.apache.tika.parser.chm.core.ChmWrapper
- getMainDocumentParts() - Method in class org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor
-
Return a list of the main parts of the document, used when searching for embedded resources.
- getMainDocumentParts() - Method in class org.apache.tika.parser.microsoft.ooxml.POIXMLTextExtractorDecorator
- getMainDocumentParts() - Method in class org.apache.tika.parser.microsoft.ooxml.SXSLFPowerPointExtractorDecorator
-
In PowerPoint files, slides have things embedded in them, and slide drawings which have the images
- getMainDocumentParts() - Method in class org.apache.tika.parser.microsoft.ooxml.SXWPFWordExtractorDecorator
-
This returns all items that might contain embedded objects: main document, headers, footers, comments, etc.
- getMainDocumentParts() - Method in class org.apache.tika.parser.microsoft.ooxml.xps.XPSExtractorDecorator
- getMainDocumentParts() - Method in class org.apache.tika.parser.microsoft.ooxml.XSLFPowerPointExtractorDecorator
-
In PowerPoint files, slides have things embedded in them, and slide drawings which have the images
- getMainDocumentParts() - Method in class org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator
-
In Excel files, sheets have things embedded in them, and sheet drawings which have the images
- getMainDocumentParts() - Method in class org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator
-
Include main body and anything else that can have an attachment/embedded object
- getMainTreeElements() - Method in class org.apache.tika.parser.chm.lzx.ChmLzxState
- getMainTreeLengtsTable() - Method in class org.apache.tika.parser.chm.lzx.ChmLzxState
- getMainTreeTable() - Method in class org.apache.tika.parser.chm.lzx.ChmLzxState
- getMajorVersion() - Method in class org.apache.tika.parser.mp3.ID3v2Frame
- getMarkLimit() - Method in class org.apache.tika.parser.html.charsetdetector.StandardHtmlEncodingDetector
- getMarkLimit() - Method in class org.apache.tika.parser.html.HtmlEncodingDetector
- getMarkLimit() - Method in class org.apache.tika.parser.txt.Icu4jEncodingDetector
- getMarkLimit() - Method in class org.apache.tika.parser.txt.UniversalEncodingDetector
- getMaxBytesForEmbeddedObject() - Static method in class org.apache.tika.parser.rtf.RTFParser
-
Deprecated.
- getMaxFileSizeToOcr() - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
- getMaxMainMemoryBytes() - Method in class org.apache.tika.parser.pdf.PDFParserConfig
-
The maximum amount of memory to use when loading a pdf into a PDDocument.
- getMaxXMPMMHistory() - Static method in class org.apache.tika.parser.image.xmp.JempboxExtractor
- getMediaType() - Method in class org.apache.tika.parser.csv.CSVParams
- getMediaType() - Method in class org.apache.tika.parser.csv.CSVResult
- getMediaType() - Method in class org.apache.tika.parser.utils.DataURIScheme
- getMessageClass(String) - Static method in class org.apache.tika.parser.microsoft.OutlookExtractor
- getMetadata() - Method in class org.apache.tika.parser.ctakes.CTAKESConfig
-
Returns an array of metadata whose values will be analyzed using cTAKES.
- getMetadata() - Method in class org.apache.tika.parser.ctakes.CTAKESContentHandler
-
Returns metadata that includes cTAKES annotations.
- getMetadataAsString() - Method in class org.apache.tika.parser.ctakes.CTAKESConfig
-
Returns a string containing a comma-separated list of metadata whose values will be analyzed using cTAKES.
- getMetadataExtractor() - Method in class org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor
- getMetadataExtractor() - Method in interface org.apache.tika.parser.microsoft.ooxml.OOXMLExtractor
-
POIXMLTextExtractor.getMetadataTextExtractor()not yet supported for OOXML by POI. - getMetaParser() - Method in class org.apache.tika.parser.epub.EpubParser
- getMetaParser() - Method in class org.apache.tika.parser.odf.OpenDocumentParser
- getMinFileSizeToOcr() - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
- getMinLength() - Method in class org.apache.tika.parser.strings.StringsConfig
-
Returns the minimum sequence length (characters) to print.
- getMinorVersion() - Method in class org.apache.tika.parser.mp3.ID3v2Frame
- getMinSize() - Method in class org.apache.tika.parser.strings.Latin1StringsParser
-
Returns the minimum size of a character sequence to be extracted.
- getMSB() - Method in class org.apache.tika.parser.executable.MachineMetadata.Endian
- getName() - Method in class org.apache.tika.parser.chm.accessor.DirectoryListingEntry
-
Returns an entry name
- getName() - Method in enum org.apache.tika.parser.ctakes.CTAKESAnnotationProperty
- getName() - Method in class org.apache.tika.parser.executable.MachineMetadata.Endian
- getName() - Method in class org.apache.tika.parser.geo.topic.gazetteer.Location
- getName() - Method in class org.apache.tika.parser.txt.CharsetMatch
-
Get the name of the detected charset.
- getNameLength() - Method in class org.apache.tika.parser.chm.accessor.DirectoryListingEntry
-
Returns an entry name length
- getNamespace() - Method in enum org.apache.tika.parser.iwork.IWorkPackageParser.IWORKDocumentType
- getNerModelUrl() - Method in class org.apache.tika.parser.geo.topic.GeoParserConfig
- getNum_blocks() - Method in class org.apache.tika.parser.chm.accessor.ChmItspHeader
-
Returns number of blocks
- getNumberOfLevels() - Method in class org.apache.tika.parser.microsoft.AbstractListManager.ParagraphLevelCounter
- getNumId() - Method in class org.apache.tika.parser.microsoft.ooxml.ParagraphProperties
- getOcrDPI() - Method in class org.apache.tika.parser.pdf.PDFParserConfig
-
Dots per inch used to render the page image for OCR
- getOcrImageFormatName() - Method in class org.apache.tika.parser.pdf.PDFParserConfig
-
String representation of the image format used to render the page image for OCR (examples: png, tiff, jpeg)
- getOcrImageQuality() - Method in class org.apache.tika.parser.pdf.PDFParserConfig
-
Image quality used to render the page image for OCR.
- getOcrImageScale() - Method in class org.apache.tika.parser.pdf.PDFParserConfig
-
Deprecated.as of Tika 1.23, this is no longer used in rendering page images; use
PDFParserConfig.setOcrDPI(int) - getOcrImageType() - Method in class org.apache.tika.parser.pdf.PDFParserConfig
-
Image type used to render the page image for OCR.
- getOcrStrategy() - Method in class org.apache.tika.parser.pdf.PDFParserConfig
- getOffset() - Method in class org.apache.tika.parser.chm.accessor.DirectoryListingEntry
- getOtherTesseractConfig() - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
- getOutputStream() - Method in class org.apache.tika.parser.ctakes.CTAKESConfig
-
Returns an
OutputStreamobject used write the CAS. - getOutputType() - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
- getPackage() - Method in class org.apache.tika.parser.microsoft.ooxml.xps.XPSTextExtractor
- getPackage() - Method in class org.apache.tika.parser.microsoft.ooxml.xslf.XSLFEventBasedPowerPointExtractor
- getPackage() - Method in class org.apache.tika.parser.microsoft.ooxml.xwpf.XWPFEventBasedWordExtractor
- getPageSegMode() - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
- getPageSeparator() - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
- getPart() - Method in enum org.apache.tika.parser.iwork.IWorkPackageParser.IWORKDocumentType
- getPDFParserConfig() - Method in class org.apache.tika.parser.pdf.PDFParser
- getPreserveInterwordSpacing() - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
- getPrevContent() - Method in class org.apache.tika.parser.chm.lzx.ChmSection
- getR0() - Method in class org.apache.tika.parser.chm.lzx.ChmLzxState
- getR1() - Method in class org.apache.tika.parser.chm.lzx.ChmLzxState
- getR2() - Method in class org.apache.tika.parser.chm.lzx.ChmLzxState
- getReader() - Method in class org.apache.tika.parser.txt.CharsetMatch
-
Create a java.io.Reader for reading the Unicode character data corresponding to the original byte data supplied to the Charset detect operation.
- getReader(InputStream, String) - Method in class org.apache.tika.parser.txt.CharsetDetector
-
Autodetect the charset of an inputStream, and return a Java Reader to access the converted input data.
- getResetInterval() - Method in class org.apache.tika.parser.chm.accessor.ChmLzxcControlData
-
Returns reset interval
- getResetTableIndex() - Method in class org.apache.tika.parser.chm.accessor.ChmDirectoryListingSet
-
Return index of reset table
- getResize() - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
- getRight() - Method in class org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.HeaderFooterFromString
- getSampleRate() - Method in class org.apache.tika.parser.mp3.AudioFrame
-
Get the sampling rate, in Hz
- getSeparatorChar() - Method in class org.apache.tika.parser.ctakes.CTAKESConfig
-
Returns the separator character used for annotation properties.
- getSerializerType() - Method in class org.apache.tika.parser.ctakes.CTAKESConfig
-
Returns the type of cTAKES (UIMA) serializer used to write the CAS.
- getSetKCMS() - Method in class org.apache.tika.parser.pdf.PDFParserConfig
- getSignature() - Method in class org.apache.tika.parser.chm.accessor.ChmItsfHeader
-
Returns a signature of itsf header
- getSignature() - Method in class org.apache.tika.parser.chm.accessor.ChmItspHeader
-
Returns a signature of the header
- getSignature() - Method in class org.apache.tika.parser.chm.accessor.ChmLzxcControlData
-
Returns a signature of control data block
- getSignature() - Method in class org.apache.tika.parser.chm.accessor.ChmPmgiHeader
-
Returns pmgi signature if exists
- getSignature() - Method in class org.apache.tika.parser.chm.accessor.ChmPmglHeader
- getSize() - Method in class org.apache.tika.parser.chm.accessor.ChmLzxcControlData
-
Returns a size of control data
- getSize() - Method in class org.apache.tika.parser.mp3.ID3v2Frame.RawTag
- getSortByPosition() - Method in class org.apache.tika.parser.pdf.PDFParser
-
Deprecated.
- getSortByPosition() - Method in class org.apache.tika.parser.pdf.PDFParserConfig
- getSpacingTolerance() - Method in class org.apache.tika.parser.pdf.PDFParserConfig
- getStartBlock() - Method in class org.apache.tika.parser.chm.lzx.ChmBlockInfo
-
Returns the start block index
- getStartIndex() - Method in class org.apache.tika.parser.chm.core.ChmWrapper
- getStartOffset() - Method in class org.apache.tika.parser.chm.lzx.ChmBlockInfo
-
Returns the start offset index
- getState() - Method in class org.apache.tika.parser.chm.lzx.ChmLzxBlock
- getStream_uuid() - Method in class org.apache.tika.parser.chm.accessor.ChmItsfHeader
-
Returns stream uuid
- getString() - Method in class org.apache.tika.parser.txt.CharsetMatch
-
Create a Java String from Unicode character data corresponding to the original byte data supplied to the Charset detect operation.
- getString(byte[], int, int) - Static method in class org.apache.tika.parser.mp3.ID3v2Frame
-
Returns the String at the given offset and length.
- getString(byte[], String) - Method in class org.apache.tika.parser.txt.CharsetDetector
-
Autodetect the charset of an inputStream, and return a String containing the converted input data.
- getString(int) - Method in class org.apache.tika.parser.txt.CharsetMatch
-
Create a Java String from Unicode character data corresponding to the original byte data supplied to the Charset detect operation.
- getStringsPath() - Method in class org.apache.tika.parser.strings.StringsConfig
-
Returns the "strings" installation folder.
- getStringsProg() - Static method in class org.apache.tika.parser.strings.StringsParser
- getStripMarkup() - Method in class org.apache.tika.parser.txt.Icu4jEncodingDetector
- getStyleClass() - Method in class org.apache.tika.parser.microsoft.WordExtractor.TagAndStyle
- getStyleID() - Method in class org.apache.tika.parser.microsoft.ooxml.ParagraphProperties
- getStyleName(String) - Method in class org.apache.tika.parser.microsoft.ooxml.xwpf.XWPFStylesShim
- getSuffix(InputStream, int) - Static method in class org.apache.tika.parser.mp3.LyricsHandler
-
Reads and returns the last
lengthbytes from the given stream. - getSupportedMimes() - Method in class org.apache.tika.parser.captioning.tf.TensorflowRESTCaptioner
- getSupportedMimes() - Method in interface org.apache.tika.parser.recognition.ObjectRecogniser
-
The mimes supported by this recogniser
- getSupportedMimes() - Method in class org.apache.tika.parser.recognition.tf.TensorflowImageRecParser
- getSupportedMimes() - Method in class org.apache.tika.parser.recognition.tf.TensorflowRESTRecogniser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.apple.AppleSingleFileParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.asm.ClassParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.audio.AudioParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.audio.MidiParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.chm.ChmParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.code.SourceCodeParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.crypto.Pkcs7Parser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.crypto.TSDParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.csv.TextAndCSVParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.dbf.DBFParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.dif.DIFParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.dwg.DWGParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.envi.EnviHeaderParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.epub.EpubContentParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.epub.EpubParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.executable.ExecutableParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.feed.FeedParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.font.AdobeFontMetricParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.font.TrueTypeParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.gdal.GDALParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.geo.topic.GeoParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.geoinfo.GeographicInformationParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.grib.GribParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.hdf.HDFParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.html.HtmlParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.hwp.HwpV5Parser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.image.BPGParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.image.ICNSParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.image.ImageParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.image.PSDParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.image.TiffParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.image.WebPParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.iptc.IptcAnpaParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.isatab.ISArchiveParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.iwork.iwana.IWork13PackageParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.iwork.IWorkPackageParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.jdbc.SQLite3Parser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.journal.JournalParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.jpeg.JpegParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.mail.RFC822Parser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.mat.MatParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.mbox.MboxParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.mbox.OutlookPSTParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.microsoft.EMFParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.microsoft.JackcessParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.microsoft.MSOwnerFileParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.microsoft.OfficeParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.microsoft.OldExcelParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.microsoft.ooxml.OOXMLParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.microsoft.ooxml.xwpf.ml2006.Word2006MLParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.microsoft.TNEFParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.microsoft.WMFParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.microsoft.xml.SpreadsheetMLParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.microsoft.xml.WordMLParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.mp3.Mp3Parser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.mp4.MP4Parser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.ner.NamedEntityParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.netcdf.NetCDFParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.odf.OpenDocumentContentParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.odf.OpenDocumentParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.pdf.PDFParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.pkg.CompressorParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.pkg.PackageParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.pkg.RarParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.pot.PooledTimeSeriesParser
-
Returns the set of media types supported by this parser when used with the given parse context.
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.prt.PRTParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.recognition.ObjectRecognitionParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.rtf.RTFParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.sas.SAS7BDATParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.sentiment.SentimentAnalysisParser
-
Returns the types supported
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.strings.Latin1StringsParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.strings.StringsParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.txt.TXTParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.video.FLVParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.wordperfect.QuattroProParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.wordperfect.WordPerfectParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.xliff.XLIFF12Parser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.xliff.XLZParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.xml.FictionBookParser
- getSupportedTypes(ParseContext) - Method in class org.apache.tika.parser.xml.XMLParser
- getSuppressDuplicateOverlappingText() - Method in class org.apache.tika.parser.pdf.PDFParser
-
Deprecated.
- getSuppressDuplicateOverlappingText() - Method in class org.apache.tika.parser.pdf.PDFParserConfig
- getSwath() - Method in class org.apache.tika.parser.chm.lzx.ChmSection
- getSyncBits(int) - Method in class org.apache.tika.parser.chm.lzx.ChmSection
- getSystem_uuid() - Method in class org.apache.tika.parser.chm.accessor.ChmItspHeader
-
Returns system uuid
- getTableOffset() - Method in class org.apache.tika.parser.chm.accessor.ChmLzxcResetTable
-
Gets a table offset
- getTag() - Method in class org.apache.tika.parser.microsoft.WordExtractor.TagAndStyle
- getTagsPresent() - Method in class org.apache.tika.parser.mp3.CompositeTagHandler
- getTagsPresent() - Method in interface org.apache.tika.parser.mp3.ID3Tags
-
Does the file contain this kind of tags?
- getTagsPresent() - Method in class org.apache.tika.parser.mp3.ID3v1Handler
- getTagsPresent() - Method in class org.apache.tika.parser.mp3.ID3v22Handler
- getTagsPresent() - Method in class org.apache.tika.parser.mp3.ID3v23Handler
- getTagsPresent() - Method in class org.apache.tika.parser.mp3.ID3v24Handler
- getTagString(byte[], int, int) - Static method in class org.apache.tika.parser.mp3.ID3v2Frame
-
Returns the (possibly null padded) String at the given offset and length.
- getTessdataPath() - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
- getTesseractPath() - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
- getText() - Method in class org.apache.tika.parser.microsoft.ooxml.xps.XPSTextExtractor
- getText() - Method in class org.apache.tika.parser.microsoft.ooxml.xslf.XSLFEventBasedPowerPointExtractor
- getText() - Method in class org.apache.tika.parser.microsoft.ooxml.xwpf.XWPFEventBasedWordExtractor
- getText() - Method in class org.apache.tika.parser.mp3.ID3Tags.ID3Comment
-
Gets the text, if present
- getTextDocument() - Method in class org.apache.tika.parser.html.BoilerpipeContentHandler
-
Retrieves the built TextDocument
- getTimeout() - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
- getTimeout() - Method in class org.apache.tika.parser.strings.StringsConfig
-
Returns the maximum time (in seconds) to wait for the "strings" command to terminate.
- getTitle() - Method in class org.apache.tika.parser.mp3.CompositeTagHandler
- getTitle() - Method in interface org.apache.tika.parser.mp3.ID3Tags
- getTitle() - Method in class org.apache.tika.parser.mp3.ID3v1Handler
- getTitle() - Method in class org.apache.tika.parser.mp3.ID3v22Handler
- getTitle() - Method in class org.apache.tika.parser.mp3.ID3v23Handler
- getTitle() - Method in class org.apache.tika.parser.mp3.ID3v24Handler
- getTotal() - Method in class org.apache.tika.parser.chm.lzx.ChmSection
- getTrackingMetadata() - Method in class org.apache.tika.parser.mbox.MboxParser
- getTrackNumber() - Method in class org.apache.tika.parser.mp3.CompositeTagHandler
- getTrackNumber() - Method in interface org.apache.tika.parser.mp3.ID3Tags
-
The number of the track within the album / recording
- getTrackNumber() - Method in class org.apache.tika.parser.mp3.ID3v1Handler
- getTrackNumber() - Method in class org.apache.tika.parser.mp3.ID3v22Handler
- getTrackNumber() - Method in class org.apache.tika.parser.mp3.ID3v23Handler
- getTrackNumber() - Method in class org.apache.tika.parser.mp3.ID3v24Handler
- getType() - Method in class org.apache.tika.parser.image.ICNSType
- getType() - Method in enum org.apache.tika.parser.iwork.iwana.IWork13PackageParser.IWork13DocumentType
- getType() - Method in enum org.apache.tika.parser.iwork.IWorkPackageParser.IWORKDocumentType
- getType() - Method in enum org.apache.tika.parser.microsoft.OfficeParser.POIFSDocumentType
- getTypeFromVal(int) - Static method in enum org.apache.tika.parser.microsoft.OutlookExtractor.RECIPIENT_TYPE
- getUMLSPass() - Method in class org.apache.tika.parser.ctakes.CTAKESConfig
-
Returns the UMLS password.
- getUMLSUser() - Method in class org.apache.tika.parser.ctakes.CTAKESConfig
-
Returns the UMLS username.
- getUncompressedLen() - Method in class org.apache.tika.parser.chm.accessor.ChmLzxcResetTable
-
Gets uncompressed length
- getUnderline() - Method in class org.apache.tika.parser.microsoft.ooxml.RunProperties
- getUnknown() - Method in class org.apache.tika.parser.chm.accessor.ChmLzxcResetTable
-
Gets unknown
- getUnknown_000c() - Method in class org.apache.tika.parser.chm.accessor.ChmItsfHeader
-
Returns unknown_00c value
- getUnknown_000c() - Method in class org.apache.tika.parser.chm.accessor.ChmItspHeader
-
Returns 000c unknown bytes
- getUnknown_0024() - Method in class org.apache.tika.parser.chm.accessor.ChmItspHeader
-
Returns 0024 unknown bytes
- getUnknown_002c() - Method in class org.apache.tika.parser.chm.accessor.ChmItspHeader
-
Returns 002c unknown bytes
- getUnknown_0044() - Method in class org.apache.tika.parser.chm.accessor.ChmItspHeader
-
Returns 0044 unknown bytes
- getUnknown_18() - Method in class org.apache.tika.parser.chm.accessor.ChmLzxcControlData
-
Returns unknown 18 bytes
- getUnknown0008() - Method in class org.apache.tika.parser.chm.accessor.ChmPmglHeader
- getUnknownLen() - Method in class org.apache.tika.parser.chm.accessor.ChmItsfHeader
-
Returns unknown length
- getUnknownOffset() - Method in class org.apache.tika.parser.chm.accessor.ChmItsfHeader
-
Returns unknown offset
- getUseSAXDocxExtractor() - Method in class org.apache.tika.parser.microsoft.AbstractOfficeParser
- getUseSAXDocxExtractor() - Method in class org.apache.tika.parser.microsoft.OfficeParserConfig
- getUseSAXPptxExtractor() - Method in class org.apache.tika.parser.microsoft.OfficeParserConfig
- getVersion() - Method in class org.apache.tika.parser.chm.accessor.ChmItsfHeader
-
Returns itsf header version
- getVersion() - Method in class org.apache.tika.parser.chm.accessor.ChmItspHeader
-
Returns version of itsp header
- getVersion() - Method in class org.apache.tika.parser.chm.accessor.ChmLzxcControlData
-
Returns a version of control data block
- getVersion() - Method in class org.apache.tika.parser.chm.accessor.ChmLzxcResetTable
-
Returns the version
- getVersion() - Method in class org.apache.tika.parser.mp3.AudioFrame
- getVersionCode() - Method in class org.apache.tika.parser.mp3.AudioFrame
-
Get the version code.
- getWidth() - Method in class org.apache.tika.parser.image.ICNSType
- getWindow() - Method in class org.apache.tika.parser.chm.lzx.ChmLzxState
- getWindowPosition() - Method in class org.apache.tika.parser.chm.lzx.ChmLzxState
- getWindowSize() - Method in class org.apache.tika.parser.chm.accessor.ChmLzxcControlData
-
Returns a window size
- getWindowSize() - Method in class org.apache.tika.parser.chm.lzx.ChmLzxState
- getWindowSize(int) - Static method in class org.apache.tika.parser.chm.core.ChmCommons
-
LZX supports window sizes of 2^15 (32Kb) through 2^21 (2Mb) Returns X, i.e 2^X
- getWindowsPerReset() - Method in class org.apache.tika.parser.chm.accessor.ChmLzxcControlData
-
Returns windows per reset
- getXHTML(ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor
- getXHTML(ContentHandler, Metadata, ParseContext) - Method in interface org.apache.tika.parser.microsoft.ooxml.OOXMLExtractor
-
Parses the document into a sequence of XHTML SAX events sent to the given content handler.
- getXHTML(ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.microsoft.ooxml.XSSFBExcelExtractorDecorator
- getXHTML(ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator
- getYear() - Method in class org.apache.tika.parser.mp3.CompositeTagHandler
- getYear() - Method in interface org.apache.tika.parser.mp3.ID3Tags
- getYear() - Method in class org.apache.tika.parser.mp3.ID3v1Handler
- getYear() - Method in class org.apache.tika.parser.mp3.ID3v22Handler
- getYear() - Method in class org.apache.tika.parser.mp3.ID3v23Handler
- getYear() - Method in class org.apache.tika.parser.mp3.ID3v24Handler
- GRAPH - org.apache.tika.parser.microsoft.OfficeParser.POIFSDocumentType
- GRIB_MIME_TYPE - Static variable in class org.apache.tika.parser.grib.GribParser
- GribParser - Class in org.apache.tika.parser.grib
- GribParser() - Constructor for class org.apache.tika.parser.grib.GribParser
- GrobidNERecogniser - Class in org.apache.tika.parser.ner.grobid
- GrobidNERecogniser() - Constructor for class org.apache.tika.parser.ner.grobid.GrobidNERecogniser
- GrobidRESTParser - Class in org.apache.tika.parser.journal
- GrobidRESTParser() - Constructor for class org.apache.tika.parser.journal.GrobidRESTParser
H
- handle(Metadata) - Method in class org.apache.tika.parser.image.ImageMetadataExtractor
-
Copies extracted tags to tika metadata using registered handlers.
- handle(Iterator<Directory>) - Method in class org.apache.tika.parser.image.ImageMetadataExtractor
-
Copies extracted tags to tika metadata using registered handlers.
- handleEmbeddedFile(PackagePart, ContentHandler, String) - Method in class org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor
-
Handles an embedded file in the document
- handleEntryMetadata(String, Date, Date, Long, XHTMLContentHandler) - Static method in class org.apache.tika.parser.pkg.PackageParser
- handleXMP(InputStream, int, ImageMetadataExtractor) - Method in class org.apache.tika.parser.image.BPGParser
- hashCode() - Method in class org.apache.tika.parser.csv.CSVResult
- hashCode() - Method in class org.apache.tika.parser.pdf.AccessChecker
- hashCode() - Method in class org.apache.tika.parser.pdf.PDFParserConfig
- hashCode() - Method in class org.apache.tika.parser.txt.CharsetMatch
-
generates a hashCode based on the confidence value
- hashCode() - Method in class org.apache.tika.parser.utils.DataURIScheme
- hasID3v1() - Method in class org.apache.tika.parser.mp3.LyricsHandler
- hasLyrics() - Method in class org.apache.tika.parser.mp3.LyricsHandler
- hasMask() - Method in class org.apache.tika.parser.image.ICNSType
- hasNext() - Method in class org.apache.tika.parser.mp3.ID3v2Frame.RawTagIterator
- hasRetinaDisplay() - Method in class org.apache.tika.parser.image.ICNSType
- hasSkip(DirectoryListingEntry) - Static method in class org.apache.tika.parser.chm.core.ChmCommons
-
Checks skippable patterns
- hasTesseract(TesseractOCRConfig) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- hasWarned() - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- HDFParser - Class in org.apache.tika.parser.hdf
-
Since the
NetCDFParserdepends on the NetCDF-Java API, we are able to use it to parse HDF files as well. - HDFParser() - Constructor for class org.apache.tika.parser.hdf.HDFParser
- headerFooter(String, boolean, String) - Method in class org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.SheetTextAsHTML
- HeaderFooterFromString(String) - Constructor for class org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.HeaderFooterFromString
- headers - Variable in class org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.SheetTextAsHTML
- healthUri - Variable in class org.apache.tika.parser.recognition.tf.TensorflowRESTRecogniser
- hfHelper - Static variable in class org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator
-
Allows access to headers/footers from raw xml strings
- HISTORY_OF - org.apache.tika.parser.ctakes.CTAKESAnnotationProperty
- HOCR - org.apache.tika.parser.ocr.TesseractOCRConfig.OUTPUT_TYPE
- HSLFExtractor - Class in org.apache.tika.parser.microsoft
- HSLFExtractor(ParseContext, Metadata) - Constructor for class org.apache.tika.parser.microsoft.HSLFExtractor
- HtmlEncodingDetector - Class in org.apache.tika.parser.html
-
Character encoding detector for determining the character encoding of a HTML document based on the potential charset parameter found in a Content-Type http-equiv meta tag somewhere near the beginning.
- HtmlEncodingDetector() - Constructor for class org.apache.tika.parser.html.HtmlEncodingDetector
- HtmlMapper - Interface in org.apache.tika.parser.html
-
HTML mapper used to make incoming HTML documents easier to handle by Tika clients.
- HtmlParser - Class in org.apache.tika.parser.html
-
HTML parser.
- HtmlParser() - Constructor for class org.apache.tika.parser.html.HtmlParser
- HtmlParser(EncodingDetector) - Constructor for class org.apache.tika.parser.html.HtmlParser
- HWP - Static variable in class org.apache.tika.parser.microsoft.POIFSContainerDetector
-
Hangul Word Processor (Korean)
- HWP_MIME_TYPE - Static variable in class org.apache.tika.parser.hwp.HwpV5Parser
- HwpStreamReader - Class in org.apache.tika.parser.hwp
- HwpStreamReader(InputStream) - Constructor for class org.apache.tika.parser.hwp.HwpStreamReader
- HwpTextExtractorV5 - Class in org.apache.tika.parser.hwp
- HwpTextExtractorV5() - Constructor for class org.apache.tika.parser.hwp.HwpTextExtractorV5
- HwpV5Parser - Class in org.apache.tika.parser.hwp
- HwpV5Parser() - Constructor for class org.apache.tika.parser.hwp.HwpV5Parser
- hyperlinkEnd() - Method in class org.apache.tika.parser.microsoft.ooxml.OOXMLTikaBodyPartHandler
- hyperlinkEnd() - Method in interface org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler
- hyperlinkStart(String) - Method in class org.apache.tika.parser.microsoft.ooxml.OOXMLTikaBodyPartHandler
- hyperlinkStart(String) - Method in interface org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler
I
- I - org.apache.tika.parser.microsoft.FormattingUtils.Tag
- ICNS_1024x1024_2X_JPEG_PNG_IMAGE - Static variable in class org.apache.tika.parser.image.ICNSType
- ICNS_128x128_24BIT_IMAGE - Static variable in class org.apache.tika.parser.image.ICNSType
- ICNS_128x128_2X_JPEG_PNG_IMAGE - Static variable in class org.apache.tika.parser.image.ICNSType
- ICNS_128x128_8BIT_MASK - Static variable in class org.apache.tika.parser.image.ICNSType
- ICNS_128x128_JPEG_PNG_IMAGE - Static variable in class org.apache.tika.parser.image.ICNSType
- ICNS_16x12_1BIT_IMAGE_AND_MASK - Static variable in class org.apache.tika.parser.image.ICNSType
- ICNS_16x12_4BIT_IMAGE - Static variable in class org.apache.tika.parser.image.ICNSType
- ICNS_16x12_8BIT_IMAGE - Static variable in class org.apache.tika.parser.image.ICNSType
- ICNS_16x16_1BIT_IMAGE_AND_MASK - Static variable in class org.apache.tika.parser.image.ICNSType
- ICNS_16x16_24BIT_IMAGE - Static variable in class org.apache.tika.parser.image.ICNSType
- ICNS_16x16_2X_JPEG_PNG_IMAGE - Static variable in class org.apache.tika.parser.image.ICNSType
- ICNS_16x16_4BIT_IMAGE - Static variable in class org.apache.tika.parser.image.ICNSType
- ICNS_16x16_8BIT_IMAGE - Static variable in class org.apache.tika.parser.image.ICNSType
- ICNS_16x16_8BIT_MASK - Static variable in class org.apache.tika.parser.image.ICNSType
- ICNS_16x16_JPEG_PNG_IMAGE - Static variable in class org.apache.tika.parser.image.ICNSType
- ICNS_256x256_2X_JPEG_PNG_IMAGE - Static variable in class org.apache.tika.parser.image.ICNSType
- ICNS_256x256_JPEG_PNG_IMAGE - Static variable in class org.apache.tika.parser.image.ICNSType
- ICNS_32x32_1BIT_IMAGE - Static variable in class org.apache.tika.parser.image.ICNSType
- ICNS_32x32_1BIT_IMAGE_AND_MASK - Static variable in class org.apache.tika.parser.image.ICNSType
- ICNS_32x32_24BIT_IMAGE - Static variable in class org.apache.tika.parser.image.ICNSType
- ICNS_32x32_2X_JPEG_PNG_IMAGE - Static variable in class org.apache.tika.parser.image.ICNSType
- ICNS_32x32_4BIT_IMAGE - Static variable in class org.apache.tika.parser.image.ICNSType
- ICNS_32x32_8BIT_IMAGE - Static variable in class org.apache.tika.parser.image.ICNSType
- ICNS_32x32_8BIT_MASK - Static variable in class org.apache.tika.parser.image.ICNSType
- ICNS_32x32_JPEG_PNG_IMAGE - Static variable in class org.apache.tika.parser.image.ICNSType
- ICNS_48x48_1BIT_IMAGE_AND_MASK - Static variable in class org.apache.tika.parser.image.ICNSType
- ICNS_48x48_24BIT_IMAGE - Static variable in class org.apache.tika.parser.image.ICNSType
- ICNS_48x48_4BIT_IMAGE - Static variable in class org.apache.tika.parser.image.ICNSType
- ICNS_48x48_8BIT_IMAGE - Static variable in class org.apache.tika.parser.image.ICNSType
- ICNS_48x48_8BIT_MASK - Static variable in class org.apache.tika.parser.image.ICNSType
- ICNS_512x512_JPEG_PNG_IMAGE - Static variable in class org.apache.tika.parser.image.ICNSType
- ICNS_64x64_JPEG_PNG_IMAGE - Static variable in class org.apache.tika.parser.image.ICNSType
- ICNS_MIME_TYPE - Static variable in class org.apache.tika.parser.image.ICNSParser
- ICNSParser - Class in org.apache.tika.parser.image
-
A basic parser class for Apple ICNS icon files
- ICNSParser() - Constructor for class org.apache.tika.parser.image.ICNSParser
- ICNSType - Class in org.apache.tika.parser.image
-
Holds details on Apple ICNS icons
- Icu4jEncodingDetector - Class in org.apache.tika.parser.txt
- Icu4jEncodingDetector() - Constructor for class org.apache.tika.parser.txt.Icu4jEncodingDetector
- id - Variable in class org.apache.tika.parser.recognition.RecognisedObject
-
Identifier for this object
- id - Variable in class org.apache.tika.parser.rtf.ListDescriptor
- ID - org.apache.tika.parser.ctakes.CTAKESAnnotationProperty
- ID3Comment(String) - Constructor for class org.apache.tika.parser.mp3.ID3Tags.ID3Comment
-
Creates an ID3 v1 style comment tag
- ID3Comment(String, String, String) - Constructor for class org.apache.tika.parser.mp3.ID3Tags.ID3Comment
-
Creates an ID3 v2 style comment tag
- ID3Tags - Interface in org.apache.tika.parser.mp3
-
Interface that defines the common interface for ID3 tag parsers, such as ID3v1 and ID3v2.3.
- ID3Tags.ID3Comment - Class in org.apache.tika.parser.mp3
-
Represents a comments in ID3 (especially ID3 v2), where are made up of several parts
- ID3TagsAndAudio() - Constructor for class org.apache.tika.parser.mp3.Mp3Parser.ID3TagsAndAudio
- ID3v1Handler - Class in org.apache.tika.parser.mp3
-
This is used to parse ID3 Version 1 Tag information from an MP3 file, if available.
- ID3v1Handler(byte[]) - Constructor for class org.apache.tika.parser.mp3.ID3v1Handler
-
Creates from the last 128 bytes of a stream.
- ID3v1Handler(InputStream, ContentHandler) - Constructor for class org.apache.tika.parser.mp3.ID3v1Handler
- ID3v22Handler - Class in org.apache.tika.parser.mp3
-
This is used to parse ID3 Version 2.2 Tag information from an MP3 file, if available.
- ID3v22Handler(ID3v2Frame) - Constructor for class org.apache.tika.parser.mp3.ID3v22Handler
- ID3v23Handler - Class in org.apache.tika.parser.mp3
-
This is used to parse ID3 Version 2.3 Tag information from an MP3 file, if available.
- ID3v23Handler(ID3v2Frame) - Constructor for class org.apache.tika.parser.mp3.ID3v23Handler
- ID3v24Handler - Class in org.apache.tika.parser.mp3
-
This is used to parse ID3 Version 2.4 Tag information from an MP3 file, if available.
- ID3v24Handler(ID3v2Frame) - Constructor for class org.apache.tika.parser.mp3.ID3v24Handler
- ID3v2Frame - Class in org.apache.tika.parser.mp3
-
A frame of ID3v2 data, which is then passed to a handler to be turned into useful data.
- ID3v2Frame.RawTag - Class in org.apache.tika.parser.mp3
- ID3v2Frame.RawTagIterator - Class in org.apache.tika.parser.mp3
-
Iterates over id3v2 raw tags.
- ID3v2Frame.TextEncoding - Class in org.apache.tika.parser.mp3
- IdentityHtmlMapper - Class in org.apache.tika.parser.html
-
Alternative HTML mapping rules that pass the input HTML as-is without any modifications.
- IdentityHtmlMapper() - Constructor for class org.apache.tika.parser.html.IdentityHtmlMapper
- ignorableWhitespace(char[], int, int) - Method in class org.apache.tika.parser.dif.DIFContentHandler
- ignorableWhitespace(char[], int, int) - Method in class org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler
- ignorableWhitespace(char[], int, int) - Method in class org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.XSSFSheetInterestingPartsCapturer
- ignorableWhitespace(char[], int, int) - Method in class org.apache.tika.parser.xml.ElementMetadataHandler
- ImageMetadataExtractor - Class in org.apache.tika.parser.image
-
Uses the Metadata Extractor library to read EXIF and IPTC image metadata and map to Tika fields.
- ImageMetadataExtractor(Metadata) - Constructor for class org.apache.tika.parser.image.ImageMetadataExtractor
- ImageMetadataExtractor(Metadata, ImageMetadataExtractor.DirectoryHandler...) - Constructor for class org.apache.tika.parser.image.ImageMetadataExtractor
- ImageParser - Class in org.apache.tika.parser.image
- ImageParser() - Constructor for class org.apache.tika.parser.image.ImageParser
- increaseFramesRead() - Method in class org.apache.tika.parser.chm.lzx.ChmLzxState
- incrementLevel(int, AbstractListManager.LevelTuple[]) - Method in class org.apache.tika.parser.microsoft.AbstractListManager.ParagraphLevelCounter
-
Apply this to every numbered paragraph in order.
- indexOf(byte[], byte[]) - Static method in class org.apache.tika.parser.chm.core.ChmCommons
-
Searches some pattern in byte[]
- indexOf(List<DirectoryListingEntry>, String) - Static method in class org.apache.tika.parser.chm.core.ChmCommons
-
Searches for some pattern in the directory listing entry list
- indexOfResetTableBlock(byte[], byte[]) - Static method in class org.apache.tika.parser.chm.core.ChmCommons
-
Returns an index of the reset table
- initialize(URL) - Method in class org.apache.tika.parser.geo.topic.GeoParser
-
Initializes this parser
- initialize(Map<String, Param>) - Method in class org.apache.tika.parser.captioning.tf.TensorflowRESTCaptioner
- initialize(Map<String, Param>) - Method in class org.apache.tika.parser.jdbc.SQLite3Parser
-
No-op
- initialize(Map<String, Param>) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
-
no-op
- initialize(Map<String, Param>) - Method in class org.apache.tika.parser.pdf.PDFParser
-
This is a no-op.
- initialize(Map<String, Param>) - Method in interface org.apache.tika.parser.recognition.ObjectRecogniser
-
This is the hook for configuring the recogniser
- initialize(Map<String, Param>) - Method in class org.apache.tika.parser.recognition.ObjectRecognitionParser
- initialize(Map<String, Param>) - Method in class org.apache.tika.parser.recognition.tf.TensorflowImageRecParser
- initialize(Map<String, Param>) - Method in class org.apache.tika.parser.recognition.tf.TensorflowRESTRecogniser
- initialize(Map<String, Param>) - Method in class org.apache.tika.parser.recognition.tf.TensorflowRESTVideoRecogniser
- initialize(Map<String, Param>) - Method in class org.apache.tika.parser.sentiment.SentimentAnalysisParser
- inputFilterEnabled() - Method in class org.apache.tika.parser.txt.CharsetDetector
-
Test whether or not input filtering is enabled.
- INSERT - org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.EditType
- INSTANCE - Static variable in class org.apache.tika.parser.html.DefaultHtmlMapper
- INSTANCE - Static variable in class org.apache.tika.parser.html.IdentityHtmlMapper
- intelE8Decoding() - Method in class org.apache.tika.parser.chm.lzx.ChmLzxBlock
- IptcAnpaParser - Class in org.apache.tika.parser.iptc
-
Parser for IPTC ANPA New Wire Feeds
- IptcAnpaParser() - Constructor for class org.apache.tika.parser.iptc.IptcAnpaParser
- ISArchiveParser - Class in org.apache.tika.parser.isatab
- ISArchiveParser() - Constructor for class org.apache.tika.parser.isatab.ISArchiveParser
-
Default constructor.
- ISArchiveParser(String) - Constructor for class org.apache.tika.parser.isatab.ISArchiveParser
-
Constructor that accepts the pathname of ISArchive folder.
- ISATabUtils - Class in org.apache.tika.parser.isatab
- ISATabUtils() - Constructor for class org.apache.tika.parser.isatab.ISATabUtils
- isAudioHeader(int, int, int, int) - Static method in class org.apache.tika.parser.mp3.AudioFrame
-
Does this appear to be a 4 byte audio frame header?
- isAvailable() - Method in class org.apache.tika.parser.captioning.tf.TensorflowRESTCaptioner
- isAvailable() - Method in class org.apache.tika.parser.geo.topic.GeoParser
- isAvailable() - Method in class org.apache.tika.parser.ner.corenlp.CoreNLPNERecogniser
- isAvailable() - Method in class org.apache.tika.parser.ner.grobid.GrobidNERecogniser
- isAvailable() - Method in class org.apache.tika.parser.ner.mitie.MITIENERecogniser
- isAvailable() - Method in interface org.apache.tika.parser.ner.NERecogniser
-
checks if this Named Entity recogniser is available for service
- isAvailable() - Method in class org.apache.tika.parser.ner.nltk.NLTKNERecogniser
- isAvailable() - Method in class org.apache.tika.parser.ner.opennlp.OpenNLPNameFinder
- isAvailable() - Method in class org.apache.tika.parser.ner.opennlp.OpenNLPNERecogniser
- isAvailable() - Method in class org.apache.tika.parser.ner.regex.RegexNERecogniser
- isAvailable() - Method in interface org.apache.tika.parser.recognition.ObjectRecogniser
-
Is this service available
- isAvailable() - Method in class org.apache.tika.parser.recognition.tf.TensorflowImageRecParser
- isAvailable() - Method in class org.apache.tika.parser.recognition.tf.TensorflowRESTRecogniser
- isBase64() - Method in class org.apache.tika.parser.utils.DataURIScheme
- isBold() - Method in class org.apache.tika.parser.microsoft.ooxml.RunProperties
- isCatchIntermediateIOExceptions() - Method in class org.apache.tika.parser.pdf.PDFParserConfig
-
Deprecated.
- isComplete() - Method in class org.apache.tika.parser.csv.CSVParams
- isDiscardElement(String) - Method in class org.apache.tika.parser.html.DefaultHtmlMapper
- isDiscardElement(String) - Method in interface org.apache.tika.parser.html.HtmlMapper
-
Checks whether all content within the given HTML element should be discarded instead of including it in the parse output.
- isDiscardElement(String) - Method in class org.apache.tika.parser.html.HtmlParser
-
Deprecated.Use the
HtmlMappermechanism to customize the HTML mapping. This method will be removed in Tika 1.0. - isDiscardElement(String) - Method in class org.apache.tika.parser.html.IdentityHtmlMapper
- isEmpty() - Method in class org.apache.tika.parser.csv.CSVParams
- isEmpty(String) - Static method in class org.apache.tika.parser.chm.core.ChmCommons
- isEnableImageProcessing() - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
- isHeading() - Method in class org.apache.tika.parser.microsoft.WordExtractor.TagAndStyle
- isIncludeMarkup() - Method in class org.apache.tika.parser.html.BoilerpipeContentHandler
- isItalics() - Method in class org.apache.tika.parser.microsoft.ooxml.RunProperties
- isListenForAllRecords() - Method in class org.apache.tika.parser.microsoft.ExcelExtractor
-
Returns
trueif this parser is configured to listen for all records instead of just the specified few. - isMatchingElement(String, String) - Method in class org.apache.tika.parser.xml.ElementMetadataHandler
- isMatchingParentElement(String, String) - Method in class org.apache.tika.parser.xml.ElementMetadataHandler
- isMetadataField(String) - Static method in class org.apache.tika.parser.image.MetadataFields
- isMetadataField(Property) - Static method in class org.apache.tika.parser.image.MetadataFields
- isMimetype() - Method in class org.apache.tika.parser.strings.FileConfig
-
Returns
trueif the mime option is enabled. - isMSB() - Method in class org.apache.tika.parser.executable.MachineMetadata.Endian
- isPrettyPrint() - Method in class org.apache.tika.parser.ctakes.CTAKESConfig
-
Returns
trueif formatted output is enabled,falseotherwise. - isSerialize() - Method in class org.apache.tika.parser.ctakes.CTAKESConfig
-
Returns
trueif CAS serialization is enabled,falseotherwise. - isStrikeThrough() - Method in class org.apache.tika.parser.microsoft.ooxml.RunProperties
- isStyle - Variable in class org.apache.tika.parser.rtf.ListDescriptor
- isText() - Method in class org.apache.tika.parser.ctakes.CTAKESConfig
-
Returns
trueif content text analysis is enabledfalseotherwise. - isTracking() - Method in class org.apache.tika.parser.mbox.MboxParser
- isUnordered(int) - Method in class org.apache.tika.parser.rtf.ListDescriptor
- ITSF - Static variable in class org.apache.tika.parser.chm.core.ChmConstants
- ITSP - Static variable in class org.apache.tika.parser.chm.core.ChmConstants
- IWORK_COMMON_ENTRY - Static variable in class org.apache.tika.parser.iwork.IWorkPackageParser
-
All iWork files contain one of these, so we can detect based on it
- IWORK_CONTENT_ENTRIES - Static variable in class org.apache.tika.parser.iwork.IWorkPackageParser
-
Which files within an iWork file contain the actual content?
- IWORK13_COMMON_ENTRY - Static variable in class org.apache.tika.parser.iwork.iwana.IWork13PackageParser
-
All iWork 13 files contain this, so we can detect based on it
- IWork13PackageParser - Class in org.apache.tika.parser.iwork.iwana
- IWork13PackageParser() - Constructor for class org.apache.tika.parser.iwork.iwana.IWork13PackageParser
- IWork13PackageParser.IWork13DocumentType - Enum in org.apache.tika.parser.iwork.iwana
- IWorkPackageParser - Class in org.apache.tika.parser.iwork
-
A parser for the IWork container files.
- IWorkPackageParser() - Constructor for class org.apache.tika.parser.iwork.IWorkPackageParser
- IWorkPackageParser.IWORKDocumentType - Enum in org.apache.tika.parser.iwork
J
- JackcessParser - Class in org.apache.tika.parser.microsoft
-
Parser that handles Microsoft Access files via Jackcess
- JackcessParser() - Constructor for class org.apache.tika.parser.microsoft.JackcessParser
- JempboxExtractor - Class in org.apache.tika.parser.image.xmp
- JempboxExtractor(Metadata) - Constructor for class org.apache.tika.parser.image.xmp.JempboxExtractor
- joinCreators(List<String>) - Static method in class org.apache.tika.parser.image.xmp.JempboxExtractor
- JournalParser - Class in org.apache.tika.parser.journal
- JournalParser() - Constructor for class org.apache.tika.parser.journal.JournalParser
- JpegParser - Class in org.apache.tika.parser.jpeg
- JpegParser() - Constructor for class org.apache.tika.parser.jpeg.JpegParser
K
- KEYNOTE - org.apache.tika.parser.iwork.IWorkPackageParser.IWORKDocumentType
- KEYNOTE13 - org.apache.tika.parser.iwork.iwana.IWork13PackageParser.IWork13DocumentType
L
- label - Variable in class org.apache.tika.parser.recognition.RecognisedObject
-
Label of this object.
- LABEL_LANG - Static variable in class org.apache.tika.parser.recognition.tf.TensorflowRESTRecogniser
- labelLang - Variable in class org.apache.tika.parser.recognition.RecognisedObject
-
Language of label, Example : english
- Latin1StringsParser - Class in org.apache.tika.parser.strings
-
Parser to extract printable Latin1 strings from arbitrary files with pure java without running any external process.
- Latin1StringsParser() - Constructor for class org.apache.tika.parser.strings.Latin1StringsParser
- LAYER_1 - Static variable in class org.apache.tika.parser.mp3.AudioFrame
-
Constant for audio layer 1.
- LAYER_2 - Static variable in class org.apache.tika.parser.mp3.AudioFrame
-
Constant for audio layer 2.
- LAYER_3 - Static variable in class org.apache.tika.parser.mp3.AudioFrame
-
Constant for audio layer 3.
- lengthTreeLengtsTable - Variable in class org.apache.tika.parser.chm.lzx.ChmLzxState
- lengthTreeTable - Variable in class org.apache.tika.parser.chm.lzx.ChmLzxState
- LevelTuple(int, int, String, String, boolean) - Constructor for class org.apache.tika.parser.microsoft.AbstractListManager.LevelTuple
- LevelTuple(String) - Constructor for class org.apache.tika.parser.microsoft.AbstractListManager.LevelTuple
- LinkedCell - Class in org.apache.tika.parser.microsoft
-
Linked cell.
- LinkedCell(Cell, String) - Constructor for class org.apache.tika.parser.microsoft.LinkedCell
- ListDescriptor - Class in org.apache.tika.parser.rtf
-
Contains the information for a single list in the list or list override tables.
- ListDescriptor() - Constructor for class org.apache.tika.parser.rtf.ListDescriptor
- listLevelMap - Variable in class org.apache.tika.parser.microsoft.AbstractListManager
- ListManager - Class in org.apache.tika.parser.microsoft
-
Computes the number text which goes at the beginning of each list paragraph
- ListManager(HWPFDocument) - Constructor for class org.apache.tika.parser.microsoft.ListManager
-
Ordinary constructor for a new list reader
- LITTLE - Static variable in class org.apache.tika.parser.executable.MachineMetadata.Endian
- LITTLEENDIAN_16_BIT - org.apache.tika.parser.strings.StringsEncoding
- LITTLEENDIAN_32_BIT - org.apache.tika.parser.strings.StringsEncoding
- loadLinkedRelationships(PackagePart, boolean, Metadata) - Method in class org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor
-
This is used by the SAX docx and pptx decorators to load hyperlinks and other linked objects
- Location - Class in org.apache.tika.parser.geo.topic.gazetteer
- Location() - Constructor for class org.apache.tika.parser.geo.topic.gazetteer.Location
- LOCATION - Static variable in interface org.apache.tika.parser.ner.NERecogniser
- LOCATION_FILE - Static variable in class org.apache.tika.parser.ner.opennlp.OpenNLPNERecogniser
- LOG - Static variable in class org.apache.tika.parser.hwp.HwpTextExtractorV5
- LOG - Static variable in class org.apache.tika.parser.ner.NamedEntityParser
- LyricsHandler - Class in org.apache.tika.parser.mp3
-
This is used to parse Lyrics3 tag information from an MP3 file, if available.
- LyricsHandler(byte[]) - Constructor for class org.apache.tika.parser.mp3.LyricsHandler
-
Looks for the Lyrics data, which will be just before the ID3v1 data (if present), and process it.
- LyricsHandler(InputStream, ContentHandler) - Constructor for class org.apache.tika.parser.mp3.LyricsHandler
- LZX_ALIGNED_MAXSYMBOLS - Static variable in class org.apache.tika.parser.chm.core.ChmConstants
- LZX_ALIGNED_NUM_ELEMENTS - Static variable in class org.apache.tika.parser.chm.core.ChmConstants
- LZX_ALIGNED_TABLEBITS - Static variable in class org.apache.tika.parser.chm.core.ChmConstants
- LZX_BLOCKTYPE_ALIGNED - Static variable in class org.apache.tika.parser.chm.core.ChmConstants
- LZX_BLOCKTYPE_INVALID - Static variable in class org.apache.tika.parser.chm.core.ChmConstants
- LZX_BLOCKTYPE_UNCOMPRESSED - Static variable in class org.apache.tika.parser.chm.core.ChmConstants
- LZX_BLOCKTYPE_VERBATIM - Static variable in class org.apache.tika.parser.chm.core.ChmConstants
- LZX_LENGTH_MAXSYMBOLS - Static variable in class org.apache.tika.parser.chm.core.ChmConstants
- LZX_LENGTH_TABLEBITS - Static variable in class org.apache.tika.parser.chm.core.ChmConstants
- LZX_LENTABLE_SAFETY - Static variable in class org.apache.tika.parser.chm.core.ChmConstants
- LZX_MAIN_MAXSYMBOLS - Static variable in class org.apache.tika.parser.chm.core.ChmConstants
- LZX_MAINTREE_MAXSYMBOLS - Static variable in class org.apache.tika.parser.chm.core.ChmConstants
- LZX_MAINTREE_TABLEBITS - Static variable in class org.apache.tika.parser.chm.core.ChmConstants
- LZX_MAX_MATCH - Static variable in class org.apache.tika.parser.chm.core.ChmConstants
- LZX_MIN_MATCH - Static variable in class org.apache.tika.parser.chm.core.ChmConstants
- LZX_NUM_CHARS - Static variable in class org.apache.tika.parser.chm.core.ChmConstants
- LZX_NUM_PRIMARY_LENGTHS - Static variable in class org.apache.tika.parser.chm.core.ChmConstants
- LZX_NUM_SECONDARY_LENGTHS - Static variable in class org.apache.tika.parser.chm.core.ChmConstants
- LZX_PRETREE_MAXSYMBOLS - Static variable in class org.apache.tika.parser.chm.core.ChmConstants
- LZX_PRETREE_NUM_ELEMENTS - Static variable in class org.apache.tika.parser.chm.core.ChmConstants
- LZX_PRETREE_NUM_ELEMENTS_BITS - Static variable in class org.apache.tika.parser.chm.core.ChmConstants
- LZX_PRETREE_TABLEBITS - Static variable in class org.apache.tika.parser.chm.core.ChmConstants
- LZXC - Static variable in class org.apache.tika.parser.chm.core.ChmConstants
M
- MACHINE_ALPHA - Static variable in interface org.apache.tika.parser.executable.MachineMetadata
- MACHINE_ARM - Static variable in interface org.apache.tika.parser.executable.MachineMetadata
- MACHINE_EFI - Static variable in interface org.apache.tika.parser.executable.MachineMetadata
- MACHINE_IA_64 - Static variable in interface org.apache.tika.parser.executable.MachineMetadata
- MACHINE_M32R - Static variable in interface org.apache.tika.parser.executable.MachineMetadata
- MACHINE_M68K - Static variable in interface org.apache.tika.parser.executable.MachineMetadata
- MACHINE_M88K - Static variable in interface org.apache.tika.parser.executable.MachineMetadata
- MACHINE_MIPS - Static variable in interface org.apache.tika.parser.executable.MachineMetadata
- MACHINE_PPC - Static variable in interface org.apache.tika.parser.executable.MachineMetadata
- MACHINE_S370 - Static variable in interface org.apache.tika.parser.executable.MachineMetadata
- MACHINE_S390 - Static variable in interface org.apache.tika.parser.executable.MachineMetadata
- MACHINE_SH3 - Static variable in interface org.apache.tika.parser.executable.MachineMetadata
- MACHINE_SH4 - Static variable in interface org.apache.tika.parser.executable.MachineMetadata
- MACHINE_SH5 - Static variable in interface org.apache.tika.parser.executable.MachineMetadata
- MACHINE_SPARC - Static variable in interface org.apache.tika.parser.executable.MachineMetadata
- MACHINE_TYPE - Static variable in interface org.apache.tika.parser.executable.MachineMetadata
- MACHINE_UNKNOWN - Static variable in interface org.apache.tika.parser.executable.MachineMetadata
- MACHINE_VAX - Static variable in interface org.apache.tika.parser.executable.MachineMetadata
- MACHINE_x86_32 - Static variable in interface org.apache.tika.parser.executable.MachineMetadata
- MACHINE_x86_64 - Static variable in interface org.apache.tika.parser.executable.MachineMetadata
- MachineMetadata - Interface in org.apache.tika.parser.executable
-
Metadata for describing machines, such as their architecture, type and endian-ness
- MachineMetadata.Endian - Class in org.apache.tika.parser.executable
- MAIL_MAX_SIZE - Static variable in class org.apache.tika.parser.mbox.MboxParser
- MailUtil - Class in org.apache.tika.parser.mail
- MailUtil() - Constructor for class org.apache.tika.parser.mail.MailUtil
- main(String[]) - Static method in class org.apache.tika.parser.chm.accessor.ChmItsfHeader
- main(String[]) - Static method in class org.apache.tika.parser.chm.accessor.ChmLzxcControlData
- main(String[]) - Static method in class org.apache.tika.parser.chm.lzx.ChmBlockInfo
- main(String[]) - Static method in class org.apache.tika.parser.chm.lzx.ChmSection
- main(String[]) - Static method in class org.apache.tika.parser.microsoft.ooxml.xslf.XSLFEventBasedPowerPointExtractor
- main(String[]) - Static method in class org.apache.tika.parser.microsoft.ooxml.xwpf.XWPFEventBasedWordExtractor
- main(String[]) - Static method in class org.apache.tika.parser.ner.corenlp.CoreNLPNERecogniser
- mainTreeLengtsTable - Variable in class org.apache.tika.parser.chm.lzx.ChmLzxState
- mainTreeTable - Variable in class org.apache.tika.parser.chm.lzx.ChmLzxState
- map(long, long) - Method in class org.apache.tika.parser.mp4.DirectFileReadDataSource
- mapSafeAttribute(String, String) - Method in class org.apache.tika.parser.html.DefaultHtmlMapper
-
Normalizes an attribute name.
- mapSafeAttribute(String, String) - Method in interface org.apache.tika.parser.html.HtmlMapper
-
Maps "safe" HTML attribute names to semantic XHTML equivalents.
- mapSafeAttribute(String, String) - Method in class org.apache.tika.parser.html.HtmlParser
-
Deprecated.Use the
HtmlMappermechanism to customize the HTML mapping. This method will be removed in Tika 1.0. - mapSafeAttribute(String, String) - Method in class org.apache.tika.parser.html.IdentityHtmlMapper
- mapSafeElement(String) - Method in class org.apache.tika.parser.html.DefaultHtmlMapper
- mapSafeElement(String) - Method in interface org.apache.tika.parser.html.HtmlMapper
-
Maps "safe" HTML element names to semantic XHTML equivalents.
- mapSafeElement(String) - Method in class org.apache.tika.parser.html.HtmlParser
-
Deprecated.Use the
HtmlMappermechanism to customize the HTML mapping. This method will be removed in Tika 1.0. - mapSafeElement(String) - Method in class org.apache.tika.parser.html.IdentityHtmlMapper
- MATLAB_MIME_TYPE - Static variable in class org.apache.tika.parser.mat.MatParser
- MatParser - Class in org.apache.tika.parser.mat
- MatParser() - Constructor for class org.apache.tika.parser.mat.MatParser
- MBOX_MIME_TYPE - Static variable in class org.apache.tika.parser.mbox.MboxParser
- MBOX_RECORD_DIVIDER - Static variable in class org.apache.tika.parser.mbox.MboxParser
- MboxParser - Class in org.apache.tika.parser.mbox
-
Mbox (mailbox) parser.
- MboxParser() - Constructor for class org.apache.tika.parser.mbox.MboxParser
- MD_KEY_IMG_CAP - Static variable in class org.apache.tika.parser.recognition.ObjectRecognitionParser
- MD_KEY_OBJ_REC - Static variable in class org.apache.tika.parser.recognition.ObjectRecognitionParser
- MD_KEY_PREFIX - Static variable in class org.apache.tika.parser.ner.NamedEntityParser
- MD_REC_IMPL_KEY - Static variable in class org.apache.tika.parser.recognition.ObjectRecognitionParser
- MD2 - org.apache.tika.parser.utils.CommonsDigester.DigestAlgorithm
- MD5 - org.apache.tika.parser.utils.CommonsDigester.DigestAlgorithm
- MDB_PROPERTY_PREFIX - Static variable in class org.apache.tika.parser.microsoft.JackcessParser
- MDB_PW - Static variable in class org.apache.tika.parser.microsoft.JackcessParser
- MEDIA_TYPES - Static variable in class org.apache.tika.parser.ner.NamedEntityParser
- metadata - Variable in class org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator
- MetadataExtractor - Class in org.apache.tika.parser.microsoft.ooxml
-
OOXML metadata extractor.
- MetadataExtractor(POIXMLTextExtractor) - Constructor for class org.apache.tika.parser.microsoft.ooxml.MetadataExtractor
- MetadataFields - Class in org.apache.tika.parser.image
-
Knowns about all declared
Metadatafields. - MetadataFields() - Constructor for class org.apache.tika.parser.image.MetadataFields
- MetadataHandler - Class in org.apache.tika.parser.xml
-
Deprecated.Use the
AttributeMetadataHandlerandElementMetadataHandlerclasses instead - MetadataHandler(Metadata, String) - Constructor for class org.apache.tika.parser.xml.MetadataHandler
-
Deprecated.
- MetadataHandler(Metadata, Property) - Constructor for class org.apache.tika.parser.xml.MetadataHandler
-
Deprecated.
- MidiParser - Class in org.apache.tika.parser.audio
- MidiParser() - Constructor for class org.apache.tika.parser.audio.MidiParser
- minConfidence - Variable in class org.apache.tika.parser.recognition.tf.TensorflowRESTRecogniser
- MISCELLANEOUS - Static variable in interface org.apache.tika.parser.ner.NERecogniser
- MITIENERecogniser - Class in org.apache.tika.parser.ner.mitie
-
This class offers an implementation of
NERecogniserbased on trained models using state-of-the-art information extraction tools. - MITIENERecogniser() - Constructor for class org.apache.tika.parser.ner.mitie.MITIENERecogniser
- MITIENERecogniser(String) - Constructor for class org.apache.tika.parser.ner.mitie.MITIENERecogniser
-
Creates a NERecogniser by loading model from given path
- MODEL_PROP_NAME - Static variable in class org.apache.tika.parser.ner.corenlp.CoreNLPNERecogniser
- MODEL_PROP_NAME - Static variable in class org.apache.tika.parser.ner.mitie.MITIENERecogniser
- MODELS_DIR - Static variable in class org.apache.tika.parser.ner.opennlp.OpenNLPNERecogniser
- MONEY - Static variable in interface org.apache.tika.parser.ner.NERecogniser
- MONEY_FILE - Static variable in class org.apache.tika.parser.ner.opennlp.OpenNLPNERecogniser
- MOVE_FROM - org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.EditType
- MOVE_TO - org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.EditType
- MP3Frame - Interface in org.apache.tika.parser.mp3
-
A frame in an MP3 file, such as ID3v2 Tags or some audio.
- Mp3Parser - Class in org.apache.tika.parser.mp3
-
The
Mp3Parseris used to parse ID3 Version 1 Tag information from an MP3 file, if available. - Mp3Parser() - Constructor for class org.apache.tika.parser.mp3.Mp3Parser
- Mp3Parser.ID3TagsAndAudio - Class in org.apache.tika.parser.mp3
- MP4Parser - Class in org.apache.tika.parser.mp4
-
Parser for the MP4 media container format, as well as the older QuickTime format that MP4 is based on.
- MP4Parser() - Constructor for class org.apache.tika.parser.mp4.MP4Parser
- MPEG_V1 - Static variable in class org.apache.tika.parser.mp3.AudioFrame
-
Constant for the MPEG version 1.
- MPEG_V2 - Static variable in class org.apache.tika.parser.mp3.AudioFrame
-
Constant for the MPEG version 2.
- MPEG_V2_5 - Static variable in class org.apache.tika.parser.mp3.AudioFrame
-
Constant for the MPEG version 2.5.
- MPP - Static variable in class org.apache.tika.parser.microsoft.POIFSContainerDetector
-
Microsoft Project
- MS_EQUATION - Static variable in class org.apache.tika.parser.microsoft.POIFSContainerDetector
-
Equation embedded in Office docs
- MS_GRAPH_CHART - Static variable in class org.apache.tika.parser.microsoft.POIFSContainerDetector
-
Graph/Charts embedded in PowerPoint and Excel
- MS_OUTLOOK_PST_MIMETYPE - Static variable in class org.apache.tika.parser.mbox.OutlookPSTParser
- MSG - Static variable in class org.apache.tika.parser.microsoft.POIFSContainerDetector
-
Microsoft Outlook
- MSOwnerFileParser - Class in org.apache.tika.parser.microsoft
-
Parser for temporary MSOFfice files.
- MSOwnerFileParser() - Constructor for class org.apache.tika.parser.microsoft.MSOwnerFileParser
N
- name - Variable in class org.apache.tika.parser.mp3.ID3v2Frame.RawTag
- NamedEntityParser - Class in org.apache.tika.parser.ner
-
This implementation of
Parserextracts entity names from text content and adds it to the metadata. - NamedEntityParser() - Constructor for class org.apache.tika.parser.ner.NamedEntityParser
- NameEntityExtractor - Class in org.apache.tika.parser.geo.topic
- NameEntityExtractor(NameFinderME) - Constructor for class org.apache.tika.parser.geo.topic.NameEntityExtractor
- NER_3CLASS_MODEL - Static variable in class org.apache.tika.parser.ner.corenlp.CoreNLPNERecogniser
- NER_4CLASS_MODEL - Static variable in class org.apache.tika.parser.ner.corenlp.CoreNLPNERecogniser
- NER_7CLASS_MODEL - Static variable in class org.apache.tika.parser.ner.corenlp.CoreNLPNERecogniser
- NER_DATE_MODEL - Static variable in class org.apache.tika.parser.ner.opennlp.OpenNLPNERecogniser
- NER_LOCATION_MODEL - Static variable in class org.apache.tika.parser.ner.opennlp.OpenNLPNERecogniser
- NER_MONEY_MODEL - Static variable in class org.apache.tika.parser.ner.opennlp.OpenNLPNERecogniser
- NER_ORGANIZATION_MODEL - Static variable in class org.apache.tika.parser.ner.opennlp.OpenNLPNERecogniser
- NER_PERCENT_MODEL - Static variable in class org.apache.tika.parser.ner.opennlp.OpenNLPNERecogniser
- NER_PERSON_MODEL - Static variable in class org.apache.tika.parser.ner.opennlp.OpenNLPNERecogniser
- NER_REGEX_FILE - Static variable in class org.apache.tika.parser.ner.regex.RegexNERecogniser
- NER_TIME_MODEL - Static variable in class org.apache.tika.parser.ner.opennlp.OpenNLPNERecogniser
- NERecogniser - Interface in org.apache.tika.parser.ner
-
Defines a contract for named entity recogniser.
- NetCDFParser - Class in org.apache.tika.parser.netcdf
- NetCDFParser() - Constructor for class org.apache.tika.parser.netcdf.NetCDFParser
- newDecoder() - Method in class org.apache.tika.parser.html.charsetdetector.charsets.ReplacementCharset
- newDecoder() - Method in class org.apache.tika.parser.html.charsetdetector.charsets.XUserDefinedCharset
- newEncoder() - Method in class org.apache.tika.parser.html.charsetdetector.charsets.ReplacementCharset
- newEncoder() - Method in class org.apache.tika.parser.html.charsetdetector.charsets.XUserDefinedCharset
- next() - Method in class org.apache.tika.parser.mp3.ID3v2Frame.RawTagIterator
- NLTKNERecogniser - Class in org.apache.tika.parser.ner.nltk
-
This class offers an implementation of
NERecogniserbased on ne_chunk() module of NLTK. - NLTKNERecogniser() - Constructor for class org.apache.tika.parser.ner.nltk.NLTKNERecogniser
- NO_OCR - org.apache.tika.parser.pdf.PDFParserConfig.OCR_STRATEGY
- NONE - org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.EditType
- NOT_STARTED - org.apache.tika.parser.chm.core.ChmCommons.IntelState
- NOT_STARTED_DECODING - org.apache.tika.parser.chm.core.ChmCommons.LzxState
- NSNormalizerContentHandler - Class in org.apache.tika.parser.odf
-
Content handler decorator that: Maps old OpenOffice 1.0 Namespaces to the OpenDocument ones Returns a fake DTD when parser requests OpenOffice DTD
- NSNormalizerContentHandler(ContentHandler) - Constructor for class org.apache.tika.parser.odf.NSNormalizerContentHandler
- NUMBER_TYPE_BULLET - Static variable in class org.apache.tika.parser.rtf.ListDescriptor
- NumberCell - Class in org.apache.tika.parser.microsoft
-
Number cell.
- NumberCell(double, NumberFormat) - Constructor for class org.apache.tika.parser.microsoft.NumberCell
- NUMBERS - org.apache.tika.parser.iwork.IWorkPackageParser.IWORKDocumentType
- NUMBERS13 - org.apache.tika.parser.iwork.iwana.IWork13PackageParser.IWork13DocumentType
- numberType - Variable in class org.apache.tika.parser.rtf.ListDescriptor
O
- ObjectRecogniser - Interface in org.apache.tika.parser.recognition
-
This is a contract for object recognisers used by
ObjectRecognitionParser - ObjectRecognitionParser - Class in org.apache.tika.parser.recognition
-
This parser recognises objects from Images.
- ObjectRecognitionParser() - Constructor for class org.apache.tika.parser.recognition.ObjectRecognitionParser
- OCR_AND_TEXT_EXTRACTION - org.apache.tika.parser.pdf.PDFParserConfig.OCR_STRATEGY
- OCR_ONLY - org.apache.tika.parser.pdf.PDFParserConfig.OCR_STRATEGY
- OFFICE_NS - Static variable in class org.apache.tika.parser.odf.OpenDocumentContentParser
- OfficeParser - Class in org.apache.tika.parser.microsoft
-
Defines a Microsoft document content extractor.
- OfficeParser() - Constructor for class org.apache.tika.parser.microsoft.OfficeParser
- OfficeParser.POIFSDocumentType - Enum in org.apache.tika.parser.microsoft
- OfficeParserConfig - Class in org.apache.tika.parser.microsoft
- OfficeParserConfig() - Constructor for class org.apache.tika.parser.microsoft.OfficeParserConfig
- OldExcelParser - Class in org.apache.tika.parser.microsoft
-
A POI-powered Tika Parser for very old versions of Excel, from pre-OLE2 days, such as Excel 4.
- OldExcelParser() - Constructor for class org.apache.tika.parser.microsoft.OldExcelParser
- OLE - Static variable in class org.apache.tika.parser.microsoft.POIFSContainerDetector
-
The OLE base file format
- OLE10_NATIVE - org.apache.tika.parser.microsoft.OfficeParser.POIFSDocumentType
- OLE10_NATIVE - Static variable in class org.apache.tika.parser.microsoft.POIFSContainerDetector
-
An OLE10 Native embedded document within another OLE2 document
- ONTOLOGY_CONCEPT_ARR - org.apache.tika.parser.ctakes.CTAKESAnnotationProperty
- OOXML_PROTECTED - Static variable in class org.apache.tika.parser.microsoft.POIFSContainerDetector
-
The protected OOXML base file format
- OOXMLExtractor - Interface in org.apache.tika.parser.microsoft.ooxml
-
Interface implemented by all Tika OOXML extractors.
- OOXMLExtractorFactory - Class in org.apache.tika.parser.microsoft.ooxml
-
Figures out the correct
OOXMLExtractorfor the supplied document and returns it. - OOXMLExtractorFactory() - Constructor for class org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory
- OOXMLParser - Class in org.apache.tika.parser.microsoft.ooxml
-
Office Open XML (OOXML) parser.
- OOXMLParser() - Constructor for class org.apache.tika.parser.microsoft.ooxml.OOXMLParser
- OOXMLTikaBodyPartHandler - Class in org.apache.tika.parser.microsoft.ooxml
- OOXMLTikaBodyPartHandler(XHTMLContentHandler) - Constructor for class org.apache.tika.parser.microsoft.ooxml.OOXMLTikaBodyPartHandler
- OOXMLTikaBodyPartHandler(XHTMLContentHandler, XWPFStylesShim, XWPFListManager, OfficeParserConfig) - Constructor for class org.apache.tika.parser.microsoft.ooxml.OOXMLTikaBodyPartHandler
- OOXMLWordAndPowerPointTextHandler - Class in org.apache.tika.parser.microsoft.ooxml
-
This class is intended to handle anything that might contain IBodyElements: main document, headers, footers, notes, slides, etc.
- OOXMLWordAndPowerPointTextHandler(OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler, Map<String, String>) - Constructor for class org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler
- OOXMLWordAndPowerPointTextHandler(OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler, Map<String, String>, boolean, boolean) - Constructor for class org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler
- OOXMLWordAndPowerPointTextHandler.EditType - Enum in org.apache.tika.parser.microsoft.ooxml
- OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler - Interface in org.apache.tika.parser.microsoft.ooxml
- OpenDocumentContentParser - Class in org.apache.tika.parser.odf
-
Parser for ODF
content.xmlfiles. - OpenDocumentContentParser() - Constructor for class org.apache.tika.parser.odf.OpenDocumentContentParser
- OpenDocumentMetaParser - Class in org.apache.tika.parser.odf
-
Parser for OpenDocument
meta.xmlfiles. - OpenDocumentMetaParser() - Constructor for class org.apache.tika.parser.odf.OpenDocumentMetaParser
- OpenDocumentParser - Class in org.apache.tika.parser.odf
-
OpenOffice parser
- OpenDocumentParser() - Constructor for class org.apache.tika.parser.odf.OpenDocumentParser
- OpenNLPNameFinder - Class in org.apache.tika.parser.ner.opennlp
-
An implementation of
NERecogniserthat finds names in text using Open NLP Model. - OpenNLPNameFinder(String, String) - Constructor for class org.apache.tika.parser.ner.opennlp.OpenNLPNameFinder
-
Creates OpenNLP name finder
- OpenNLPNERecogniser - Class in org.apache.tika.parser.ner.opennlp
-
This implementation of
NERecogniserchains an array ofOpenNLPNameFinders for which NER models are available in classpath. - OpenNLPNERecogniser() - Constructor for class org.apache.tika.parser.ner.opennlp.OpenNLPNERecogniser
-
Creates a default chain of Name finders using default OpenNLP recognizers
- OpenNLPNERecogniser(Map<String, String>) - Constructor for class org.apache.tika.parser.ner.opennlp.OpenNLPNERecogniser
-
Creates a chain of Named Entity recognisers
- OpenOfficeParser - Class in org.apache.tika.parser.opendocument
-
Deprecated.Use the
OpenDocumentParserclass instead. This class will be removed in Apache Tika 1.0. - OpenOfficeParser() - Constructor for class org.apache.tika.parser.opendocument.OpenOfficeParser
-
Deprecated.
- org.apache.tika.parser.apple - package org.apache.tika.parser.apple
- org.apache.tika.parser.asm - package org.apache.tika.parser.asm
- org.apache.tika.parser.audio - package org.apache.tika.parser.audio
- org.apache.tika.parser.captioning - package org.apache.tika.parser.captioning
- org.apache.tika.parser.captioning.tf - package org.apache.tika.parser.captioning.tf
- org.apache.tika.parser.chm - package org.apache.tika.parser.chm
- org.apache.tika.parser.chm.accessor - package org.apache.tika.parser.chm.accessor
- org.apache.tika.parser.chm.assertion - package org.apache.tika.parser.chm.assertion
- org.apache.tika.parser.chm.core - package org.apache.tika.parser.chm.core
- org.apache.tika.parser.chm.exception - package org.apache.tika.parser.chm.exception
- org.apache.tika.parser.chm.lzx - package org.apache.tika.parser.chm.lzx
- org.apache.tika.parser.code - package org.apache.tika.parser.code
- org.apache.tika.parser.crypto - package org.apache.tika.parser.crypto
- org.apache.tika.parser.csv - package org.apache.tika.parser.csv
- org.apache.tika.parser.ctakes - package org.apache.tika.parser.ctakes
- org.apache.tika.parser.dbf - package org.apache.tika.parser.dbf
- org.apache.tika.parser.dif - package org.apache.tika.parser.dif
- org.apache.tika.parser.dwg - package org.apache.tika.parser.dwg
- org.apache.tika.parser.envi - package org.apache.tika.parser.envi
- org.apache.tika.parser.epub - package org.apache.tika.parser.epub
- org.apache.tika.parser.executable - package org.apache.tika.parser.executable
- org.apache.tika.parser.feed - package org.apache.tika.parser.feed
- org.apache.tika.parser.font - package org.apache.tika.parser.font
- org.apache.tika.parser.gdal - package org.apache.tika.parser.gdal
- org.apache.tika.parser.geo.topic - package org.apache.tika.parser.geo.topic
- org.apache.tika.parser.geo.topic.gazetteer - package org.apache.tika.parser.geo.topic.gazetteer
- org.apache.tika.parser.geoinfo - package org.apache.tika.parser.geoinfo
- org.apache.tika.parser.grib - package org.apache.tika.parser.grib
- org.apache.tika.parser.hdf - package org.apache.tika.parser.hdf
- org.apache.tika.parser.html - package org.apache.tika.parser.html
- org.apache.tika.parser.html.charsetdetector - package org.apache.tika.parser.html.charsetdetector
- org.apache.tika.parser.html.charsetdetector.charsets - package org.apache.tika.parser.html.charsetdetector.charsets
- org.apache.tika.parser.hwp - package org.apache.tika.parser.hwp
- org.apache.tika.parser.image - package org.apache.tika.parser.image
- org.apache.tika.parser.image.xmp - package org.apache.tika.parser.image.xmp
- org.apache.tika.parser.internal - package org.apache.tika.parser.internal
- org.apache.tika.parser.iptc - package org.apache.tika.parser.iptc
- org.apache.tika.parser.isatab - package org.apache.tika.parser.isatab
- org.apache.tika.parser.iwork - package org.apache.tika.parser.iwork
- org.apache.tika.parser.iwork.iwana - package org.apache.tika.parser.iwork.iwana
- org.apache.tika.parser.jdbc - package org.apache.tika.parser.jdbc
- org.apache.tika.parser.journal - package org.apache.tika.parser.journal
- org.apache.tika.parser.jpeg - package org.apache.tika.parser.jpeg
- org.apache.tika.parser.mail - package org.apache.tika.parser.mail
- org.apache.tika.parser.mat - package org.apache.tika.parser.mat
- org.apache.tika.parser.mbox - package org.apache.tika.parser.mbox
- org.apache.tika.parser.microsoft - package org.apache.tika.parser.microsoft
- org.apache.tika.parser.microsoft.ooxml - package org.apache.tika.parser.microsoft.ooxml
- org.apache.tika.parser.microsoft.ooxml.xps - package org.apache.tika.parser.microsoft.ooxml.xps
- org.apache.tika.parser.microsoft.ooxml.xslf - package org.apache.tika.parser.microsoft.ooxml.xslf
- org.apache.tika.parser.microsoft.ooxml.xwpf - package org.apache.tika.parser.microsoft.ooxml.xwpf
- org.apache.tika.parser.microsoft.ooxml.xwpf.ml2006 - package org.apache.tika.parser.microsoft.ooxml.xwpf.ml2006
- org.apache.tika.parser.microsoft.xml - package org.apache.tika.parser.microsoft.xml
- org.apache.tika.parser.mp3 - package org.apache.tika.parser.mp3
- org.apache.tika.parser.mp4 - package org.apache.tika.parser.mp4
- org.apache.tika.parser.ner - package org.apache.tika.parser.ner
- org.apache.tika.parser.ner.corenlp - package org.apache.tika.parser.ner.corenlp
- org.apache.tika.parser.ner.grobid - package org.apache.tika.parser.ner.grobid
- org.apache.tika.parser.ner.mitie - package org.apache.tika.parser.ner.mitie
- org.apache.tika.parser.ner.nltk - package org.apache.tika.parser.ner.nltk
- org.apache.tika.parser.ner.opennlp - package org.apache.tika.parser.ner.opennlp
- org.apache.tika.parser.ner.regex - package org.apache.tika.parser.ner.regex
- org.apache.tika.parser.netcdf - package org.apache.tika.parser.netcdf
- org.apache.tika.parser.ocr - package org.apache.tika.parser.ocr
- org.apache.tika.parser.odf - package org.apache.tika.parser.odf
- org.apache.tika.parser.opendocument - package org.apache.tika.parser.opendocument
- org.apache.tika.parser.pdf - package org.apache.tika.parser.pdf
- org.apache.tika.parser.pkg - package org.apache.tika.parser.pkg
- org.apache.tika.parser.pot - package org.apache.tika.parser.pot
- org.apache.tika.parser.prt - package org.apache.tika.parser.prt
- org.apache.tika.parser.recognition - package org.apache.tika.parser.recognition
- org.apache.tika.parser.recognition.tf - package org.apache.tika.parser.recognition.tf
- org.apache.tika.parser.rtf - package org.apache.tika.parser.rtf
- org.apache.tika.parser.sas - package org.apache.tika.parser.sas
- org.apache.tika.parser.sentiment - package org.apache.tika.parser.sentiment
- org.apache.tika.parser.strings - package org.apache.tika.parser.strings
- org.apache.tika.parser.txt - package org.apache.tika.parser.txt
- org.apache.tika.parser.utils - package org.apache.tika.parser.utils
- org.apache.tika.parser.video - package org.apache.tika.parser.video
- org.apache.tika.parser.wordperfect - package org.apache.tika.parser.wordperfect
- org.apache.tika.parser.xliff - package org.apache.tika.parser.xliff
- org.apache.tika.parser.xml - package org.apache.tika.parser.xml
- ORGANIZATION - Static variable in interface org.apache.tika.parser.ner.NERecogniser
- ORGANIZATION_FILE - Static variable in class org.apache.tika.parser.ner.opennlp.OpenNLPNERecogniser
- OUTLOOK - org.apache.tika.parser.microsoft.OfficeParser.POIFSDocumentType
- OutlookExtractor - Class in org.apache.tika.parser.microsoft
-
Outlook Message Parser.
- OutlookExtractor(DirectoryNode, ParseContext) - Constructor for class org.apache.tika.parser.microsoft.OutlookExtractor
- OutlookExtractor(POIFSFileSystem, ParseContext) - Constructor for class org.apache.tika.parser.microsoft.OutlookExtractor
- OutlookExtractor.RECIPIENT_TYPE - Enum in org.apache.tika.parser.microsoft
- OutlookPSTParser - Class in org.apache.tika.parser.mbox
-
Parser for MS Outlook PST email storage files
- OutlookPSTParser() - Constructor for class org.apache.tika.parser.mbox.OutlookPSTParser
- overrideTupleMap - Variable in class org.apache.tika.parser.microsoft.AbstractListManager
P
- PackageParser - Class in org.apache.tika.parser.pkg
-
Parser for various packaging formats.
- PackageParser() - Constructor for class org.apache.tika.parser.pkg.PackageParser
- PAGES - org.apache.tika.parser.iwork.IWorkPackageParser.IWORKDocumentType
- PAGES13 - org.apache.tika.parser.iwork.iwana.IWork13PackageParser.IWork13DocumentType
- ParagraphLevelCounter(AbstractListManager.LevelTuple[]) - Constructor for class org.apache.tika.parser.microsoft.AbstractListManager.ParagraphLevelCounter
- ParagraphProperties - Class in org.apache.tika.parser.microsoft.ooxml
- ParagraphProperties() - Constructor for class org.apache.tika.parser.microsoft.ooxml.ParagraphProperties
- parse(byte[], ChmItsfHeader) - Method in class org.apache.tika.parser.chm.accessor.ChmItsfHeader
- parse(byte[], ChmItspHeader) - Method in class org.apache.tika.parser.chm.accessor.ChmItspHeader
- parse(byte[], ChmLzxcControlData) - Method in class org.apache.tika.parser.chm.accessor.ChmLzxcControlData
- parse(byte[], ChmLzxcResetTable) - Method in class org.apache.tika.parser.chm.accessor.ChmLzxcResetTable
- parse(byte[], ChmPmgiHeader) - Method in class org.apache.tika.parser.chm.accessor.ChmPmgiHeader
- parse(byte[], ChmPmglHeader) - Method in class org.apache.tika.parser.chm.accessor.ChmPmglHeader
- parse(byte[], T) - Method in interface org.apache.tika.parser.chm.accessor.ChmAccessor
-
Parses chm accessor
- parse(Image, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- parse(InputStream) - Method in class org.apache.tika.parser.image.xmp.JempboxExtractor
- parse(InputStream, OutputStream) - Method in class org.apache.tika.parser.image.xmp.XMPPacketScanner
-
Locates an XMP packet in a stream, parses it and returns the XMP metadata.
- parse(InputStream, ContentHandler, Metadata) - Method in class org.apache.tika.parser.iptc.IptcAnpaParser
-
Deprecated.This method will be removed in Apache Tika 1.0.
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.apple.AppleSingleFileParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.asm.ClassParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.audio.AudioParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.audio.MidiParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.chm.ChmParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.code.SourceCodeParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.crypto.Pkcs7Parser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.crypto.TSDParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.csv.TextAndCSVParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.ctakes.CTAKESParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.dbf.DBFParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.dif.DIFParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.dwg.DWGParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.envi.EnviHeaderParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.epub.EpubContentParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.epub.EpubParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.executable.ExecutableParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.feed.FeedParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.font.AdobeFontMetricParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.font.TrueTypeParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.gdal.GDALParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.geo.topic.GeoParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.geoinfo.GeographicInformationParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.grib.GribParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.hdf.HDFParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.html.HtmlParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.hwp.HwpV5Parser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.image.BPGParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.image.ICNSParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.image.ImageParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.image.PSDParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.image.TiffParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.image.WebPParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.iptc.IptcAnpaParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.isatab.ISArchiveParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.iwork.iwana.IWork13PackageParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.iwork.IWorkPackageParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.jdbc.SQLite3Parser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.journal.JournalParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.jpeg.JpegParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.mail.RFC822Parser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.mat.MatParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.mbox.MboxParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.mbox.OutlookPSTParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.microsoft.EMFParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.microsoft.JackcessParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.microsoft.MSOwnerFileParser
-
Extracts owner from MS temp file
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.microsoft.OfficeParser
-
Extracts properties and text from an MS Document input stream
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.microsoft.OldExcelParser
-
Extracts properties and text from an MS Document input stream
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Static method in class org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.microsoft.ooxml.OOXMLParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.microsoft.ooxml.xwpf.ml2006.Word2006MLParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.microsoft.TNEFParser
-
Extracts properties and text from an MS Document input stream
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.microsoft.WMFParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.microsoft.xml.AbstractXML2003Parser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.mp3.Mp3Parser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.mp4.MP4Parser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.ner.NamedEntityParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.netcdf.NetCDFParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.odf.OpenDocumentContentParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.odf.OpenDocumentMetaParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.odf.OpenDocumentParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.pdf.PDFParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.pkg.CompressorParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.pkg.PackageParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.pkg.RarParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.pot.PooledTimeSeriesParser
-
Parses a document stream into a sequence of XHTML SAX events.
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.prt.PRTParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.recognition.ObjectRecognitionParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.rtf.RTFParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.sas.SAS7BDATParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.sentiment.SentimentAnalysisParser
-
Performs the parse
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.strings.Latin1StringsParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.strings.StringsParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.txt.TXTParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.video.FLVParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.wordperfect.QuattroProParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.wordperfect.WordPerfectParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.xliff.XLIFF12Parser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.xliff.XLZParser
- parse(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.xml.XMLParser
- parse(String) - Static method in class org.apache.tika.parser.utils.CommonsDigester
-
Deprecated.use the
CommonsDigester(int, String)instead - parse(String) - Method in class org.apache.tika.parser.utils.DataURISchemeUtil
- parse(String, ParseContext) - Method in class org.apache.tika.parser.journal.TEIDOMParser
- parse(String, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.journal.GrobidRESTParser
- parse(OldExcelExtractor, XHTMLContentHandler) - Static method in class org.apache.tika.parser.microsoft.OldExcelParser
- parse(DirectoryNode, ParseContext, Metadata, XHTMLContentHandler) - Method in class org.apache.tika.parser.microsoft.OfficeParser
- parse(DirectoryNode, XHTMLContentHandler) - Method in class org.apache.tika.parser.microsoft.HSLFExtractor
- parse(DirectoryNode, XHTMLContentHandler) - Method in class org.apache.tika.parser.microsoft.WordExtractor
- parse(DirectoryNode, XHTMLContentHandler, Locale) - Method in class org.apache.tika.parser.microsoft.ExcelExtractor
- parse(POIFSFileSystem, XHTMLContentHandler) - Method in class org.apache.tika.parser.microsoft.HSLFExtractor
- parse(POIFSFileSystem, XHTMLContentHandler) - Method in class org.apache.tika.parser.microsoft.WordExtractor
- parse(POIFSFileSystem, XHTMLContentHandler, Locale) - Method in class org.apache.tika.parser.microsoft.ExcelExtractor
-
Extracts text from an Excel Workbook writing the extracted content to the specified
Appendable. - parse(XHTMLContentHandler, Metadata) - Method in class org.apache.tika.parser.microsoft.OutlookExtractor
- parseAssay(InputStream, XHTMLContentHandler, Metadata, ParseContext) - Static method in class org.apache.tika.parser.isatab.ISATabUtils
- parseContext - Variable in class org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator
- parseDate(String) - Static method in class org.apache.tika.parser.mbox.MboxParser
- parseELF(XHTMLContentHandler, Metadata, InputStream, byte[]) - Method in class org.apache.tika.parser.executable.ExecutableParser
-
Parses a Unix ELF file
- parseInline(InputStream, XHTMLContentHandler, TesseractOCRConfig) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- parseInline(InputStream, XHTMLContentHandler, ParseContext, TesseractOCRConfig) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
-
Use this to parse content without starting a new document.
- parseInvestigation(InputStream, XHTMLContentHandler, Metadata, ParseContext) - Static method in class org.apache.tika.parser.isatab.ISATabUtils
- parseInvestigation(InputStream, XHTMLContentHandler, Metadata, ParseContext, String) - Static method in class org.apache.tika.parser.isatab.ISATabUtils
- parseJpeg(File) - Method in class org.apache.tika.parser.image.ImageMetadataExtractor
- parseObject(String, ParsePosition) - Method in class org.apache.tika.parser.microsoft.TikaExcelGeneralFormat
- parseOOXMLContentTypes(InputStream) - Static method in class org.apache.tika.parser.pkg.StreamingZipContainerDetector
- parseOOXMLRels(InputStream) - Static method in class org.apache.tika.parser.pkg.StreamingZipContainerDetector
- parsePE(XHTMLContentHandler, Metadata, InputStream, byte[]) - Method in class org.apache.tika.parser.executable.ExecutableParser
-
Parses a DOS or Windows PE file
- parseRawExif(byte[]) - Method in class org.apache.tika.parser.image.ImageMetadataExtractor
- parseRawExif(InputStream, int, boolean) - Method in class org.apache.tika.parser.image.ImageMetadataExtractor
- parseRawXMP(byte[]) - Method in class org.apache.tika.parser.image.ImageMetadataExtractor
- parseStudy(InputStream, XHTMLContentHandler, Metadata, ParseContext) - Static method in class org.apache.tika.parser.isatab.ISATabUtils
- parseSummaries(DirectoryNode) - Method in class org.apache.tika.parser.microsoft.SummaryExtractor
- parseSummaries(POIFSFileSystem) - Method in class org.apache.tika.parser.microsoft.SummaryExtractor
- parseTiff(File) - Method in class org.apache.tika.parser.image.ImageMetadataExtractor
- parseWebP(File) - Method in class org.apache.tika.parser.image.ImageMetadataExtractor
- parseWord6(DirectoryNode, XHTMLContentHandler) - Method in class org.apache.tika.parser.microsoft.WordExtractor
- parseWord6(POIFSFileSystem, XHTMLContentHandler) - Method in class org.apache.tika.parser.microsoft.WordExtractor
- PASSWORD - Static variable in class org.apache.tika.parser.pdf.PDFParser
-
Deprecated.Supply a
PasswordProvideron theParseContextinstead - patterns - Variable in class org.apache.tika.parser.ner.regex.RegexNERecogniser
- PDFParser - Class in org.apache.tika.parser.pdf
-
PDF parser.
- PDFParser() - Constructor for class org.apache.tika.parser.pdf.PDFParser
- PDFParserConfig - Class in org.apache.tika.parser.pdf
-
Config for PDFParser.
- PDFParserConfig() - Constructor for class org.apache.tika.parser.pdf.PDFParserConfig
- PDFParserConfig(InputStream) - Constructor for class org.apache.tika.parser.pdf.PDFParserConfig
-
Loads properties from InputStream and then tries to close InputStream.
- PDFParserConfig.OCR_STRATEGY - Enum in org.apache.tika.parser.pdf
- peekBits(int) - Method in class org.apache.tika.parser.chm.lzx.ChmSection
- PERCENT - Static variable in interface org.apache.tika.parser.ner.NERecogniser
- PERCENT_FILE - Static variable in class org.apache.tika.parser.ner.opennlp.OpenNLPNERecogniser
- PERSON - Static variable in interface org.apache.tika.parser.ner.NERecogniser
- PERSON_FILE - Static variable in class org.apache.tika.parser.ner.opennlp.OpenNLPNERecogniser
- Pkcs7Parser - Class in org.apache.tika.parser.crypto
-
Basic parser for PKCS7 data.
- Pkcs7Parser() - Constructor for class org.apache.tika.parser.crypto.Pkcs7Parser
- PLATFORM - Static variable in interface org.apache.tika.parser.executable.MachineMetadata
- PLATFORM_AIX - Static variable in interface org.apache.tika.parser.executable.MachineMetadata
- PLATFORM_ARM - Static variable in interface org.apache.tika.parser.executable.MachineMetadata
- PLATFORM_EMBEDDED - Static variable in interface org.apache.tika.parser.executable.MachineMetadata
- PLATFORM_FREEBSD - Static variable in interface org.apache.tika.parser.executable.MachineMetadata
- PLATFORM_HPUX - Static variable in interface org.apache.tika.parser.executable.MachineMetadata
- PLATFORM_IRIX - Static variable in interface org.apache.tika.parser.executable.MachineMetadata
- PLATFORM_LINUX - Static variable in interface org.apache.tika.parser.executable.MachineMetadata
- PLATFORM_NETBSD - Static variable in interface org.apache.tika.parser.executable.MachineMetadata
- PLATFORM_SOLARIS - Static variable in interface org.apache.tika.parser.executable.MachineMetadata
- PLATFORM_SYSV - Static variable in interface org.apache.tika.parser.executable.MachineMetadata
- PLATFORM_TRU64 - Static variable in interface org.apache.tika.parser.executable.MachineMetadata
- PLATFORM_WINDOWS - Static variable in interface org.apache.tika.parser.executable.MachineMetadata
- PMGL - Static variable in class org.apache.tika.parser.chm.core.ChmConstants
- POIFSContainerDetector - Class in org.apache.tika.parser.microsoft
-
A detector that works on a POIFS OLE2 document to figure out exactly what the file is.
- POIFSContainerDetector() - Constructor for class org.apache.tika.parser.microsoft.POIFSContainerDetector
- POIXMLTextExtractorDecorator - Class in org.apache.tika.parser.microsoft.ooxml
- POIXMLTextExtractorDecorator(ParseContext, POIXMLTextExtractor) - Constructor for class org.apache.tika.parser.microsoft.ooxml.POIXMLTextExtractorDecorator
- POLARITY - org.apache.tika.parser.ctakes.CTAKESAnnotationProperty
- PooledTimeSeriesParser - Class in org.apache.tika.parser.pot
-
Uses the Pooled Time Series algorithm + command line tool, to generate a numeric representation of the video suitable for similarity searches.
- PooledTimeSeriesParser() - Constructor for class org.apache.tika.parser.pot.PooledTimeSeriesParser
- position() - Method in class org.apache.tika.parser.mp4.DirectFileReadDataSource
- position(long) - Method in class org.apache.tika.parser.mp4.DirectFileReadDataSource
- POSITION_BASE - Static variable in class org.apache.tika.parser.chm.core.ChmConstants
- POWERPOINT - org.apache.tika.parser.microsoft.OfficeParser.POIFSDocumentType
- PPT - Static variable in class org.apache.tika.parser.microsoft.POIFSContainerDetector
-
Microsoft PowerPoint
- PREFIX - Static variable in interface org.apache.tika.parser.executable.MachineMetadata
- PRESENTATION_NS - Static variable in class org.apache.tika.parser.odf.OpenDocumentContentParser
- processCommand(InputStream) - Method in class org.apache.tika.parser.gdal.GDALParser
- processingInstruction(String, String) - Method in class org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.XSSFSheetInterestingPartsCapturer
- processShapes(List<XSSFShape>, XHTMLContentHandler) - Method in class org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator
- processSheet(XSSFSheetXMLHandler.SheetContentsHandler, CommentsTable, StylesTable, ReadOnlySharedStringsTable, InputStream) - Method in class org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator
- PROJECT - org.apache.tika.parser.microsoft.OfficeParser.POIFSDocumentType
- PRT_MIME_TYPE - Static variable in class org.apache.tika.parser.prt.PRTParser
- PRTParser - Class in org.apache.tika.parser.prt
-
A basic text extracting parser for the CADKey PRT (CAD Drawing) format.
- PRTParser() - Constructor for class org.apache.tika.parser.prt.PRTParser
- PSDParser - Class in org.apache.tika.parser.image
-
Parser for the Adobe Photoshop PSD File Format.
- PSDParser() - Constructor for class org.apache.tika.parser.image.PSDParser
- PUB - Static variable in class org.apache.tika.parser.microsoft.POIFSContainerDetector
-
Microsoft Publisher
- PUBLISHER - org.apache.tika.parser.microsoft.OfficeParser.POIFSDocumentType
Q
- QP_7_8 - Static variable in class org.apache.tika.parser.wordperfect.QuattroProParser
- QP_9 - Static variable in class org.apache.tika.parser.wordperfect.QuattroProParser
- QUATTROPRO - Static variable in class org.apache.tika.parser.microsoft.POIFSContainerDetector
-
Base QuattroPro mime
- QuattroProParser - Class in org.apache.tika.parser.wordperfect
-
Parser for Corel QuattroPro documents (part of Corel WordPerfect Office Suite).
- QuattroProParser() - Constructor for class org.apache.tika.parser.wordperfect.QuattroProParser
R
- RarParser - Class in org.apache.tika.parser.pkg
-
Parser for Rar files.
- RarParser() - Constructor for class org.apache.tika.parser.pkg.RarParser
- RawTagIterator(int, int, int, int) - Constructor for class org.apache.tika.parser.mp3.ID3v2Frame.RawTagIterator
- read(ByteBuffer) - Method in class org.apache.tika.parser.mp4.DirectFileReadDataSource
- readAllInOnce(ByteBuffer) - Method in class org.apache.tika.parser.mp4.DirectFileReadDataSource
- readFully(InputStream, int) - Static method in class org.apache.tika.parser.mp3.ID3v2Frame
- readFully(InputStream, int, boolean) - Static method in class org.apache.tika.parser.mp3.ID3v2Frame
- recognise(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.captioning.tf.TensorflowRESTCaptioner
- recognise(InputStream, ContentHandler, Metadata, ParseContext) - Method in interface org.apache.tika.parser.recognition.ObjectRecogniser
-
Recognise the objects in the stream
- recognise(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.recognition.tf.TensorflowImageRecParser
- recognise(InputStream, ContentHandler, Metadata, ParseContext) - Method in class org.apache.tika.parser.recognition.tf.TensorflowRESTRecogniser
- recognise(String) - Method in class org.apache.tika.parser.ner.corenlp.CoreNLPNERecogniser
-
recognises names of entities in the text
- recognise(String) - Method in class org.apache.tika.parser.ner.grobid.GrobidNERecogniser
-
recognises names of entities in the text
- recognise(String) - Method in class org.apache.tika.parser.ner.mitie.MITIENERecogniser
-
recognises names of entities in the text
- recognise(String) - Method in interface org.apache.tika.parser.ner.NERecogniser
-
call for name recognition action from text
- recognise(String) - Method in class org.apache.tika.parser.ner.nltk.NLTKNERecogniser
-
recognises names of entities in the text
- recognise(String) - Method in class org.apache.tika.parser.ner.opennlp.OpenNLPNameFinder
- recognise(String) - Method in class org.apache.tika.parser.ner.opennlp.OpenNLPNERecogniser
- recognise(String) - Method in class org.apache.tika.parser.ner.regex.RegexNERecogniser
- RecognisedObject - Class in org.apache.tika.parser.recognition
-
A model for recognised objects from graphics and texts typically includes human readable label for the object, language of the label, id and confidence score.
- RecognisedObject(String, String, String, double) - Constructor for class org.apache.tika.parser.recognition.RecognisedObject
- RegexNERecogniser - Class in org.apache.tika.parser.ner.regex
-
This class offers an implementation of
NERecogniserbased on Regular Expressions. - RegexNERecogniser() - Constructor for class org.apache.tika.parser.ner.regex.RegexNERecogniser
- RegexNERecogniser(InputStream) - Constructor for class org.apache.tika.parser.ner.regex.RegexNERecogniser
- remove() - Method in class org.apache.tika.parser.mp3.ID3v2Frame.RawTagIterator
- render(XHTMLContentHandler) - Method in interface org.apache.tika.parser.microsoft.Cell
-
Renders the content to the given XHTML SAX event stream.
- render(XHTMLContentHandler) - Method in class org.apache.tika.parser.microsoft.CellDecorator
- render(XHTMLContentHandler) - Method in class org.apache.tika.parser.microsoft.LinkedCell
- render(XHTMLContentHandler) - Method in class org.apache.tika.parser.microsoft.NumberCell
- render(XHTMLContentHandler) - Method in class org.apache.tika.parser.microsoft.TextCell
- ReplacementCharset - Class in org.apache.tika.parser.html.charsetdetector.charsets
-
An implementation of the standard "replacement" charset defined by the W3C.
- ReplacementCharset() - Constructor for class org.apache.tika.parser.html.charsetdetector.charsets.ReplacementCharset
- reset() - Method in class org.apache.tika.parser.microsoft.ooxml.ParagraphProperties
- reset(AnalysisEngine, JCas) - Static method in class org.apache.tika.parser.ctakes.CTAKESUtils
-
Resets cTAKES objects, if created.
- RESET_TABLE - Static variable in class org.apache.tika.parser.chm.core.ChmConstants
- resetAE(AnalysisEngine) - Static method in class org.apache.tika.parser.ctakes.CTAKESUtils
-
Resets the AE (AnalysisEngine), releasing all resources held by the current AE.
- resetCAS(JCas) - Static method in class org.apache.tika.parser.ctakes.CTAKESUtils
-
Resets the CAS (Common Analysis System), emptying it of all content.
- resolveEntity(String, String) - Method in class org.apache.tika.parser.odf.NSNormalizerContentHandler
-
do not load any DTDs (may be requested by parser).
- reverse(byte[]) - Static method in class org.apache.tika.parser.chm.core.ChmCommons
-
Reverses the order of given array
- reverseByteOrder(byte[]) - Method in class org.apache.tika.parser.chm.lzx.ChmSection
- RFC822Parser - Class in org.apache.tika.parser.mail
-
Uses apache-mime4j to parse emails.
- RFC822Parser() - Constructor for class org.apache.tika.parser.mail.RFC822Parser
- RTFParser - Class in org.apache.tika.parser.rtf
-
RTF parser
- RTFParser() - Constructor for class org.apache.tika.parser.rtf.RTFParser
- run(RunProperties, String) - Method in class org.apache.tika.parser.microsoft.ooxml.OOXMLTikaBodyPartHandler
- run(RunProperties, String) - Method in interface org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler
- RunProperties - Class in org.apache.tika.parser.microsoft.ooxml
-
WARNING: This class is mutable.
- RunProperties() - Constructor for class org.apache.tika.parser.microsoft.ooxml.RunProperties
S
- S - org.apache.tika.parser.microsoft.FormattingUtils.Tag
- salvageCopy(File, File) - Static method in class org.apache.tika.parser.utils.ZipSalvager
- salvageCopy(InputStream, File) - Static method in class org.apache.tika.parser.utils.ZipSalvager
-
This streams the broken zip and rebuilds a new zip that is at least a valid zip file.
- SAS7BDATParser - Class in org.apache.tika.parser.sas
-
Processes the SAS7BDAT data columnar database file used by SAS and other similar languages.
- SAS7BDATParser() - Constructor for class org.apache.tika.parser.sas.SAS7BDATParser
- SDA - Static variable in class org.apache.tika.parser.microsoft.POIFSContainerDetector
-
StarOffice Draw
- SDC - Static variable in class org.apache.tika.parser.microsoft.POIFSContainerDetector
-
StarOffice Calc
- SDD - Static variable in class org.apache.tika.parser.microsoft.POIFSContainerDetector
-
StarOffice Impress
- SDW - Static variable in class org.apache.tika.parser.microsoft.POIFSContainerDetector
-
StarOffice Writer
- searchGeoNames(ArrayList<String>) - Method in class org.apache.tika.parser.geo.topic.GeoParser
- secondaryParser - Variable in class org.apache.tika.parser.ner.NamedEntityParser
- SentimentAnalysisParser - Class in org.apache.tika.parser.sentiment
-
This parser classifies documents based on the sentiment of document.
- SentimentAnalysisParser() - Constructor for class org.apache.tika.parser.sentiment.SentimentAnalysisParser
- serialize(JCas, CTAKESSerializer, boolean, OutputStream) - Static method in class org.apache.tika.parser.ctakes.CTAKESUtils
-
Serializes a CAS in the given format.
- setAccessChecker(AccessChecker) - Method in class org.apache.tika.parser.pdf.PDFParserConfig
- setAdmin1Code(String) - Method in class org.apache.tika.parser.geo.topic.gazetteer.Location
- setAdmin2Code(String) - Method in class org.apache.tika.parser.geo.topic.gazetteer.Location
- setAeDescriptorPath(String) - Method in class org.apache.tika.parser.ctakes.CTAKESConfig
-
Sets the path to XML descriptor for AnalysisEngine.
- setAlignedLenTable(short[]) - Method in class org.apache.tika.parser.chm.lzx.ChmLzxState
- setAlignedTreeTable(short[]) - Method in class org.apache.tika.parser.chm.lzx.ChmLzxState
- setAnnotationProps(String[]) - Method in class org.apache.tika.parser.ctakes.CTAKESConfig
-
ets the
CTAKESAnnotationProperty's that will be included into cTAKES metadata. - setAnnotationProps(CTAKESAnnotationProperty[]) - Method in class org.apache.tika.parser.ctakes.CTAKESConfig
-
Sets the
CTAKESAnnotationProperty's that will be included into cTAKES metadata. - setApplyRotation(boolean) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
-
Sets whether or not a rotation value should be calculated and passed to ImageMagick.
- setApplyRotation(boolean) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- setAverageCharTolerance(Float) - Method in class org.apache.tika.parser.pdf.PDFParserConfig
-
See
PDFTextStripper.setAverageCharTolerance(float) - setBlock_len(long) - Method in class org.apache.tika.parser.chm.accessor.ChmItspHeader
-
Sets block length
- setBlockAddress(long[]) - Method in class org.apache.tika.parser.chm.accessor.ChmLzxcResetTable
-
Sets block addresses
- setBlockCount(long) - Method in class org.apache.tika.parser.chm.accessor.ChmLzxcResetTable
-
Sets a block count
- setBlockidx_intvl(int) - Method in class org.apache.tika.parser.chm.accessor.ChmItspHeader
-
Sets block index interval
- setBlockLength(int) - Method in class org.apache.tika.parser.chm.lzx.ChmLzxState
- setBlockLlen(long) - Method in class org.apache.tika.parser.chm.accessor.ChmLzxcResetTable
-
Sets a block length
- setBlockNext(int) - Method in class org.apache.tika.parser.chm.accessor.ChmPmglHeader
- setBlockPrev(int) - Method in class org.apache.tika.parser.chm.accessor.ChmPmglHeader
- setBlockRemaining(int) - Method in class org.apache.tika.parser.chm.lzx.ChmLzxState
- setBlockType(int) - Method in class org.apache.tika.parser.chm.lzx.ChmLzxState
- setBold(boolean) - Method in class org.apache.tika.parser.microsoft.ooxml.RunProperties
- setByteArrayMaxOverride(int) - Method in class org.apache.tika.parser.microsoft.AbstractOfficeParser
-
WARNING: this sets a static variable in POI.
- setCatchIntermediateIOExceptions(boolean) - Method in class org.apache.tika.parser.pdf.PDFParserConfig
-
The PDFBox parser will throw an IOException if there is a problem with a stream.
- setCenter(String) - Method in class org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.HeaderFooterFromString
- setCharset(Charset) - Method in class org.apache.tika.parser.csv.CSVParams
- setChmDirList(ChmDirectoryListingSet) - Method in class org.apache.tika.parser.chm.core.ChmWrapper
- setChmItsfHeader(ChmItsfHeader) - Method in class org.apache.tika.parser.chm.core.ChmWrapper
- setChmItspHeader(ChmItspHeader) - Method in class org.apache.tika.parser.chm.core.ChmWrapper
- setChmLzxcControlData(ChmLzxcControlData) - Method in class org.apache.tika.parser.chm.core.ChmWrapper
- setChmLzxcResetTable(ChmLzxcResetTable) - Method in class org.apache.tika.parser.chm.core.ChmWrapper
- setColorspace(String) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
- setColorspace(String) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- setCommand(String) - Method in class org.apache.tika.parser.gdal.GDALParser
- setCompressedLen(long) - Method in class org.apache.tika.parser.chm.accessor.ChmLzxcResetTable
-
Sets compressed length
- setConcatenatePhoneticRuns(boolean) - Method in class org.apache.tika.parser.microsoft.AbstractOfficeParser
- setConcatenatePhoneticRuns(boolean) - Method in class org.apache.tika.parser.microsoft.OfficeParserConfig
-
Microsoft Excel files can sometimes contain phonetic (furigana) strings.
- setConfidence(double) - Method in class org.apache.tika.parser.recognition.RecognisedObject
- setContentLength(int) - Method in class org.apache.tika.parser.chm.lzx.ChmLzxBlock
- setContentParser(Parser) - Method in class org.apache.tika.parser.epub.EpubParser
- setContentParser(Parser) - Method in class org.apache.tika.parser.odf.OpenDocumentParser
- setContentType(Metadata) - Method in class org.apache.tika.parser.microsoft.xml.AbstractXML2003Parser
- setContentType(Metadata) - Method in class org.apache.tika.parser.microsoft.xml.SpreadsheetMLParser
- setContentType(Metadata) - Method in class org.apache.tika.parser.microsoft.xml.WordMLParser
- setControlDataIndex(int) - Method in class org.apache.tika.parser.chm.accessor.ChmDirectoryListingSet
-
Sets control data index
- setCountryCode(String) - Method in class org.apache.tika.parser.geo.topic.gazetteer.Location
- setData(byte[]) - Method in class org.apache.tika.parser.chm.core.ChmWrapper
- setDataOffset(long) - Method in class org.apache.tika.parser.chm.accessor.ChmItsfHeader
-
Sets data offset
- setDeclaredEncoding(String) - Method in class org.apache.tika.parser.txt.CharsetDetector
-
Set the declared encoding for charset detection.
- setDelimiter(Character) - Method in class org.apache.tika.parser.csv.CSVParams
- setDensity(int) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
- setDensity(int) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- setDepth(int) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
- setDepth(int) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- setDetectableCharset(String, boolean) - Method in class org.apache.tika.parser.txt.CharsetDetector
-
Deprecated.This API is ICU internal only.
- setDetectAngles(boolean) - Method in class org.apache.tika.parser.pdf.PDFParserConfig
- setDir_uuid(byte[]) - Method in class org.apache.tika.parser.chm.accessor.ChmItsfHeader
-
Sets directory uuid
- setDirectoryListingEntryList(List<DirectoryListingEntry>) - Method in class org.apache.tika.parser.chm.accessor.ChmDirectoryListingSet
-
Sets chm directory listing entry list
- setDirLen(long) - Method in class org.apache.tika.parser.chm.accessor.ChmItsfHeader
-
Sets directory length
- setDirOffset(long) - Method in class org.apache.tika.parser.chm.accessor.ChmItsfHeader
-
Sets directory offset
- setDocumentLocator(Locator) - Method in class org.apache.tika.parser.dif.DIFContentHandler
- setDocumentLocator(Locator) - Method in class org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.XSSFSheetInterestingPartsCapturer
- setEnableAutoSpace(boolean) - Method in class org.apache.tika.parser.pdf.PDFParser
-
Deprecated.
- setEnableAutoSpace(boolean) - Method in class org.apache.tika.parser.pdf.PDFParserConfig
-
If true (the default), the parser should estimate where spaces should be inserted between words.
- setEnableImageProcessing(int) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
-
Set the value to true if processing is to be enabled.
- setEnableImageProcessing(int) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- setEncoding(StringsEncoding) - Method in class org.apache.tika.parser.strings.StringsConfig
-
Sets the character encoding of the strings that are to be found.
- setEntryType(ChmCommons.EntryType) - Method in class org.apache.tika.parser.chm.accessor.DirectoryListingEntry
- setExtractAcroFormContent(boolean) - Method in class org.apache.tika.parser.pdf.PDFParserConfig
-
If true (the default), extract content from AcroForms at the end of the document.
- setExtractActions(boolean) - Method in class org.apache.tika.parser.pdf.PDFParserConfig
-
Whether or not to extract PDActions from the file.
- setExtractAllAlternatives(boolean) - Method in class org.apache.tika.parser.mail.RFC822Parser
-
Until version 1.17, Tika handled all body parts as embedded objects (see TIKA-2478).
- setExtractAllAlternativesFromMSG(boolean) - Method in class org.apache.tika.parser.microsoft.AbstractOfficeParser
-
Some .msg files can contain body content in html, rtf and/or text.
- setExtractAllAlternativesFromMSG(boolean) - Method in class org.apache.tika.parser.microsoft.OfficeParserConfig
-
Some .msg files can contain body content in html, rtf and/or text.
- setExtractAnnotationText(boolean) - Method in class org.apache.tika.parser.pdf.PDFParser
-
Deprecated.
- setExtractAnnotationText(boolean) - Method in class org.apache.tika.parser.pdf.PDFParserConfig
-
If true (the default), text in annotations will be extracted.
- setExtractBookmarksText(boolean) - Method in class org.apache.tika.parser.pdf.PDFParserConfig
-
If true, extract bookmarks (document outline) text.
- setExtractFontNames(boolean) - Method in class org.apache.tika.parser.pdf.PDFParserConfig
-
Extract font names into a metadata field
- setExtractInlineImages(boolean) - Method in class org.apache.tika.parser.pdf.PDFParserConfig
-
If true, extract inline embedded OBXImages.
- setExtractMacros(boolean) - Method in class org.apache.tika.parser.microsoft.AbstractOfficeParser
- setExtractMacros(boolean) - Method in class org.apache.tika.parser.microsoft.OfficeParserConfig
-
Sets whether or not MSOffice parsers should extract macros.
- setExtractScripts(boolean) - Method in class org.apache.tika.parser.html.HtmlParser
-
Whether or not to extract contents in script entities.
- setExtractUniqueInlineImagesOnly(boolean) - Method in class org.apache.tika.parser.pdf.PDFParserConfig
-
Multiple pages within a PDF file might refer to the same underlying image.
- setFilePath(String) - Method in class org.apache.tika.parser.strings.FileConfig
-
Sets the "file" installation folder.
- setFilter(String) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
- setFilter(String) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- setFramesRead(int) - Method in class org.apache.tika.parser.chm.lzx.ChmLzxState
- setFreeSpace(long) - Method in class org.apache.tika.parser.chm.accessor.ChmPmgiHeader
-
Sets pmgi free space
- setFreeSpace(long) - Method in class org.apache.tika.parser.chm.accessor.ChmPmglHeader
- setGazetteerRestEndpoint(String) - Method in class org.apache.tika.parser.geo.topic.GeoParserConfig
-
Configure REST endpoint for lucene-geo-gazetteer
- setHadStarted(ChmCommons.LzxState) - Method in class org.apache.tika.parser.chm.lzx.ChmLzxState
- setHeader_len(int) - Method in class org.apache.tika.parser.chm.accessor.ChmItspHeader
-
Sets itsp header length
- setHeaderLen(int) - Method in class org.apache.tika.parser.chm.accessor.ChmItsfHeader
-
Sets itsf header length
- setId(String) - Method in class org.apache.tika.parser.recognition.RecognisedObject
- setIfXFAExtractOnlyXFA(boolean) - Method in class org.apache.tika.parser.pdf.PDFParserConfig
-
If false (the default), extract content from the full PDF as well as the XFA form.
- setIlvl(int) - Method in class org.apache.tika.parser.microsoft.ooxml.ParagraphProperties
- setImageMagickPath(String) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
-
Set the path to the ImageMagick executable directory, needed if it is not on system path.
- setImageMagickPath(String) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- setIncludeDeletedContent(boolean) - Method in class org.apache.tika.parser.microsoft.AbstractOfficeParser
- setIncludeDeletedContent(boolean) - Method in class org.apache.tika.parser.microsoft.OfficeParserConfig
-
Sets whether or not the parser should include deleted content.
- setIncludeDeletedContent(boolean) - Method in class org.apache.tika.parser.wordperfect.WordPerfectParser
-
Whether or not to include deleted content.
- setIncludeHeadersAndFooters(boolean) - Method in class org.apache.tika.parser.microsoft.OfficeParserConfig
-
Whether or not to include headers and footers.
- setIncludeMarkup(boolean) - Method in class org.apache.tika.parser.html.BoilerpipeContentHandler
- setIncludeMissingRows(boolean) - Method in class org.apache.tika.parser.microsoft.OfficeParserConfig
-
For table-like formats, and tables within other formats, should missing rows in sparse tables be output where detected? The default is to only output rows defined within the file, which avoid lots of blank lines, but means layout isn't preserved.
- setIncludeMoveFromContent(boolean) - Method in class org.apache.tika.parser.microsoft.AbstractOfficeParser
- setIncludeMoveFromContent(boolean) - Method in class org.apache.tika.parser.microsoft.OfficeParserConfig
-
With track changes on, when a section is moved, the content is stored in both the "moveFrom" section and in the "moveTo" section.
- setIncludeShapeBasedContent(boolean) - Method in class org.apache.tika.parser.microsoft.AbstractOfficeParser
- setIncludeShapeBasedContent(boolean) - Method in class org.apache.tika.parser.microsoft.OfficeParserConfig
-
In Excel and Word, there can be text stored within drawing shapes.
- setIncludeSlideMasterContent(boolean) - Method in class org.apache.tika.parser.microsoft.OfficeParserConfig
-
Whether or not to include contents from any of the three types of masters -- slide, notes, handout -- in a .ppt or ppt[xm] file.
- setIncludeSlideNotes(boolean) - Method in class org.apache.tika.parser.microsoft.OfficeParserConfig
-
Whether or not to process slide notes content.
- setIndex_depth(int) - Method in class org.apache.tika.parser.chm.accessor.ChmItspHeader
-
Sets an index depth
- setIndex_head(int) - Method in class org.apache.tika.parser.chm.accessor.ChmItspHeader
-
Sets an index head
- setIndex_root(int) - Method in class org.apache.tika.parser.chm.accessor.ChmItspHeader
-
Sets an index root
- setIndexOfContent(int) - Method in class org.apache.tika.parser.chm.core.ChmWrapper
- setIndexOfResetData(int) - Method in class org.apache.tika.parser.chm.core.ChmWrapper
- setIndexOfResetTable(int) - Method in class org.apache.tika.parser.chm.core.ChmWrapper
- setInitializableProblemHandler(InitializableProblemHandler) - Method in class org.apache.tika.parser.pdf.PDFParser
- setIntelCurrentPossition(long) - Method in class org.apache.tika.parser.chm.lzx.ChmLzxState
- setIntelFileSize(int) - Method in class org.apache.tika.parser.chm.lzx.ChmLzxState
- setIntelState(ChmCommons.IntelState) - Method in class org.apache.tika.parser.chm.lzx.ChmLzxState
- setItalics(boolean) - Method in class org.apache.tika.parser.microsoft.ooxml.RunProperties
- setLabel(String) - Method in class org.apache.tika.parser.recognition.RecognisedObject
- setLabelLang(String) - Method in class org.apache.tika.parser.recognition.RecognisedObject
- setLang_id(long) - Method in class org.apache.tika.parser.chm.accessor.ChmItspHeader
-
Sets language id
- setLangId(long) - Method in class org.apache.tika.parser.chm.accessor.ChmItsfHeader
-
Sets language_id
- setLanguage(String) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
-
Set tesseract language dictionary to be used.
- setLanguage(String) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- setLastModified(long) - Method in class org.apache.tika.parser.chm.accessor.ChmItsfHeader
-
Sets last modified date of the chm file
- setLatitude(String) - Method in class org.apache.tika.parser.geo.topic.gazetteer.Location
- setLeft(String) - Method in class org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.HeaderFooterFromString
- setLength(int) - Method in class org.apache.tika.parser.chm.accessor.DirectoryListingEntry
- setLengthTreeLengtsTable(short[]) - Method in class org.apache.tika.parser.chm.lzx.ChmLzxState
- setLengthTreeTable(short[]) - Method in class org.apache.tika.parser.chm.lzx.ChmLzxState
- setListenForAllRecords(boolean) - Method in class org.apache.tika.parser.microsoft.ExcelExtractor
-
Specifies whether this parser should to listen for all records or just for the specified few.
- setLongitude(String) - Method in class org.apache.tika.parser.geo.topic.gazetteer.Location
- setLzxBlockLength(long) - Method in class org.apache.tika.parser.chm.core.ChmWrapper
- setLzxBlockOffset(long) - Method in class org.apache.tika.parser.chm.core.ChmWrapper
- setLzxBlocksCache(List<ChmLzxBlock>) - Method in class org.apache.tika.parser.chm.core.ChmWrapper
- setMain(String, String, String) - Method in class org.apache.tika.parser.geo.topic.GeoTag
- setMainTreeElements(int) - Method in class org.apache.tika.parser.chm.lzx.ChmLzxState
- setMainTreeLengtsTable(short[]) - Method in class org.apache.tika.parser.chm.lzx.ChmLzxState
- setMainTreeTable(short[]) - Method in class org.apache.tika.parser.chm.lzx.ChmLzxState
- setMarkLimit(int) - Method in class org.apache.tika.parser.html.charsetdetector.StandardHtmlEncodingDetector
-
How far into the stream to read for charset detection.
- setMarkLimit(int) - Method in class org.apache.tika.parser.html.HtmlEncodingDetector
-
How far into the stream to read for charset detection.
- setMarkLimit(int) - Method in class org.apache.tika.parser.microsoft.POIFSContainerDetector
- setMarkLimit(int) - Method in class org.apache.tika.parser.pkg.ZipContainerDetector
-
If this is less than 0, the file will be spooled to disk, and detection will run on the full file.
- setMarkLimit(int) - Method in class org.apache.tika.parser.txt.Icu4jEncodingDetector
-
How far into the stream to read for charset detection.
- setMarkLimit(int) - Method in class org.apache.tika.parser.txt.UniversalEncodingDetector
-
How far into the stream to read for charset detection.
- setMaxBytesForEmbeddedObject(int) - Static method in class org.apache.tika.parser.rtf.RTFParser
-
Deprecated.use
RTFParser.setMemoryLimitInKb(int)instead - setMaxFileSizeToOcr(long) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
-
Set maximum file size to submit file to ocr.
- setMaxFileSizeToOcr(long) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- setMaxMainMemoryBytes(int) - Method in class org.apache.tika.parser.pdf.PDFParserConfig
-
Deprecated.
- setMaxMainMemoryBytes(long) - Method in class org.apache.tika.parser.pdf.PDFParser
- setMaxMainMemoryBytes(long) - Method in class org.apache.tika.parser.pdf.PDFParserConfig
- setMaxXMPMMHistory(int) - Static method in class org.apache.tika.parser.image.xmp.JempboxExtractor
-
Maximum number of events to extract from the event history in the XMP Media Management (XMPMM) section.
- setMediaType(MediaType) - Method in class org.apache.tika.parser.csv.CSVParams
- setMemoryLimitInKb(int) - Method in class org.apache.tika.parser.pkg.CompressorParser
- setMemoryLimitInKb(int) - Method in class org.apache.tika.parser.rtf.RTFParser
- setMetadata(String[]) - Method in class org.apache.tika.parser.ctakes.CTAKESConfig
-
Sets the metadata whose values will be analyzed using cTAKES.
- setMetaParser(Parser) - Method in class org.apache.tika.parser.epub.EpubParser
- setMetaParser(Parser) - Method in class org.apache.tika.parser.odf.OpenDocumentParser
- setMimetype(boolean) - Method in class org.apache.tika.parser.strings.FileConfig
-
Sets the mime option.
- setMinFileSizeToOcr(long) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
-
Set minimum file size to submit file to ocr.
- setMinFileSizeToOcr(long) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- setMinLength(int) - Method in class org.apache.tika.parser.strings.StringsConfig
-
Sets the minimum sequence length (characters) to print.
- setMinSize(int) - Method in class org.apache.tika.parser.strings.Latin1StringsParser
-
Sets the minimum size of a character sequence to be extracted.
- setName(String) - Method in class org.apache.tika.parser.chm.accessor.DirectoryListingEntry
-
Sets entry name
- setName(String) - Method in class org.apache.tika.parser.geo.topic.gazetteer.Location
- setNameLength(int) - Method in class org.apache.tika.parser.chm.accessor.DirectoryListingEntry
-
Sets an entry name length
- setNERModelPath(String) - Method in class org.apache.tika.parser.geo.topic.GeoParserConfig
- setNerModelUrl(URL) - Method in class org.apache.tika.parser.geo.topic.GeoParserConfig
- setNum_blocks(long) - Method in class org.apache.tika.parser.chm.accessor.ChmItspHeader
-
Sets number of blocks containing in the chm file
- setNumId(int) - Method in class org.apache.tika.parser.microsoft.ooxml.ParagraphProperties
- setOcrDPI(int) - Method in class org.apache.tika.parser.pdf.PDFParserConfig
-
Dots per inch used to render the page image for OCR.
- setOcrImageFormatName(String) - Method in class org.apache.tika.parser.pdf.PDFParserConfig
- setOcrImageQuality(float) - Method in class org.apache.tika.parser.pdf.PDFParserConfig
-
Image quality used to render the page image for OCR.
- setOcrImageScale(float) - Method in class org.apache.tika.parser.pdf.PDFParserConfig
-
Deprecated.(as of Tika 1.23, this is no longer used in rendering page images)
- setOcrImageType(String) - Method in class org.apache.tika.parser.pdf.PDFParser
- setOcrImageType(String) - Method in class org.apache.tika.parser.pdf.PDFParserConfig
-
Image type used to render the page image for OCR.
- setOcrImageType(ImageType) - Method in class org.apache.tika.parser.pdf.PDFParserConfig
-
Image type used to render the page image for OCR.
- setOcrStrategy(String) - Method in class org.apache.tika.parser.pdf.PDFParser
- setOcrStrategy(String) - Method in class org.apache.tika.parser.pdf.PDFParserConfig
-
Which strategy to use for OCR
- setOcrStrategy(PDFParserConfig.OCR_STRATEGY) - Method in class org.apache.tika.parser.pdf.PDFParserConfig
-
Which strategy to use for OCR
- setOffset(int) - Method in class org.apache.tika.parser.chm.accessor.DirectoryListingEntry
- setOutputStream(OutputStream) - Method in class org.apache.tika.parser.ctakes.CTAKESConfig
-
Sets the
OutputStreamobject used to write the CAS. - setOutputType(String) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
- setOutputType(String) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- setOutputType(TesseractOCRConfig.OUTPUT_TYPE) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
-
Set output type from ocr process.
- setPageSegMode(String) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
-
Set tesseract page segmentation mode.
- setPageSegMode(String) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- setPageSeparator(String) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
-
The page separator to use in plain text output.
- setPDFParserConfig(PDFParserConfig) - Method in class org.apache.tika.parser.pdf.PDFParser
- setPersonAndEmail(String, Property, Property, Metadata) - Static method in class org.apache.tika.parser.mail.MailUtil
-
This tries to split a "from" or "to" value into a person field and an email field.
- setPreserveInterwordSpacing(boolean) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
-
Whether or not to maintain interword spacing.
- setPreserveInterwordSpacing(boolean) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- setPrettyPrint(boolean) - Method in class org.apache.tika.parser.ctakes.CTAKESConfig
-
Enables the formatted output for serializer.
- setR0(long) - Method in class org.apache.tika.parser.chm.lzx.ChmLzxState
- setR1(long) - Method in class org.apache.tika.parser.chm.lzx.ChmLzxState
- setR2(long) - Method in class org.apache.tika.parser.chm.lzx.ChmLzxState
- setRecogniser(String) - Method in class org.apache.tika.parser.recognition.ObjectRecognitionParser
- setResetInterval(long) - Method in class org.apache.tika.parser.chm.accessor.ChmLzxcControlData
-
Sets a reset interval
- setResetTableIndex(int) - Method in class org.apache.tika.parser.chm.accessor.ChmDirectoryListingSet
-
Sets reset table index
- setResize(int) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
- setResize(int) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- setRight(String) - Method in class org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.HeaderFooterFromString
- setSeparatorChar(char) - Method in class org.apache.tika.parser.ctakes.CTAKESConfig
-
Sets the separator character used for annotation properties.
- setSerialize(boolean) - Method in class org.apache.tika.parser.ctakes.CTAKESConfig
-
Enables CAS serialization.
- setSerializerType(CTAKESSerializer) - Method in class org.apache.tika.parser.ctakes.CTAKESConfig
-
Sets the type of cTAKES (UIMA) serializer used to write CAS.
- setSetKCMS(boolean) - Method in class org.apache.tika.parser.pdf.PDFParserConfig
-
Whether to call
System.setProperty("sun.java2d.cmm", "sun.java2d.cmm.kcms.KcmsServiceProvider"). - setSignature(byte[]) - Method in class org.apache.tika.parser.chm.accessor.ChmItsfHeader
-
Sets itsf header signature
- setSignature(byte[]) - Method in class org.apache.tika.parser.chm.accessor.ChmItspHeader
-
Sets itsp signature
- setSignature(byte[]) - Method in class org.apache.tika.parser.chm.accessor.ChmLzxcControlData
-
Sets a signature of control data block
- setSignature(byte[]) - Method in class org.apache.tika.parser.chm.accessor.ChmPmgiHeader
-
Sets pmgi signature
- setSignature(byte[]) - Method in class org.apache.tika.parser.chm.accessor.ChmPmglHeader
- setSize(long) - Method in class org.apache.tika.parser.chm.accessor.ChmLzxcControlData
-
Sets a size of control data
- setSortByPosition(boolean) - Method in class org.apache.tika.parser.pdf.PDFParser
-
Deprecated.
- setSortByPosition(boolean) - Method in class org.apache.tika.parser.pdf.PDFParserConfig
-
If true, sort text tokens by their x/y position before extracting text.
- setSpacingTolerance(Float) - Method in class org.apache.tika.parser.pdf.PDFParserConfig
-
See
PDFTextStripper.setSpacingTolerance(float) - setStartIndex(int) - Method in class org.apache.tika.parser.chm.core.ChmWrapper
- setStream_uuid(byte[]) - Method in class org.apache.tika.parser.chm.accessor.ChmItsfHeader
-
Sets stream uuid
- setStrike(boolean) - Method in class org.apache.tika.parser.microsoft.ooxml.RunProperties
- setStringsPath(String) - Method in class org.apache.tika.parser.strings.StringsConfig
-
Sets the "strings" installation folder.
- setStripMarkup(boolean) - Method in class org.apache.tika.parser.txt.Icu4jEncodingDetector
-
Whether or not to attempt to strip html-ish markup from the stream before sending it to the underlying detector.
- setStyleID(String) - Method in class org.apache.tika.parser.microsoft.ooxml.ParagraphProperties
- setSuppressDuplicateOverlappingText(boolean) - Method in class org.apache.tika.parser.pdf.PDFParser
-
Deprecated.
- setSuppressDuplicateOverlappingText(boolean) - Method in class org.apache.tika.parser.pdf.PDFParserConfig
-
If true, the parser should try to remove duplicated text over the same region.
- setSwath(int) - Method in class org.apache.tika.parser.chm.lzx.ChmSection
- setSystem_uuid(byte[]) - Method in class org.apache.tika.parser.chm.accessor.ChmItspHeader
-
Sets system uuid
- setTableOffset(long) - Method in class org.apache.tika.parser.chm.accessor.ChmLzxcResetTable
-
Sets a table offset
- setTessdataPath(String) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
-
Set the path to the 'tessdata' folder, which contains language files and config files.
- setTessdataPath(String) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- setTesseractPath(String) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
-
Set the path to the Tesseract executable's directory, needed if it is not on system path.
- setTesseractPath(String) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- setText(boolean) - Method in class org.apache.tika.parser.ctakes.CTAKESConfig
-
Enables content text analysis using cTAKES.
- setText(byte[]) - Method in class org.apache.tika.parser.txt.CharsetDetector
-
Set the input text (byte) data whose charset is to be detected.
- setText(InputStream) - Method in class org.apache.tika.parser.txt.CharsetDetector
-
Set the input text (byte) data whose charset is to be detected.
- setTimeout(int) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
-
Set maximum time (seconds) to wait for the ocring process to terminate.
- setTimeout(int) - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- setTimeout(int) - Method in class org.apache.tika.parser.strings.StringsConfig
-
Sets the maximum time (in seconds) to wait for the "strings" command to terminate.
- setTotal(int) - Method in class org.apache.tika.parser.chm.lzx.ChmSection
- setTracking(boolean) - Method in class org.apache.tika.parser.mbox.MboxParser
- setTrustedPageSeparator(String) - Method in class org.apache.tika.parser.ocr.TesseractOCRConfig
-
Same as
TesseractOCRConfig.setPageSeparator(String)but does not perform any checks on the string. - setUMLSPass(String) - Method in class org.apache.tika.parser.ctakes.CTAKESConfig
-
Sets the UMLS password.
- setUMLSUser(String) - Method in class org.apache.tika.parser.ctakes.CTAKESConfig
-
Sets the UMLS username.
- setUncompressedLen(long) - Method in class org.apache.tika.parser.chm.accessor.ChmLzxcResetTable
-
Sets uncompressed length
- setUnderline(String) - Method in class org.apache.tika.parser.microsoft.ooxml.RunProperties
- setUnknown(long) - Method in class org.apache.tika.parser.chm.accessor.ChmLzxcResetTable
-
Sets an unknown
- setUnknown_000c(int) - Method in class org.apache.tika.parser.chm.accessor.ChmItsfHeader
-
Sets unknown_00c
- setUnknown_000c(int) - Method in class org.apache.tika.parser.chm.accessor.ChmItspHeader
-
Sets 000c unknown bytes Unknown means here that those guys who cracked the chm format do not know what's it purposes for
- setUnknown_0024(int) - Method in class org.apache.tika.parser.chm.accessor.ChmItspHeader
-
Sets 0024 unknown bytes
- setUnknown_002c(int) - Method in class org.apache.tika.parser.chm.accessor.ChmItspHeader
-
Sets 002c unknown bytes
- setUnknown_0044(byte[]) - Method in class org.apache.tika.parser.chm.accessor.ChmItspHeader
-
Sets 0044 unknown bytes
- setUnknown_18(long) - Method in class org.apache.tika.parser.chm.accessor.ChmLzxcControlData
-
Sets unknown 18 bytes
- setUnknown0008(long) - Method in class org.apache.tika.parser.chm.accessor.ChmPmglHeader
- setUnknownLen(long) - Method in class org.apache.tika.parser.chm.accessor.ChmItsfHeader
-
Sets unknown length
- setUnknownOffset(long) - Method in class org.apache.tika.parser.chm.accessor.ChmItsfHeader
-
Sets unknown offset
- setUseSAXDocxExtractor(boolean) - Method in class org.apache.tika.parser.microsoft.AbstractOfficeParser
- setUseSAXDocxExtractor(boolean) - Method in class org.apache.tika.parser.microsoft.OfficeParserConfig
-
Use the experimental SAX-based streaming DOCX parser? If set to
false, the classic parser will be used; iftrue, the new experimental parser will be used. - setUseSAXPptxExtractor(boolean) - Method in class org.apache.tika.parser.microsoft.AbstractOfficeParser
- setUseSAXPptxExtractor(boolean) - Method in class org.apache.tika.parser.microsoft.OfficeParserConfig
-
Use the experimental SAX-based streaming DOCX parser? If set to
false, the classic parser will be used; iftrue, the new experimental parser will be used. - setVersion(int) - Method in class org.apache.tika.parser.chm.accessor.ChmItsfHeader
-
Sets itsf version
- setVersion(int) - Method in class org.apache.tika.parser.chm.accessor.ChmItspHeader
-
Sets a version of itsp header
- setVersion(long) - Method in class org.apache.tika.parser.chm.accessor.ChmLzxcControlData
-
Sets version of control data block
- setVersion(long) - Method in class org.apache.tika.parser.chm.accessor.ChmLzxcResetTable
-
Sets the version
- setWindow(int) - Method in class org.apache.tika.parser.chm.lzx.ChmLzxState
- setWindowPosition(int) - Method in class org.apache.tika.parser.chm.lzx.ChmLzxState
- setWindowSize(long) - Method in class org.apache.tika.parser.chm.accessor.ChmLzxcControlData
-
Sets a window size
- setWindowSize(long) - Method in class org.apache.tika.parser.chm.lzx.ChmLzxState
- setWindowsPerReset(long) - Method in class org.apache.tika.parser.chm.accessor.ChmLzxcControlData
-
Sets windows per reset
- SHA1 - org.apache.tika.parser.utils.CommonsDigester.DigestAlgorithm
- SHA256 - org.apache.tika.parser.utils.CommonsDigester.DigestAlgorithm
- SHA384 - org.apache.tika.parser.utils.CommonsDigester.DigestAlgorithm
- SHA512 - org.apache.tika.parser.utils.CommonsDigester.DigestAlgorithm
- sheetParts - Variable in class org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator
- SheetTextAsHTML(OfficeParserConfig, XHTMLContentHandler) - Constructor for class org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.SheetTextAsHTML
- SIGNATURE_RELATIONSHIP - Static variable in class org.apache.tika.parser.microsoft.ooxml.OOXMLParser
- SINGLE_7_BIT - org.apache.tika.parser.strings.StringsEncoding
- SINGLE_8_BIT - org.apache.tika.parser.strings.StringsEncoding
- size() - Method in class org.apache.tika.parser.mp4.DirectFileReadDataSource
- skippedEntity(String) - Method in class org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.XSSFSheetInterestingPartsCapturer
- SLDWORKS - Static variable in class org.apache.tika.parser.microsoft.POIFSContainerDetector
-
SolidWorks CAD file
- SOLIDWORKS_ASSEMBLY - org.apache.tika.parser.microsoft.OfficeParser.POIFSDocumentType
- SOLIDWORKS_DRAWING - org.apache.tika.parser.microsoft.OfficeParser.POIFSDocumentType
- SOLIDWORKS_PART - org.apache.tika.parser.microsoft.OfficeParser.POIFSDocumentType
- SourceCodeParser - Class in org.apache.tika.parser.code
-
Generic Source code parser for Java, Groovy, C++.
- SourceCodeParser() - Constructor for class org.apache.tika.parser.code.SourceCodeParser
- SourceCodeParser(EncodingDetector) - Constructor for class org.apache.tika.parser.code.SourceCodeParser
- SpreadsheetMLParser - Class in org.apache.tika.parser.microsoft.xml
-
Parses wordml 2003 format Excel files.
- SpreadsheetMLParser() - Constructor for class org.apache.tika.parser.microsoft.xml.SpreadsheetMLParser
- SQLite3Parser - Class in org.apache.tika.parser.jdbc
-
This is the main class for parsing SQLite3 files.
- SQLite3Parser() - Constructor for class org.apache.tika.parser.jdbc.SQLite3Parser
-
Checks to see if class is available for org.sqlite.JDBC.
- StandardHtmlEncodingDetector - Class in org.apache.tika.parser.html.charsetdetector
-
An encoding detector that tries to respect the spirit of the HTML spec part 12.2.3 "The input byte stream", or at least the part that is compatible with the implementation of tika.
- StandardHtmlEncodingDetector() - Constructor for class org.apache.tika.parser.html.charsetdetector.StandardHtmlEncodingDetector
- start(BundleContext) - Method in class org.apache.tika.parser.internal.Activator
- START_PMGL - Static variable in class org.apache.tika.parser.chm.core.ChmConstants
- startBookmark(String, String) - Method in class org.apache.tika.parser.microsoft.ooxml.OOXMLTikaBodyPartHandler
- startBookmark(String, String) - Method in interface org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler
- startDocument() - Method in class org.apache.tika.parser.dif.DIFContentHandler
- startDocument() - Method in class org.apache.tika.parser.html.BoilerpipeContentHandler
- startDocument() - Method in class org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler
- startDocument() - Method in class org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.XSSFSheetInterestingPartsCapturer
- STARTED - org.apache.tika.parser.chm.core.ChmCommons.IntelState
- STARTED_DECODING - org.apache.tika.parser.chm.core.ChmCommons.LzxState
- startEditedSection(String, Date, OOXMLWordAndPowerPointTextHandler.EditType) - Method in class org.apache.tika.parser.microsoft.ooxml.OOXMLTikaBodyPartHandler
- startEditedSection(String, Date, OOXMLWordAndPowerPointTextHandler.EditType) - Method in interface org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler
- startElement(String, String, String, Attributes) - Method in class org.apache.tika.parser.dif.DIFContentHandler
- startElement(String, String, String, Attributes) - Method in class org.apache.tika.parser.html.BoilerpipeContentHandler
- startElement(String, String, String, Attributes) - Method in class org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler
- startElement(String, String, String, Attributes) - Method in class org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.XSSFSheetInterestingPartsCapturer
- startElement(String, String, String, Attributes) - Method in class org.apache.tika.parser.odf.NSNormalizerContentHandler
- startElement(String, String, String, Attributes) - Method in class org.apache.tika.parser.xliff.XLIFF12ContentHandler
- startElement(String, String, String, Attributes) - Method in class org.apache.tika.parser.xml.AttributeDependantMetadataHandler
- startElement(String, String, String, Attributes) - Method in class org.apache.tika.parser.xml.AttributeMetadataHandler
- startElement(String, String, String, Attributes) - Method in class org.apache.tika.parser.xml.ElementMetadataHandler
- startElement(String, String, String, Attributes) - Method in class org.apache.tika.parser.xml.MetadataHandler
-
Deprecated.
- startParagraph(ParagraphProperties) - Method in class org.apache.tika.parser.microsoft.ooxml.OOXMLTikaBodyPartHandler
- startParagraph(ParagraphProperties) - Method in interface org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler
- startPrefixMapping(String, String) - Method in class org.apache.tika.parser.html.BoilerpipeContentHandler
- startPrefixMapping(String, String) - Method in class org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler
- startPrefixMapping(String, String) - Method in class org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.XSSFSheetInterestingPartsCapturer
- startPrefixMapping(String, String) - Method in class org.apache.tika.parser.odf.NSNormalizerContentHandler
- startRow(int) - Method in class org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.SheetTextAsHTML
- startSDT() - Method in class org.apache.tika.parser.microsoft.ooxml.OOXMLTikaBodyPartHandler
- startSDT() - Method in interface org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler
- startsWith(byte[], String) - Static method in class org.apache.tika.parser.chm.accessor.ChmDirectoryListingSet
- startTable() - Method in class org.apache.tika.parser.microsoft.ooxml.OOXMLTikaBodyPartHandler
- startTable() - Method in interface org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler
- startTableCell() - Method in class org.apache.tika.parser.microsoft.ooxml.OOXMLTikaBodyPartHandler
- startTableCell() - Method in interface org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler
- startTableRow() - Method in class org.apache.tika.parser.microsoft.ooxml.OOXMLTikaBodyPartHandler
- startTableRow() - Method in interface org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler
- stop(BundleContext) - Method in class org.apache.tika.parser.internal.Activator
- StreamingZipContainerDetector - Class in org.apache.tika.parser.pkg
- StreamingZipContainerDetector() - Constructor for class org.apache.tika.parser.pkg.StreamingZipContainerDetector
- StringsConfig - Class in org.apache.tika.parser.strings
-
Configuration for the "strings" (or strings-alternative) command.
- StringsConfig() - Constructor for class org.apache.tika.parser.strings.StringsConfig
-
Default contructor.
- StringsConfig(InputStream) - Constructor for class org.apache.tika.parser.strings.StringsConfig
-
Loads properties from InputStream and then tries to close InputStream.
- StringsEncoding - Enum in org.apache.tika.parser.strings
-
Character encoding of the strings that are to be found using the "strings" command.
- StringsParser - Class in org.apache.tika.parser.strings
-
Parser that uses the "strings" (or strings-alternative) command to find the printable strings in a object, or other binary, file (application/octet-stream).
- StringsParser() - Constructor for class org.apache.tika.parser.strings.StringsParser
- stringToAsciiBytes(String) - Method in class org.apache.tika.parser.chm.lzx.ChmSection
- STYLE_NS - Static variable in class org.apache.tika.parser.odf.OpenDocumentContentParser
- SUMMARY_PROPERTY_PREFIX - Static variable in class org.apache.tika.parser.microsoft.JackcessParser
- SummaryExtractor - Class in org.apache.tika.parser.microsoft
-
Extractor for Common OLE2 (HPSF) metadata
- SummaryExtractor(Metadata) - Constructor for class org.apache.tika.parser.microsoft.SummaryExtractor
- SUPPORTED_TYPES - Static variable in class org.apache.tika.parser.microsoft.ooxml.OOXMLParser
- SUPPORTED_TYPES - Static variable in class org.apache.tika.parser.microsoft.ooxml.xwpf.ml2006.Word2006MLParser
- SVG_NS - Static variable in class org.apache.tika.parser.odf.OpenDocumentContentParser
- SXSLFPowerPointExtractorDecorator - Class in org.apache.tika.parser.microsoft.ooxml
-
SAX/Streaming pptx extractior
- SXSLFPowerPointExtractorDecorator(Metadata, ParseContext, XSLFEventBasedPowerPointExtractor) - Constructor for class org.apache.tika.parser.microsoft.ooxml.SXSLFPowerPointExtractorDecorator
- SXWPFWordExtractorDecorator - Class in org.apache.tika.parser.microsoft.ooxml
-
This is an experimental, alternative extractor for docx files.
- SXWPFWordExtractorDecorator(Metadata, ParseContext, XWPFEventBasedWordExtractor) - Constructor for class org.apache.tika.parser.microsoft.ooxml.SXWPFWordExtractorDecorator
- SYS_PROP_NER_IMPL - Static variable in class org.apache.tika.parser.ner.NamedEntityParser
T
- TAB - Static variable in class org.apache.tika.parser.odf.OpenDocumentContentParser
- TABLE_NS - Static variable in class org.apache.tika.parser.odf.OpenDocumentContentParser
- TagAndStyle(String, String) - Constructor for class org.apache.tika.parser.microsoft.WordExtractor.TagAndStyle
- tagName() - Method in enum org.apache.tika.parser.microsoft.FormattingUtils.Tag
- TEIDOMParser - Class in org.apache.tika.parser.journal
- TEIDOMParser() - Constructor for class org.apache.tika.parser.journal.TEIDOMParser
- templateID - Variable in class org.apache.tika.parser.rtf.ListDescriptor
- TensorflowImageRecParser - Class in org.apache.tika.parser.recognition.tf
-
This is an implementation of
ObjectRecogniserpowered by Tensorflow convolutional neural network (CNN). - TensorflowImageRecParser() - Constructor for class org.apache.tika.parser.recognition.tf.TensorflowImageRecParser
- TensorflowRESTCaptioner - Class in org.apache.tika.parser.captioning.tf
-
Tensorflow image captioner.
- TensorflowRESTCaptioner() - Constructor for class org.apache.tika.parser.captioning.tf.TensorflowRESTCaptioner
- TensorflowRESTRecogniser - Class in org.apache.tika.parser.recognition.tf
-
Tensor Flow image recogniser which has high performance.
- TensorflowRESTRecogniser() - Constructor for class org.apache.tika.parser.recognition.tf.TensorflowRESTRecogniser
- TensorflowRESTVideoRecogniser - Class in org.apache.tika.parser.recognition.tf
-
Tensor Flow video recogniser which has high performance.
- TensorflowRESTVideoRecogniser() - Constructor for class org.apache.tika.parser.recognition.tf.TensorflowRESTVideoRecogniser
- TesseractOCRConfig - Class in org.apache.tika.parser.ocr
-
Configuration for TesseractOCRParser.
- TesseractOCRConfig() - Constructor for class org.apache.tika.parser.ocr.TesseractOCRConfig
-
Default contructor.
- TesseractOCRConfig(InputStream) - Constructor for class org.apache.tika.parser.ocr.TesseractOCRConfig
-
Loads properties from InputStream and then tries to close InputStream.
- TesseractOCRConfig.OUTPUT_TYPE - Enum in org.apache.tika.parser.ocr
- TesseractOCRParser - Class in org.apache.tika.parser.ocr
-
TesseractOCRParser powered by tesseract-ocr engine.
- TesseractOCRParser() - Constructor for class org.apache.tika.parser.ocr.TesseractOCRParser
- TEXT_NS - Static variable in class org.apache.tika.parser.odf.OpenDocumentContentParser
- TextAndCSVParser - Class in org.apache.tika.parser.csv
-
Unless the
TikaCoreProperties.CONTENT_TYPE_OVERRIDEis set, this parser tries to assess whether the file is a text file, csv or tsv. - TextAndCSVParser() - Constructor for class org.apache.tika.parser.csv.TextAndCSVParser
- TextAndCSVParser(EncodingDetector) - Constructor for class org.apache.tika.parser.csv.TextAndCSVParser
- TextCell - Class in org.apache.tika.parser.microsoft
-
Text cell.
- TextCell(String) - Constructor for class org.apache.tika.parser.microsoft.TextCell
- TiffParser - Class in org.apache.tika.parser.image
- TiffParser() - Constructor for class org.apache.tika.parser.image.TiffParser
- TikaExcelDataFormatter - Class in org.apache.tika.parser.microsoft
-
Overrides Excel's General format to include more significant digits than the MS Spec allows.
- TikaExcelDataFormatter() - Constructor for class org.apache.tika.parser.microsoft.TikaExcelDataFormatter
- TikaExcelDataFormatter(Locale) - Constructor for class org.apache.tika.parser.microsoft.TikaExcelDataFormatter
- TikaExcelGeneralFormat - Class in org.apache.tika.parser.microsoft
-
A Format that allows up to 15 significant digits for integers.
- TikaExcelGeneralFormat(Locale) - Constructor for class org.apache.tika.parser.microsoft.TikaExcelGeneralFormat
- TIME - Static variable in interface org.apache.tika.parser.ner.NERecogniser
- TIME_FILE - Static variable in class org.apache.tika.parser.ner.opennlp.OpenNLPNERecogniser
- TNEFParser - Class in org.apache.tika.parser.microsoft
-
A POI-powered Tika Parser for TNEF (Transport Neutral Encoding Format) messages, aka winmail.dat
- TNEFParser() - Constructor for class org.apache.tika.parser.microsoft.TNEFParser
- TO - org.apache.tika.parser.microsoft.OutlookExtractor.RECIPIENT_TYPE
- toGeoTag(Map<String, List<Location>>, String) - Method in class org.apache.tika.parser.geo.topic.GeoTag
- tokenize(String) - Static method in class org.apache.tika.parser.ner.opennlp.OpenNLPNameFinder
- topN - Variable in class org.apache.tika.parser.recognition.tf.TensorflowRESTRecogniser
- toString() - Method in class org.apache.tika.parser.captioning.CaptionObject
- toString() - Method in class org.apache.tika.parser.chm.accessor.ChmDirectoryListingSet
- toString() - Method in class org.apache.tika.parser.chm.accessor.ChmItsfHeader
-
Prints the values of ChmfHeader
- toString() - Method in class org.apache.tika.parser.chm.accessor.ChmItspHeader
- toString() - Method in class org.apache.tika.parser.chm.accessor.ChmLzxcControlData
-
Returns textual representation of ChmLzxcControlData
- toString() - Method in class org.apache.tika.parser.chm.accessor.ChmLzxcResetTable
- toString() - Method in class org.apache.tika.parser.chm.accessor.ChmPmgiHeader
-
Returns textual representation of the pmgi header
- toString() - Method in class org.apache.tika.parser.chm.accessor.ChmPmglHeader
- toString() - Method in class org.apache.tika.parser.chm.accessor.DirectoryListingEntry
- toString() - Method in class org.apache.tika.parser.chm.lzx.ChmBlockInfo
-
Returns textual representation of ChmBlockInfo
- toString() - Method in class org.apache.tika.parser.chm.lzx.ChmLzxState
-
It suits for informative outlook
- toString() - Method in class org.apache.tika.parser.csv.CSVResult
- toString() - Method in class org.apache.tika.parser.dif.DIFContentHandler
- toString() - Method in class org.apache.tika.parser.microsoft.NumberCell
- toString() - Method in class org.apache.tika.parser.microsoft.TextCell
- toString() - Method in class org.apache.tika.parser.pdf.PDFParserConfig
- toString() - Method in class org.apache.tika.parser.recognition.RecognisedObject
- toString() - Method in enum org.apache.tika.parser.strings.StringsEncoding
- toString() - Method in class org.apache.tika.parser.txt.CharsetMatch
- toTags(CharacterRun) - Static method in class org.apache.tika.parser.microsoft.FormattingUtils
- transferTo(long, long, WritableByteChannel) - Method in class org.apache.tika.parser.mp4.DirectFileReadDataSource
- TrueTypeParser - Class in org.apache.tika.parser.font
-
Parser for TrueType font files (TTF).
- TrueTypeParser() - Constructor for class org.apache.tika.parser.font.TrueTypeParser
- TSD_MIME_TYPE - Static variable in class org.apache.tika.parser.crypto.TSDParser
- TSDParser - Class in org.apache.tika.parser.crypto
-
Tika parser for Time Stamped Data Envelope (application/timestamped-data)
- TSDParser() - Constructor for class org.apache.tika.parser.crypto.TSDParser
- TXT - org.apache.tika.parser.ocr.TesseractOCRConfig.OUTPUT_TYPE
- TXTParser - Class in org.apache.tika.parser.txt
-
Plain text parser.
- TXTParser() - Constructor for class org.apache.tika.parser.txt.TXTParser
- TXTParser(EncodingDetector) - Constructor for class org.apache.tika.parser.txt.TXTParser
U
- U - org.apache.tika.parser.microsoft.FormattingUtils.Tag
- uint16() - Method in class org.apache.tika.parser.hwp.HwpStreamReader
-
unsigned 2 byte
- uint16(int) - Method in class org.apache.tika.parser.hwp.HwpStreamReader
-
unsigned 2 byte array
- uint32() - Method in class org.apache.tika.parser.hwp.HwpStreamReader
-
unsigned 4 byte
- uint8() - Method in class org.apache.tika.parser.hwp.HwpStreamReader
-
unsigned 1 byte
- UNCOMPRESSED - org.apache.tika.parser.chm.core.ChmCommons.EntryType
- UNCOMPRESSED - Static variable in class org.apache.tika.parser.chm.core.ChmCommons
- UNDEFINED - Static variable in class org.apache.tika.parser.chm.core.ChmCommons
-
Represents lzx block types in order to decompress differently
- UniversalEncodingDetector - Class in org.apache.tika.parser.txt
- UniversalEncodingDetector() - Constructor for class org.apache.tika.parser.txt.UniversalEncodingDetector
- UNKNOWN - org.apache.tika.parser.microsoft.OfficeParser.POIFSDocumentType
- UNKNOWN13 - org.apache.tika.parser.iwork.iwana.IWork13PackageParser.IWork13DocumentType
- unmarshalBytes(int) - Method in class org.apache.tika.parser.chm.lzx.ChmSection
- unmarshalCharArray(byte[], ChmPmglHeader, int) - Method in class org.apache.tika.parser.chm.accessor.ChmPmglHeader
- unmarshalInt() - Method in class org.apache.tika.parser.chm.lzx.ChmSection
- unmarshalUByte() - Method in class org.apache.tika.parser.chm.lzx.ChmSection
- unmarshalUInt() - Method in class org.apache.tika.parser.chm.lzx.ChmSection
- unmarshalUlong() - Method in class org.apache.tika.parser.chm.lzx.ChmSection
- unmarshalUtfChar() - Method in class org.apache.tika.parser.chm.lzx.ChmSection
- unravelStringMet(NetcdfFile, Group, Metadata) - Method in class org.apache.tika.parser.hdf.HDFParser
- UNRECOGNIZED - org.apache.tika.parser.microsoft.OutlookExtractor.RECIPIENT_TYPE
- UNSPECIFIED - org.apache.tika.parser.microsoft.OutlookExtractor.RECIPIENT_TYPE
- UNSPECIFIED_MEDIA_TYPE - Static variable in class org.apache.tika.parser.utils.DataURISchemeUtil
- UNSUPPORTED_OOXML_TYPES - Static variable in class org.apache.tika.parser.microsoft.ooxml.OOXMLParser
-
We claim to support all OOXML files, but we actually don't support a small number of them.
- USER_DEFINED_PROPERTY_PREFIX - Static variable in class org.apache.tika.parser.microsoft.JackcessParser
V
- valueOf(String) - Static method in enum org.apache.tika.parser.chm.core.ChmCommons.EntryType
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.apache.tika.parser.chm.core.ChmCommons.IntelState
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.apache.tika.parser.chm.core.ChmCommons.LzxState
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.apache.tika.parser.ctakes.CTAKESAnnotationProperty
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.apache.tika.parser.ctakes.CTAKESSerializer
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.apache.tika.parser.iwork.iwana.IWork13PackageParser.IWork13DocumentType
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.apache.tika.parser.iwork.IWorkPackageParser.IWORKDocumentType
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.apache.tika.parser.microsoft.FormattingUtils.Tag
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.apache.tika.parser.microsoft.OfficeParser.POIFSDocumentType
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.EditType
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.apache.tika.parser.microsoft.OutlookExtractor.RECIPIENT_TYPE
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.apache.tika.parser.ocr.TesseractOCRConfig.OUTPUT_TYPE
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.apache.tika.parser.pdf.PDFParserConfig.OCR_STRATEGY
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.apache.tika.parser.strings.StringsEncoding
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.apache.tika.parser.utils.CommonsDigester.DigestAlgorithm
-
Returns the enum constant of this type with the specified name.
- values() - Static method in enum org.apache.tika.parser.chm.core.ChmCommons.EntryType
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum org.apache.tika.parser.chm.core.ChmCommons.IntelState
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum org.apache.tika.parser.chm.core.ChmCommons.LzxState
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum org.apache.tika.parser.ctakes.CTAKESAnnotationProperty
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum org.apache.tika.parser.ctakes.CTAKESSerializer
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum org.apache.tika.parser.iwork.iwana.IWork13PackageParser.IWork13DocumentType
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum org.apache.tika.parser.iwork.IWorkPackageParser.IWORKDocumentType
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum org.apache.tika.parser.microsoft.FormattingUtils.Tag
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum org.apache.tika.parser.microsoft.OfficeParser.POIFSDocumentType
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler.EditType
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum org.apache.tika.parser.microsoft.OutlookExtractor.RECIPIENT_TYPE
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum org.apache.tika.parser.ocr.TesseractOCRConfig.OUTPUT_TYPE
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum org.apache.tika.parser.pdf.PDFParserConfig.OCR_STRATEGY
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum org.apache.tika.parser.strings.StringsEncoding
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum org.apache.tika.parser.utils.CommonsDigester.DigestAlgorithm
-
Returns an array containing the constants of this enum type, in the order they are declared.
- VERBATIM - Static variable in class org.apache.tika.parser.chm.core.ChmCommons
- VISIO - org.apache.tika.parser.microsoft.OfficeParser.POIFSDocumentType
- VSD - Static variable in class org.apache.tika.parser.microsoft.POIFSContainerDetector
-
Microsoft Visio
W
- W_NS - Static variable in class org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler
- warn() - Method in class org.apache.tika.parser.ocr.TesseractOCRParser
- WebPParser - Class in org.apache.tika.parser.image
- WebPParser() - Constructor for class org.apache.tika.parser.image.WebPParser
- WMFParser - Class in org.apache.tika.parser.microsoft
-
This parser offers a very rough capability to extract text if there is text stored in the WMF files.
- WMFParser() - Constructor for class org.apache.tika.parser.microsoft.WMFParser
- Word2006MLParser - Class in org.apache.tika.parser.microsoft.ooxml.xwpf.ml2006
- Word2006MLParser() - Constructor for class org.apache.tika.parser.microsoft.ooxml.xwpf.ml2006.Word2006MLParser
- WORDDOCUMENT - org.apache.tika.parser.microsoft.OfficeParser.POIFSDocumentType
- WordExtractor - Class in org.apache.tika.parser.microsoft
- WordExtractor(ParseContext, Metadata) - Constructor for class org.apache.tika.parser.microsoft.WordExtractor
- WordExtractor.TagAndStyle - Class in org.apache.tika.parser.microsoft
- WordMLParser - Class in org.apache.tika.parser.microsoft.xml
-
Parses wordml 2003 format word files.
- WordMLParser() - Constructor for class org.apache.tika.parser.microsoft.xml.WordMLParser
- WordPerfectParser - Class in org.apache.tika.parser.wordperfect
-
Parser for Corel WordPerfect documents.
- WordPerfectParser() - Constructor for class org.apache.tika.parser.wordperfect.WordPerfectParser
- WORKBOOK - org.apache.tika.parser.microsoft.OfficeParser.POIFSDocumentType
- WORKS - org.apache.tika.parser.microsoft.OfficeParser.POIFSDocumentType
- WPS - Static variable in class org.apache.tika.parser.microsoft.POIFSContainerDetector
-
Microsoft Works
- writeFile(byte[][], String) - Static method in class org.apache.tika.parser.chm.core.ChmCommons
-
Writes byte[][] to the file
X
- XCAS - org.apache.tika.parser.ctakes.CTAKESSerializer
- XLIFF12ContentHandler - Class in org.apache.tika.parser.xliff
-
Content Handler for XLIFF 1.2 documents.
- XLIFF12Parser - Class in org.apache.tika.parser.xliff
-
Parser for XLIFF 1.2 files.
- XLIFF12Parser() - Constructor for class org.apache.tika.parser.xliff.XLIFF12Parser
- XLINK_NS - Static variable in class org.apache.tika.parser.odf.OpenDocumentContentParser
- XLR - org.apache.tika.parser.microsoft.OfficeParser.POIFSDocumentType
- XLR - Static variable in class org.apache.tika.parser.microsoft.POIFSContainerDetector
-
Microsoft Works Spreadsheet 7.0
- XLS - Static variable in class org.apache.tika.parser.microsoft.POIFSContainerDetector
-
Microsoft Excel
- XLZParser - Class in org.apache.tika.parser.xliff
-
Parser for XLZ Archives.
- XLZParser() - Constructor for class org.apache.tika.parser.xliff.XLZParser
- XMI - org.apache.tika.parser.ctakes.CTAKESSerializer
- XML - org.apache.tika.parser.ctakes.CTAKESSerializer
- XMLParser - Class in org.apache.tika.parser.xml
-
XML parser.
- XMLParser() - Constructor for class org.apache.tika.parser.xml.XMLParser
- XMPPacketScanner - Class in org.apache.tika.parser.image.xmp
-
This class is a parser for XMP packets.
- XMPPacketScanner() - Constructor for class org.apache.tika.parser.image.xmp.XMPPacketScanner
- XPS - Static variable in class org.apache.tika.parser.microsoft.ooxml.OOXMLParser
- XPSExtractorDecorator - Class in org.apache.tika.parser.microsoft.ooxml.xps
- XPSExtractorDecorator(ParseContext, POIXMLTextExtractor) - Constructor for class org.apache.tika.parser.microsoft.ooxml.xps.XPSExtractorDecorator
- XPSTextExtractor - Class in org.apache.tika.parser.microsoft.ooxml.xps
-
Currently, mostly a pass-through class to hold pkg and properties and keep the general framework similar to our other POI-integrated extractors.
- XPSTextExtractor(OPCPackage) - Constructor for class org.apache.tika.parser.microsoft.ooxml.xps.XPSTextExtractor
- XSLFEventBasedPowerPointExtractor - Class in org.apache.tika.parser.microsoft.ooxml.xslf
- XSLFEventBasedPowerPointExtractor(String) - Constructor for class org.apache.tika.parser.microsoft.ooxml.xslf.XSLFEventBasedPowerPointExtractor
- XSLFEventBasedPowerPointExtractor(OPCPackage) - Constructor for class org.apache.tika.parser.microsoft.ooxml.xslf.XSLFEventBasedPowerPointExtractor
- XSLFPowerPointExtractorDecorator - Class in org.apache.tika.parser.microsoft.ooxml
- XSLFPowerPointExtractorDecorator(Metadata, ParseContext, XSLFPowerPointExtractor) - Constructor for class org.apache.tika.parser.microsoft.ooxml.XSLFPowerPointExtractorDecorator
- XSLFPowerPointExtractorDecorator(ParseContext, XSLFPowerPointExtractor) - Constructor for class org.apache.tika.parser.microsoft.ooxml.XSLFPowerPointExtractorDecorator
-
Deprecated.
- XSSFBExcelExtractorDecorator - Class in org.apache.tika.parser.microsoft.ooxml
- XSSFBExcelExtractorDecorator(ParseContext, POIXMLTextExtractor, Locale) - Constructor for class org.apache.tika.parser.microsoft.ooxml.XSSFBExcelExtractorDecorator
- XSSFExcelExtractorDecorator - Class in org.apache.tika.parser.microsoft.ooxml
- XSSFExcelExtractorDecorator(ParseContext, POIXMLTextExtractor, Locale) - Constructor for class org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator
- XSSFExcelExtractorDecorator.HeaderFooterFromString - Class in org.apache.tika.parser.microsoft.ooxml
- XSSFExcelExtractorDecorator.SheetTextAsHTML - Class in org.apache.tika.parser.microsoft.ooxml
-
Turns formatted sheet events into HTML
- XSSFExcelExtractorDecorator.XSSFSheetInterestingPartsCapturer - Class in org.apache.tika.parser.microsoft.ooxml
-
Captures information on interesting tags, whilst delegating the main work to the formatting handler
- XSSFSheetInterestingPartsCapturer(ContentHandler) - Constructor for class org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.XSSFSheetInterestingPartsCapturer
- XUserDefinedCharset - Class in org.apache.tika.parser.html.charsetdetector.charsets
- XUserDefinedCharset() - Constructor for class org.apache.tika.parser.html.charsetdetector.charsets.XUserDefinedCharset
- XWPFEventBasedWordExtractor - Class in org.apache.tika.parser.microsoft.ooxml.xwpf
-
Experimental class that is based on POI's XSSFEventBasedExcelExtractor
- XWPFEventBasedWordExtractor(String) - Constructor for class org.apache.tika.parser.microsoft.ooxml.xwpf.XWPFEventBasedWordExtractor
- XWPFEventBasedWordExtractor(OPCPackage) - Constructor for class org.apache.tika.parser.microsoft.ooxml.xwpf.XWPFEventBasedWordExtractor
- XWPFListManager - Class in org.apache.tika.parser.microsoft.ooxml
- XWPFListManager(XWPFNumbering) - Constructor for class org.apache.tika.parser.microsoft.ooxml.XWPFListManager
- XWPFNumberingShim - Class in org.apache.tika.parser.microsoft.ooxml.xwpf
-
Stub class of POI's XWPFNumbering because onDocumentRead() is protected
- XWPFNumberingShim(PackagePart) - Constructor for class org.apache.tika.parser.microsoft.ooxml.xwpf.XWPFNumberingShim
- XWPFStylesShim - Class in org.apache.tika.parser.microsoft.ooxml.xwpf
-
For Tika, all we need (so far) is a mapping between styleId and a style's name.
- XWPFStylesShim(PackagePart, ParseContext) - Constructor for class org.apache.tika.parser.microsoft.ooxml.xwpf.XWPFStylesShim
- XWPFWordExtractorDecorator - Class in org.apache.tika.parser.microsoft.ooxml
- XWPFWordExtractorDecorator(Metadata, ParseContext, XWPFWordExtractor) - Constructor for class org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator
- XWPFWordExtractorDecorator(ParseContext, XWPFWordExtractor) - Constructor for class org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator
Z
- ZipContainerDetector - Class in org.apache.tika.parser.pkg
-
A detector that works on Zip documents and other archive and compression formats to figure out exactly what the file is.
- ZipContainerDetector() - Constructor for class org.apache.tika.parser.pkg.ZipContainerDetector
- ZipSalvager - Class in org.apache.tika.parser.utils
- ZipSalvager() - Constructor for class org.apache.tika.parser.utils.ZipSalvager
All Classes All Packages