Class TikaLuceneContentExtractor
- java.lang.Object
-
- org.apache.cxf.jaxrs.ext.search.tika.TikaLuceneContentExtractor
-
public class TikaLuceneContentExtractor extends Object
-
-
Constructor Summary
Constructors Constructor Description TikaLuceneContentExtractor(List<org.apache.tika.parser.Parser> parsers, LuceneDocumentMetadata documentMetadata)Create new Tika-based content extractor using the provided parser instance and optional media type validation.TikaLuceneContentExtractor(org.apache.tika.parser.Parser parser)Create new Tika-based content extractor using the provided parser instance.TikaLuceneContentExtractor(org.apache.tika.parser.Parser parser, boolean validateMediaType)Create new Tika-based content extractor using the provided parser instance and optional media type validation.TikaLuceneContentExtractor(org.apache.tika.parser.Parser parser, boolean validateMediaType, LuceneDocumentMetadata documentMetadata)Create new Tika-based content extractor using the provided parser instance and optional media type validation.TikaLuceneContentExtractor(org.apache.tika.parser.Parser parser, LuceneDocumentMetadata documentMetadata)Create new Tika-based content extractor using the provided parser instance and optional media type validation.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description org.apache.lucene.document.Documentextract(InputStream in)Extract the content and metadata from the input stream.org.apache.lucene.document.Documentextract(InputStream in, LuceneDocumentMetadata documentMetadata)Extract the content and metadata from the input stream.org.apache.lucene.document.DocumentextractContent(InputStream in)Extract the content only from the input stream.org.apache.lucene.document.DocumentextractMetadata(InputStream in)Extract the metadata only from the input stream.org.apache.lucene.document.DocumentextractMetadata(InputStream in, LuceneDocumentMetadata documentMetadata)Extract the metadata only from the input stream.
-
-
-
Constructor Detail
-
TikaLuceneContentExtractor
public TikaLuceneContentExtractor(org.apache.tika.parser.Parser parser)
Create new Tika-based content extractor using the provided parser instance.- Parameters:
parser- parser instance
-
TikaLuceneContentExtractor
public TikaLuceneContentExtractor(org.apache.tika.parser.Parser parser, boolean validateMediaType)Create new Tika-based content extractor using the provided parser instance and optional media type validation. If validation is enabled, the implementation will try to detect the media type of the input and validate it against media typesthis.contentFieldName supported by the parser.- Parameters:
parser- parser instancevalidateMediaType- enabled or disable media type validation
-
TikaLuceneContentExtractor
public TikaLuceneContentExtractor(org.apache.tika.parser.Parser parser, LuceneDocumentMetadata documentMetadata)Create new Tika-based content extractor using the provided parser instance and optional media type validation. If validation is enabled, the implementation will try to detect the media type of the input and validate it against media types supported by the parser.- Parameters:
parser- parser instancethis.contentFieldNamedocumentMetadata- documentMetadata
-
TikaLuceneContentExtractor
public TikaLuceneContentExtractor(org.apache.tika.parser.Parser parser, boolean validateMediaType, LuceneDocumentMetadata documentMetadata)Create new Tika-based content extractor using the provided parser instance and optional media type validation. If validation is enabled, the implementation will try to detect the media type of the input and validate it against media types supported by the parser.- Parameters:
parser- parser instancethis.contentFieldNamevalidateMediaType- enabled or disable media type validationdocumentMetadata- documentMetadata
-
TikaLuceneContentExtractor
public TikaLuceneContentExtractor(List<org.apache.tika.parser.Parser> parsers, LuceneDocumentMetadata documentMetadata)
Create new Tika-based content extractor using the provided parser instance and optional media type validation. If validation is enabled, the implementation will try to detect the media type of the input and validate it against media types supported by the parser.- Parameters:
parsers- parsers instancethis.contentFieldNamedocumentMetadata- documentMetadata
-
-
Method Detail
-
extract
public org.apache.lucene.document.Document extract(InputStream in)
Extract the content and metadata from the input stream. Depending on media type validation, the detector could be run against input stream in order to ensure that parser supports this type of content.- Parameters:
in- input stream to extract the content and metadata from- Returns:
- the extracted document or null if extraction is not possible or was unsuccessful
-
extract
public org.apache.lucene.document.Document extract(InputStream in, LuceneDocumentMetadata documentMetadata)
Extract the content and metadata from the input stream. Depending on media type validation, the detector could be run against input stream in order to ensure that parser supports this type of content.- Parameters:
in- input stream to extract the content and metadata fromdocumentMetadata- documentMetadata- Returns:
- the extracted document or null if extraction is not possible or was unsuccessful
-
extractContent
public org.apache.lucene.document.Document extractContent(InputStream in)
Extract the content only from the input stream. Depending on media type validation, the detector could be run against input stream in order to ensure that parser supports this type of content.- Parameters:
in- input stream to extract the content from- Returns:
- the extracted document or null if extraction is not possible or was unsuccessful
-
extractMetadata
public org.apache.lucene.document.Document extractMetadata(InputStream in)
Extract the metadata only from the input stream. Depending on media type validation, the detector could be run against input stream in order to ensure that parser supports this type of content.- Parameters:
in- input stream to extract the metadata from- Returns:
- the extracted document or null if extraction is not possible or was unsuccessful
-
extractMetadata
public org.apache.lucene.document.Document extractMetadata(InputStream in, LuceneDocumentMetadata documentMetadata)
Extract the metadata only from the input stream. Depending on media type validation, the detector could be run against input stream in order to ensure that parser supports this type of content.- Parameters:
in- input stream to extract the metadata fromdocumentMetadata- documentMetadata- Returns:
- the extracted document or null if extraction is not possible or was unsuccessful
-
-