Class TikaContentExtractor
- java.lang.Object
-
- org.apache.cxf.jaxrs.ext.search.tika.TikaContentExtractor
-
public class TikaContentExtractor extends Object
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classTikaContentExtractor.TikaContentExtracted content, metadata and media type container
-
Constructor Summary
Constructors Constructor Description TikaContentExtractor()Create new Tika-based content extractor using AutoDetectParser.TikaContentExtractor(List<org.apache.tika.parser.Parser> parsers)Create new Tika-based content extractor using the provided parser instances.TikaContentExtractor(List<org.apache.tika.parser.Parser> parsers, org.apache.tika.detect.Detector detector)Create new Tika-based content extractor using the provided parser instances.TikaContentExtractor(org.apache.tika.parser.Parser parser)Create new Tika-based content extractor using the provided parser instance.TikaContentExtractor(org.apache.tika.parser.Parser parser, boolean validateMediaType)Create new Tika-based content extractor using the provided parser instance and optional media type validation.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description TikaContentExtractor.TikaContentextract(InputStream in)Extract the content and metadata from the input stream.TikaContentExtractor.TikaContentextract(InputStream in, javax.ws.rs.core.MediaType mt)Extract the content and metadata from the input stream with a media type hint.TikaContentExtractor.TikaContentextract(InputStream in, ContentHandler handler)Extract the content and metadata from the input stream.TikaContentExtractor.TikaContentextract(InputStream in, ContentHandler handler, javax.ws.rs.core.MediaType mt)Extract the content and metadata from the input stream with a media type hint.TikaContentExtractor.TikaContentextract(InputStream in, ContentHandler handler, javax.ws.rs.core.MediaType mtHint, org.apache.tika.parser.ParseContext context)Extract the content and metadata from the input stream with a media type hint type of content.TikaContentExtractor.TikaContentextract(InputStream in, ContentHandler handler, org.apache.tika.parser.ParseContext context)Extract the content and metadata from the input stream.TikaContentExtractor.TikaContentextractMetadata(InputStream in)Extract the metadata only from the input stream.SearchBeanextractMetadataToSearchBean(InputStream in)Extract the metadata only from the input stream.
-
-
-
Constructor Detail
-
TikaContentExtractor
public TikaContentExtractor()
Create new Tika-based content extractor using AutoDetectParser.
-
TikaContentExtractor
public TikaContentExtractor(org.apache.tika.parser.Parser parser)
Create new Tika-based content extractor using the provided parser instance.- Parameters:
parser- parser instance
-
TikaContentExtractor
public TikaContentExtractor(List<org.apache.tika.parser.Parser> parsers)
Create new Tika-based content extractor using the provided parser instances.- Parameters:
parsers- parser instances
-
TikaContentExtractor
public TikaContentExtractor(List<org.apache.tika.parser.Parser> parsers, org.apache.tika.detect.Detector detector)
Create new Tika-based content extractor using the provided parser instances.- Parameters:
parsers- parser instances
-
TikaContentExtractor
public TikaContentExtractor(org.apache.tika.parser.Parser parser, boolean validateMediaType)Create new Tika-based content extractor using the provided parser instance and optional media type validation. If validation is enabled, the implementation parser will try to detect the media type of the input and validate it against media types supported by the parser.- Parameters:
parser- parser instancevalidateMediaType- enabled or disable media type validationparser
-
-
Method Detail
-
extract
public TikaContentExtractor.TikaContent extract(InputStream in)
Extract the content and metadata from the input stream. Depending on media type validation, the detector could be run against input stream in order to ensure that parser supports this type of content.- Parameters:
in- input stream to extract the content and metadata from- Returns:
- the extracted content and metadata or null if extraction is not possible or was unsuccessful
-
extract
public TikaContentExtractor.TikaContent extract(InputStream in, javax.ws.rs.core.MediaType mt)
Extract the content and metadata from the input stream with a media type hint.- Parameters:
in- input stream to extract the content and metadata frommt- JAX-RS MediaType of the stream content- Returns:
- the extracted content and metadata or null if extraction is not possible or was unsuccessful
-
extract
public TikaContentExtractor.TikaContent extract(InputStream in, ContentHandler handler)
Extract the content and metadata from the input stream. Depending on media type validation, the detector could be run against input stream in order to ensure that parser supports this type of content.- Parameters:
in- input stream to extract the content and metadata fromhandler- custom ContentHandler- Returns:
- the extracted content and metadata or null if extraction is not possible or was unsuccessful
-
extract
public TikaContentExtractor.TikaContent extract(InputStream in, ContentHandler handler, javax.ws.rs.core.MediaType mt)
Extract the content and metadata from the input stream with a media type hint.- Parameters:
in- input stream to extract the content and metadata fromhandler- custom ContentHandlermt- JAX-RS MediaType of the stream content- Returns:
- the extracted content and metadata or null if extraction is not possible or was unsuccessful
-
extract
public TikaContentExtractor.TikaContent extract(InputStream in, ContentHandler handler, org.apache.tika.parser.ParseContext context)
Extract the content and metadata from the input stream. Depending on media type validation, the detector could be run against input stream in order to ensure that parser supports this type of content.- Parameters:
in- input stream to extract the content and metadata fromhandler- custom ContentHandlercontext- custom context- Returns:
- the extracted content and metadata or null if extraction is not possible or was unsuccessful
-
extract
public TikaContentExtractor.TikaContent extract(InputStream in, ContentHandler handler, javax.ws.rs.core.MediaType mtHint, org.apache.tika.parser.ParseContext context)
Extract the content and metadata from the input stream with a media type hint type of content.- Parameters:
in- input stream to extract the metadata fromhandler- custom ContentHandlermtHint- JAX-RS MediaType of the stream contentcontext- custom context- Returns:
- the extracted content and metadata or null if extraction is not possible or was unsuccessful
-
extractMetadata
public TikaContentExtractor.TikaContent extractMetadata(InputStream in)
Extract the metadata only from the input stream. Depending on media type validation, the detector could be run against input stream in order to ensure that parser supports this type of content.- Parameters:
in- input stream to extract the metadata from- Returns:
- the extracted content or null if extraction is not possible or was unsuccessful
-
extractMetadataToSearchBean
public SearchBean extractMetadataToSearchBean(InputStream in)
Extract the metadata only from the input stream. Depending on media type validation, the detector could be run against input stream in order to ensure that parser supports this type of content.- Parameters:
in- input stream to extract the metadata from- Returns:
- the extracted metadata converted to SearchBean or null if extraction is not possible or was unsuccessful
-
-