public class XMLProfiler
extends org.apache.tika.parser.AbstractParser
This parser enables profiling of XML. It captures the root entity as well as entity uris/namespaces and entity local names in parallel arrays.
This parser is not part of the default set of parsers and must be "turned on" via a tika config: <properties> <parsers> <parser class="org.apache.tika.parser.DefaultParser"/> <parser class="org.apache.tika.parser.xml.XMLProfiler"/> </parsers> </properties>
This was initially designed to profile xmp and xfa in PDFs. Further work would need to be done to extract other types of xml and/or xmp in other file formats. Please open a ticket.
| Modifier and Type | Field and Description |
|---|---|
static org.apache.tika.metadata.Property |
ENTITY_LOCAL_NAMES |
static org.apache.tika.metadata.Property |
ENTITY_URIS |
static org.apache.tika.metadata.Property |
ROOT_ENTITY |
| Constructor and Description |
|---|
XMLProfiler() |
| Modifier and Type | Method and Description |
|---|---|
Set<org.apache.tika.mime.MediaType> |
getSupportedTypes(org.apache.tika.parser.ParseContext context) |
void |
parse(InputStream stream,
ContentHandler handler,
org.apache.tika.metadata.Metadata metadata,
org.apache.tika.parser.ParseContext context) |
public static org.apache.tika.metadata.Property ROOT_ENTITY
public static org.apache.tika.metadata.Property ENTITY_URIS
public static org.apache.tika.metadata.Property ENTITY_LOCAL_NAMES
public Set<org.apache.tika.mime.MediaType> getSupportedTypes(org.apache.tika.parser.ParseContext context)
public void parse(InputStream stream, ContentHandler handler, org.apache.tika.metadata.Metadata metadata, org.apache.tika.parser.ParseContext context) throws IOException, SAXException, org.apache.tika.exception.TikaException
IOExceptionSAXExceptionorg.apache.tika.exception.TikaExceptionCopyright © 2007–2022 The Apache Software Foundation. All rights reserved.