Package org.apache.uima.tools.components
Class XmlDetagger
java.lang.Object
org.apache.uima.analysis_component.AnalysisComponent_ImplBase
org.apache.uima.analysis_component.Annotator_ImplBase
org.apache.uima.analysis_component.CasAnnotator_ImplBase
org.apache.uima.tools.components.XmlDetagger
- All Implemented Interfaces:
org.apache.uima.analysis_component.AnalysisComponent
public class XmlDetagger
extends org.apache.uima.analysis_component.CasAnnotator_ImplBase
A multi-sofa annotator that does XML detagging. Reads XML data from the input Sofa (named
"xmlDocument"); this data can be stored in the CAS as a string or array, or it can be a URI to a
remote file. The XML is parsed using the JVM's default parser, and the plain-text content is
written to a new sofa called "plainTextDocument".
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final StringName of optional configuration parameter that contains the name of an XML tag that appears in the input file. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic org.apache.uima.analysis_engine.AnalysisEngineDescriptionParses and returns the descriptor for this Analysis Gnein.static URLvoidinitialize(org.apache.uima.UimaContext aContext) voidprocess(org.apache.uima.cas.CAS aCAS) voidtypeSystemInit(org.apache.uima.cas.TypeSystem aTypeSystem) Methods inherited from class org.apache.uima.analysis_component.CasAnnotator_ImplBase
getRequiredCasInterface, processMethods inherited from class org.apache.uima.analysis_component.Annotator_ImplBase
getCasInstancesRequired, hasNext, nextMethods inherited from class org.apache.uima.analysis_component.AnalysisComponent_ImplBase
batchProcessComplete, collectionProcessComplete, destroy, getContext, getLogger, getResultSpecification, reconfigure, setResultSpecification
-
Field Details
-
PARAM_TEXT_TAG
Name of optional configuration parameter that contains the name of an XML tag that appears in the input file. Only text that falls within this XML tag will be considered part of the "document" that it is added to the CAS by this CAS Initializer. If not specified, the entire file will be considered the document.- See Also:
-
-
Constructor Details
-
XmlDetagger
public XmlDetagger()
-
-
Method Details
-
initialize
public void initialize(org.apache.uima.UimaContext aContext) throws org.apache.uima.resource.ResourceInitializationException - Specified by:
initializein interfaceorg.apache.uima.analysis_component.AnalysisComponent- Overrides:
initializein classorg.apache.uima.analysis_component.AnalysisComponent_ImplBase- Throws:
org.apache.uima.resource.ResourceInitializationException
-
typeSystemInit
public void typeSystemInit(org.apache.uima.cas.TypeSystem aTypeSystem) throws org.apache.uima.analysis_engine.AnalysisEngineProcessException - Overrides:
typeSystemInitin classorg.apache.uima.analysis_component.CasAnnotator_ImplBase- Throws:
org.apache.uima.analysis_engine.AnalysisEngineProcessException
-
process
public void process(org.apache.uima.cas.CAS aCAS) throws org.apache.uima.analysis_engine.AnalysisEngineProcessException - Specified by:
processin classorg.apache.uima.analysis_component.CasAnnotator_ImplBase- Throws:
org.apache.uima.analysis_engine.AnalysisEngineProcessException
-
getDescription
public static org.apache.uima.analysis_engine.AnalysisEngineDescription getDescription() throws org.apache.uima.util.InvalidXMLExceptionParses and returns the descriptor for this Analysis Gnein. The descriptor is stored in the uima-core.jar file and located using the ClassLoader.- Returns:
- an object containing all of the information parsed from the descriptor.
- Throws:
org.apache.uima.util.InvalidXMLException- if the descriptor is invalid or missing
-
getDescriptorURL
-