Class XmlDetagger

java.lang.Object
org.apache.uima.analysis_component.AnalysisComponent_ImplBase
org.apache.uima.analysis_component.Annotator_ImplBase
org.apache.uima.analysis_component.CasAnnotator_ImplBase
org.apache.uima.tools.components.XmlDetagger
All Implemented Interfaces:
org.apache.uima.analysis_component.AnalysisComponent

public class XmlDetagger extends org.apache.uima.analysis_component.CasAnnotator_ImplBase
A multi-sofa annotator that does XML detagging. Reads XML data from the input Sofa (named "xmlDocument"); this data can be stored in the CAS as a string or array, or it can be a URI to a remote file. The XML is parsed using the JVM's default parser, and the plain-text content is written to a new sofa called "plainTextDocument".
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    static final String
    Name of optional configuration parameter that contains the name of an XML tag that appears in the input file.
  • Constructor Summary

    Constructors
    Constructor
    Description
     
  • Method Summary

    Modifier and Type
    Method
    Description
    static org.apache.uima.analysis_engine.AnalysisEngineDescription
    Parses and returns the descriptor for this Analysis Gnein.
    static URL
     
    void
    initialize(org.apache.uima.UimaContext aContext)
     
    void
    process(org.apache.uima.cas.CAS aCAS)
     
    void
    typeSystemInit(org.apache.uima.cas.TypeSystem aTypeSystem)
     

    Methods inherited from class org.apache.uima.analysis_component.CasAnnotator_ImplBase

    getRequiredCasInterface, process

    Methods inherited from class org.apache.uima.analysis_component.Annotator_ImplBase

    getCasInstancesRequired, hasNext, next

    Methods inherited from class org.apache.uima.analysis_component.AnalysisComponent_ImplBase

    batchProcessComplete, collectionProcessComplete, destroy, getContext, getLogger, getResultSpecification, reconfigure, setResultSpecification

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • PARAM_TEXT_TAG

      public static final String PARAM_TEXT_TAG
      Name of optional configuration parameter that contains the name of an XML tag that appears in the input file. Only text that falls within this XML tag will be considered part of the "document" that it is added to the CAS by this CAS Initializer. If not specified, the entire file will be considered the document.
      See Also:
  • Constructor Details

    • XmlDetagger

      public XmlDetagger()
  • Method Details

    • initialize

      public void initialize(org.apache.uima.UimaContext aContext) throws org.apache.uima.resource.ResourceInitializationException
      Specified by:
      initialize in interface org.apache.uima.analysis_component.AnalysisComponent
      Overrides:
      initialize in class org.apache.uima.analysis_component.AnalysisComponent_ImplBase
      Throws:
      org.apache.uima.resource.ResourceInitializationException
    • typeSystemInit

      public void typeSystemInit(org.apache.uima.cas.TypeSystem aTypeSystem) throws org.apache.uima.analysis_engine.AnalysisEngineProcessException
      Overrides:
      typeSystemInit in class org.apache.uima.analysis_component.CasAnnotator_ImplBase
      Throws:
      org.apache.uima.analysis_engine.AnalysisEngineProcessException
    • process

      public void process(org.apache.uima.cas.CAS aCAS) throws org.apache.uima.analysis_engine.AnalysisEngineProcessException
      Specified by:
      process in class org.apache.uima.analysis_component.CasAnnotator_ImplBase
      Throws:
      org.apache.uima.analysis_engine.AnalysisEngineProcessException
    • getDescription

      public static org.apache.uima.analysis_engine.AnalysisEngineDescription getDescription() throws org.apache.uima.util.InvalidXMLException
      Parses and returns the descriptor for this Analysis Gnein. The descriptor is stored in the uima-core.jar file and located using the ClassLoader.
      Returns:
      an object containing all of the information parsed from the descriptor.
      Throws:
      org.apache.uima.util.InvalidXMLException - if the descriptor is invalid or missing
    • getDescriptorURL

      public static URL getDescriptorURL()