public class RegExAnnotator
extends org.apache.uima.analysis_component.CasAnnotator_ImplBase
There are two ways to specify the regular expressions - via configuration parameters or via an external resource file.
This annotator takes the following optional configuration parameters:
Patterns - array of Strings indicating regular expressions to match. The
pattern language is described at
http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html) TypeNames - array of Strings indicating names of Types to be created from the
patterns. ContainingAnnotationTypes - an array of input annotation types. This
annotator will only produce new annotations that are contained within existing annotaions of
these types. (This is optional.) AnnotateEntireContainedAnnotation - When the ContainingAnnoationTypes
parameter is specified, a value of true for this parameter will cause the entire containing
annotation to be used as the span of the new annotation, rather than just the span of the regular
expression match. This can be used to "classify" previously created annotations according to
whether or not they contain text matching a regular expression.
The indices of the Patterns and TypeNames arrays correspond, so
that a substring that matches Patterns[i] will result in an annotation of type
TypeNames[i].
It is also possible to provide an external resource file that declares the annotation type names and the regular expressions to match. The annotator will look for this file under the resource key "PatternFile". The file format is as follows:
Patterns configuration parameter.| Modifier and Type | Field and Description |
|---|---|
static String |
MESSAGE_DIGEST |
| Constructor and Description |
|---|
RegExAnnotator() |
| Modifier and Type | Method and Description |
|---|---|
protected int[] |
getRangesToAnnotate(org.apache.uima.cas.CAS aCAS)
Utility method that determines which subranges of the document text should be annotated by this
annotator.
|
void |
initialize(org.apache.uima.UimaContext aContext)
Performs any startup tasks required by this annotator.
|
void |
process(org.apache.uima.cas.CAS aCAS)
Invokes this annotator's analysis logic.
|
void |
typeSystemInit(org.apache.uima.cas.TypeSystem aTypeSystem)
Acquires references to CAS Type and Feature objects that are later used during the
process(CAS) method. |
getRequiredCasInterface, processgetCasInstancesRequired, hasNext, nextpublic static final String MESSAGE_DIGEST
public void initialize(org.apache.uima.UimaContext aContext)
throws org.apache.uima.resource.ResourceInitializationException
initialize in interface org.apache.uima.analysis_component.AnalysisComponentinitialize in class org.apache.uima.analysis_component.AnalysisComponent_ImplBaseorg.apache.uima.resource.ResourceInitializationExceptionAnalysisComponent_ImplBase.initialize(UimaContext)public void typeSystemInit(org.apache.uima.cas.TypeSystem aTypeSystem)
throws org.apache.uima.analysis_engine.AnalysisEngineProcessException
process(CAS) method.typeSystemInit in class org.apache.uima.analysis_component.CasAnnotator_ImplBaseorg.apache.uima.analysis_engine.AnalysisEngineProcessExceptionCasAnnotator_ImplBase.typeSystemInit(TypeSystem)public void process(org.apache.uima.cas.CAS aCAS)
throws org.apache.uima.analysis_engine.AnalysisEngineProcessException
process in class org.apache.uima.analysis_component.CasAnnotator_ImplBaseaCAS - the CAS to processorg.apache.uima.analysis_engine.AnalysisEngineProcessException - if a failure occurs during processing.CasAnnotator_ImplBase.process(CAS)protected int[] getRangesToAnnotate(org.apache.uima.cas.CAS aCAS)
mContainingAnnotationTypes is null, the entire document
is eligible for annotation.mContainingAnnotationTypes is not null, then each of its
elements is expected to be an Annotation Type name. The CAS is queried for existing annotations
of any of these Types, and the only subranges of the document eligible for annotation are those
subranges contained within such annotations.aCAS - CAS currently being processedCopyright © 2006–2018 The Apache Software Foundation. All rights reserved.