Class MUC7Reader

  • All Implemented Interfaces:
    org.apache.uima.collection.base_cpm.BaseCollectionReader, org.apache.uima.collection.CollectionReader, org.apache.uima.resource.ConfigurableResource, org.apache.uima.resource.Resource

    public class MUC7Reader
    extends org.apache.uima.collection.CollectionReader_ImplBase
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static String[] ELEMENT_TEXT_TO_BE_PROCESSED
      XML elements comprised in an object list
      static String PARAM_INPUTDIR
      Name of configuration parameter that must be set to the path of a directory containing input files.
      • Fields inherited from interface org.apache.uima.resource.Resource

        PARAM_AGGREGATE_SOFA_MAPPINGS, PARAM_CONFIG_MANAGER, PARAM_CONFIG_PARAM_SETTINGS, PARAM_EXTERNAL_OVERRIDE_SETTINGS, PARAM_PERFORMANCE_TUNING_SETTINGS, PARAM_RESOURCE_MANAGER, PARAM_UIMA_CONTEXT
    • Constructor Summary

      Constructors 
      Constructor Description
      MUC7Reader()  
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void close()  
      int[] getBeginEndOfSequence​(String sequenceString, String inputString, int startOfSequence)
      Given a sequence, a string in which the token occurs and a stating point, this methods retrieves begin and end position of this sequence.
      int[] getBeginEndOfToken​(String tokenString, String inputString, int startOfToken)
      Given a token, a string in which the token occurs and a stating point, this methods retrieves begin and end position of this token.
      void getNext​(org.apache.uima.cas.CAS cas)  
      org.apache.uima.util.Progress[] getProgress()  
      boolean hasNext()  
      void initialize()  
      String normalizeString​(String stringToBeNormalized)
      normalizes a string by replacing newlines by whitspaces, by removing sequences of more that one whitespace and by removing the newlines at the beginning of a line; also removes stuff like "A;N;D;R;LR;" etc.
      • Methods inherited from class org.apache.uima.collection.CollectionReader_ImplBase

        destroy, getCasInitializer, getProcessingResourceMetaData, initialize, isConsuming, reconfigure, setCasInitializer, typeSystemInit
      • Methods inherited from class org.apache.uima.resource.ConfigurableResource_ImplBase

        getConfigParameterValue, getConfigParameterValue, setConfigParameterValue, setConfigParameterValue
      • Methods inherited from class org.apache.uima.resource.Resource_ImplBase

        getCasManager, getLogger, getMetaData, getRelativePathResolver, getResourceManager, getUimaContext, getUimaContextAdmin, setLogger, setMetaData
      • Methods inherited from interface org.apache.uima.resource.ConfigurableResource

        getConfigParameterValue, getConfigParameterValue, setConfigParameterValue, setConfigParameterValue
      • Methods inherited from interface org.apache.uima.resource.Resource

        getLogger, getMetaData, getResourceManager, getUimaContext, getUimaContextAdmin, setLogger
    • Field Detail

      • ELEMENT_TEXT_TO_BE_PROCESSED

        public static final String[] ELEMENT_TEXT_TO_BE_PROCESSED
        XML elements comprised in an object list
      • PARAM_INPUTDIR

        public static final String PARAM_INPUTDIR
        Name of configuration parameter that must be set to the path of a directory containing input files.
        See Also:
        Constant Field Values
    • Constructor Detail

      • MUC7Reader

        public MUC7Reader()
    • Method Detail

      • getBeginEndOfToken

        public int[] getBeginEndOfToken​(String tokenString,
                                        String inputString,
                                        int startOfToken)
        Given a token, a string in which the token occurs and a stating point, this methods retrieves begin and end position of this token.
        Parameters:
        tokenString - (the token to be searched)
        inputString - (the string in which we search the token)
        startOfToken - (the begin where the token should be searched in the inputString)
        Returns:
        the begin and the end position of the token in an int array {begin, end}
      • getBeginEndOfSequence

        public int[] getBeginEndOfSequence​(String sequenceString,
                                           String inputString,
                                           int startOfSequence)
        Given a sequence, a string in which the token occurs and a stating point, this methods retrieves begin and end position of this sequence.
        Parameters:
        sequenceString - (the sequence to be searched)
        inputString - (the string in which we search de sequence)
        startOfSequence - (the begin were the sequence shpuld be searched in the inputString)
        Returns:
        the begin and the end position of the sequence in an int array {begin, end}
      • normalizeString

        public String normalizeString​(String stringToBeNormalized)
        normalizes a string by replacing newlines by whitspaces, by removing sequences of more that one whitespace and by removing the newlines at the beginning of a line; also removes stuff like "A;N;D;R;LR;" etc.
        Parameters:
        stringToBeNormalized -
        Returns:
        the normalized string
      • initialize

        public void initialize()
                        throws org.apache.uima.resource.ResourceInitializationException
        Overrides:
        initialize in class org.apache.uima.collection.CollectionReader_ImplBase
        Throws:
        org.apache.uima.resource.ResourceInitializationException
      • getNext

        public void getNext​(org.apache.uima.cas.CAS cas)
                     throws IOException,
                            org.apache.uima.collection.CollectionException
        Throws:
        IOException
        org.apache.uima.collection.CollectionException
      • getProgress

        public org.apache.uima.util.Progress[] getProgress()
      • hasNext

        public boolean hasNext()
                        throws IOException,
                               org.apache.uima.collection.CollectionException
        Throws:
        IOException
        org.apache.uima.collection.CollectionException