Class JCoReCondensedDocumentText


  • public class JCoReCondensedDocumentText
    extends Object
    This class is helpful when some parts of the CAS document text should be cut out according to a set of specific annotations. The class then represents the document text that results from cutting out said text passages. It offers a method to return the actual text string and a method to map the character offsets of the compacted string to the original CAS document text.
    Author:
    faessler
    • Constructor Detail

      • JCoReCondensedDocumentText

        public JCoReCondensedDocumentText​(org.apache.uima.jcas.JCas cas,
                                          Set<String> cutAwayTypes)
                                   throws ClassNotFoundException

        Cuts away the covered text of annotations of a type in cutAwayTypes from the cas document text. If cutAwayTypes is null or empty, this class' methods will return the original CAS data.

        Parameters:
        cas - The CAS for which the document text should be cut.
        cutAwayTypes - The types for cutting. May be null.
        Throws:
        ClassNotFoundException - If cutAwayTypes contains non-existing type names.
    • Method Detail

      • getCas

        public org.apache.uima.jcas.JCas getCas()
      • buildMap

        public void buildMap​(org.apache.uima.jcas.JCas cas,
                             Set<String> cutAwayTypes)
                      throws ClassNotFoundException

        Creates a map that maps those positions of the small-cut text that correspond to an intermediate next position after a cut-away annotation in the original text to the sum of ranges covered by cut-away annotations up to the original offset.

        If cutAwayTypes is empty, no work will be done and the methods of this class we return the original text and offets of the CAS.

        Parameters:
        cas - The CAS for create a cut-away document text for.
        cutAwayTypes - The qualified type names of the annotations whose covered text should be cut away.
        Throws:
        ClassNotFoundException - If cutAwayTypes contains type identifiers to non-existing types.
      • getOriginalOffsetForCondensedOffset

        public int getOriginalOffsetForCondensedOffset​(int condensedOffset)
        Given a character offset relative to the condensed document text, this method returns the corresponding offset in the original CAS document text.
        Parameters:
        condensedOffset - The character offset in the condensed document text string.
        Returns:
        The character offset relative to the original CAS document text associated with condensedOffset.
      • getCondensedOffsetForOriginalOffset

        public int getCondensedOffsetForOriginalOffset​(int originalOffset)
        Given a character offset relative to the original CAS document text, this method returns the corresponding offset in the condensed document text.
        Parameters:
        originalOffset - The character offset in the originalOffset document CAS text string.
        Returns:
        The character offset relative to the condensed document text associated with originalOffset.
      • getCodensedText

        public String getCodensedText()