Package de.julielab.jcore.utility
Class JCoReCondensedDocumentText
- java.lang.Object
-
- de.julielab.jcore.utility.JCoReCondensedDocumentText
-
public class JCoReCondensedDocumentText extends Object
This class is helpful when some parts of the CAS document text should be cut out according to a set of specific annotations. The class then represents the document text that results from cutting out said text passages. It offers a method to return the actual text string and a method to map the character offsets of the compacted string to the original CAS document text.- Author:
- faessler
-
-
Constructor Summary
Constructors Constructor Description JCoReCondensedDocumentText(org.apache.uima.jcas.JCas cas, Set<String> cutAwayTypes)Cuts away the covered text of annotations of a type in cutAwayTypes from the cas document text.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidbuildMap(org.apache.uima.jcas.JCas cas, Set<String> cutAwayTypes)Creates a map that maps those positions of the small-cut text that correspond to an intermediate next position after a cut-away annotation in the original text to the sum of ranges covered by cut-away annotations up to the original offset.org.apache.uima.jcas.JCasgetCas()StringgetCodensedText()intgetCondensedOffsetForOriginalOffset(int originalOffset)Given a character offset relative to the original CAS document text, this method returns the corresponding offset in the condensed document text.intgetOriginalOffsetForCondensedOffset(int condensedOffset)Given a character offset relative to the condensed document text, this method returns the corresponding offset in the original CAS document text.
-
-
-
Constructor Detail
-
JCoReCondensedDocumentText
public JCoReCondensedDocumentText(org.apache.uima.jcas.JCas cas, Set<String> cutAwayTypes) throws ClassNotFoundExceptionCuts away the covered text of annotations of a type in cutAwayTypes from the cas document text. If cutAwayTypes is null or empty, this class' methods will return the original CAS data.
- Parameters:
cas- The CAS for which the document text should be cut.cutAwayTypes- The types for cutting. May be null.- Throws:
ClassNotFoundException- If cutAwayTypes contains non-existing type names.
-
-
Method Detail
-
getCas
public org.apache.uima.jcas.JCas getCas()
-
buildMap
public void buildMap(org.apache.uima.jcas.JCas cas, Set<String> cutAwayTypes) throws ClassNotFoundExceptionCreates a map that maps those positions of the small-cut text that correspond to an intermediate next position after a cut-away annotation in the original text to the sum of ranges covered by cut-away annotations up to the original offset.
If cutAwayTypes is empty, no work will be done and the methods of this class we return the original text and offets of the CAS.
- Parameters:
cas- The CAS for create a cut-away document text for.cutAwayTypes- The qualified type names of the annotations whose covered text should be cut away.- Throws:
ClassNotFoundException- If cutAwayTypes contains type identifiers to non-existing types.
-
getOriginalOffsetForCondensedOffset
public int getOriginalOffsetForCondensedOffset(int condensedOffset)
Given a character offset relative to the condensed document text, this method returns the corresponding offset in the original CAS document text.- Parameters:
condensedOffset- The character offset in the condensed document text string.- Returns:
- The character offset relative to the original CAS document text associated with condensedOffset.
-
getCondensedOffsetForOriginalOffset
public int getCondensedOffsetForOriginalOffset(int originalOffset)
Given a character offset relative to the original CAS document text, this method returns the corresponding offset in the condensed document text.- Parameters:
originalOffset- The character offset in the originalOffset document CAS text string.- Returns:
- The character offset relative to the condensed document text associated with originalOffset.
-
getCodensedText
public String getCodensedText()
-
-