Class JCoReTools


  • public class JCoReTools
    extends Object
    • The binarySearch methods work specifically on Annotation objects, sorted by given function.
    • The addToFSArray methods are useful for adding elements to FSArrays which are rather awkward to use and, especially, to extend.
    • The addToStringArray methods serve a similar purpose.
    • One of the most used methods from this list is getDocId(JCas) which will look for an annotation of type de.julielab.jcore.types.Header and return its docId feature value.
    • The deserializeXmi(CAS, InputStream, int) method is used in UIMA 2.x to fix issues with special Unicode characters. For more information, refer to the JavaDoc of the method.
    Author:
    faessler
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static int DEFAULT_ADDITION_SIZE
      Number of elements to be added if an FSArray needs to be resized, effectively creating a new, larger FSArray.
    • Constructor Summary

      Constructors 
      Constructor Description
      JCoReTools()  
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static org.apache.uima.jcas.cas.FSArray addToFSArray​(org.apache.uima.jcas.cas.FSArray inputArray, Collection<? extends org.apache.uima.cas.FeatureStructure> newElements)
      Returns an FSArray that contains all elements of the given array and newElements.
      static org.apache.uima.jcas.cas.FSArray addToFSArray​(org.apache.uima.jcas.cas.FSArray array, org.apache.uima.cas.FeatureStructure newElement)
      Returns an FSArray that contains all elements of the given array and newElement.
      static org.apache.uima.jcas.cas.FSArray addToFSArray​(org.apache.uima.jcas.cas.FSArray array, org.apache.uima.cas.FeatureStructure newElement, int additionSize)
      Returns an FSArray that contains all elements of the given array and newElement.
      static org.apache.uima.jcas.cas.StringArray addToStringArray​(org.apache.uima.jcas.cas.StringArray array, String element)
      Creates a new string array, copies the values of array into it and adds element.
      static org.apache.uima.jcas.cas.StringArray addToStringArray​(org.apache.uima.jcas.cas.StringArray array, String[] elements)
      Creates a new string array, copies the values of array into it and adds elements.
      static <T extends org.apache.uima.jcas.tcas.Annotation,​R extends Comparable<R>>
      int
      binarySearch​(List<T> annotations, Function<T,​R> comparisonValueFunction, R searchValue)  
      static <T extends org.apache.uima.jcas.tcas.Annotation,​R extends Comparable<R>>
      int
      binarySearch​(List<T> annotations, Function<T,​R> comparisonValueFunction, R searchValue, int from, int to)  
      static org.apache.uima.jcas.cas.FSArray copyFSArray​(org.apache.uima.jcas.cas.FSArray array)
      Returns a new FSArray with the exact size and contents of array.
      static void deserializeXmi​(org.apache.uima.cas.CAS cas, InputStream is, int attributeBufferSize)
      Deserializes an UTF-8 encoded XMI input stream into the given CAS.
      static String getDocId​(org.apache.uima.jcas.JCas aJCas)
      Returns the document ID of the document in the JCas.
      static org.apache.uima.jcas.cas.StringArray newStringArray​(org.apache.uima.jcas.JCas jCas, String... elements)
      Creates a new StringArray from the given string elements.
      static void printAnnotationIndex​(org.apache.uima.jcas.JCas jCas, int type)  
      static void printFSArray​(org.apache.uima.jcas.cas.FSArray array)
      Prints the content of the FSArray to System.out
      static InputStream resolveExternalResourceGzipInputStream​(org.apache.uima.resource.DataResource resource)
      Helper method to transparently handle GZIPPed external resource files.
    • Field Detail

      • DEFAULT_ADDITION_SIZE

        public static final int DEFAULT_ADDITION_SIZE
        Number of elements to be added if an FSArray needs to be resized, effectively creating a new, larger FSArray.
        See Also:
        Constant Field Values
    • Constructor Detail

      • JCoReTools

        public JCoReTools()
    • Method Detail

      • addToFSArray

        public static org.apache.uima.jcas.cas.FSArray addToFSArray​(org.apache.uima.jcas.cas.FSArray array,
                                                                    org.apache.uima.cas.FeatureStructure newElement)

        Returns an FSArray that contains all elements of the given array and newElement.

        The new element is set into array if it has trailing null entries. Then, the new element is set to the first position of array that is null and only followed by null entries until the end of the array. If array is full, i.e. there are no trailing null entries, a new FSArray of size array.size()+DEFAULT_ADDITION_SIZE is created. All elements of array are copied into the new FSArray and newElement is added after the last element of array. Depending on DEFAULT_ADDITION_SIZE there might be trailing null entries left in the new FSArray they can be used to set further elements without the need to create a new FSArray.

        In any case, it should be assumed that the return value is a new FSArray. Thus, one should not rely on the possible in-place change of the passed array and replace the variable holding array with the return value of this method.

        Parameters:
        array - The array to what the feature structure should be added
        newElement - The feature structure that should be added to the array
        Returns:
        An FSArray containing all entries from array plus newElement. The returned FSArray will be array if there was enough space to add newElement or a new FSArray otherwise.
      • addToFSArray

        public static org.apache.uima.jcas.cas.FSArray addToFSArray​(org.apache.uima.jcas.cas.FSArray array,
                                                                    org.apache.uima.cas.FeatureStructure newElement,
                                                                    int additionSize)

        Returns an FSArray that contains all elements of the given array and newElement.

        The new element is set into array if it has trailing null entries. Then, the new element is set to the first position of array that is null and only followed by null entries until the end of the array. If array is full, i.e. there are no trailing null entries, a new FSArray of size array.size()+additionSize is created. All elements of array are copied into the new FSArray and newElement is added after the last element of array. Depending on additionSize there might be trailing null entries left in the new FSArray they can be used to set further elements without the need to create a new FSArray.

        In any case, it should be assumed that the return value is a new FSArray. Thus, one should not rely on the possible in-place change of the passed array and replace the variable holding array with the return value of this method.

        Parameters:
        array - The array to what the feature structure should be added
        newElement - The feature structure that should be added to the array
        additionSize - The size the array should be expanded
        Returns:
        An FSArray containing all entries from array plus newElement. The returned FSArray will be array if there was enough space to add newElement or a new FSArray otherwise.
      • addToFSArray

        public static org.apache.uima.jcas.cas.FSArray addToFSArray​(org.apache.uima.jcas.cas.FSArray inputArray,
                                                                    Collection<? extends org.apache.uima.cas.FeatureStructure> newElements)

        Returns an FSArray that contains all elements of the given array and newElements.

        The new elements are set into inputArray if it has trailing null entries. Then, the new elements are set to the first positions of inputArray that are null and only followed by null entries until the end of the array. If inputArray is too small, i.e. there are not enough trailing null entries, a new FSArray of size inputArray.size()+newElements.size() is created. All elements of inputArray are copied into the new FSArray and newElements are added after the last element of inputArray.

        In any case, it should be assumed that the return value is a new FSArray. Thus, one should not rely on the possible in-place change of the passed array and replace the variable holding array with the return value of this method.

        Parameters:
        inputArray - The array to what the feature structures should be added, may be null.
        newElements - The feature structure that should be added to the array
        Returns:
        An FSArray containing all entries from inputArray plus newElements. The returned FSArray will be inputArray if there was enough space to add newElements or a new FSArray otherwise.
      • copyFSArray

        public static org.apache.uima.jcas.cas.FSArray copyFSArray​(org.apache.uima.jcas.cas.FSArray array)
        Returns a new FSArray with the exact size and contents of array. This is a shallow copy, the array entries are copied by reference.
        Parameters:
        array - The FSArray to copy.
        Returns:
        A new FSArray with the size and contents of array.
      • newStringArray

        public static org.apache.uima.jcas.cas.StringArray newStringArray​(org.apache.uima.jcas.JCas jCas,
                                                                          String... elements)
        Creates a new StringArray from the given string elements.
        Parameters:
        jCas - The jCas to associate the new StringArray with.
        elements - The strings to put into the StringArray.
        Returns:
        The new, filled StringArray.
      • addToStringArray

        public static org.apache.uima.jcas.cas.StringArray addToStringArray​(org.apache.uima.jcas.cas.StringArray array,
                                                                            String element)

        Creates a new string array, copies the values of array into it and adds element.

        This method does not handle null values as addToFSArray(FSArray, FeatureStructure, int) does. To add multiple elements at once, avoiding excessive copying, refer to addToStringArray(StringArray, String[]).

        Parameters:
        array - The source array to extend.
        element - The element to add.
        Returns:
        A new StringArray with the same contents as array extended by element.
      • addToStringArray

        public static org.apache.uima.jcas.cas.StringArray addToStringArray​(org.apache.uima.jcas.cas.StringArray array,
                                                                            String[] elements)

        Creates a new string array, copies the values of array into it and adds elements.

        Parameters:
        array - The array to extend.
        elements - The elements to add into a new array.
        Returns:
        A new StringArray containing all values of array plus elements.
      • printFSArray

        public static void printFSArray​(org.apache.uima.jcas.cas.FSArray array)
        Prints the content of the FSArray to System.out
        Parameters:
        array - The array to be printed
      • printAnnotationIndex

        public static void printAnnotationIndex​(org.apache.uima.jcas.JCas jCas,
                                                int type)
      • getDocId

        public static String getDocId​(org.apache.uima.jcas.JCas aJCas)

        Returns the document ID of the document in the JCas.

        This can only be done when an annotation of type de.julielab.jcore.types.Header (or a subtype) is present and its feature docId is set.

        Parameters:
        aJCas -
        Returns:
        The value of of Header.getDocId()
      • deserializeXmi

        public static void deserializeXmi​(org.apache.uima.cas.CAS cas,
                                          InputStream is,
                                          int attributeBufferSize)
                                   throws SAXException,
                                          IOException

        Deserializes an UTF-8 encoded XMI input stream into the given CAS.

        This method has largely been taken directly from XmiCasDeserializer.deserialize(InputStream, CAS). However, the given input stream is explicitly transformed into an UTF-8 encoded InputSource for the XML parsing process. This is necessary because the Xerces internal UTF-8 handling is faulty with Unicode characters above the BMP (see https://issues.apache.org/jira/browse/XERCESJ-1257). Thus, this method explicitly uses UTF-8 encoding. For other encodings, use the default UIMA deserialization mechanism.

        The attributeBufferSize parameter only has an effect if the julielab Xerces version is on the classpath. Then, the XMLStringBuffer initial size is set via a system property. This can be very helpful for documents because UIMA stores the document text as an attribute to the sofa element in the XMI format. Such long attribute values are not expected by Xerces which initializes its attribute buffers with a size of 32 chars. Then, reading a large sofa (= document text) string results in a very long process of resizing the buffer array and copying the old buffer contents into the larger array. By setting a larger size from the beginning, a lot of time can be saved.

        Parameters:
        cas - The CAS to populate.
        is - The XMI data stream to populate the CAS with.
        attributeBufferSize -
        Throws:
        SAXException
        IOException
        See Also:
        https://issues.apache.org/jira/browse/XERCESJ-1257
      • binarySearch

        public static <T extends org.apache.uima.jcas.tcas.Annotation,​R extends Comparable<R>> int binarySearch​(List<T> annotations,
                                                                                                                      Function<T,​R> comparisonValueFunction,
                                                                                                                      R searchValue)
      • binarySearch

        public static <T extends org.apache.uima.jcas.tcas.Annotation,​R extends Comparable<R>> int binarySearch​(List<T> annotations,
                                                                                                                      Function<T,​R> comparisonValueFunction,
                                                                                                                      R searchValue,
                                                                                                                      int from,
                                                                                                                      int to)
      • resolveExternalResourceGzipInputStream

        public static InputStream resolveExternalResourceGzipInputStream​(org.apache.uima.resource.DataResource resource)
                                                                  throws IOException

        Helper method to transparently handle GZIPPed external resource files.

        When using external resources for analysis engines in UIMA, typically a custom object implementing SharedResourceObject is created as the resource provider. Since the overhead in handling external resources is mostly done when the resource is rather large, file resources are commonly compressed using GZIP. This method takes the input stream of the DataResource object passed by UIMA to SharedResourceObject.load(DataResource) and checks if its URI ends with .gzip or .gz. If so, the input stream is wrapped into a GZIPInputStream. This way a gzipped or plain resource file can be used without further code adaptions.

        Parameters:
        resource - The DataResource object passed to SharedResourceObject.load(DataResource).
        Returns:
        The original input stream, if the resource URI did not end in .gz or .gzip, a GZIP input stream otherwise.
        Throws:
        IOException - If reading the resource file fails.