Class JCoReTools
- java.lang.Object
-
- de.julielab.jcore.utility.JCoReTools
-
public class JCoReTools extends Object
- The binarySearch methods work specifically on Annotation objects, sorted by given function.
- The addToFSArray methods are useful for adding elements to FSArrays which are rather awkward to use and, especially, to extend.
- The addToStringArray methods serve a similar purpose.
- One of the most used methods from this list is
getDocId(JCas)which will look for an annotation of type de.julielab.jcore.types.Header and return its docId feature value. - The
deserializeXmi(CAS, InputStream, int)method is used in UIMA 2.x to fix issues with special Unicode characters. For more information, refer to the JavaDoc of the method.
- Author:
- faessler
-
-
Field Summary
Fields Modifier and Type Field Description static intDEFAULT_ADDITION_SIZENumber of elements to be added if an FSArray needs to be resized, effectively creating a new, larger FSArray.
-
Constructor Summary
Constructors Constructor Description JCoReTools()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static org.apache.uima.jcas.cas.FSArrayaddToFSArray(org.apache.uima.jcas.cas.FSArray inputArray, Collection<? extends org.apache.uima.cas.FeatureStructure> newElements)Returns anFSArraythat contains all elements of the given array andnewElements.static org.apache.uima.jcas.cas.FSArrayaddToFSArray(org.apache.uima.jcas.cas.FSArray array, org.apache.uima.cas.FeatureStructure newElement)Returns anFSArraythat contains all elements of the given array andnewElement.static org.apache.uima.jcas.cas.FSArrayaddToFSArray(org.apache.uima.jcas.cas.FSArray array, org.apache.uima.cas.FeatureStructure newElement, int additionSize)Returns anFSArraythat contains all elements of the given array andnewElement.static org.apache.uima.jcas.cas.StringArrayaddToStringArray(org.apache.uima.jcas.cas.StringArray array, String element)Creates a new string array, copies the values ofarrayinto it and addselement.static org.apache.uima.jcas.cas.StringArrayaddToStringArray(org.apache.uima.jcas.cas.StringArray array, String[] elements)Creates a new string array, copies the values ofarrayinto it and addselements.static <T extends org.apache.uima.jcas.tcas.Annotation,R extends Comparable<R>>
intbinarySearch(List<T> annotations, Function<T,R> comparisonValueFunction, R searchValue)static <T extends org.apache.uima.jcas.tcas.Annotation,R extends Comparable<R>>
intbinarySearch(List<T> annotations, Function<T,R> comparisonValueFunction, R searchValue, int from, int to)static org.apache.uima.jcas.cas.FSArraycopyFSArray(org.apache.uima.jcas.cas.FSArray array)Returns a newFSArraywith the exact size and contents ofarray.static voiddeserializeXmi(org.apache.uima.cas.CAS cas, InputStream is, int attributeBufferSize)Deserializes an UTF-8 encoded XMI input stream into the given CAS.static StringgetDocId(org.apache.uima.jcas.JCas aJCas)Returns the document ID of the document in theJCas.static org.apache.uima.jcas.cas.StringArraynewStringArray(org.apache.uima.jcas.JCas jCas, String... elements)Creates a new StringArray from the given string elements.static voidprintAnnotationIndex(org.apache.uima.jcas.JCas jCas, int type)static voidprintFSArray(org.apache.uima.jcas.cas.FSArray array)Prints the content of the FSArray to System.outstatic InputStreamresolveExternalResourceGzipInputStream(org.apache.uima.resource.DataResource resource)Helper method to transparently handle GZIPPed external resource files.
-
-
-
Field Detail
-
DEFAULT_ADDITION_SIZE
public static final int DEFAULT_ADDITION_SIZE
Number of elements to be added if an FSArray needs to be resized, effectively creating a new, larger FSArray.- See Also:
- Constant Field Values
-
-
Method Detail
-
addToFSArray
public static org.apache.uima.jcas.cas.FSArray addToFSArray(org.apache.uima.jcas.cas.FSArray array, org.apache.uima.cas.FeatureStructure newElement)Returns an
FSArraythat contains all elements of the given array andnewElement.The new element is set into
arrayif it has trailingnullentries. Then, the new element is set to the first position ofarraythat isnulland only followed bynullentries until the end of the array. Ifarrayis full, i.e. there are no trailingnullentries, a newFSArrayof sizearray.size()+is created. All elements ofDEFAULT_ADDITION_SIZEarrayare copied into the newFSArrayandnewElementis added after the last element ofarray. Depending onthere might be trailingDEFAULT_ADDITION_SIZEnullentries left in the newFSArraythey can be used to set further elements without the need to create a newFSArray.In any case, it should be assumed that the return value is a new
FSArray. Thus, one should not rely on the possible in-place change of the passedarrayand replace the variable holdingarraywith the return value of this method.- Parameters:
array- The array to what the feature structure should be addednewElement- The feature structure that should be added to the array- Returns:
- An
FSArraycontaining all entries fromarrayplusnewElement. The returnedFSArraywill bearrayif there was enough space to addnewElementor a newFSArrayotherwise.
-
addToFSArray
public static org.apache.uima.jcas.cas.FSArray addToFSArray(org.apache.uima.jcas.cas.FSArray array, org.apache.uima.cas.FeatureStructure newElement, int additionSize)Returns an
FSArraythat contains all elements of the given array andnewElement.The new element is set into
arrayif it has trailingnullentries. Then, the new element is set to the first position ofarraythat isnulland only followed bynullentries until the end of the array. Ifarrayis full, i.e. there are no trailingnullentries, a newFSArrayof sizearray.size()+additionSizeis created. All elements ofarrayare copied into the newFSArrayandnewElementis added after the last element ofarray. Depending onadditionSizethere might be trailingnullentries left in the newFSArraythey can be used to set further elements without the need to create a newFSArray.In any case, it should be assumed that the return value is a new
FSArray. Thus, one should not rely on the possible in-place change of the passedarrayand replace the variable holdingarraywith the return value of this method.- Parameters:
array- The array to what the feature structure should be addednewElement- The feature structure that should be added to the arrayadditionSize- The size the array should be expanded- Returns:
- An
FSArraycontaining all entries fromarrayplusnewElement. The returnedFSArraywill bearrayif there was enough space to addnewElementor a newFSArrayotherwise.
-
addToFSArray
public static org.apache.uima.jcas.cas.FSArray addToFSArray(org.apache.uima.jcas.cas.FSArray inputArray, Collection<? extends org.apache.uima.cas.FeatureStructure> newElements)Returns an
FSArraythat contains all elements of the given array andnewElements.The new elements are set into
inputArrayif it has trailingnullentries. Then, the new elements are set to the first positions ofinputArraythat arenulland only followed bynullentries until the end of the array. IfinputArrayis too small, i.e. there are not enough trailingnullentries, a newFSArrayof sizeinputArray.size()+newElements.size()is created. All elements ofinputArrayare copied into the newFSArrayandnewElementsare added after the last element ofinputArray.In any case, it should be assumed that the return value is a new
FSArray. Thus, one should not rely on the possible in-place change of the passedarrayand replace the variable holdingarraywith the return value of this method.- Parameters:
inputArray- The array to what the feature structures should be added, may be null.newElements- The feature structure that should be added to the array- Returns:
- An
FSArraycontaining all entries frominputArrayplusnewElements. The returnedFSArraywill beinputArrayif there was enough space to addnewElementsor a newFSArrayotherwise.
-
copyFSArray
public static org.apache.uima.jcas.cas.FSArray copyFSArray(org.apache.uima.jcas.cas.FSArray array)
Returns a newFSArraywith the exact size and contents ofarray. This is a shallow copy, the array entries are copied by reference.- Parameters:
array- TheFSArrayto copy.- Returns:
- A new
FSArraywith the size and contents ofarray.
-
newStringArray
public static org.apache.uima.jcas.cas.StringArray newStringArray(org.apache.uima.jcas.JCas jCas, String... elements)Creates a new StringArray from the given string elements.- Parameters:
jCas- The jCas to associate the new StringArray with.elements- The strings to put into the StringArray.- Returns:
- The new, filled StringArray.
-
addToStringArray
public static org.apache.uima.jcas.cas.StringArray addToStringArray(org.apache.uima.jcas.cas.StringArray array, String element)Creates a new string array, copies the values of
arrayinto it and addselement.This method does not handle
nullvalues asaddToFSArray(FSArray, FeatureStructure, int)does. To add multiple elements at once, avoiding excessive copying, refer toaddToStringArray(StringArray, String[]).- Parameters:
array- The source array to extend.element- The element to add.- Returns:
- A new
StringArraywith the same contents asarrayextended byelement.
-
addToStringArray
public static org.apache.uima.jcas.cas.StringArray addToStringArray(org.apache.uima.jcas.cas.StringArray array, String[] elements)Creates a new string array, copies the values of
arrayinto it and addselements.- Parameters:
array- The array to extend.elements- The elements to add into a new array.- Returns:
- A new
StringArraycontaining all values ofarraypluselements.
-
printFSArray
public static void printFSArray(org.apache.uima.jcas.cas.FSArray array)
Prints the content of the FSArray to System.out- Parameters:
array- The array to be printed
-
printAnnotationIndex
public static void printAnnotationIndex(org.apache.uima.jcas.JCas jCas, int type)
-
getDocId
public static String getDocId(org.apache.uima.jcas.JCas aJCas)
Returns the document ID of the document in the
JCas.This can only be done when an annotation of type
de.julielab.jcore.types.Header(or a subtype) is present and its featuredocIdis set.- Parameters:
aJCas-- Returns:
- The value of of
Header.getDocId()
-
deserializeXmi
public static void deserializeXmi(org.apache.uima.cas.CAS cas, InputStream is, int attributeBufferSize) throws SAXException, IOExceptionDeserializes an UTF-8 encoded XMI input stream into the given CAS.
This method has largely been taken directly from
XmiCasDeserializer.deserialize(InputStream, CAS). However, the given input stream is explicitly transformed into an UTF-8 encodedInputSourcefor the XML parsing process. This is necessary because the Xerces internal UTF-8 handling is faulty with Unicode characters above the BMP (see https://issues.apache.org/jira/browse/XERCESJ-1257). Thus, this method explicitly uses UTF-8 encoding. For other encodings, use the default UIMA deserialization mechanism.The
attributeBufferSizeparameter only has an effect if the julielab Xerces version is on the classpath. Then, the XMLStringBuffer initial size is set via a system property. This can be very helpful for documents because UIMA stores the document text as an attribute to the sofa element in the XMI format. Such long attribute values are not expected by Xerces which initializes its attribute buffers with a size of 32 chars. Then, reading a large sofa (= document text) string results in a very long process of resizing the buffer array and copying the old buffer contents into the larger array. By setting a larger size from the beginning, a lot of time can be saved.- Parameters:
cas- The CAS to populate.is- The XMI data stream to populate the CAS with.attributeBufferSize-- Throws:
SAXExceptionIOException- See Also:
- https://issues.apache.org/jira/browse/XERCESJ-1257
-
binarySearch
public static <T extends org.apache.uima.jcas.tcas.Annotation,R extends Comparable<R>> int binarySearch(List<T> annotations, Function<T,R> comparisonValueFunction, R searchValue)
-
binarySearch
public static <T extends org.apache.uima.jcas.tcas.Annotation,R extends Comparable<R>> int binarySearch(List<T> annotations, Function<T,R> comparisonValueFunction, R searchValue, int from, int to)
-
resolveExternalResourceGzipInputStream
public static InputStream resolveExternalResourceGzipInputStream(org.apache.uima.resource.DataResource resource) throws IOException
Helper method to transparently handle GZIPPed external resource files.
When using external resources for analysis engines in UIMA, typically a custom object implementing
SharedResourceObjectis created as the resource provider. Since the overhead in handling external resources is mostly done when the resource is rather large, file resources are commonly compressed using GZIP. This method takes the input stream of theDataResourceobject passed by UIMA toSharedResourceObject.load(DataResource)and checks if its URI ends with .gzip or .gz. If so, the input stream is wrapped into aGZIPInputStream. This way a gzipped or plain resource file can be used without further code adaptions.- Parameters:
resource- TheDataResourceobject passed toSharedResourceObject.load(DataResource).- Returns:
- The original input stream, if the resource URI did not end in .gz or .gzip, a GZIP input stream otherwise.
- Throws:
IOException- If reading the resource file fails.
-
-