public class JCoReTools extends Object
getDocId(JCas) which will look for an annotation of type de.julielab.jcore.types.Header and return its docId feature value.deserializeXmi(CAS, InputStream, int) method is used in UIMA 2.x to fix issues with special Unicode characters. For more information, refer to the JavaDoc of the method.| Modifier and Type | Field and Description |
|---|---|
static int |
DEFAULT_ADDITION_SIZE
Number of elements to be added if an FSArray needs to be resized, effectively creating a new, larger FSArray.
|
| Constructor and Description |
|---|
JCoReTools() |
| Modifier and Type | Method and Description |
|---|---|
static org.apache.uima.jcas.cas.FSArray |
addToFSArray(org.apache.uima.jcas.cas.FSArray inputArray,
Collection<? extends org.apache.uima.cas.FeatureStructure> newElements)
Returns an
FSArray that contains all elements of the given array and newElements. |
static org.apache.uima.jcas.cas.FSArray |
addToFSArray(org.apache.uima.jcas.cas.FSArray array,
org.apache.uima.cas.FeatureStructure newElement)
Returns an
FSArray that contains all elements of the given array and newElement. |
static org.apache.uima.jcas.cas.FSArray |
addToFSArray(org.apache.uima.jcas.cas.FSArray array,
org.apache.uima.cas.FeatureStructure newElement,
int additionSize)
Returns an
FSArray that contains all elements of the given array and newElement. |
static org.apache.uima.jcas.cas.StringArray |
addToStringArray(org.apache.uima.jcas.cas.StringArray array,
String element)
Creates a new string array, copies the values of
array into it and adds element. |
static org.apache.uima.jcas.cas.StringArray |
addToStringArray(org.apache.uima.jcas.cas.StringArray array,
String[] elements)
Creates a new string array, copies the values of
array into it and adds elements. |
static <T extends org.apache.uima.jcas.tcas.Annotation,R extends Comparable<R>> |
binarySearch(List<T> annotations,
java.util.function.Function<T,R> comparisonValueFunction,
R searchValue) |
static <T extends org.apache.uima.jcas.tcas.Annotation,R extends Comparable<R>> |
binarySearch(List<T> annotations,
java.util.function.Function<T,R> comparisonValueFunction,
R searchValue,
int from,
int to) |
static org.apache.uima.jcas.cas.FSArray |
copyFSArray(org.apache.uima.jcas.cas.FSArray array)
Returns a new
FSArray with the exact size and contents of array. |
static void |
deserializeXmi(org.apache.uima.cas.CAS cas,
InputStream is,
int attributeBufferSize)
Deserializes an UTF-8 encoded XMI input stream into the given CAS.
|
static String |
getDocId(org.apache.uima.jcas.JCas aJCas)
Returns the document ID of the document in the
JCas. |
static void |
printAnnotationIndex(org.apache.uima.jcas.JCas jCas,
int type) |
static void |
printFSArray(org.apache.uima.jcas.cas.FSArray array)
Prints the content of the FSArray to System.out
|
static InputStream |
resolveExternalResourceGzipInputStream(org.apache.uima.resource.DataResource resource)
Helper method to transparently handle GZIPPed external resource files.
|
public static final int DEFAULT_ADDITION_SIZE
public static org.apache.uima.jcas.cas.FSArray addToFSArray(org.apache.uima.jcas.cas.FSArray array,
org.apache.uima.cas.FeatureStructure newElement)
Returns an FSArray that contains all elements of the given array and newElement.
The new element
is set into array if it has trailing null entries. Then, the new element is set to
the first position of array that is null and only followed by null entries until the end of the
array. If array is full, i.e. there are no trailing null entries, a new FSArray
of size array.size()+ is created. All elements of DEFAULT_ADDITION_SIZEarray are copied into
the new FSArray and newElement is added after the last element of array.
Depending on there might be trailing DEFAULT_ADDITION_SIZEnull entries left in the new
FSArray they can be used to set further elements without the need to create a new FSArray.
In any case, it should be assumed that the return value is a new FSArray. Thus, one should not rely
on the possible in-place change of the passed array and replace the variable holding array
with the return value of this method.
array - The array to what the feature structure should be addednewElement - The feature structure that should be added to the arrayFSArray containing all entries from array plus newElement. The
returned FSArray will be array if there was enough space to add newElement
or a new FSArray otherwise.public static org.apache.uima.jcas.cas.FSArray addToFSArray(org.apache.uima.jcas.cas.FSArray array,
org.apache.uima.cas.FeatureStructure newElement,
int additionSize)
Returns an FSArray that contains all elements of the given array and newElement.
The new element
is set into array if it has trailing null entries. Then, the new element is set to
the first position of array that is null and only followed by null entries until the end of the
array. If array is full, i.e. there are no trailing null entries, a new FSArray
of size array.size()+additionSize is created. All elements of array are copied into
the new FSArray and newElement is added after the last element of array.
Depending on additionSize there might be trailing null entries left in the new
FSArray they can be used to set further elements without the need to create a new FSArray.
In any case, it should be assumed that the return value is a new FSArray. Thus, one should not rely
on the possible in-place change of the passed array and replace the variable holding array
with the return value of this method.
array - The array to what the feature structure should be addednewElement - The feature structure that should be added to the arrayadditionSize - The size the array should be expandedFSArray containing all entries from array plus newElement. The
returned FSArray will be array if there was enough space to add newElement
or a new FSArray otherwise.public static org.apache.uima.jcas.cas.FSArray addToFSArray(org.apache.uima.jcas.cas.FSArray inputArray,
Collection<? extends org.apache.uima.cas.FeatureStructure> newElements)
Returns an FSArray that contains all elements of the given array and newElements.
The new elements
are set into inputArray if it has trailing null entries. Then, the new elements are set to
the first positions of inputArray that are null and only followed by null entries until the end of the
array. If inputArray is too small, i.e. there are not enough trailing null entries, a new FSArray
of size inputArray.size()+newElements.size() is created. All elements of inputArray are copied into
the new FSArray and newElements are added after the last element of inputArray.
In any case, it should be assumed that the return value is a new FSArray. Thus, one should not rely
on the possible in-place change of the passed array and replace the variable holding array
with the return value of this method.
inputArray - The array to what the feature structures should be addednewElements - The feature structure that should be added to the arrayFSArray containing all entries from inputArray plus newElements. The
returned FSArray will be inputArray if there was enough space to add newElements
or a new FSArray otherwise.public static org.apache.uima.jcas.cas.FSArray copyFSArray(org.apache.uima.jcas.cas.FSArray array)
FSArray with the exact size and contents of array. This is a shallow
copy, the array entries are copied by reference.array - The FSArray to copy.FSArray with the size and contents of array.public static org.apache.uima.jcas.cas.StringArray addToStringArray(org.apache.uima.jcas.cas.StringArray array,
String element)
Creates a new string array, copies the values of array into it and adds element.
This method does not handle null values as addToFSArray(FSArray, FeatureStructure, int) does.
To add multiple elements at once, avoiding excessive copying, refer to addToStringArray(StringArray, String[]).
array - The source array to extend.element - The element to add.StringArray with the same contents as array extended by element.public static org.apache.uima.jcas.cas.StringArray addToStringArray(org.apache.uima.jcas.cas.StringArray array,
String[] elements)
Creates a new string array, copies the values of array into it and adds elements.
array - The array to extend.elements - The elements to add into a new array.StringArray containing all values of array plus elements.public static void printFSArray(org.apache.uima.jcas.cas.FSArray array)
array - The array to be printedpublic static void printAnnotationIndex(org.apache.uima.jcas.JCas jCas,
int type)
public static String getDocId(org.apache.uima.jcas.JCas aJCas)
Returns the document ID of the document in the JCas.
This can only be done when an annotation of type
de.julielab.jcore.types.Header (or a subtype) is present and
its feature docId is set.
aJCas - Header.getDocId()public static void deserializeXmi(org.apache.uima.cas.CAS cas,
InputStream is,
int attributeBufferSize)
throws SAXException,
IOException
Deserializes an UTF-8 encoded XMI input stream into the given CAS.
This method has largely been taken directly from
XmiCasDeserializer.deserialize(InputStream, CAS). However, the
given input stream is explicitly transformed into an UTF-8 encoded
InputSource for the XML parsing process. This is necessary
because the Xerces internal UTF-8 handling is faulty with Unicode
characters above the BMP (see
https://issues.apache.org/jira/browse/XERCESJ-1257). Thus, this method
explicitly uses UTF-8 encoding. For other encodings, use the default UIMA
deserialization mechanism.
The attributeBufferSize parameter only has an effect if the
julielab Xerces version is on the classpath. Then, the XMLStringBuffer
initial size is set via a system property. This can be very helpful for
documents because UIMA stores the document text as an attribute to the
sofa element in the XMI format. Such long attribute values are not
expected by Xerces which initializes its attribute buffers with a size of
32 chars. Then, reading a large sofa (= document text) string results in
a very long process of resizing the buffer array and copying the old
buffer contents into the larger array. By setting a larger size from the
beginning, a lot of time can be saved.
cas - The CAS to populate.is - The XMI data stream to populate the CAS with.attributeBufferSize - SAXExceptionIOExceptionpublic static <T extends org.apache.uima.jcas.tcas.Annotation,R extends Comparable<R>> int binarySearch(List<T> annotations, java.util.function.Function<T,R> comparisonValueFunction, R searchValue)
public static <T extends org.apache.uima.jcas.tcas.Annotation,R extends Comparable<R>> int binarySearch(List<T> annotations, java.util.function.Function<T,R> comparisonValueFunction, R searchValue, int from, int to)
public static InputStream resolveExternalResourceGzipInputStream(org.apache.uima.resource.DataResource resource) throws IOException
Helper method to transparently handle GZIPPed external resource files.
When using external resources for analysis engines in UIMA, typically a custom object implementing SharedResourceObject
is created as the resource provider. Since the overhead in handling external resources is mostly done when the resource is rather large, file
resources are commonly compressed using GZIP. This method takes the input stream of the DataResource object
passed by UIMA to SharedResourceObject.load(DataResource) and checks if its URI
ends with .gzip or .gz. If so, the input stream is wrapped into a GZIPInputStream. This way a gzipped or
plain resource file can be used without further code adaptions.
resource - The DataResource object passed to SharedResourceObject.load(DataResource).IOException - If reading the resource file fails.Copyright © 2018 JULIE Lab Jena, Germany. All rights reserved.