Class JulieXMLTools
- java.lang.Object
-
- de.julielab.xml.JulieXMLTools
-
public class JulieXMLTools extends Object
Utility class offering convenience methods.- Author:
- faessler
-
-
Field Summary
Fields Modifier and Type Field Description static intCONTENT_FRAGMENTstatic intELEMENT_FRAGMENT
-
Constructor Summary
Constructors Constructor Description JulieXMLTools()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static Map<String,String>buildNamespaceMap(com.ximpleware.VTDNav vn)Reads the namespace axis of the XML document associated with vn and returns a map connecting the namespace prefixes with their URI.static Iterator<Map<String,Object>>constructRowIterator(byte[] data, int bufferSize, String forEachXpath, List<Map<String,String>> fields, String identifier)Convenience method for quick construction of a row iterator over an XML document.static Iterator<Map<String,Object>>constructRowIterator(com.ximpleware.VTDNav vn, String forEachXpath, List<Map<String,String>> fields, String identifier)TheVTDNavvn is a VTD navigator over the XML file to return data records from.static Iterator<Map<String,Object>>constructRowIterator(String fileName, int bufferSize, String forEachXpath, List<Map<String,String>> fields, boolean largeFileSize)Convenience method for quick construction of a row iterator over an XML document.static Map<String,String>createField(String... configuration)static voiddeclareNamespaces(com.ximpleware.AutoPilot ap, Map<String,String> namespaceMap)Declares the given namespaces to the passed auto pilot.static <T> String[]expandArrayEntries(List<T> list, String fmtStr)static <T> String[]expandArrayEntries(T[] array, String fmtStr)static <T> String[]expandArrayEntries(T[] array, String[] fmtStrs)static StringgetElementText(com.ximpleware.VTDNav vn)static StringgetFragment(com.ximpleware.VTDNav vn, int fragmentType, boolean returnRawString)Returns the fragment of XML, where vn currently points to, as a string.static URLgetSolrServerURL(String urlStr, boolean calledByCLI, org.slf4j.Logger LOG)static com.ximpleware.VTDNavgetVTDNav(InputStream is, int bufferSize)static StringgetXpathValue(String xpath, com.ximpleware.AutoPilot ap)static StringgetXpathValue(String xpath, com.ximpleware.VTDNav vn)static StringgetXpathValue(String xpath, InputStream is)static byte[]gzipData(byte[] data)static byte[]readStream(InputStream is, int bufferSize)Reads anInputStreambuffer wise, concatenates all buffers and returns onebyte[]of exact length of the read data.static intsetElementText(com.ximpleware.VTDNav vn, com.ximpleware.AutoPilot ap, com.ximpleware.XMLModifier xm, String xpath, String text)Sets the text content of an XML element pointed to byxpathtotext.static byte[]unGzipData(byte[] gzipData)
-
-
-
Field Detail
-
ELEMENT_FRAGMENT
public static final int ELEMENT_FRAGMENT
- See Also:
- Constant Field Values
-
CONTENT_FRAGMENT
public static final int CONTENT_FRAGMENT
- See Also:
- Constant Field Values
-
-
Method Detail
-
constructRowIterator
public static Iterator<Map<String,Object>> constructRowIterator(String fileName, int bufferSize, String forEachXpath, List<Map<String,String>> fields, boolean largeFileSize)
Convenience method for quick construction of a row iterator over an XML document.The
fileNamedetermines the location of the XML file to return data records from. For more detailed information seeconstructRowIterator(VTDNav, String, List, String).- Parameters:
fileName- XML file to return data rows from.bufferSize- Size of buffers while reading the file atfileName.forEachXpath- An XPath expression determining the XML elements to retrieve data records from.fields- List of attribute-value pairs determining the record fields returned by the iterator.- Returns:
- An iterator over all rows extracted from the XMl document pointed
to by
fileName.
-
constructRowIterator
public static Iterator<Map<String,Object>> constructRowIterator(byte[] data, int bufferSize, String forEachXpath, List<Map<String,String>> fields, String identifier)
Convenience method for quick construction of a row iterator over an XML document.datacontains the XML data to return data records from. For more detailed information seeconstructRowIterator(VTDNav, String, List, String).- Parameters:
data- Byte array containing an XML document.bufferSize- Size of buffers while reading the file atfileName.forEachXpath- An XPath expression determining the XML elements to retrieve data records from.fields- List of attribute-value pairs determining the record fields returned by the iterator.identifier- A string identifying the XML document indata, needed for error messages.- Returns:
- An iterator over all rows extracted from the XMl document pointed
to by
fileName.
-
constructRowIterator
public static Iterator<Map<String,Object>> constructRowIterator(com.ximpleware.VTDNav vn, String forEachXpath, List<Map<String,String>> fields, String identifier)
The
VTDNavvn is a VTD navigator over the XML file to return data records from. For each evaluation of theforEachXPath expression, one data row is createdSuch a row consist of the fields given by the list
fieldsThe list containsMapsof attribute-value pairs. All fields are required to have aJulieXMLConstants.XPATHattribute which specifies the XPath pointing to information in the XML documents to retrieve. Likewise, aJulieXMLConstants.NAMEattribute is required. This attribute determines the name of the field in the resulting row containing the information retrieved by the field'sConstants.XPATHattribute.Example:
A field with the following attribute-value-pairs
<field name="pmid" xpath="/MedlineCitationSet/MedlineCitation/PMID" >
will create one field in each returned data row named "pmid" and its value will by the character data at the XPath "/MedlineCitationSet/MedlineCitation/PMID".- Parameters:
vn- TheVTDNavobject which navigates over the XML document to retrieve records from.forEachXpath- An XPath expression determining the XML elements for each of which one row should be created.fields- The fields to be returned with each data row.- Returns:
- An iterator over all rows extracted from the XMl document
navigated by
vn.
-
declareNamespaces
public static void declareNamespaces(com.ximpleware.AutoPilot ap, Map<String,String> namespaceMap)Declares the given namespaces to the passed auto pilot. The namespaceMap can automatically be derived from an XML document by callingbuildNamespaceMap(VTDNav).- Parameters:
ap-namespaceMap-
-
buildNamespaceMap
public static Map<String,String> buildNamespaceMap(com.ximpleware.VTDNav vn) throws com.ximpleware.VTDException
Reads the namespace axis of the XML document associated with vn and returns a map connecting the namespace prefixes with their URI. This map can be passed todeclareNamespaces(AutoPilot, Map)to declare all the namespaces of the document to anAutoPilot.- Parameters:
vn-- Returns:
- Throws:
com.ximpleware.VTDException
-
getVTDNav
public static com.ximpleware.VTDNav getVTDNav(InputStream is, int bufferSize) throws com.ximpleware.ParseException, FileTooBigException
- Throws:
com.ximpleware.ParseExceptionFileTooBigException
-
readStream
public static byte[] readStream(InputStream is, int bufferSize) throws IOException
Reads anInputStreambuffer wise, concatenates all buffers and returns onebyte[]of exact length of the read data.- Parameters:
is-InputStreamto read.bufferSize- Size of maximum bytes to read by oneis.read()call.- Returns:
- A
byte[]containing all the data of theInputStream. - Throws:
IOException
-
gzipData
public static byte[] gzipData(byte[] data)
-
unGzipData
public static byte[] unGzipData(byte[] gzipData) throws IOException- Throws:
IOException
-
getSolrServerURL
public static URL getSolrServerURL(String urlStr, boolean calledByCLI, org.slf4j.Logger LOG)
-
getElementText
public static String getElementText(com.ximpleware.VTDNav vn) throws com.ximpleware.NavException
- Throws:
com.ximpleware.NavException
-
getFragment
public static String getFragment(com.ximpleware.VTDNav vn, int fragmentType, boolean returnRawString) throws com.ximpleware.NavException
Returns the fragment of XML, where vn currently points to, as a string.- Parameters:
vn- The XML navigator.fragmentType- EitherELEMENT_FRAGMENTorCONTENT_FRAGMENT. Determines which respective method on vn is called. The first returns the whole element, including starting and end tag, the latter omits the tags of the element and only returns its enclosed contents.returnRawString- Whether to return a raw string, i.e. the pure XML fragment without resolving XML entities, or a "readable" string which then possibly cannot be used for further XML parsing.- Returns:
- The XML fragment of the current element vn points to.
- Throws:
com.ximpleware.NavException
-
setElementText
public static int setElementText(com.ximpleware.VTDNav vn, com.ximpleware.AutoPilot ap, com.ximpleware.XMLModifier xm, String xpath, String text) throws com.ximpleware.VTDException, UnsupportedEncodingExceptionSets the text content of an XML element pointed to byxpathtotext.The cursor of
vnis moved to the element determined byxpath.- Parameters:
vn-VTDNavobject navigating the XML document to modify.ap-AutoPilotobject bound tovn.xm-XMLModifierobject bound tovn.xpath- An XPath expression pointing to the XML element whose text should be set.text- The text which is to be set to the XML element pointed to byxpath.- Returns:
- The VTD index of the changed element, -1 otherwise.
- Throws:
com.ximpleware.VTDException- If something with navigation or modification of the XML document goes wrong.UnsupportedEncodingException
-
getXpathValue
public static String getXpathValue(String xpath, com.ximpleware.AutoPilot ap) throws com.ximpleware.XPathParseException
- Throws:
com.ximpleware.XPathParseException
-
getXpathValue
public static String getXpathValue(String xpath, com.ximpleware.VTDNav vn) throws com.ximpleware.XPathParseException
- Throws:
com.ximpleware.XPathParseException
-
getXpathValue
public static String getXpathValue(String xpath, InputStream is) throws IOException, com.ximpleware.XPathParseException, com.ximpleware.ParseException
- Throws:
IOExceptioncom.ximpleware.XPathParseExceptioncom.ximpleware.ParseException
-
-