Skip navigation links
de.julielab.xmlData.dataBase

Class DataBaseConnector

    • Field Detail

      • META_IN_ARRAY

        @Deprecated
        public static final int META_IN_ARRAY
        Deprecated. 
        Used as a hack for the not-yet-published EMNLP-Paper. In the meantime, a more sophisticated system has been implemented (EF, 18.01.2012)
        See Also:
        Constant Field Values
      • subsetColumns

        public static final LinkedHashMap<String,String> subsetColumns
        This is the definition of subset tables except the primary key.
    • Constructor Detail

      • DataBaseConnector

        public DataBaseConnector(InputStream configStream)
        This class creates a connection with a database and allows for convenient queries and commands.
        Parameters:
        configStream - used to read the configuration for this connector instance
      • DataBaseConnector

        public DataBaseConnector(InputStream configStream,
                                 int queryBatchSize)
        This class creates a connection with a database and allows for convenient queries and commands.
        Parameters:
        configStream - used to read the configuration for this connector instance
        queryBatchSize - background threads are utilized to speed up queries, this parameter determines the number of pre-fetched entries
      • DataBaseConnector

        public DataBaseConnector(String dbUrl,
                                 String user,
                                 String password,
                                 String pgSchema,
                                 InputStream fieldDefinition)
        This class creates a connection with a database and allows for convenient queries and commands.
        Parameters:
        dbUrl - the url of the database
        user - the username for the db
        password - the password for the username
        fieldDefinition - InputStream containing data of a configuration file
      • DataBaseConnector

        public DataBaseConnector(String dbUrl,
                                 String user,
                                 String password,
                                 String pgSchema,
                                 int queryBatchSize,
                                 InputStream configStream)
        This class creates a connection with a database and allows for convenient queries and commands.
        Parameters:
        dbUrl - the url of the database
        user - the username for the db
        password - the password for the username
        queryBatchSize - background threads are utilized to speed up queries, this parameter determines the number of pre-fetched entries
        configStream - used to read the configuration for this connector instance
      • DataBaseConnector

        public DataBaseConnector(String dbUrl,
                                 String user,
                                 String password)
        This class creates a connection with a database and allows for convenient queries and commands.
        Parameters:
        dbUrl - the url of the database
        user - the username for the db
        password - the password for the username
    • Method Detail

      • setHost

        public void setHost(String host)
      • setPort

        public void setPort(String port)
      • setPort

        public void setPort(Integer port)
      • setUser

        public void setUser(String user)
      • setPassword

        public void setPassword(String password)
      • getConn

        public Connection getConn()
        Returns:
        A Connection to the database.
      • getActiveDataTable

        public String getActiveDataTable()
        Returns:
        the activeDataTable
      • getEffectiveConfiguration

        public byte[] getEffectiveConfiguration()

        Returns the effective XML configuration as a byte[].

        The effective configuration consists of the default configuration and the given user configuration as well (merged by the ConfigReader in the constructor).

        Returns:
        the effectiveConfiguration
      • getActiveDataPGSchema

        public String getActiveDataPGSchema()
      • getActivePGSchema

        public String getActivePGSchema()
      • setActivePGSchema

        public void setActivePGSchema(String pgSchema)
      • getActiveTableSchema

        public String getActiveTableSchema()
      • setActiveTableSchema

        public void setActiveTableSchema(String schemaName)
      • getActiveTableFieldConfiguration

        public FieldConfig getActiveTableFieldConfiguration()
      • retrieveAndMark

        public List<Object[]> retrieveAndMark(String subsetTableName,
                                              String readerComponent,
                                              String hostName,
                                              String pid)
                                       throws TableSchemaMismatchException

        Retrieves from a subset-table limit primary keys whose rows are not marked to be in process or finished being processed and sets the rows of the retrieved primary keys as being "in process".

        The table is locked during this transaction. Locking and marking ensure that every primary key will be returned exactly once. Remember to remove the marks if you want to use the subset again ;)

        Parameters:
        subsetTableName - - name of a table, conforming to the subset standard
        hostName - - will be saved in the subset table
        pid - - will be saved in the subset table
        Returns:
        An ArrayList of pmids which have not yet been processed
        Throws:
        TableSchemaMismatchException
      • retrieveAndMark

        public List<Object[]> retrieveAndMark(String subsetTableName,
                                              String readerComponent,
                                              String hostName,
                                              String pid,
                                              int limit,
                                              String order)
                                       throws TableSchemaMismatchException

        Retrieves primary keys from a subset table and marks them as being "in process". The table schema - and thus the form of the primary keys - is assumed to match the active table schema determined in the configuration file.

        The table is locked during this transaction. Locking and marking ensure that every primary key will be returned exactly once. Remember to remove the marks if you want to use the subset again ;)
        Parameters:
        subsetTableName - - name of a table, conforming to the subset standard
        hostName - - will be saved in the subset table
        pid - - will be saved in the subset table
        limit - - batchsize for marking/retrieving
        order - - determines an ordering. Default order (which may change over time) when this parameter is null or empty.
        Returns:
        An ArrayList of primary keys which have not yet been processed.
        Throws:
        TableSchemaMismatchException
        See Also:
        retrieveAndMark(String, String, String, String, int, String)
      • retrieveAndMark

        public List<Object[]> retrieveAndMark(String subsetTableName,
                                              String schemaName,
                                              String readerComponent,
                                              String hostName,
                                              String pid,
                                              int limit,
                                              String order)
                                       throws TableSchemaMismatchException

        Retrieves from a subset-table limit primary keys whose rows are not marked to be in process or finished being processed and sets the rows of the retrieved primary keys as being "in process".

        The following parameters may be set:

        • limit - sets the maximum number of primary keys retrieved
        • order - determines whether to retrieve the primary keys in a particular order. Note that the default order of rows is undefined. If you need the same order in every run, you should specify some ordering as an SQL 'ORDER BY' statement. When order is not prefixed with 'ORDER BY' (case ignored), it will be inserted.

        The table is locked during this transaction. Locking and marking ensure that every primary key will be returned exactly once. Remember to remove the marks if you want to use the subset again ;)

        Parameters:
        subsetTableName - - name of a table, conforming to the subset standard
        hostName - - will be saved in the subset table
        pid - - will be saved in the subset table
        limit - - batchsize for marking/retrieving
        order - - determines an ordering. Default order (which may change over time) when this parameter is null or empty.
        Returns:
        An ArrayList of primary keys which have not yet been processed.
        Throws:
        TableSchemaMismatchException
      • countUnprocessed

        public int countUnprocessed(String subsetTableName,
                                    String schemaName)
        Counts the unprocessed rows in a subset table
        Parameters:
        subsetTableName - - name of the subset table
        Returns:
        - number of rows
      • countRowsOfDataTable

        public int countRowsOfDataTable(String tableName,
                                        String whereCondition)
      • countRowsOfDataTable

        public int countRowsOfDataTable(String tableName,
                                        String whereCondition,
                                        String schemaName)
      • hasUnfetchedRows

        public boolean hasUnfetchedRows(String tableName)
      • hasUnfetchedRows

        public boolean hasUnfetchedRows(String tableName,
                                        String schemaName)
        Utility **********************************
      • deleteFromTableSimplePK

        public <T> void deleteFromTableSimplePK(String table,
                                                List<T> ids)
        Deletes entries from a table where the primary key of this table must consist of exactly one column. For deletion from tables which contain a multi-column-primary-key see deleteFromTable(String, List).
        Parameters:
        table - name of the table
        ids - primary key arrays defining the entries to delete
        See Also:
        deleteFromTable(String, List)
      • markAsProcessed

        public void markAsProcessed(String table,
                                    List<Object[]> ids)
        Modifies a subset table, marking entries as processed.
        Parameters:
        table - name of the subset table
        ids - primary key arrays defining the entries to delete
      • modifyTable

        public void modifyTable(String sql,
                                List<Object[]> ids)

        Executes a given SQL command (must end with "WHERE "!) an extends the WHERE-clause with the primary keys, set to the values in ids.

        Assumes that the form of the primary keys matches the definition given in the active table schema in the configuration.

        Parameters:
        sql - a valid SQL command, ending with "WHERE "
        ids - list of primary key arrays
        See Also:
        modifyTable(String, List)
      • modifyTable

        public void modifyTable(String sql,
                                List<Object[]> ids,
                                String schemaName)

        Executes a given SQL command (must end with "WHERE "!) an extends the WHERE-clause with the primary keys, set to the values in ids.

        Parameters:
        sql - a valid SQL command, ending with "WHERE "
        ids - list of primary key arrays
        schemaName - name of the schema which defines the primary keys
      • getReferencedTable

        public String getReferencedTable(String referencingTable)
        Returns the name of a table referenced by an SQL-foreign-key.
        Parameters:
        referencingTable - the name of the table for which the foreign keys shall be checked
        Returns:
        the name of the first referenced table or null if there is no referenced table (i.e. the passed table name denotes a data table).
        Throws:
        IllegalArgumentException - When referencingTable is null.
      • createSchema

        public void createSchema(String schemaName)
        Creates the PostgreSQL schema schemaName in the active database.
        Parameters:
        schemaName - The name of the PostgreSQL schema to create.
      • createTable

        public void createTable(String tableName,
                                String comment)
                         throws SQLException
        Creates a new table according to the field schema definition corresponding to the active schema name determined in the configuration.
        Parameters:
        tableName - the name of the new table
        Throws:
        SQLException
      • createTable

        public void createTable(String tableName,
                                String schemaName,
                                String comment)
                         throws SQLException
        Creates a new table according to the field schema definition corresponding to the name schemaName given in the configuration file.
        Parameters:
        tableName - the name of the new table
        Throws:
        SQLException
      • createTable

        public void createTable(String tableName,
                                String referenceTableName,
                                String schemaName,
                                String comment)
                         throws SQLException

        Creates a new table according to the field schema definition corresponding to the name schemaName and with foreign key references to the primary key of referenceTableName.

        The primary key of the tables tableName and referenceTableName must be equal. The foreign key constraint is configured for ON DELETE CASCADE which means, when in the referenced table rows are deleted, there are also deleted in the table created by this method call.

        Parameters:
        tableName - The name of the new table.
        referenceTableName - The table to be referenced by this table.
        schemaName - The table schema determining the structure (especially the primary key) of the new table.
        comment - A comment for the new table.
        Throws:
        SQLException
      • createSubsetTable

        public void createSubsetTable(String subsetTable,
                                      String supersetTable,
                                      Integer maxNumberRefHops,
                                      String comment)
                               throws SQLException

        Does the same as createSubsetTable(String, String, Integer, String, String) with the exception that the assumed table schema is that of the active schema defined in the configuration file.

        Parameters:
        subsetTable - name of the subset table
        supersetTable - name of the referenced table
        maxNumberRefHops - the maximum number of times a foreign key reference to a data table may be followed
        comment - will be added to the table in the database, used to make tables reproducable
        Throws:
        SQLException
      • createSubsetTable

        public void createSubsetTable(String subsetTable,
                                      String supersetTable,
                                      String comment)
                               throws SQLException

        Does the same as createSubsetTable(String, String, Integer, String, String) with the exception that the assumed table schema is that of the active schema defined in the configuration file and the first referenced data table is used as data table.

        Parameters:
        subsetTable - name of the subset table
        supersetTable - name of the referenced table
        comment - will be added to the table in the database, used to make tables reproducable
        Throws:
        SQLException
      • createSubsetTable

        public void createSubsetTable(String subsetTable,
                                      String supersetTable,
                                      Integer posOfDataTable,
                                      String comment,
                                      String schemaName)
                               throws SQLException

        Creates an empty table referencing the primary key of the data table given by superSetTable or, if this is a subset table itself, the data table referenced by that table.

        To fill the empty subset table with data, use one of the init[...] methods offered by this class.

        Subset tables have a particular table scheme. They define a foreign key to the primary key of the referenced data table. There are the following additional columns:

        Name Type
        is_in_process boolean
        is_processed boolean
        last_component text
        log text
        has errors boolean
        pid character varying(10)
        host_name character varying(100)
        processing_timestamp timestamp without time zone

        The subset table can be used for processing, e.g. by UIMA CollectionReaders, which store information about the processing in it.

        The actual data is located in the referenced table.

        Parameters:
        subsetTable - name of the subset table
        supersetTable - name of the referenced table
        posOfDataTable - the position of the datatable that should be referenced; the 1st would be nearest data table, i.e. perhaps supersetTable itself. The 2nd would be the datatable referenced by the first data table on the reference path.
        schemaName - name of the table schema to work with (determined in the configuration file)
        comment - will be added to the table in the database, used to make tables reproducable
        Throws:
        SQLException
      • createIndex

        public void createIndex(String table,
                                String... columns)
                         throws SQLException
        Creates an index for table table on the given columns. The name of the index will be <table>_idx. It is currently not possible to create a second index since the names would collide. This would require an extension of this method for different names.
        Parameters:
        table - The table for which an index should be created.
        columns - The columns the index should cover.
        Throws:
        SQLException - In case something goes wrong.
      • getReferencedTable

        public String getReferencedTable(String startTable,
                                         Integer posOfDataTable)
                                  throws SQLException
        Gets the - possibly indirectly - referenced table of startTable where posOfDataTable specifies the position of the desired table in the reference chain starting at startTable.
        Parameters:
        startTable -
        posOfDataTable -
        Returns:
        Throws:
        SQLException
      • getNextDataTable

        public String getNextDataTable(String referencingTable)
                                throws SQLException
        Follows the foreign-key specifications of the given table to the referenced table. This process is repeated until a non-subset table (a table for which isSubsetTable(String) returns false) is encountered or a table without a foreign-key is found. If referencingTable has no foreign-key itself, null is returned since the referenced table does not exist.
        Parameters:
        referencingTable - The table to get the next referenced data table for, possibly across other subsets if referencingTable denotes a subset table..
        Returns:
        The found data table or null, if referencingTable is a data table itself.
        Throws:
        SQLException - If table meta data checking fails.
      • getNextOrThisDataTable

        public String getNextOrThisDataTable(String referencingTable)
                                      throws SQLException
        Determines the first data table on the reference path referencingTable -> table1 -> table2 -> ... -> lastTable -> null referenced from referencingTable. This means that referencingTable is returned itself if it is a data table.
        Parameters:
        referencingTable - The start point table for the path for which the first data table is to be returned.
        Returns:
        The first data table on the foreign-key path beginning with referencingTable itself.
        Throws:
        SQLException - If a database operation fails.
      • isSubsetTable

        public boolean isSubsetTable(String table)
                              throws SQLException

        Checks if the given table is a subset table.

        A database table is identified to be a subset table if it exhibits all the column names that subsets have. Those are defined in subsetColumns.

        Parameters:
        table - The table to check for being a subset table.
        Returns:
        True, iff table denotes a subset table, false otherwise. The latter case includes the table parameter being null.
        Throws:
        SQLException - If table meta data checking fails.
      • tableExists

        public boolean tableExists(Connection conn,
                                   String tableName)
        Tests if a table exists.
        Parameters:
        tableName - name of the table to test
        Returns:
        true if the table exists, false otherwise
      • tableExists

        public boolean tableExists(String tableName)
        Tests if a table exists.
        Parameters:
        tableName - name of the table to test
        Returns:
        true if the table exists, false otherwise
      • schemaExists

        public boolean schemaExists(String schemaName)
        Tests if a schema exists.
        Parameters:
        schemaName - name of the schema to test
        Returns:
        true if the schema exists, false otherwise
      • isEmpty

        public boolean isEmpty(String tableName)
        Tests if a table contains entries.
        Parameters:
        tableName - name of the schema to test
        Returns:
        true if the table has entries, false otherwise
      • defineSubset

        public void defineSubset(String subsetTable,
                                 String supersetTable,
                                 String comment)
                          throws SQLException

        Convenience method for creating and initializing a subset in one step. See method references below for more information.

        Parameters:
        subsetTable -
        supersetTable -
        comment -
        Throws:
        SQLException
        See Also:
        initSubset(String, String)
      • defineMirrorSubset

        public void defineMirrorSubset(String subsetTable,
                                       String supersetTable,
                                       boolean performUpdate,
                                       String comment)
                                throws SQLException

        Convenience method for creating and initializing a subset in one step. See method references below for more information.

        Parameters:
        subsetTable -
        supersetTable -
        comment -
        Throws:
        SQLException
      • defineMirrorSubset

        public void defineMirrorSubset(String subsetTable,
                                       String supersetTable,
                                       boolean performUpdate,
                                       Integer maxNumberRefHops,
                                       String comment)
                                throws SQLException

        Convenience method for creating and initializing a subset in one step. See method references below for more information.

        Parameters:
        subsetTable -
        supersetTable -
        maxNumberRefHops - the maximum number of times a foreign key reference to a data table may be followed
        comment -
        Throws:
        SQLException
        See Also:
        createSubsetTable(String, String, Integer, String)
      • defineMirrorSubset

        public void defineMirrorSubset(String subsetTable,
                                       String supersetTable,
                                       boolean performUpdate,
                                       String comment,
                                       String schemaName)
                                throws SQLException

        Convenience method for creating and initializing a subset in one step. See method references below for more information.

        Parameters:
        subsetTable -
        supersetTable -
        comment -
        schemaName -
        Throws:
        SQLException
      • initRandomSubset

        public void initRandomSubset(int size,
                                     String subsetTable,
                                     String superSetTable,
                                     String schemaName)

        Selects size rows of the given super set table randomly and inserts them into the subset table.

        Parameters:
        size - size of the subset to create
        subsetTable - name of subset table to insert the chosen rows into
        superSetTable - name of the table to choose from
        schemaName - name of the schema to use
      • initSubset

        public void initSubset(List<String> values,
                               String subsetTable,
                               String supersetTable,
                               String columnToTest)
        Defines a subset by populating a subset table with primary keys from another table. A WHERE clause is used to control which entries are copied, checking if columnToTest has the desired value.
        Parameters:
        values - Desired values for the columnToTest
        subsetTable - name of the subset table
        supersetTable - name of table to reference
        columnToTest - column to check for value
      • initSubset

        public void initSubset(List<String> values,
                               String subsetTable,
                               String supersetTable,
                               String columnToTest,
                               String schemaName)
        Defines a subset by populating a subset table with primary keys from another table. A WHERE clause is used to control which entries are copied, checking if columnToTest has the desired value.
        Parameters:
        values - Desired values for the columnToTest
        subsetTable - name of the subset table
        supersetTable - name of table to reference
        schemaName - schema to use
        columnToTest - column to check for value
      • initSubset

        public void initSubset(String subsetTable,
                               String supersetTable)
        Initializes subsetTable by inserting one row for each entry in supersetTable.
        Parameters:
        subsetTable -
        supersetTable -
        See Also:
        initSubset(String, String, String)
      • initSubset

        public void initSubset(String subsetTable,
                               String supersetTable,
                               String schemaName)
        Defines a subset by populating a subset table with all primary keys from another table.
        Parameters:
        subsetTable - name of the subset table
        supersetTable - name of table to reference
        schemaName - name of the schema used to determine the primary keys
      • initSubsetWithWhereClause

        public void initSubsetWithWhereClause(String subsetTable,
                                              String supersetTable,
                                              String whereClause)
        Defines a subset by populating a subset table with primary keys from another table. All those entries are selected, for which the conditionToCheck is true.
        Parameters:
        subsetTable - name of the subset table
        supersetTable - name of table to reference
        whereClause - condition to check by a SQL WHERE clause, e.g. 'foo > 10'
        See Also:
        initSubsetWithWhereClause(String, String, String, String)
      • initSubsetWithWhereClause

        public void initSubsetWithWhereClause(String subsetTable,
                                              String supersetTable,
                                              String whereClause,
                                              String schemaName)
        Defines a subset by populating a subset table with primary keys from another table. All those entries are selected, for which the conditionToCheck is true.
        Parameters:
        subsetTable - name of the subset table
        supersetTable - name of table to reference
        schemaName - name of the schema used to determine the primary keys
        whereClause - condition to check by a SQL WHERE clause, e.g. 'foo > 10'
      • initMirrorSubset

        public void initMirrorSubset(String subsetTable,
                                     String supersetTable,
                                     boolean performUpdate,
                                     String schemaName)
                              throws SQLException
        Defines a mirror subset populating a subset table with primary keys from another table.
        Its name is saved into a special meta data table to enable automatic syncing (changes to the superset are propagated to the mirror subset).
        Parameters:
        subsetTable - name of the subset table
        supersetTable - name of table to reference
        Throws:
        SQLException
      • resetSubset

        public void resetSubset(String subsetTableName)
        Sets the values in the is_processed, is_in_process, has_errors and log columns of a subset to FALSE.
        Parameters:
        subsetTableName - name of the subset to reset
      • resetSubset

        public void resetSubset(String subsetTableName,
                                boolean whereNotProcessed,
                                boolean whereNoErrors,
                                String lastComponent)
        Sets the values in the is_processed, is_in_process, has_errors and log columns of a subset to FALSE where the corresponding rows are is_in_process or is_processed.

        The boolean parameter whereNotProcessed is used for the use case where only those rows should be reset that are in_process but not is_processed which may happen when a pipeline crashed, a document has errors or a pipeline ist just canceled.

        In a similar fashion, whereNoErrors resets those rows that have no errors.

        Both boolean parameters may be combined in which case only non-processed rows without errors will be reset.

        Parameters:
        subsetTableName - name of the table to reset unprocessed rows
      • resetSubset

        public int[] resetSubset(String subsetTableName,
                                 List<Object[]> pkValues)
        Parameters:
        subsetTableName -
        pkValues -
        Returns:
      • resetSubset

        public int[] resetSubset(String subsetTableName,
                                 List<Object[]> pkValues,
                                 String schemaName)
        Sets the values in the is_processed and is_in_process rows of a subset to FALSE. Only resets the subset table rows where the primary key equals one of the entries in pkValues.
        Parameters:
        subsetTableName - - name of the table to reset
        pkValues - - list of primary keys
        Returns:
      • determineExistingSubsetRows

        public int[] determineExistingSubsetRows(String subsetTableName,
                                                 List<Object[]> pkValues,
                                                 String schemaName)
      • importFromXML

        public void importFromXML(Iterable<byte[]> xmls,
                                  String tableName,
                                  String identifier,
                                  String schemaName)
        Imports XMLs into a table.
        Parameters:
        xmls - - an Iterator over XMLs as byte[]
        tableName - - name of the table to import
        identifier - - used for error messages
      • importFromXMLFile

        public void importFromXMLFile(String fileStr,
                                      String tableName)
        Import new medline XMLs in a existing table from an XML file or a directory of XML files. The XML must be in MEDLINE XML format and can additionally be (G)Zipped.
        Parameters:
        fileStr - - path to file or directory of (G)Zipped MEDLINE XML file(s)
        tableName - - name of the target table
        See Also:
        importFromXMLFile(String, String, String)
      • importFromXMLFile

        public void importFromXMLFile(String fileStr,
                                      String tableName,
                                      String schemaName)
        Import new medline XMLs in a existing table from an XML file or a directory of XML files. The XML must be in MEDLINE XML format and can additionally be (G)Zipped.
        Parameters:
        fileStr - - path to file or directory of (G)Zipped MEDLINE XML file(s)
        tableName - - name of the target table
        schemaName - the table schema to use for the import
      • updateFromXML

        public void updateFromXML(String fileStr,
                                  String tableName,
                                  String schemaName)
        Updates an existing database. If the file contains new entries those are inserted, otherwise the table is updated to the version in the file.
        Parameters:
        fileStr - - file containing new or updated entries
        tableName - - table to update
      • importFromRowIterator

        public void importFromRowIterator(Iterator<Map<String,Object>> it,
                                          String tableName,
                                          Connection externalConn,
                                          boolean commit,
                                          String schemaName)
        Internal method to import into an existing table
        Parameters:
        it - - an Iterator, yielding rows to insert into the database
        tableName - - the updated table
        externalConn - - if not null, this connection will be employed instead of asking for a new connection
        commit - - if true, the inserted data will be committed in batches within this method; no commits will happen otherwise.
        schemaName - the name of the table schema corresponding to the data table
      • updateFromRowIterator

        public void updateFromRowIterator(Iterator<Map<String,Object>> it,
                                          String tableName)

        Updates a table with the entries yielded by the iterator. If the entries is not yet in the table, it will be inserted instead.

        The input rows are expected to fit the active table schema.

        Parameters:
        it - - an Iterator, yielding new or updated entries.
        tableName - - the updated table
      • updateFromRowIterator

        public void updateFromRowIterator(Iterator<Map<String,Object>> it,
                                          String tableName,
                                          Connection externalConn,
                                          boolean commit,
                                          String schemaName)

        Updates a table with the entries yielded by the iterator. If the entries is not yet in the table, it will be inserted instead.

        The input rows are expected to fit the table schema schemaName.

        Parameters:
        it - - an Iterator, yielding new or updated entries.
        tableName - - the updated table
        externalConn - - if not null, this connection will be employed instead of asking for a new connection
        commit - - if true, the updated data will be committed in batches within this method; nothing will be commit otherwise.
        schemaName - the name of the table schema corresponding to the updated data table
      • queryWithTime

        public DBCIterator<byte[][]> queryWithTime(List<Object[]> ids,
                                                   String table,
                                                   String timestamp,
                                                   String schemaName)
        Returns an iterator over all rows in the table with matching id and a timestamp newer (>) than timestamp. The Iterator will use threads, memory and a connection until all matches are returned.
        Parameters:
        ids - - List with primary keys
        table - - table to query
        timestamp - - timestamp (only rows with newer timestamp are returned)
        Returns:
        - pmid and xml as an Iterator
      • queryAll

        public DBCIterator<Object[]> queryAll(List<String> fields,
                                              String table)
        Returns an iterator over the column field in the table table. NOTE: The Iterator will use threads, memory and a connection until the iterator is empty, i.e. hasNext() returns null!
        Parameters:
        fields - - field to return
        table - - table to query
        Returns:
        - results as an Iterator
      • query

        public DBCIterator<Object[]> query(String table,
                                           List<String> fields)
        Returns the requested fields from the requested table. The iterator must be fully consumed or dangling threads and connections will remain, possible causing the application to wait forever for an open connection.
        Parameters:
        table - The table to query.
        fields - The names of the columns to retrieve values from.
        Returns:
        An iterator over the requested columns values.
      • query

        public DBCIterator<Object[]> query(String table,
                                           List<String> fields,
                                           long limit)
        Returns the requested fields from the requested table. The iterator must be fully consumed or dangling threads and connections will remain, possible causing the application to wait forever for an open connection.
        Parameters:
        table - The table to query.
        fields - The names of the columns to retrieve values from.
        limit - A limit of documents to retrieve.
        Returns:
        An iterator over the requested columns values.
      • query

        public DBCIterator<Object[]> query(List<String[]> keys,
                                           String table,
                                           String schemaName)
        Returns the values the the column DEFAULT_FIELD in the given table. The Iterator will use threads, memory and a connection until all matches were returned.
        Parameters:
        keys - - list of String[] containing the parts of the primary key
        table - - table to query
        Returns:
        - results as an Iterator
      • retrieveColumnsByTableSchema

        public DBCIterator<byte[][]> retrieveColumnsByTableSchema(List<Object[]> ids,
                                                                  String table,
                                                                  String schemaName)
        Retrieves row values of table from the database. The returned columns are those that are configuration to be retrieved in the table schema with name schemaName.
        Parameters:
        ids -
        table -
        schemaName -
        Returns:
      • retrieveColumnsByTableSchema

        public DBCIterator<byte[][]> retrieveColumnsByTableSchema(List<Object[]> ids,
                                                                  String[] tables,
                                                                  String[] schemaNames)
        Retrieves data from the database over multiple tables. All tables will be joined on the given IDs. The columns to be retrieved for each table is determined by its table schema. For this purpose, the tables and schemaName arrays are required to be parallel.
        Parameters:
        ids - A list of primary keys identifying the items to retrieve.
        tables - The tables from which the items should be retrieved that are identified by ids.
        schemaNames - A parallel array to tables thas specifies the table schema name of each table.
        Returns:
        The joined data from the requested tables.
      • queryDataTable

        public DBCIterator<byte[][]> queryDataTable(String tableName,
                                                    String whereCondition)

        Returns all column data from the data table tableName which is marked as 'to be retrieved' in the table scheme specified by the active table scheme.

        For more specific information, please refer to queryDataTable(String, String, String).

        Parameters:
        tableName - Name of a data table.
        whereCondition - Optional additional specifications for the SQL "SELECT" statement.
        See Also:
        queryDataTable(String, String, String)
      • queryDataTable

        public DBCIterator<byte[][]> queryDataTable(String tableName,
                                                    String whereCondition,
                                                    String schemaName)

        Returns all column data from the data table tableName which is marked as 'to be retrieved' in the table scheme specified by schemaName.

        This method offers direct access to the table data by using an SQL ResultSet in cursor mode, allowing for queries leading to large results.

        An optional where clause (actually everything behind the "FROM" in the SQL select statement) may be passed to restrict the columns being returned. All specifications are allowed which do not alter the number of columns returned (like "GROUP BY").

        Parameters:
        tableName - Name of a data table.
        whereCondition - Optional additional specifications for the SQL "SELECT" statement.
        schemaName - The table schema name to determine which columns should be retrieved. // * @return An iterator over byte[][] . Each returned byte array contains one nested byte array for each retrieved column, holding the column's data in a sequence of bytes.
      • getQueryBatchSize

        public int getQueryBatchSize()
      • setQueryBatchSize

        public void setQueryBatchSize(int queryBatchSize)
      • querySubset

        public DBCIterator<byte[][]> querySubset(String tableName,
                                                 String whereClause,
                                                 long limitParam,
                                                 Integer numberRefHops,
                                                 String schemaName)
                                          throws SQLException

        Retrieves XML field values in the data table referenced by the subset table tableName or tableName itself if it is a data table.

        The method always first retrieves a batch of primary keys from the subset table and then gets the actual documents from the data table (necessary for the data table - subset paradigm). As this is unnecessary when querying directly from a data table, for that kind of queries this method calls queryDataTable(String, String, String).

        The number of returned documents is restricted in number by limitParam. All documents are returned if limitParam is of negative value.
        Note: Of course, whereClause could already contain an SQL 'LIMIT' specification. However, I won't work as expected since this limit expression would be applied to each batch of subset-IDs which is used to query the data table. Using the limitParam parameter will assure you get at most as much documents from the iterator as specified. If tableName denotes a data table and whereClause does not already contain a 'LIMIT' expression, limitParam will be added to whereClause for the subsequent call to queryDataTable.

        Parameters:
        tableName - Subset table determining which documents to retrieve from the data table; may also be a data table itself.
        whereClause - An SQL where clause restricting the returned columns of each queried subset-ID batch. This clause must not change the rows returned (e.g. by 'GROUP BY').
        limitParam - Number restriction of documents to return.
        numberRefHops -
        schemaName - The name of table schema of the referenced data table.
        Returns:
        An iterator returning documents references from or in the table tableName.
        Throws:
        SQLException
        See Also:
        queryDataTable(String, String, String)
      • getNumColumnsAndFields

        public org.apache.commons.lang3.tuple.Pair<Integer,List<Map<String,String>>> getNumColumnsAndFields(boolean joined,
                                                                                                            String[] schemaNames)
        Helper method to determine the columns that are returned in case of a joining operation. Returns the number of returned fields and the according field definitions. If joined is set to false, only the first table and the first schema is taken into account.
        Parameters:
        joined - Whether the data is joined.
        schemaNames - The names of the table schemas of the tables that are read. From the respective table schemas, the columns that are marked to be retrieved, are extracted.
        Returns:
        A pair holding the number of retrieved columns and those columns themselves.
      • getNumRows

        public long getNumRows(String tableName)
        Returns the row count of the requested table.
        Parameters:
        tableName - The table to count the rows of.
        Returns:
        The table row count.
      • getTables

        public ArrayList<String> getTables()
        Returns:
        - all tables in the active scheme
      • getTableDefinition

        public List<String> getTableDefinition(String tableName)
        Query the MetaData for the columns of a table
        Parameters:
        tableName - - the table
        Returns:
        - List of String containing name and type of each column
      • getScheme

        public String getScheme()
        Returns:
        - the active Postgres scheme
      • getFieldConfiguration

        public FieldConfig getFieldConfiguration()
        Returns:
        the active field configuration
      • addFieldConfiguration

        public void addFieldConfiguration(FieldConfig config)
      • getFieldConfiguration

        public FieldConfig getFieldConfiguration(String schemaName)
        Parameters:
        schemaName - The name of the schema for which the eventual FieldConfig should be returned.
        Returns:
        The field configuration for schemaName.
      • checkTableDefinition

        public void checkTableDefinition(String tableName,
                                         String schemaName)
                                  throws TableSchemaMismatchException
        Compares the actual table in the database with its definition in the xml configuration
        Note: This method currently does not check other then primary key columns for tables that reference another table, even if those should actually be data tables.
        Parameters:
        tableName - - table to check
        Throws:
        TableSchemaMismatchException
      • setProcessed

        public void setProcessed(String subsetTableName,
                                 ArrayList<byte[][]> primaryKeyList)

        Sets the values of is_processed to TRUE and of is_in_process to FALSE for a collection of documents according to the given primary keys.

        Parameters:
        subsetTableName - name of the subset
        primaryKeyList - the list of primary keys which itself can consist of several primary key elements
      • setException

        public void setException(String subsetTableName,
                                 ArrayList<byte[][]> primaryKeyList,
                                 HashMap<byte[][],String> logException)

        Sets the value of has_errors to TRUE and adds a description in log for exceptions which occured during the processing of a collection of documents according to the given primary keys.

        Parameters:
        subsetTableName - name of the subset
        primaryKeyList - the list of primary keys which itself can consist of several primary key elements
        logException - matches primary keys of unsuccessfully processed documents and exceptions that occured during the processing
      • getPrimaryKeyIndices

        public List<Integer> getPrimaryKeyIndices()
        Returns the indices of the primary keys, beginning with 0.
      • getDbURL

        public String getDbURL()
      • setDbURL

        public void setDbURL(String uri)
      • close

        public void close()
      • isDatabaseReachable

        public boolean isDatabaseReachable()
      • addXmiDocumentFieldConfiguration

        public FieldConfig addXmiDocumentFieldConfiguration(List<Map<String,String>> primaryKey,
                                                            boolean doGzip)
        Adds an auto-generated field configuration that exhibits the given primary key and all the fields required to store complete XMI document data (i.e. not segmented XMI parts but the whole serialized CAS) in a database table. The field configuration will have the given primary key and an additional field named 'xmi'. This method is used by the Jena Document Information System (JeDIS) components jcore-xmi-db-reader and jcore-xmi-db-consumer.
        Parameters:
        primaryKey - The document primary key for which a document CAS XMI table schema should be created.
        doGzip - Whether the XMI data should be gzipped in the table.
        Returns:
        The created field configuration.
      • addXmiTextFieldConfiguration

        public FieldConfig addXmiTextFieldConfiguration(List<Map<String,String>> primaryKey,
                                                        boolean doGzip)
        Adds an auto-generated field configuration that exhibits the given primary key and all the fields required to store XMI base document data (i.e. the document text but not its annotations) in a database table. The additional fields are
        1. xmi
        2. max_xmi_id
        3. sofa_mapping
        and are required for the storage of XMI annotation graph segments stored in other tables. The schema created with this method is to be used for the base documents that include the document text. To get a schema with a specific primary that stores annotation data, see addXmiAnnotationFieldConfiguration(List, boolean). This method is used by the Jena Document Information System (JeDIS) components jcore-xmi-db-reader and jcore-xmi-db-consumer.
        Parameters:
        primaryKey - The document primary key for which an base document XMI segmentation table schema should be created.
        doGzip - Whether the XMI data should be gzipped in the table.
        Returns:
        The created field configuration.
      • addXmiAnnotationFieldConfiguration

        public FieldConfig addXmiAnnotationFieldConfiguration(List<Map<String,String>> primaryKey,
                                                              boolean doGzip)
        Adds an auto-generated field configuration that exhibits the given primary key and all the fields required to store XMI annotation data (not base documents) in database tables. The only field besides the primary key is xmi and will store the actual XMI annotation data. This table schema is used for the storage of XMI annotation graph segments. Those segments will then correspond to UIMA annotation types that are stored in tables of their own. A table schema to store the base document is created by addXmiTextFieldConfiguration(List, boolean). This method is used by the Jena Document Information System (JeDIS) components jcore-xmi-db-reader and jcore-xmi-db-consumer.
        Parameters:
        primaryKey - The document primary key for which an base document XMI segmentation table schema should be created.
        doGzip - Whether the XMI data should be gzipped in the table.
        Returns:
        The created field configuration.

Copyright © 2018 JULIE Lab, Germany. All rights reserved.