public class DataBaseConnector extends Object
Visit
http://commons.apache.org/dbcp/apidocs/org/apache/commons/dbcp/package-
summary.html#package_description<\code> for more information about the
connection pooling.
| Modifier and Type | Class and Description |
|---|---|
static class |
DataBaseConnector.StatusElement |
| Modifier and Type | Field and Description |
|---|---|
static String |
DEFAULT_PIPELINE_STATE |
static int |
META_IN_ARRAY
Deprecated.
|
static LinkedHashMap<String,String> |
subsetColumns
This is the definition of subset tables except the primary key.
|
| Constructor and Description |
|---|
DataBaseConnector(InputStream configStream)
This class creates a connection with a database and allows for convenient
queries and commands.
|
DataBaseConnector(InputStream configStream,
int queryBatchSize)
This class creates a connection with a database and allows for convenient
queries and commands.
|
DataBaseConnector(String configPath)
Constructors ********************************
|
DataBaseConnector(String dbUrl,
String user,
String password)
This class creates a connection with a database and allows for convenient
queries and commands.
|
DataBaseConnector(String dbUrl,
String user,
String password,
String pgSchema,
InputStream fieldDefinition)
This class creates a connection with a database and allows for convenient
queries and commands.
|
DataBaseConnector(String dbUrl,
String user,
String password,
String pgSchema,
int queryBatchSize,
InputStream configStream)
This class creates a connection with a database and allows for convenient
queries and commands.
|
DataBaseConnector(String serverName,
String dbName,
String user,
String password,
String pgSchema,
InputStream fieldDefinition) |
DataBaseConnector(String serverName,
String dbName,
String user,
String password,
String pgSchema,
int queryBatchSize,
InputStream configStream) |
| Modifier and Type | Method and Description |
|---|---|
void |
addFieldConfiguration(FieldConfig config) |
FieldConfig |
addPKAdaptedFieldConfiguration(List<Map<String,String>> primaryKey,
String fieldConfigurationForAdaption,
String fieldConfigurationNameSuffix) |
FieldConfig |
addXmiAnnotationFieldConfiguration(List<Map<String,String>> primaryKey,
boolean doGzip)
Adds an auto-generated field configuration that exhibits the given primary key and all the fields required to
store XMI annotation data (not base documents) in database tables.
|
FieldConfig |
addXmiDocumentFieldConfiguration(List<Map<String,String>> primaryKey,
boolean doGzip)
Adds an auto-generated field configuration that exhibits the given primary key and all the fields required to
store complete XMI document data (i.e.
|
FieldConfig |
addXmiTextFieldConfiguration(List<Map<String,String>> primaryKey,
boolean doGzip)
Adds an auto-generated field configuration that exhibits the given primary key and all the fields required to
store XMI base document data (i.e.
|
void |
checkTableDefinition(String tableName)
Checks whether the given table matches the active table schema.
|
void |
checkTableDefinition(String tableName,
String schemaName)
Compares the actual table in the database with its definition in the xml
configuration
Note: This method currently does not check other then primary key columns for
tables that reference another table, even if those should actually be data
tables.
|
void |
checkTableSchemaCompatibility(String... schemaNames) |
void |
checkTableSchemaCompatibility(String referenceSchema,
String[] schemaNames) |
void |
close() |
int |
countRowsOfDataTable(String tableName,
String whereCondition) |
int |
countRowsOfDataTable(String tableName,
String whereCondition,
String schemaName) |
int |
countUnprocessed(String subsetTableName) |
int |
countUnprocessed(String subsetTableName,
String schemaName)
Counts the unprocessed rows in a subset table
|
void |
createIndex(String table,
String... columns)
Creates an index for table table on the given columns.
|
void |
createSchema(String schemaName)
Creates the PostgreSQL schema
schemaName in the active database. |
void |
createSubsetTable(String subsetTable,
String supersetTable,
Integer maxNumberRefHops,
String comment)
Does the same as
createSubsetTable(String, String, Integer, String, String)
with the exception that the assumed table schema is that of the active schema
defined in the configuration file. |
void |
createSubsetTable(String subsetTable,
String supersetTable,
Integer posOfDataTable,
String comment,
String schemaName)
Creates an empty table referencing the primary key of the data table given by
superSetTable or, if this is a subset table itself, the data
table referenced by that table. |
void |
createSubsetTable(String subsetTable,
String supersetTable,
String comment)
Does the same as
createSubsetTable(String, String, Integer, String, String)
with the exception that the assumed table schema is that of the active schema
defined in the configuration file and the first referenced data table is used as data table. |
void |
createTable(String tableName,
String comment)
Creates a new table according to the field schema definition corresponding to
the active schema name determined in the configuration.
|
void |
createTable(String tableName,
String schemaName,
String comment)
Creates a new table according to the field schema definition corresponding to
the name
schemaName given in the configuration file. |
void |
createTable(String tableName,
String referenceTableName,
String schemaName,
String comment)
Creates a new table according to the field schema definition corresponding to
the name
schemaName and with foreign key references to the
primary key of referenceTableName. |
void |
defineMirrorSubset(String subsetTable,
String supersetTable,
boolean performUpdate,
Integer maxNumberRefHops,
String comment)
Convenience method for creating and initializing a subset in one step.
|
void |
defineMirrorSubset(String subsetTable,
String supersetTable,
boolean performUpdate,
String comment)
Convenience method for creating and initializing a subset in one step.
|
void |
defineMirrorSubset(String subsetTable,
String supersetTable,
boolean performUpdate,
String comment,
String schemaName)
Convenience method for creating and initializing a subset in one step.
|
void |
defineRandomSubset(int size,
String subsetTable,
String supersetTable,
String comment)
Convenience method for creating and initializing a subset in one step.
|
void |
defineRandomSubset(int size,
String subsetTable,
String supersetTable,
String comment,
String schemaName)
Convenience method for creating and initializing a subset in one step.
|
void |
defineSubset(List<String> values,
String subsetTable,
String supersetTable,
String columnToTest,
String comment)
Convenience method for creating and initializing a subset in one step.
|
void |
defineSubset(List<String> values,
String subsetTable,
String supersetTable,
String columnToTest,
String comment,
String schemaName)
Convenience method for creating and initializing a subset in one step.
|
void |
defineSubset(String subsetTable,
String supersetTable,
String comment)
Convenience method for creating and initializing a subset in one step.
|
void |
defineSubset(String subsetTable,
String supersetTable,
String comment,
String schemaName)
Convenience method for creating and initializing a subset in one step.
|
void |
defineSubsetWithWhereClause(String subsetTable,
String supersetTable,
String conditionToCheck,
String comment)
Convenience method for creating and initializing a subset in one step.
|
void |
defineSubsetWithWhereClause(String subsetTable,
String supersetTable,
String conditionToCheck,
String comment,
String schemaName)
Convenience method for creating and initializing a subset in one step.
|
void |
deleteFromTable(String table,
List<Object[]> ids)
Deletes entries from a table
|
<T> void |
deleteFromTableSimplePK(String table,
List<T> ids)
Deletes entries from a table where the primary key of this table must consist
of exactly one column.
|
int[] |
determineExistingSubsetRows(String subsetTableName,
List<Object[]> pkValues,
String schemaName) |
boolean |
dropTable(String table) |
String |
getActiveDataPGSchema() |
String |
getActiveDataTable() |
String |
getActivePGSchema() |
FieldConfig |
getActiveTableFieldConfiguration() |
String |
getActiveTableSchema() |
ConfigReader |
getConfig() |
Connection |
getConn() |
String |
getDbURL() |
byte[] |
getEffectiveConfiguration()
Returns the effective XML configuration as a
byte[]. |
FieldConfig |
getFieldConfiguration() |
FieldConfig |
getFieldConfiguration(String schemaName) |
String |
getNextDataTable(String referencingTable)
Follows the foreign-key specifications of the given table to the referenced table.
|
String |
getNextOrThisDataTable(String referencingTable)
Determines the first data table on the reference path
referencingTable -> table1 -> table2 -> ... |
org.apache.commons.lang3.tuple.Pair<Integer,List<Map<String,String>>> |
getNumColumnsAndFields(boolean joined,
String[] schemaNames)
Helper method to determine the columns that are returned in case of a joining operation.
|
long |
getNumRows(String tableName)
Returns the row count of the requested table.
|
List<Integer> |
getPrimaryKeyIndices()
Returns the indices of the primary keys, beginning with 0.
|
int |
getQueryBatchSize() |
String |
getReferencedTable(String referencingTable)
Returns the name of a table referenced by an SQL-foreign-key.
|
String |
getReferencedTable(String startTable,
Integer posOfDataTable)
Gets the - possibly indirectly - referenced table of startTable
where posOfDataTable specifies the position of the desired table in
the reference chain starting at startTable.
|
String |
getScheme() |
List<String> |
getTableDefinition(String tableName)
Query the MetaData for the columns of a table
|
ArrayList<String> |
getTables() |
boolean |
hasUnfetchedRows(String tableName) |
boolean |
hasUnfetchedRows(String tableName,
String schemaName)
Utility **********************************
|
void |
importFromRowIterator(Iterator<Map<String,Object>> it,
String tableName) |
void |
importFromRowIterator(Iterator<Map<String,Object>> it,
String tableName,
Connection externalConn,
boolean commit,
String schemaName)
Internal method to import into an existing table
|
void |
importFromRowIterator(Iterator<Map<String,Object>> it,
String tableName,
String tableSchema) |
void |
importFromXML(Iterable<byte[]> xmls,
String identifier,
String tableName) |
void |
importFromXML(Iterable<byte[]> xmls,
String tableName,
String identifier,
String schemaName)
Imports XMLs into a table.
|
void |
importFromXMLFile(String fileStr,
String tableName)
Import new medline XMLs in a existing table from an XML file or a directory
of XML files.
|
void |
importFromXMLFile(String fileStr,
String tableName,
String schemaName)
Import new medline XMLs in a existing table from an XML file or a directory
of XML files.
|
void |
initMirrorSubset(String subsetTable,
String supersetTable,
boolean performUpdate) |
void |
initMirrorSubset(String subsetTable,
String supersetTable,
boolean performUpdate,
String schemaName)
Defines a mirror subset populating a subset table with primary keys from
another table.
|
void |
initRandomSubset(int size,
String subsetTable,
String supersetTable) |
void |
initRandomSubset(int size,
String subsetTable,
String superSetTable,
String schemaName)
Selects
size rows of the given super set table randomly and
inserts them into the subset table. |
void |
initSubset(List<String> values,
String subsetTable,
String supersetTable,
String columnToTest)
Defines a subset by populating a subset table with primary keys from another
table.
|
void |
initSubset(List<String> values,
String subsetTable,
String supersetTable,
String columnToTest,
String schemaName)
Defines a subset by populating a subset table with primary keys from another
table.
|
void |
initSubset(String subsetTable,
String supersetTable)
Initializes
subsetTable by inserting one row for each entry in supersetTable. |
void |
initSubset(String subsetTable,
String supersetTable,
String schemaName)
Defines a subset by populating a subset table with all primary keys from
another table.
|
void |
initSubsetWithWhereClause(String subsetTable,
String supersetTable,
String whereClause)
Defines a subset by populating a subset table with primary keys from another
table.
|
void |
initSubsetWithWhereClause(String subsetTable,
String supersetTable,
String whereClause,
String schemaName)
Defines a subset by populating a subset table with primary keys from another
table.
|
boolean |
isDatabaseReachable() |
boolean |
isDataTable(String table) |
boolean |
isEmpty(String tableName)
Tests if a table contains entries.
|
boolean |
isSubsetTable(String table)
Checks if the given table is a subset table.
|
void |
markAsProcessed(String table,
List<Object[]> ids)
Modifies a subset table, marking entries as processed.
|
void |
modifyTable(String sql,
List<Object[]> ids)
Executes a given SQL command (must end with "WHERE "!) an extends the
WHERE-clause with the primary keys, set to the values in ids.
|
void |
modifyTable(String sql,
List<Object[]> ids,
String schemaName)
Executes a given SQL command (must end with "WHERE "!) an extends the
WHERE-clause with the primary keys, set to the values in ids.
|
int[] |
performBatchUpdate(List<Object[]> pkValues,
String sqlFormatString,
String schemaName) |
DBCIterator<Object[]> |
query(List<String[]> keys,
String table)
Returns the values the the column
DEFAULT_FIELD in the given table. |
DBCIterator<Object[]> |
query(List<String[]> keys,
String table,
String schemaName)
Returns the values the the column
DEFAULT_FIELD in the given table. |
DBCIterator<Object[]> |
query(String table,
List<String> fields)
Returns the requested fields from the requested table.
|
DBCIterator<Object[]> |
query(String table,
List<String> fields,
long limit)
Returns the requested fields from the requested table.
|
DBCIterator<Object[]> |
queryAll(List<String> fields,
String table)
Returns an iterator over the column
field in the table
table. |
DBCIterator<byte[][]> |
queryDataTable(String tableName,
String whereCondition)
Returns all column data from the data table
tableName which is
marked as 'to be retrieved' in the table scheme specified by the active table
scheme. |
DBCIterator<byte[][]> |
queryDataTable(String tableName,
String whereCondition,
String schemaName)
Returns all column data from the data table
tableName which is
marked as 'to be retrieved' in the table scheme specified by
schemaName. |
DBCIterator<byte[][]> |
querySubset(String tableName,
long limitParam) |
DBCIterator<byte[][]> |
querySubset(String tableName,
String whereClause,
long limitParam,
Integer numberRefHops,
String schemaName)
Retrieves XML field values in the data table referenced by the subset table
tableName or tableName itself if it is a data
table. |
DBCIterator<byte[][]> |
queryWithTime(List<Object[]> ids,
String table,
String timestamp) |
DBCIterator<byte[][]> |
queryWithTime(List<Object[]> ids,
String table,
String timestamp,
String schemaName)
Returns an iterator over all rows in the table with matching id and a
timestamp newer (>) than
timestamp. |
void |
resetSubset(String subsetTableName)
Sets the values in the
is_processed, is_in_process,
has_errors and log columns of a subset to
FALSE. |
void |
resetSubset(String subsetTableName,
boolean whereNotProcessed,
boolean whereNoErrors,
String lastComponent)
Sets the values in the
is_processed, is_in_process,
has_errors and log columns of a subset to
FALSE where the corresponding rows are
is_in_process or is_processed. |
int[] |
resetSubset(String subsetTableName,
List<Object[]> pkValues) |
int[] |
resetSubset(String subsetTableName,
List<Object[]> pkValues,
String schemaName)
Sets the values in the
is_processed and
is_in_process rows of a subset to FALSE. |
List<Object[]> |
retrieveAndMark(String subsetTableName,
String readerComponent,
String hostName,
String pid)
Retrieves from a subset-table
limit primary keys whose rows are
not marked to be in process or finished being processed and sets the rows of
the retrieved primary keys as being "in process". |
List<Object[]> |
retrieveAndMark(String subsetTableName,
String readerComponent,
String hostName,
String pid,
int limit,
String order)
Retrieves primary keys from a subset table and marks them as being "in
process".
|
List<Object[]> |
retrieveAndMark(String subsetTableName,
String schemaName,
String readerComponent,
String hostName,
String pid,
int limit,
String order)
Retrieves from a subset-table
limit primary keys whose rows are
not marked to be in process or finished being processed and sets the rows of
the retrieved primary keys as being "in process". |
DBCIterator<byte[][]> |
retrieveColumnsByTableSchema(List<Object[]> ids,
String table)
Retrieves row values of
table from the database. |
DBCIterator<byte[][]> |
retrieveColumnsByTableSchema(List<Object[]> ids,
String[] tables,
String[] schemaNames)
Retrieves data from the database over multiple tables.
|
DBCIterator<byte[][]> |
retrieveColumnsByTableSchema(List<Object[]> ids,
String table,
String schemaName)
Retrieves row values of
table from the database. |
boolean |
schemaExists(String schemaName)
Tests if a schema exists.
|
void |
setActivePGSchema(String pgSchema) |
void |
setActiveTableSchema(String schemaName) |
void |
setDbURL(String uri) |
void |
setException(String subsetTableName,
ArrayList<byte[][]> primaryKeyList,
HashMap<byte[][],String> logException)
Sets the value of
has_errors to TRUE and adds a
description in log for exceptions which occured during the
processing of a collection of documents according to the given primary keys. |
void |
setHost(String host) |
void |
setPassword(String password) |
void |
setPort(Integer port) |
void |
setPort(String port) |
void |
setProcessed(String subsetTableName,
ArrayList<byte[][]> primaryKeyList)
Sets the values of
is_processed to TRUE and of
is_in_process to FALSE for a collection of
documents according to the given primary keys. |
void |
setQueryBatchSize(int queryBatchSize) |
void |
setUser(String user) |
SubsetStatus |
status(String subsetTableName,
Set<DataBaseConnector.StatusElement> statusElementsToReturn)
Returns a map with information about how many rows are marked as
is_in_process, is_processed and how many rows there are in
total.
The respective values are stored under with the keys Constants.IN_PROCESS, Constants.PROCESSED and
Constants.TOTAL. |
boolean |
tableExists(Connection conn,
String tableName)
Tests if a table exists.
|
boolean |
tableExists(String tableName)
Tests if a table exists.
|
void |
updateFromRowIterator(Iterator<Map<String,Object>> it,
String tableName)
Updates a table with the entries yielded by the iterator.
|
void |
updateFromRowIterator(Iterator<Map<String,Object>> it,
String tableName,
Connection externalConn,
boolean commit,
String schemaName)
Updates a table with the entries yielded by the iterator.
|
void |
updateFromXML(String fileStr,
String tableName) |
void |
updateFromXML(String fileStr,
String tableName,
String schemaName)
Updates an existing database.
|
public static final String DEFAULT_PIPELINE_STATE
@Deprecated public static final int META_IN_ARRAY
public static final LinkedHashMap<String,String> subsetColumns
public DataBaseConnector(String configPath) throws FileNotFoundException
FileNotFoundExceptionpublic DataBaseConnector(InputStream configStream)
configStream - used to read the configuration for this connector instancepublic DataBaseConnector(InputStream configStream, int queryBatchSize)
configStream - used to read the configuration for this connector instancequeryBatchSize - background threads are utilized to speed up queries, this
parameter determines the number of pre-fetched entriespublic DataBaseConnector(String dbUrl, String user, String password, String pgSchema, InputStream fieldDefinition)
dbUrl - the url of the databaseuser - the username for the dbpassword - the password for the usernamefieldDefinition - InputStream containing data of a configuration filepublic DataBaseConnector(String serverName, String dbName, String user, String password, String pgSchema, InputStream fieldDefinition)
public DataBaseConnector(String dbUrl, String user, String password, String pgSchema, int queryBatchSize, InputStream configStream)
dbUrl - the url of the databaseuser - the username for the dbpassword - the password for the usernamequeryBatchSize - background threads are utilized to speed up queries, this
parameter determines the number of pre-fetched entriesconfigStream - used to read the configuration for this connector instancepublic DataBaseConnector(String serverName, String dbName, String user, String password, String pgSchema, int queryBatchSize, InputStream configStream)
public ConfigReader getConfig()
public void setHost(String host)
public void setPort(String port)
public void setPort(Integer port)
public void setUser(String user)
public void setPassword(String password)
public Connection getConn()
public String getActiveDataTable()
public byte[] getEffectiveConfiguration()
Returns the effective XML configuration as a byte[].
The effective configuration consists of the default configuration and the given user configuration as well (merged by the ConfigReader in the constructor).
public String getActiveDataPGSchema()
public String getActivePGSchema()
public void setActivePGSchema(String pgSchema)
public String getActiveTableSchema()
public void setActiveTableSchema(String schemaName)
public FieldConfig getActiveTableFieldConfiguration()
public List<Object[]> retrieveAndMark(String subsetTableName, String readerComponent, String hostName, String pid) throws TableSchemaMismatchException
Retrieves from a subset-table limit primary keys whose rows are
not marked to be in process or finished being processed and sets the rows of
the retrieved primary keys as being "in process".
The table is locked during this transaction. Locking and marking ensure that every primary key will be returned exactly once. Remember to remove the marks if you want to use the subset again ;)
subsetTableName - - name of a table, conforming to the subset standardhostName - - will be saved in the subset tablepid - - will be saved in the subset tableTableSchemaMismatchExceptionpublic List<Object[]> retrieveAndMark(String subsetTableName, String readerComponent, String hostName, String pid, int limit, String order) throws TableSchemaMismatchException
Retrieves primary keys from a subset table and marks them as being "in process". The table schema - and thus the form of the primary keys - is assumed to match the active table schema determined in the configuration file.
The table is locked during this transaction. Locking and marking ensure that every primary key will be returned exactly once. Remember to remove the marks if you want to use the subset again ;)subsetTableName - - name of a table, conforming to the subset standardhostName - - will be saved in the subset tablepid - - will be saved in the subset tablelimit - - batchsize for marking/retrievingorder - - determines an ordering. Default order (which may change over
time) when this parameter is null or empty.TableSchemaMismatchExceptionretrieveAndMark(String, String, String, String, int, String)public List<Object[]> retrieveAndMark(String subsetTableName, String schemaName, String readerComponent, String hostName, String pid, int limit, String order) throws TableSchemaMismatchException
Retrieves from a subset-table limit primary keys whose rows are
not marked to be in process or finished being processed and sets the rows of
the retrieved primary keys as being "in process".
The following parameters may be set:
limit - sets the maximum number of primary keys retrieved
order - determines whether to retrieve the primary keys in a
particular order. Note that the default order of rows is undefined. If you
need the same order in every run, you should specify some ordering as an SQL
'ORDER BY' statement. When order is not prefixed with 'ORDER BY'
(case ignored), it will be inserted.
The table is locked during this transaction. Locking and marking ensure that every primary key will be returned exactly once. Remember to remove the marks if you want to use the subset again ;)
subsetTableName - - name of a table, conforming to the subset standardhostName - - will be saved in the subset tablepid - - will be saved in the subset tablelimit - - batchsize for marking/retrievingorder - - determines an ordering. Default order (which may change over
time) when this parameter is null or empty.TableSchemaMismatchExceptionpublic int countUnprocessed(String subsetTableName)
subsetTableName - countUnprocessed(String)public int countUnprocessed(String subsetTableName, String schemaName)
subsetTableName - - name of the subset tablepublic int countRowsOfDataTable(String tableName, String whereCondition, String schemaName)
public boolean hasUnfetchedRows(String tableName)
public boolean hasUnfetchedRows(String tableName, String schemaName)
public void deleteFromTable(String table, List<Object[]> ids)
table - name of the tableids - primary key arrays defining the entries to deletedeleteFromTableSimplePK(String, List)public <T> void deleteFromTableSimplePK(String table, List<T> ids)
deleteFromTable(String, List).table - name of the tableids - primary key arrays defining the entries to deletedeleteFromTable(String, List)public void markAsProcessed(String table, List<Object[]> ids)
table - name of the subset tableids - primary key arrays defining the entries to deletepublic void modifyTable(String sql, List<Object[]> ids)
Executes a given SQL command (must end with "WHERE "!) an extends the WHERE-clause with the primary keys, set to the values in ids.
Assumes that the form of the primary keys matches the definition given in the active table schema in the configuration.
sql - a valid SQL command, ending with "WHERE "ids - list of primary key arraysmodifyTable(String, List)public void modifyTable(String sql, List<Object[]> ids, String schemaName)
Executes a given SQL command (must end with "WHERE "!) an extends the WHERE-clause with the primary keys, set to the values in ids.
sql - a valid SQL command, ending with "WHERE "ids - list of primary key arraysschemaName - name of the schema which defines the primary keyspublic String getReferencedTable(String referencingTable)
referencingTable - the name of the table for which the foreign keys shall be checkednull if there
is no referenced table (i.e. the passed table name denotes a data
table).IllegalArgumentException - When referencingTable is null.public void createSchema(String schemaName)
schemaName in the active database.schemaName - The name of the PostgreSQL schema to create.public void createTable(String tableName, String comment) throws SQLException
tableName - the name of the new tableSQLExceptionpublic void createTable(String tableName, String schemaName, String comment) throws SQLException
schemaName given in the configuration file.tableName - the name of the new tableSQLExceptionpublic void createTable(String tableName, String referenceTableName, String schemaName, String comment) throws SQLException
Creates a new table according to the field schema definition corresponding to
the name schemaName and with foreign key references to the
primary key of referenceTableName.
The primary key of the tables tableName and referenceTableName must be equal. The foreign key constraint is configured for ON DELETE CASCADE which means, when in the referenced table rows are deleted, there are also deleted in the table created by this method call.
tableName - The name of the new table.referenceTableName - The table to be referenced by this table.schemaName - The table schema determining the structure (especially the primary
key) of the new table.comment - A comment for the new table.SQLExceptionpublic void createSubsetTable(String subsetTable, String supersetTable, Integer maxNumberRefHops, String comment) throws SQLException
Does the same as createSubsetTable(String, String, Integer, String, String)
with the exception that the assumed table schema is that of the active schema
defined in the configuration file.
subsetTable - name of the subset tablesupersetTable - name of the referenced tablemaxNumberRefHops - the maximum number of times a foreign key reference to a data
table may be followedcomment - will be added to the table in the database, used to make tables
reproducableSQLExceptionpublic void createSubsetTable(String subsetTable, String supersetTable, String comment) throws SQLException
Does the same as createSubsetTable(String, String, Integer, String, String)
with the exception that the assumed table schema is that of the active schema
defined in the configuration file and the first referenced data table is used as data table.
subsetTable - name of the subset tablesupersetTable - name of the referenced tablecomment - will be added to the table in the database, used to make tables
reproducableSQLExceptionpublic void createSubsetTable(String subsetTable, String supersetTable, Integer posOfDataTable, String comment, String schemaName) throws SQLException
Creates an empty table referencing the primary key of the data table given by
superSetTable or, if this is a subset table itself, the data
table referenced by that table.
To fill the empty subset table with data, use one of the
init[...] methods offered by this class.
Subset tables have a particular table scheme. They define a foreign key to the primary key of the referenced data table. There are the following additional columns:
| Name | Type |
|---|---|
| is_in_process | boolean |
| is_processed | boolean |
| last_component | text |
| log | text |
| has errors | boolean |
| pid | character varying(10) |
| host_name | character varying(100) |
| processing_timestamp | timestamp without time zone |
The subset table can be used for processing, e.g. by UIMA CollectionReaders, which store information about the processing in it.
The actual data is located in the referenced table.
subsetTable - name of the subset tablesupersetTable - name of the referenced tableposOfDataTable - the position of the datatable that should be referenced; the 1st
would be nearest data table, i.e. perhaps supersetTable
itself. The 2nd would be the datatable referenced by the first
data table on the reference path.schemaName - name of the table schema to work with (determined in the
configuration file)comment - will be added to the table in the database, used to make tables
reproducableSQLExceptionpublic void createIndex(String table, String... columns) throws SQLException
table - The table for which an index should be created.columns - The columns the index should cover.SQLException - In case something goes wrong.public String getReferencedTable(String startTable, Integer posOfDataTable) throws SQLException
startTable - posOfDataTable - SQLExceptionpublic String getNextDataTable(String referencingTable) throws SQLException
isSubsetTable(String) returns false) is encountered
or a table without a foreign-key is found. If referencingTable has no foreign-key itself, null is returned
since the referenced table does not exist.referencingTable - The table to get the next referenced data table for, possibly across other subsets if referencingTable denotes a subset table..null, if referencingTable is a data table itself.SQLException - If table meta data checking fails.public String getNextOrThisDataTable(String referencingTable) throws SQLException
referencingTable -> table1 -> table2 -> ... -> lastTable -> null
referenced from referencingTable. This means that referencingTable is returned itself
if it is a data table.referencingTable - The start point table for the path for which the first data table is to be returned.referencingTable itself.SQLException - If a database operation fails.public boolean isSubsetTable(String table) throws SQLException
Checks if the given table is a subset table.
A database table is identified to be a subset table if it exhibits all the column names that subsets
have. Those are defined in subsetColumns.
table - The table to check for being a subset table.table denotes a subset table, false otherwise. The latter case includes the table parameter being null.SQLException - If table meta data checking fails.public boolean isDataTable(String table) throws SQLException
SQLExceptionpublic boolean dropTable(String table) throws SQLException
SQLExceptionpublic boolean tableExists(Connection conn, String tableName)
tableName - name of the table to testpublic boolean tableExists(String tableName)
tableName - name of the table to testpublic boolean schemaExists(String schemaName)
schemaName - name of the schema to testpublic boolean isEmpty(String tableName)
tableName - name of the schema to testpublic void defineRandomSubset(int size,
String subsetTable,
String supersetTable,
String comment)
throws SQLException
Convenience method for creating and initializing a subset in one step. See method references below for more information.
size - subsetTable - supersetTable - comment - SQLExceptioninitRandomSubset(int, String, String)public void defineRandomSubset(int size,
String subsetTable,
String supersetTable,
String comment,
String schemaName)
throws SQLException
Convenience method for creating and initializing a subset in one step. See method references below for more information.
size - subsetTable - supersetTable - comment - schemaName - SQLExceptioninitRandomSubset(int, String, String, String)public void defineSubset(List<String> values, String subsetTable, String supersetTable, String columnToTest, String comment) throws SQLException
Convenience method for creating and initializing a subset in one step. See method references below for more information.
values - subsetTable - supersetTable - columnToTest - comment - SQLExceptioninitSubset(List, String, String, String)public void defineSubset(List<String> values, String subsetTable, String supersetTable, String columnToTest, String comment, String schemaName) throws SQLException
Convenience method for creating and initializing a subset in one step. See method references below for more information.
values - subsetTable - supersetTable - columnToTest - comment - schemaName - SQLExceptioninitSubset(List, String, String, String, String)public void defineSubset(String subsetTable, String supersetTable, String comment) throws SQLException
Convenience method for creating and initializing a subset in one step. See method references below for more information.
subsetTable - supersetTable - comment - SQLExceptioninitSubset(String, String)public void defineSubset(String subsetTable, String supersetTable, String comment, String schemaName) throws SQLException
Convenience method for creating and initializing a subset in one step. See method references below for more information.
subsetTable - supersetTable - comment - schemaName - SQLExceptioninitSubset(List, String, String, String, String)public void defineSubsetWithWhereClause(String subsetTable, String supersetTable, String conditionToCheck, String comment) throws SQLException
Convenience method for creating and initializing a subset in one step. See method references below for more information.
subsetTable - supersetTable - conditionToCheck - comment - SQLExceptioninitSubsetWithWhereClause(String, String, String)public void defineSubsetWithWhereClause(String subsetTable, String supersetTable, String conditionToCheck, String comment, String schemaName) throws SQLException
Convenience method for creating and initializing a subset in one step. See method references below for more information.
subsetTable - supersetTable - conditionToCheck - comment - schemaName - SQLExceptioninitSubsetWithWhereClause(String, String, String, String)public void defineMirrorSubset(String subsetTable, String supersetTable, boolean performUpdate, String comment) throws SQLException
Convenience method for creating and initializing a subset in one step. See method references below for more information.
subsetTable - supersetTable - comment - SQLExceptionpublic void defineMirrorSubset(String subsetTable, String supersetTable, boolean performUpdate, Integer maxNumberRefHops, String comment) throws SQLException
Convenience method for creating and initializing a subset in one step. See method references below for more information.
subsetTable - supersetTable - maxNumberRefHops - the maximum number of times a foreign key reference to a data
table may be followedcomment - SQLExceptioncreateSubsetTable(String, String, Integer, String)public void defineMirrorSubset(String subsetTable, String supersetTable, boolean performUpdate, String comment, String schemaName) throws SQLException
Convenience method for creating and initializing a subset in one step. See method references below for more information.
subsetTable - supersetTable - comment - schemaName - SQLExceptionpublic void initRandomSubset(int size,
String subsetTable,
String superSetTable,
String schemaName)
Selects size rows of the given super set table randomly and
inserts them into the subset table.
size - size of the subset to createsubsetTable - name of subset table to insert the chosen rows intosuperSetTable - name of the table to choose fromschemaName - name of the schema to usepublic void initSubset(List<String> values, String subsetTable, String supersetTable, String columnToTest)
values - Desired values for the columnToTestsubsetTable - name of the subset tablesupersetTable - name of table to referencecolumnToTest - column to check for valuepublic void initSubset(List<String> values, String subsetTable, String supersetTable, String columnToTest, String schemaName)
values - Desired values for the columnToTestsubsetTable - name of the subset tablesupersetTable - name of table to referenceschemaName - schema to usecolumnToTest - column to check for valuepublic void initSubset(String subsetTable, String supersetTable)
subsetTable by inserting one row for each entry in supersetTable.subsetTable - supersetTable - initSubset(String, String, String)public void initSubset(String subsetTable, String supersetTable, String schemaName)
subsetTable - name of the subset tablesupersetTable - name of table to referenceschemaName - name of the schema used to determine the primary keyspublic void initSubsetWithWhereClause(String subsetTable, String supersetTable, String whereClause)
subsetTable - name of the subset tablesupersetTable - name of table to referencewhereClause - condition to check by a SQL WHERE clause, e.g. 'foo > 10'initSubsetWithWhereClause(String, String, String, String)public void initSubsetWithWhereClause(String subsetTable, String supersetTable, String whereClause, String schemaName)
subsetTable - name of the subset tablesupersetTable - name of table to referenceschemaName - name of the schema used to determine the primary keyswhereClause - condition to check by a SQL WHERE clause, e.g. 'foo > 10'public void initMirrorSubset(String subsetTable, String supersetTable, boolean performUpdate) throws SQLException
SQLExceptionpublic void initMirrorSubset(String subsetTable, String supersetTable, boolean performUpdate, String schemaName) throws SQLException
subsetTable - name of the subset tablesupersetTable - name of table to referenceSQLExceptionpublic void resetSubset(String subsetTableName)
is_processed, is_in_process,
has_errors and log columns of a subset to
FALSE.subsetTableName - name of the subset to resetpublic void resetSubset(String subsetTableName, boolean whereNotProcessed, boolean whereNoErrors, String lastComponent)
is_processed, is_in_process,
has_errors and log columns of a subset to
FALSE where the corresponding rows are
is_in_process or is_processed.
The boolean parameter whereNotProcessed is used for the use case
where only those rows should be reset that are in_process but
not is_processed which may happen when a pipeline crashed, a
document has errors or a pipeline ist just canceled.
In a similar fashion, whereNoErrors resets those rows that have
no errors.
Both boolean parameters may be combined in which case only non-processed rows without errors will be reset.
subsetTableName - name of the table to reset unprocessed rowspublic int[] resetSubset(String subsetTableName, List<Object[]> pkValues)
subsetTableName - pkValues - public int[] performBatchUpdate(List<Object[]> pkValues, String sqlFormatString, String schemaName)
pkValues - sqlFormatString - schemaName - resetSubset(String, List, String)public int[] resetSubset(String subsetTableName, List<Object[]> pkValues, String schemaName)
is_processed and
is_in_process rows of a subset to FALSE. Only
resets the subset table rows where the primary key equals one of the entries
in pkValues.subsetTableName - - name of the table to resetpkValues - - list of primary keyspublic int[] determineExistingSubsetRows(String subsetTableName, List<Object[]> pkValues, String schemaName)
public void importFromXML(Iterable<byte[]> xmls, String identifier, String tableName)
xmls - tableName - identifier - importFromXML(Iterable, String, String, String)public void importFromXML(Iterable<byte[]> xmls, String tableName, String identifier, String schemaName)
xmls - - an Iterator over XMLs as byte[]tableName - - name of the table to importidentifier - - used for error messagespublic void importFromXMLFile(String fileStr, String tableName)
fileStr - - path to file or directory of (G)Zipped MEDLINE XML file(s)tableName - - name of the target tableimportFromXMLFile(String, String, String)public void importFromXMLFile(String fileStr, String tableName, String schemaName)
fileStr - - path to file or directory of (G)Zipped MEDLINE XML file(s)tableName - - name of the target tableschemaName - the table schema to use for the importpublic void updateFromXML(String fileStr, String tableName)
fileStr - tableName - updateFromXML(String, String)public void updateFromXML(String fileStr, String tableName, String schemaName)
fileStr - - file containing new or updated entriestableName - - table to updatepublic void importFromRowIterator(Iterator<Map<String,Object>> it, String tableName)
it - tableName - public void importFromRowIterator(Iterator<Map<String,Object>> it, String tableName, String tableSchema)
it - tableName - public void importFromRowIterator(Iterator<Map<String,Object>> it, String tableName, Connection externalConn, boolean commit, String schemaName)
it - - an Iterator, yielding rows to insert into the databasetableName - - the updated tableexternalConn - - if not null, this connection will be employed instead
of asking for a new connectioncommit - - if true, the inserted data will be committed in batches
within this method; no commits will happen otherwise.schemaName - the name of the table schema corresponding to the data tablepublic void updateFromRowIterator(Iterator<Map<String,Object>> it, String tableName)
Updates a table with the entries yielded by the iterator. If the entries is not yet in the table, it will be inserted instead.
The input rows are expected to fit the active table schema.
it - - an Iterator, yielding new or updated entries.tableName - - the updated tablepublic void updateFromRowIterator(Iterator<Map<String,Object>> it, String tableName, Connection externalConn, boolean commit, String schemaName)
Updates a table with the entries yielded by the iterator. If the entries is not yet in the table, it will be inserted instead.
The input rows are expected to fit the table schema schemaName.
it - - an Iterator, yielding new or updated entries.tableName - - the updated tableexternalConn - - if not null, this connection will be employed instead
of asking for a new connectioncommit - - if true, the updated data will be committed in batches
within this method; nothing will be commit otherwise.schemaName - the name of the table schema corresponding to the updated data
tablepublic DBCIterator<byte[][]> queryWithTime(List<Object[]> ids, String table, String timestamp)
ids - table - timestamp - queryWithTime(List, String, String, String)public DBCIterator<byte[][]> queryWithTime(List<Object[]> ids, String table, String timestamp, String schemaName)
timestamp. The Iterator will use
threads, memory and a connection until all matches are returned.ids - - List with primary keystable - - table to querytimestamp - - timestamp (only rows with newer timestamp are returned)public DBCIterator<Object[]> queryAll(List<String> fields, String table)
field in the table
table. NOTE: The Iterator will use threads, memory and a
connection until the iterator is empty, i.e. hasNext() returns
null!fields - - field to returntable - - table to querypublic DBCIterator<Object[]> query(String table, List<String> fields)
table - The table to query.fields - The names of the columns to retrieve values from.public DBCIterator<Object[]> query(String table, List<String> fields, long limit)
table - The table to query.fields - The names of the columns to retrieve values from.limit - A limit of documents to retrieve.public DBCIterator<Object[]> query(List<String[]> keys, String table)
DEFAULT_FIELD in the given table.
The Iterator will use threads, memory and a connection until all matches were
returned.keys - table - query(List, String, String)public DBCIterator<Object[]> query(List<String[]> keys, String table, String schemaName)
DEFAULT_FIELD in the given table. The
Iterator will use threads, memory and a connection until all matches were
returned.keys - - list of String[] containing the parts of the primary keytable - - table to querypublic DBCIterator<byte[][]> retrieveColumnsByTableSchema(List<Object[]> ids, String table)
table from the database. The returned columns are those
that are configuration to be retrieved in the active table schema.ids - table - retrieveColumnsByTableSchema(List, String, String)public DBCIterator<byte[][]> retrieveColumnsByTableSchema(List<Object[]> ids, String table, String schemaName)
table from the database. The returned columns are those
that are configuration to be retrieved in the table schema with name schemaName.ids - table - schemaName - public DBCIterator<byte[][]> retrieveColumnsByTableSchema(List<Object[]> ids, String[] tables, String[] schemaNames)
tables and schemaName arrays are required to be parallel.ids - A list of primary keys identifying the items to retrieve.tables - The tables from which the items should be retrieved that are identified by ids.schemaNames - A parallel array to tables thas specifies the table schema name of each table.public DBCIterator<byte[][]> queryDataTable(String tableName, String whereCondition)
Returns all column data from the data table tableName which is
marked as 'to be retrieved' in the table scheme specified by the active table
scheme.
For more specific information, please refer to
queryDataTable(String, String, String).
tableName - Name of a data table.whereCondition - Optional additional specifications for the SQL "SELECT" statement.queryDataTable(String, String, String)public DBCIterator<byte[][]> queryDataTable(String tableName, String whereCondition, String schemaName)
Returns all column data from the data table tableName which is
marked as 'to be retrieved' in the table scheme specified by
schemaName.
This method offers direct access to the table data by using an SQL
ResultSet in cursor mode, allowing for queries leading to large
results.
An optional where clause (actually everything behind the "FROM" in the SQL select statement) may be passed to restrict the columns being returned. All specifications are allowed which do not alter the number of columns returned (like "GROUP BY").
tableName - Name of a data table.whereCondition - Optional additional specifications for the SQL "SELECT" statement.schemaName - The table schema name to determine which columns should be
retrieved. // * @return An iterator over byte[][] .
Each returned byte array contains one nested byte array for each
retrieved column, holding the column's data in a sequence of
bytes.public DBCIterator<byte[][]> querySubset(String tableName, long limitParam) throws SQLException
tableName - limitParam - SQLExceptionpublic int getQueryBatchSize()
public void setQueryBatchSize(int queryBatchSize)
public DBCIterator<byte[][]> querySubset(String tableName, String whereClause, long limitParam, Integer numberRefHops, String schemaName) throws SQLException
Retrieves XML field values in the data table referenced by the subset table
tableName or tableName itself if it is a data
table.
The method always first retrieves a batch of primary keys from the subset
table and then gets the actual documents from the data table (necessary for
the data table - subset paradigm). As this is unnecessary when querying
directly from a data table, for that kind of queries this method calls
queryDataTable(String, String, String).
The number of returned documents is restricted in number by
limitParam. All documents are returned if
limitParam is of negative value.
Note: Of course, whereClause could already contain an SQL
'LIMIT' specification. However, I won't work as expected since this limit
expression would be applied to each batch of subset-IDs which is used to
query the data table. Using the limitParam parameter will assure
you get at most as much documents from the iterator as specified. If
tableName denotes a data table and whereClause does
not already contain a 'LIMIT' expression, limitParam will be
added to whereClause for the subsequent call to
queryDataTable.
tableName - Subset table determining which documents to retrieve from the data
table; may also be a data table itself.whereClause - An SQL where clause restricting the returned columns of each
queried subset-ID batch. This clause must not change the rows
returned (e.g. by 'GROUP BY').limitParam - Number restriction of documents to return.numberRefHops - schemaName - The name of table schema of the referenced data table.tableName.SQLExceptionqueryDataTable(String, String, String)public org.apache.commons.lang3.tuple.Pair<Integer,List<Map<String,String>>> getNumColumnsAndFields(boolean joined, String[] schemaNames)
joined is set to false, only the
first table and the first schema is taken into account.joined - Whether the data is joined.schemaNames - The names of the table schemas of the tables that are read. From the respective table schemas,
the columns that are marked to be retrieved, are extracted.public long getNumRows(String tableName)
tableName - The table to count the rows of.public SubsetStatus status(String subsetTableName, Set<DataBaseConnector.StatusElement> statusElementsToReturn) throws TableNotFoundException
Constants.IN_PROCESS, Constants.PROCESSED and
Constants.TOTAL.subsetTableName - name of the subset table to gain status information forTableNotFoundException - If subsetTableName does not point to a database table.public List<String> getTableDefinition(String tableName)
tableName - - the tablepublic String getScheme()
public FieldConfig getFieldConfiguration()
public void addFieldConfiguration(FieldConfig config)
public FieldConfig getFieldConfiguration(String schemaName)
schemaName - The name of the schema for which the eventual
FieldConfig should be returned.schemaName.public void checkTableDefinition(String tableName) throws TableSchemaMismatchException
tableName - The table to check.TableSchemaMismatchExceptioncheckTableDefinition(String, String)public void checkTableDefinition(String tableName, String schemaName) throws TableSchemaMismatchException
tableName - - table to checkTableSchemaMismatchExceptionpublic void setProcessed(String subsetTableName, ArrayList<byte[][]> primaryKeyList)
Sets the values of is_processed to TRUE and of
is_in_process to FALSE for a collection of
documents according to the given primary keys.
subsetTableName - name of the subsetprimaryKeyList - the list of primary keys which itself can consist of several
primary key elementspublic void setException(String subsetTableName, ArrayList<byte[][]> primaryKeyList, HashMap<byte[][],String> logException)
Sets the value of has_errors to TRUE and adds a
description in log for exceptions which occured during the
processing of a collection of documents according to the given primary keys.
subsetTableName - name of the subsetprimaryKeyList - the list of primary keys which itself can consist of several
primary key elementslogException - matches primary keys of unsuccessfully processed documents and
exceptions that occured during the processingpublic List<Integer> getPrimaryKeyIndices()
public void checkTableSchemaCompatibility(String referenceSchema, String[] schemaNames) throws TableSchemaMismatchException
TableSchemaMismatchExceptionpublic void checkTableSchemaCompatibility(String... schemaNames) throws TableSchemaMismatchException
TableSchemaMismatchExceptionpublic String getDbURL()
public void setDbURL(String uri)
public void close()
public boolean isDatabaseReachable()
public FieldConfig addXmiDocumentFieldConfiguration(List<Map<String,String>> primaryKey, boolean doGzip)
primaryKey - The document primary key for which a document CAS XMI table schema should be created.doGzip - Whether the XMI data should be gzipped in the table.public FieldConfig addPKAdaptedFieldConfiguration(List<Map<String,String>> primaryKey, String fieldConfigurationForAdaption, String fieldConfigurationNameSuffix)
public FieldConfig addXmiTextFieldConfiguration(List<Map<String,String>> primaryKey, boolean doGzip)
addXmiAnnotationFieldConfiguration(List, boolean).
This method is used by the Jena Document Information
System (JeDIS) components jcore-xmi-db-reader and jcore-xmi-db-consumer.primaryKey - The document primary key for which an base document XMI segmentation table schema should be created.doGzip - Whether the XMI data should be gzipped in the table.public FieldConfig addXmiAnnotationFieldConfiguration(List<Map<String,String>> primaryKey, boolean doGzip)
xmi and will store the actual XMI annotation data. This table schema
is used for the storage of XMI annotation graph segments. Those segments will then correspond to
UIMA annotation types that are stored in tables of their own. A table schema to store the base document
is created by addXmiTextFieldConfiguration(List, boolean).
This method is used by the Jena Document Information
System (JeDIS) components jcore-xmi-db-reader and jcore-xmi-db-consumer.primaryKey - The document primary key for which an base document XMI segmentation table schema should be created.doGzip - Whether the XMI data should be gzipped in the table.Copyright © 2018 JULIE Lab, Germany. All rights reserved.