public abstract class DBReader extends DBSubsetReader
The reader interacts with two tables: One 'subset' table listing the document collection with one document per row. Additionally, each row contains fields for information about current processing status of a document as well as error status and processing host. This table will be locked while getting a batch of documents to process, thus it furthermore serves as a synchronization medium.
The second table holds the actual data, thus we say 'data table'. The subset table has to define foreign keys to the data table. In this way, the reader is able to determine from which table to retrieve the document data.
This data management is done by the julie-medline-manager package.
Please note that this class does not implement
JCasCollectionReader_ImplBase.getNext(org.apache.uima.cas.CAS). Instead,
getNextArtifactData() is offered to expose the documents read from the
database. Until this point, no assumption about the document's structure has
been made. That is, we don't care in this class whether we deal with Medline
abstracts, plain texts, some HTML documents or whatever. Translating these
documents into a CAS with respect to a particular type system is delegated to
the extending class.
| Modifier and Type | Class and Description |
|---|---|
protected class |
DBReader.RetrievingThread
This class is charged with retrieving batches of document IDs and documents while previously fetched documents
are in process.
|
| Modifier and Type | Field and Description |
|---|---|
protected String |
dataTimestamp |
additionalTableNames, additionalTableSchemas, dataTable, fetchIdsProactively, hostName, PARAM_ADDITIONAL_TABLES, PARAM_ADDITONAL_TABLES_STORAGE_PG_SCHEMA, PARAM_RESET_TABLE, pid, readDataTable, resetTable, schemas, tablesbatchSize, costosysConfig, dbc, driver, hasNext, joinTables, limitParameter, numberFetchedDocIDs, PARAM_BATCH_SIZE, PARAM_COSTOSYS_CONFIG_NAME, PARAM_TABLE, processedDocuments, selectionOrder, tableName, totalDocumentCount, whereCondition| Constructor and Description |
|---|
DBReader() |
| Modifier and Type | Method and Description |
|---|---|
void |
close() |
byte[][] |
getNextArtifactData()
Returns the next byte[][] containing a byte[] for the pmid at [0] and a
byte[] for the XML at [1] or null if there are no unprocessed documents left.
|
org.apache.uima.util.Progress[] |
getProgress() |
protected abstract String |
getReaderComponentName() |
boolean |
hasNext() |
void |
initialize(org.apache.uima.UimaContext context) |
static String |
setDBProcessingMetaData(de.julielab.xmlData.dataBase.DataBaseConnector dbc,
boolean readDataTable,
String tableName,
byte[][] data,
org.apache.uima.jcas.JCas cas) |
checkAdditionalTableParameters, checkAndAdjustAdditionalTables, getAllRetrievedColumnsgetLogger, getNext, getNext, initializedestroy, getCasInitializer, getProcessingResourceMetaData, initialize, isConsuming, reconfigure, setCasInitializer, typeSystemInitgetConfigParameterValue, getConfigParameterValue, setConfigParameterValue, setConfigParameterValuegetCasManager, getMetaData, getRelativePathResolver, getResourceManager, getUimaContext, getUimaContextAdmin, setLogger, setMetaDataclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitprotected String dataTimestamp
public static String setDBProcessingMetaData(de.julielab.xmlData.dataBase.DataBaseConnector dbc, boolean readDataTable, String tableName, byte[][] data, org.apache.uima.jcas.JCas cas)
public void initialize(org.apache.uima.UimaContext context)
throws org.apache.uima.resource.ResourceInitializationException
initialize in class DBSubsetReaderorg.apache.uima.resource.ResourceInitializationExceptionpublic boolean hasNext()
throws IOException,
org.apache.uima.collection.CollectionException
IOExceptionorg.apache.uima.collection.CollectionExceptionpublic byte[][] getNextArtifactData()
throws org.apache.uima.collection.CollectionException
org.apache.uima.collection.CollectionExceptionpublic org.apache.uima.util.Progress[] getProgress()
public void close()
close in interface org.apache.uima.collection.base_cpm.BaseCollectionReaderclose in class org.apache.uima.fit.component.JCasCollectionReader_ImplBaseprotected abstract String getReaderComponentName()
Copyright © 2019 JULIE Lab, Germany. All rights reserved.