Class DocumentReleaseCheckpoint
- java.lang.Object
-
- de.julielab.jcore.ae.checkpoint.DocumentReleaseCheckpoint
-
public class DocumentReleaseCheckpoint extends Object
This is class is a synchronization point for JeDIS components to report documents as being completely finished with processing.
Problem explanation: This synchronization is necessary because most database operating components work in batch mode for performance reasons. However, if multiple components use batching wich might be out of sync due to different batch sizes and possibly other factors, one component may have sent a batch of document data to the database while other components have not at a particular point in time. If at such a time point the pipeline crashes or is manually interrupted, the actually written data is incoherent in the sense that some components have sent their data for a particular document and others have not.
This class does not completely resolve this issue, i.e. asynchronously sending of batches is still an issue when using this class. However, this class is used by the
DBCheckpointAEto determine if a set of registered components have all released a givenDocumentIdbefore marking it as successfully processed in the JeDIS database subset table. In this way, an uncoherent state can be seen in the database by items that are in process but have not been processed after the pipeline finishes.Those documents can then easily be reprocessed by removing the in process mark with CoStoSys.
Note that this requires that the DBCheckpointAE marking documents as processed is the last component in the pipeline
-
-
Field Summary
Fields Modifier and Type Field Description static StringPARAM_JEDIS_SYNCHRONIZATION_KEYstatic StringSYNC_PARAM_DESC
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static DocumentReleaseCheckpointget()intgetNumberOfRegisteredComponents()Returns the number of currently registered components.Set<DocumentId>getReleasedDocumentIds()Used by theDBCheckpointAEto determine documents that can safely be marked as being finished with processing.voidregister(String componentKey)Registers a component that will addDocumentIds via therelease(String, Stream)method.voidrelease(String componentKey, Stream<DocumentId> releasedDocumentIds)To be called from synchronizing components.voidunregister(String componentKey)Removes a component from the list of document ID releasing components.
-
-
-
Field Detail
-
SYNC_PARAM_DESC
public static final String SYNC_PARAM_DESC
- See Also:
- Constant Field Values
-
PARAM_JEDIS_SYNCHRONIZATION_KEY
public static final String PARAM_JEDIS_SYNCHRONIZATION_KEY
- See Also:
- Constant Field Values
-
-
Method Detail
-
get
public static DocumentReleaseCheckpoint get()
-
register
public void register(String componentKey)
Registers a component that will add
DocumentIds via therelease(String, Stream)method.- Parameters:
componentKey- A canonical identifier of the component taking part in synchronization.
-
unregister
public void unregister(String componentKey)
Removes a component from the list of document ID releasing components.
This method is not commonly required and only here for functional completeness.
- Parameters:
componentKey- The canonical identifier provided inregister(String)earlier.
-
release
public void release(String componentKey, Stream<DocumentId> releasedDocumentIds)
To be called from synchronizing components. They send their registration key and the document IDs they are positively finished with.
- Parameters:
componentKey- The canonical identifier provided inregister(String)earlier.releasedDocumentIds- The document IDs to be released.
-
getReleasedDocumentIds
public Set<DocumentId> getReleasedDocumentIds()
Used by the
DBCheckpointAEto determine documents that can safely be marked as being finished with processing.Gets all the document IDs from all synchronizing components that those components have released. The returned list will contain duplicates of document IDs when multiple components have released that document. The
DBCheckpointAEwill only mark those documents as processed that have been released as often as synchronizing components have been registered withregister(String).- Returns:
- The currently released document IDs.
-
getNumberOfRegisteredComponents
public int getNumberOfRegisteredComponents()
Returns the number of currently registered components.
- Returns:
- The number of currently registered components.
-
-