Interface Appenderator

All Superinterfaces:
QuerySegmentWalker
All Known Implementing Classes:
BatchAppenderator, StreamAppenderator

public interface Appenderator extends QuerySegmentWalker
An Appenderator indexes data. It has some in-memory data and some persisted-on-disk data. It can serve queries on both of those. It can also push data to deep storage. But, it does not decide which segments data should go into. It also doesn't publish segments to the metadata store or monitor handoff; you have to do that yourself!

You can provide a Committer or a Supplier of one when you call one of the methods that add(org.apache.druid.segment.realtime.appenderator.SegmentIdWithShardSpec, org.apache.druid.data.input.InputRow, com.google.common.base.Supplier<org.apache.druid.data.input.Committer>), persistAll(org.apache.druid.data.input.Committer), or push(java.util.Collection<org.apache.druid.segment.realtime.appenderator.SegmentIdWithShardSpec>, org.apache.druid.data.input.Committer, boolean). The Committer should represent all data you have given to the Appenderator so far. This Committer will be used when that data has been persisted to disk. Concurrency: all methods defined in this class directly, including close() and closeNow(), i. e. all methods of the data appending and indexing lifecycle except drop(org.apache.druid.segment.realtime.appenderator.SegmentIdWithShardSpec) must be called from a single thread. Methods inherited from QuerySegmentWalker can be called concurrently from multiple threads.

Important note: For historical reasons there was a single implementation for this interface (AppenderatorImpl) but that since has been split into two classes: StreamAppenderator and BatchAppenderator. With this change all the query support & concurrency has been removed/changed in BatchAppenderator therefore this class no longer makes sense to have as an Appenderator. In the future we may want to refactor away the Appenderator interface from BatchAppenderator.

  • Method Details

    • getId

      String getId()
      Return the identifier of this Appenderator; useful for log messages and such.
    • getDataSource

      String getDataSource()
      Return the name of the dataSource associated with this Appenderator.
    • startJob

      Object startJob()
      Perform any initial setup. Should be called before using any other methods.
      Returns:
      currently persisted commit metadata
    • add

      default Appenderator.AppenderatorAddResult add(SegmentIdWithShardSpec identifier, InputRow row, com.google.common.base.Supplier<Committer> committerSupplier) throws SegmentNotWritableException
      Same as add(SegmentIdWithShardSpec, InputRow, Supplier, boolean), with allowIncrementalPersists set to true
      Throws:
      SegmentNotWritableException
    • add

      Appenderator.AppenderatorAddResult add(SegmentIdWithShardSpec identifier, InputRow row, @Nullable com.google.common.base.Supplier<Committer> committerSupplier, boolean allowIncrementalPersists) throws SegmentNotWritableException
      Add a row. Must not be called concurrently from multiple threads.

      If no pending segment exists for the provided identifier, a new one will be created.

      This method may trigger a persistAll(Committer) using the supplied Committer. If it does this, the Committer is guaranteed to be *created* synchronously with the call to add, but will actually be used asynchronously.

      If committer is not provided, no metadata is persisted.

      Parameters:
      identifier - the segment into which this row should be added
      row - the row to add
      committerSupplier - supplier of a committer associated with all data that has been added, including this row if allowIncrementalPersists is set to false then this will not be used as no persist will be done automatically
      allowIncrementalPersists - indicate whether automatic persist should be performed or not if required. If this flag is set to false then the return value should have Appenderator.AppenderatorAddResult.isPersistRequired set to true if persist was skipped because of this flag and it is assumed that the responsibility of calling persistAll(Committer) is on the caller.
      Returns:
      Appenderator.AppenderatorAddResult
      Throws:
      SegmentNotWritableException - if the requested segment is known, but has been closed
    • getSegments

      Returns a list of all currently active segments.
    • getRowCount

      int getRowCount(SegmentIdWithShardSpec identifier)
      Returns the number of rows in a particular pending segment.
      Parameters:
      identifier - segment to examine
      Returns:
      row count
      Throws:
      IllegalStateException - if the segment is unknown
    • getTotalRowCount

      int getTotalRowCount()
      Returns the number of total rows in this appenderator of all segments pending push.
      Returns:
      total number of rows
    • clear

      void clear() throws InterruptedException
      Drop all in-memory and on-disk data, and forget any previously-remembered commit metadata. This could be useful if, for some reason, rows have been added that we do not actually want to hand off. Blocks until all data has been cleared. This may take some time, since all pending persists must finish first.
      Throws:
      InterruptedException
    • drop

      com.google.common.util.concurrent.ListenableFuture<?> drop(SegmentIdWithShardSpec identifier)
      Schedule dropping all data associated with a particular pending segment. Unlike clear()), any on-disk commit metadata will remain unchanged. If there is no pending segment with this identifier, then this method will do nothing.

      You should not write to the dropped segment after calling "drop". If you need to drop all your data and re-write it, consider clear() instead. This method might be called concurrently from a thread different from the "main data appending / indexing thread", from where all other methods in this class (except those inherited from QuerySegmentWalker) are called. This typically happens when drop() is called in an async future callback. drop() itself is cheap and relays heavy dropping work to an internal executor of this Appenderator.

      Parameters:
      identifier - the pending segment to drop
      Returns:
      future that resolves when data is dropped
    • persistAll

      com.google.common.util.concurrent.ListenableFuture<Object> persistAll(@Nullable Committer committer)
      Persist any in-memory indexed data to durable storage. This may be only somewhat durable, e.g. the machine's local disk. The Committer will be made synchronously with the call to persistAll, but will actually be used asynchronously. Any metadata returned by the committer will be associated with the data persisted to disk.

      If committer is not provided, no metadata is persisted.

      Parameters:
      committer - a committer associated with all data that has been added so far
      Returns:
      future that resolves when all pending data has been persisted, contains commit metadata for this persist
    • push

      com.google.common.util.concurrent.ListenableFuture<SegmentsAndCommitMetadata> push(Collection<SegmentIdWithShardSpec> identifiers, @Nullable Committer committer, boolean useUniquePath)
      Merge and push particular segments to deep storage. This will trigger an implicit persistAll(Committer) using the provided Committer.

      After this method is called, you cannot add new data to any segments that were previously under construction.

      If committer is not provided, no metadata is persisted.

      Parameters:
      identifiers - list of segments to push
      committer - a committer associated with all data that has been added so far
      useUniquePath - true if the segment should be written to a path with a unique identifier
      Returns:
      future that resolves when all segments have been pushed. The segment list will be the list of segments that have been pushed and the commit metadata from the Committer.
    • close

      void close()
      Stop any currently-running processing and clean up after ourselves. This allows currently running persists and pushes to finish. This will not remove any on-disk persisted data, but it will drop any data that has not yet been persisted.
    • closeNow

      void closeNow()
      Stop all processing, abandoning current pushes, currently running persist may be allowed to finish if they persist critical metadata otherwise shutdown immediately. This will not remove any on-disk persisted data, but it will drop any data that has not yet been persisted. Since this does not wait for pushes to finish, implementations have to make sure if any push is still happening in background thread then it does not cause any problems.
    • setTaskThreadContext

      default void setTaskThreadContext()
      Sets thread context for task threads on Indexers. Since the Appenderator and the underlying threadpools for persist, push, publish are freshly created for each task ID, this context need not be cleared.