Class StreamAppenderatorDriver

java.lang.Object
org.apache.druid.segment.realtime.appenderator.BaseAppenderatorDriver
org.apache.druid.segment.realtime.appenderator.StreamAppenderatorDriver
All Implemented Interfaces:
Closeable, AutoCloseable

public class StreamAppenderatorDriver extends BaseAppenderatorDriver
This class is specialized for streaming ingestion. In streaming ingestion, the segment lifecycle is like:
 APPENDING -> APPEND_FINISHED -> PUBLISHED
 
  • APPENDING: Segment is available for appending.
  • APPEND_FINISHED: Segment cannot be updated (data cannot be added anymore) and is waiting for being published.
  • PUBLISHED: Segment is pushed to deep storage, its metadata is published to metastore, and finally the segment is dropped from local storage
  • Constructor Details

    • StreamAppenderatorDriver

      public StreamAppenderatorDriver(Appenderator appenderator, SegmentAllocator segmentAllocator, SegmentHandoffNotifierFactory handoffNotifierFactory, PublishedSegmentRetriever segmentRetriever, DataSegmentKiller dataSegmentKiller, com.fasterxml.jackson.databind.ObjectMapper objectMapper, SegmentGenerationMetrics metrics)
      Create a driver.
      Parameters:
      appenderator - appenderator
      segmentAllocator - segment allocator
      handoffNotifierFactory - handoff notifier factory
      segmentRetriever - used segment checker
      objectMapper - object mapper, used for serde of commit metadata
      metrics - Firedepartment metrics
  • Method Details

    • startJob

      @Nullable public Object startJob(AppenderatorDriverSegmentLockHelper lockHelper)
      Description copied from class: BaseAppenderatorDriver
      Perform any initial setup and return currently persisted commit metadata.

      Note that this method returns the same metadata you've passed in with your Committers, even though this class stores extra metadata on disk.

      Specified by:
      startJob in class BaseAppenderatorDriver
      Returns:
      currently persisted commit metadata
    • add

      public AppenderatorDriverAddResult add(InputRow row, String sequenceName, com.google.common.base.Supplier<Committer> committerSupplier) throws IOException
      Throws:
      IOException
    • add

      public AppenderatorDriverAddResult add(InputRow row, String sequenceName, com.google.common.base.Supplier<Committer> committerSupplier, boolean skipSegmentLineageCheck, boolean allowIncrementalPersists) throws IOException
      Add a row. Must not be called concurrently from multiple threads.
      Parameters:
      row - the row to add
      sequenceName - sequenceName for this row's segment
      committerSupplier - supplier of a committer associated with all data that has been added, including this row if is set to false then this will not be used
      skipSegmentLineageCheck - Should be set false to perform lineage validation using previousSegmentId for this sequence. Note that for Kafka Streams we should disable this check and set this parameter to true. if true, skips, does not enforce, lineage validation.
      allowIncrementalPersists - whether to allow persist to happen when maxRowsInMemory or intermediate persist period threshold is hit
      Returns:
      AppenderatorDriverAddResult
      Throws:
      IOException - if there is an I/O error while allocating or writing to a segment
    • moveSegmentOut

      public void moveSegmentOut(String sequenceName, List<SegmentIdWithShardSpec> identifiers)
      Move a set of identifiers out from "active", making way for newer segments. This method is to support KafkaIndexTask's legacy mode and will be removed in the future. See KakfaIndexTask.runLegacy().
    • persist

      public Object persist(Committer committer) throws InterruptedException
      Persist all data indexed through this driver so far. Blocks until complete.

      Should be called after all data has been added through add(InputRow, String, Supplier, boolean, boolean).

      Parameters:
      committer - committer representing all data that has been added so far
      Returns:
      commitMetadata persisted
      Throws:
      InterruptedException
    • persistAsync

      public com.google.common.util.concurrent.ListenableFuture<Object> persistAsync(Committer committer)
      Persist all data indexed through this driver so far. Returns a future of persisted commitMetadata.

      Should be called after all data has been added through add(InputRow, String, Supplier, boolean, boolean).

      Parameters:
      committer - committer representing all data that has been added so far
      Returns:
      future containing commitMetadata persisted
    • publish

      public com.google.common.util.concurrent.ListenableFuture<SegmentsAndCommitMetadata> publish(TransactionalSegmentPublisher publisher, Committer committer, Collection<String> sequenceNames)
      Execute a task in background to publish all segments corresponding to the given sequence names. The task internally pushes the segments to the deep storage first, and then publishes the metadata to the metadata storage.
      Parameters:
      publisher - segment publisher
      committer - committer
      sequenceNames - a collection of sequence names to be published
      Returns:
      a ListenableFuture for the submitted task which removes published sequenceNames from activeSegments and publishPendingSegments
    • registerHandoff

      public com.google.common.util.concurrent.ListenableFuture<SegmentsAndCommitMetadata> registerHandoff(SegmentsAndCommitMetadata segmentsAndCommitMetadata)
      Register the segments in the given SegmentsAndCommitMetadata to be handed off and execute a background task which waits until the hand off completes.
      Parameters:
      segmentsAndCommitMetadata - the result segments and metadata of publish(TransactionalSegmentPublisher, Committer, Collection)
      Returns:
      null if the input segmentsAndMetadata is null. Otherwise, a ListenableFuture for the submitted task which returns SegmentsAndCommitMetadata containing the segments successfully handed off and the metadata of the caller of AppenderatorDriverMetadata
    • publishAndRegisterHandoff

      public com.google.common.util.concurrent.ListenableFuture<SegmentsAndCommitMetadata> publishAndRegisterHandoff(TransactionalSegmentPublisher publisher, Committer committer, Collection<String> sequenceNames)
    • close

      public void close()
      Description copied from class: BaseAppenderatorDriver
      Closes this driver. Does not close the underlying Appenderator; you should do that yourself.
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable
      Overrides:
      close in class BaseAppenderatorDriver