Interface IndexerMetadataStorageCoordinator

All Known Implementing Classes:
IndexerSQLMetadataStorageCoordinator

public interface IndexerMetadataStorageCoordinator
Handles metadata transactions performed by the Overlord.
  • Method Details

    • retrieveAllDatasourceNames

      Set<String> retrieveAllDatasourceNames()
      Returns:
      Set of all datasource names for which there are used or unused segments present in the metadata store.
    • retrieveUsedSegmentsForInterval

      default Set<DataSegment> retrieveUsedSegmentsForInterval(String dataSource, org.joda.time.Interval interval, Segments visibility)
      Retrieves all published segments that have partial or complete overlap with the given interval and are marked as used.
    • retrieveAllUsedSegments

      Set<DataSegment> retrieveAllUsedSegments(String dataSource, Segments visibility)
      Retrieves all published used segments for the given data source.
      See Also:
    • retrieveUsedSegmentsAndCreatedDates

      Collection<Pair<DataSegment,String>> retrieveUsedSegmentsAndCreatedDates(String dataSource, List<org.joda.time.Interval> intervals)
      Retrieve all published segments which are marked as used and the created_date of these segments belonging to the given data source and list of intervals from the metadata store.

      Unlike other similar methods in this interface, this method doesn't accept a Segments "visibility" parameter. The returned collection may include overshadowed segments and their created_dates, as if Segments.INCLUDING_OVERSHADOWED was passed. It's the responsibility of the caller to filter out overshadowed ones if needed.

      Parameters:
      dataSource - The data source to query
      intervals - The list of interval to query
      Returns:
      The DataSegments and the related created_date of segments
    • retrieveUsedSegmentsForIntervals

      Set<DataSegment> retrieveUsedSegmentsForIntervals(String dataSource, List<org.joda.time.Interval> intervals, Segments visibility)
      Retrieves all published segments that have partial or complete overlap with the given intervals and are marked as used.
    • retrieveUnusedSegmentsForInterval

      default List<DataSegment> retrieveUnusedSegmentsForInterval(String dataSource, org.joda.time.Interval interval, @Nullable Integer limit, @Nullable org.joda.time.DateTime maxUsedStatusLastUpdatedTime)
      Retrieve all published segments which include ONLY data within the given interval and are marked as unused from the metadata store.
      Parameters:
      dataSource - The data source the segments belong to
      interval - Filter the data segments to ones that include data in this interval exclusively.
      limit - The maximum number of unused segments to retreive. If null, no limit is applied.
      maxUsedStatusLastUpdatedTime - The maximum used_status_last_updated time. Any unused segment in interval with used_status_last_updated no later than this time will be included in the kill task. Segments without used_status_last_updated time (due to an upgrade from legacy Druid) will have maxUsedStatusLastUpdatedTime ignored
      Returns:
      DataSegments which include ONLY data within the requested interval and are marked as unused. Segments NOT returned here may include data in the interval
    • retrieveUnusedSegmentsForInterval

      List<DataSegment> retrieveUnusedSegmentsForInterval(String dataSource, org.joda.time.Interval interval, @Nullable List<String> versions, @Nullable Integer limit, @Nullable org.joda.time.DateTime maxUsedStatusLastUpdatedTime)
      Retrieve all published segments which include ONLY data within the given interval and are marked as unused from the metadata store.
      Parameters:
      dataSource - The data source the segments belong to
      interval - Filter the data segments to ones that include data in this interval exclusively.
      versions - An optional list of segment versions to retrieve in the given interval. If unspecified, all versions of unused segments in the interval must be retrieved. If an empty list is passed, no segments are retrieved.
      limit - The maximum number of unused segments to retreive. If null, no limit is applied.
      maxUsedStatusLastUpdatedTime - The maximum used_status_last_updated time. Any unused segment in interval with used_status_last_updated no later than this time will be included in the kill task. Segments without used_status_last_updated time (due to an upgrade from legacy Druid) will have maxUsedStatusLastUpdatedTime ignored
      Returns:
      DataSegments which include ONLY data within the requested interval and are marked as unused. Segments NOT returned here may include data in the interval
    • retrieveUnusedSegmentsWithExactInterval

      List<DataSegment> retrieveUnusedSegmentsWithExactInterval(String dataSource, org.joda.time.Interval interval, org.joda.time.DateTime maxUpdatedTime, int limit)
      Retrieves unused segments from the metadata store that match the given interval exactly. There is no guarantee on the order of segments in the list or on whether the limited list contains the highest or lowest segment IDs in the interval.
      Parameters:
      interval - Returned segments must exactly match this interval.
      maxUpdatedTime - Returned segments must have a used_status_last_updated which is either null or earlier than this value.
      limit - Maximum number of segments to return.
      Returns:
      Unsorted list of unused segments that match the given parameters.
    • retrieveSegmentsById

      Set<DataSegment> retrieveSegmentsById(String dataSource, Set<String> segmentIds)
      Retrieves segments for the given IDs, regardless of their visibility (visible, overshadowed or unused).
    • markSegmentAsUnused

      boolean markSegmentAsUnused(SegmentId segmentId)
      Marks the segment as unused.
      Returns:
      true if the segment was updated, false otherwise
    • markSegmentsAsUnused

      int markSegmentsAsUnused(String dataSource, Set<SegmentId> segmentIds)
      Marks the given segments as unused.
      Returns:
      Number of segments updated
    • markAllSegmentsAsUnused

      int markAllSegmentsAsUnused(String dataSource)
      Marks all the segments in given datasource as unused.
      Returns:
      Number of updated segments
    • markSegmentsWithinIntervalAsUnused

      int markSegmentsWithinIntervalAsUnused(String dataSource, org.joda.time.Interval interval, @Nullable List<String> versions)
      Marks segments that are fully contained in the given interval as unused.
      Parameters:
      versions - Optional list of segment versions eligible for update. If this list is passed as null, all segment versions are eligible for updated. If passed as empty, no segment is updated.
      Returns:
      Number of segments updated
    • commitSegments

      Set<DataSegment> commitSegments(Set<DataSegment> segments, @Nullable SegmentSchemaMapping segmentSchemaMapping)
      Attempts to insert a set of segments and corresponding schema to the metadata storage. Returns the set of segments actually added (segments with identifiers already in the metadata storage will not be added).
      Parameters:
      segments - set of segments to add
      segmentSchemaMapping - segment schema information to add
      Returns:
      set of segments actually added
    • allocatePendingSegments

      Map<SegmentCreateRequest,SegmentIdWithShardSpec> allocatePendingSegments(String dataSource, org.joda.time.Interval interval, boolean skipSegmentLineageCheck, List<SegmentCreateRequest> requests, boolean reduceMetadataIO)
      Allocates pending segments for the given requests in the pending segments table. The segment id allocated for a request will not be given out again unless a request is made with the same SegmentCreateRequest.
      Parameters:
      dataSource - dataSource for which to allocate a segment
      interval - interval for which to allocate a segment
      skipSegmentLineageCheck - if true, perform lineage validation using previousSegmentId for this sequence. Should be set to false if replica tasks would index events in same order
      requests - Requests for which to allocate segments. All the requests must share the same partition space.
      reduceMetadataIO - If true, try to use the segment ids instead of fetching every segment payload from the metadata store
      Returns:
      Map from request to allocated segment id. The map does not contain entries for failed requests.
    • getSegmentTimelineForAllocation

      SegmentTimeline getSegmentTimelineForAllocation(String dataSource, org.joda.time.Interval interval, boolean skipSegmentPayloadFetchForAllocation)
      Return a segment timeline of all used segments including overshadowed ones for a given datasource and interval if skipSegmentPayloadFetchForAllocation is set to true, do not fetch all the segment payloads for allocation Instead fetch all the ids and numCorePartitions using exactly one segment per version per interval return a dummy DataSegment for each id that holds only the SegmentId and a NumberedShardSpec with numCorePartitions
    • allocatePendingSegment

      @Nullable SegmentIdWithShardSpec allocatePendingSegment(String dataSource, org.joda.time.Interval interval, boolean skipSegmentLineageCheck, SegmentCreateRequest createRequest)
      Allocate a new pending segment in the pending segments table. This segment identifier will never be given out again, unless another call is made with the same dataSource, sequenceName, and previousSegmentId.

      The sequenceName and previousSegmentId parameters are meant to make it easy for two independent ingestion tasks to produce the same series of segments.

      Note that a segment sequence may include segments with a variety of different intervals and versions.

      Parameters:
      dataSource - dataSource for which to allocate a segment
      interval - interval for which to allocate a segment
      skipSegmentLineageCheck - if true, perform lineage validation using previousSegmentId for this sequence. Should be set to false if replica tasks would index events in same order
      Returns:
      the pending segment identifier, or null if it was impossible to allocate a new segment
    • deletePendingSegmentsCreatedInInterval

      int deletePendingSegmentsCreatedInInterval(String dataSource, org.joda.time.Interval deleteInterval)
      Delete pending segments created in the given interval belonging to the given data source from the pending segments table. The created_date field of the pending segments table is checked to find segments to be deleted.

      Note that the semantic of the interval (for `created_date`s) is different from the semantic of the interval parameters in some other methods in this class, such as retrieveUsedSegmentsForInterval(java.lang.String, org.joda.time.Interval, org.apache.druid.indexing.overlord.Segments) (where the interval is about the time column value in rows belonging to the segment).

      Parameters:
      dataSource - dataSource
      deleteInterval - interval to check the created_date of pendingSegments
      Returns:
      number of deleted pending segments
    • deletePendingSegments

      int deletePendingSegments(String dataSource)
      Delete all pending segments belonging to the given data source from the pending segments table.
      Returns:
      number of deleted pending segments
      See Also:
    • commitSegmentsAndMetadata

      SegmentPublishResult commitSegmentsAndMetadata(Set<DataSegment> segments, @Nullable String supervisorId, @Nullable DataSourceMetadata startMetadata, @Nullable DataSourceMetadata endMetadata, @Nullable SegmentSchemaMapping segmentSchemaMapping)
      Attempts to insert a set of segments and corresponding schema to the metadata storage. Returns the set of segments actually added (segments with identifiers already in the metadata storage will not be added).

      If startMetadata and endMetadata are set, this insertion will be atomic with a compare-and-swap on dataSource commit metadata.

      If segmentsToDrop is not null and not empty, this insertion will be atomic with a insert-and-drop on inserting and dropping .

      Parameters:
      supervisorId - supervisorID which is committing the segments. Cannot be null if startMetadata and endMetadata are both non-null.
      segments - set of segments to add, must all be from the same dataSource
      startMetadata - dataSource metadata pre-insert must match this startMetadata according to DataSourceMetadata.matches(DataSourceMetadata). If null, this insert will not involve a metadata transaction
      endMetadata - dataSource metadata post-insert will have this endMetadata merged in with DataSourceMetadata.plus(DataSourceMetadata). If null, this insert will not involve a metadata transaction
      segmentSchemaMapping - segment schema information to persist.
      Returns:
      segment publish result indicating transaction success or failure, and set of segments actually published. This method must only return a failure code if it is sure that the transaction did not happen. If it is not sure, it must throw an exception instead.
      Throws:
      IllegalArgumentException - if startMetadata and endMetadata are not either both null or both non-null
      RuntimeException - if the state of metadata storage after this call is unknown
    • commitAppendSegments

      SegmentPublishResult commitAppendSegments(Set<DataSegment> appendSegments, Map<DataSegment,ReplaceTaskLock> appendSegmentToReplaceLock, String taskAllocatorId, @Nullable SegmentSchemaMapping segmentSchemaMapping)
      Commits segments and corresponding schema created by an APPEND task. This method also handles segment upgrade scenarios that may result from concurrent append and replace.
      • If a REPLACE task committed a segment that overlaps with any of the appendSegments while this APPEND task was in progress, the appendSegments are upgraded to the version of the replace segment.
      • If an appendSegment is covered by a currently active REPLACE lock, then an entry is created for it in the upgrade_segments table, so that when the REPLACE task finishes, it can upgrade the appendSegment as required.
      Parameters:
      appendSegments - All segments created by an APPEND task that must be committed in a single transaction.
      appendSegmentToReplaceLock - Map from append segment to the currently active REPLACE lock (if any) covering it
      taskAllocatorId - allocator id of the task committing the segments to be appended
      segmentSchemaMapping - schema of append segments
    • commitAppendSegmentsAndMetadata

      SegmentPublishResult commitAppendSegmentsAndMetadata(Set<DataSegment> appendSegments, Map<DataSegment,ReplaceTaskLock> appendSegmentToReplaceLock, @Nullable String supervisorId, DataSourceMetadata startMetadata, DataSourceMetadata endMetadata, String taskAllocatorId, @Nullable SegmentSchemaMapping segmentSchemaMapping)
      Commits segments created by an APPEND task. This method also handles segment upgrade scenarios that may result from concurrent append and replace. Also commits start and end DataSourceMetadata.
      See Also:
    • commitReplaceSegments

      SegmentPublishResult commitReplaceSegments(Set<DataSegment> replaceSegments, Set<ReplaceTaskLock> locksHeldByReplaceTask, @Nullable SegmentSchemaMapping segmentSchemaMapping)
      Commits segments and corresponding schema created by a REPLACE task. This method also handles the segment upgrade scenarios that may result from concurrent append and replace.
      Parameters:
      replaceSegments - All segments created by a REPLACE task that must be committed in a single transaction.
      locksHeldByReplaceTask - All active non-revoked REPLACE locks held by the task
      segmentSchemaMapping - Segment schema to add.
    • retrieveDataSourceMetadata

      @Nullable DataSourceMetadata retrieveDataSourceMetadata(String supervisorId)
      Retrieves DataSourceMetadata entry for supervisorId from the metadata store. Returns null if there is no metadata.
    • deleteDataSourceMetadata

      boolean deleteDataSourceMetadata(String supervisorId)
      Removes entry for supervisorId from the dataSource metadata table.
      Parameters:
      supervisorId - identifier
      Returns:
      true if the entry was deleted, false otherwise
    • resetDataSourceMetadata

      boolean resetDataSourceMetadata(String supervisorId, DataSourceMetadata dataSourceMetadata) throws IOException
      Resets DataSourceMetadata entry for supervisorId to the one supplied.
      Parameters:
      supervisorId - identifier
      dataSourceMetadata - value to set
      Returns:
      true if the entry was reset, false otherwise
      Throws:
      IOException
    • insertDataSourceMetadata

      boolean insertDataSourceMetadata(String supervisorId, DataSourceMetadata dataSourceMetadata)
      Insert DataSourceMetadata entry for supervisorId.
      Parameters:
      supervisorId - identifier
      dataSourceMetadata - value to set
      Returns:
      true if the entry was inserted, false otherwise
    • removeDataSourceMetadataOlderThan

      int removeDataSourceMetadataOlderThan(long timestamp, @NotNull @NotNull Set<String> excludeSupervisorIds)
      Remove supervisors' datasource metadata created before the given timestamp and not in given excludeSupervisorIds set.
      Parameters:
      timestamp - timestamp in milliseconds
      excludeSupervisorIds - set of supervisor ids to exclude from removal
      Returns:
      number of datasource metadata removed
    • commitMetadataOnly

      SegmentPublishResult commitMetadataOnly(String supervisorId, String dataSource, DataSourceMetadata startMetadata, DataSourceMetadata endMetadata)
      Similar to commitSegments(java.util.Set<org.apache.druid.timeline.DataSegment>, org.apache.druid.segment.SegmentSchemaMapping), but meant for streaming ingestion tasks for handling the case where the task ingested no records and created no segments, but still needs to update the metadata with the progress that the task made.

      The metadata should undergo the same validation checks as performed by commitSegments(java.util.Set<org.apache.druid.timeline.DataSegment>, org.apache.druid.segment.SegmentSchemaMapping).

      Parameters:
      supervisorId - the supervisorId
      dataSource - the dataSource
      startMetadata - dataSource metadata pre-insert must match this startMetadata according to DataSourceMetadata.matches(DataSourceMetadata).
      endMetadata - dataSource metadata post-insert will have this endMetadata merged in with DataSourceMetadata.plus(DataSourceMetadata).
      Returns:
      segment publish result indicating transaction success or failure. This method must only return a failure code if it is sure that the transaction did not happen. If it is not sure, it must throw an exception instead.
      Throws:
      IllegalArgumentException - if either startMetadata and endMetadata are null
      RuntimeException - if the state of metadata storage after this call is unknown
    • updateSegmentMetadata

      void updateSegmentMetadata(Set<DataSegment> segments)
    • deleteSegments

      int deleteSegments(Set<DataSegment> segments)
      Deletes unused segments from the metadata store.
      Returns:
      Number of segments actually deleted.
    • retrieveSegmentForId

      DataSegment retrieveSegmentForId(SegmentId segmentId)
      Retrieve the segment for a given id from the metadata store. Return null if no such segment exists
      The retrieval also considers the set of unused segments in the metadata store. Unused segments could be deleted by a kill task at any time and might lead to unexpected behaviour. This option exists mainly to provide a consistent view of the metadata, for example, in calls from MSQ controller and worker and would generally not be required.
    • retrieveUsedSegmentForId

      DataSegment retrieveUsedSegmentForId(SegmentId segmentId)
    • deleteUpgradeSegmentsForTask

      int deleteUpgradeSegmentsForTask(String taskId)
      Delete entries from the upgrade segments table after the corresponding replace task has ended
      Parameters:
      taskId - - id of the task with replace locks
      Returns:
      number of deleted entries from the metadata store
    • deletePendingSegmentsForTaskAllocatorId

      int deletePendingSegmentsForTaskAllocatorId(String datasource, String taskAllocatorId)
      Delete pending segment for a give task group after all the tasks belonging to it have completed.
      Parameters:
      datasource - datasource of the task
      taskAllocatorId - task id / task group / replica group for an appending task
      Returns:
      number of pending segments deleted from the metadata store
    • getPendingSegments

      List<PendingSegmentRecord> getPendingSegments(String datasource, org.joda.time.Interval interval)
      Fetches all the pending segments of the datasource that overlap with a given interval.
      Parameters:
      datasource - datasource to be queried
      interval - interval with which segments overlap
      Returns:
      List of pending segment records
    • retrieveUpgradedFromSegmentIds

      Map<String,String> retrieveUpgradedFromSegmentIds(String dataSource, Set<String> segmentIds)
      Map from a segment ID to the segment ID from which it was upgraded There should be no entry in the map for an original non-upgraded segment
      Parameters:
      dataSource - data source
      segmentIds - ids of segments
    • retrieveUpgradedToSegmentIds

      Map<String,Set<String>> retrieveUpgradedToSegmentIds(String dataSource, Set<String> segmentIds)
      Map from a segment ID to a set containing 1) all segment IDs that were upgraded from it AND are still present in the metadata store 2) the segment ID itself if and only if it is still present in the metadata store
      Parameters:
      dataSource - data source
      segmentIds - ids of the first segments which had the corresponding load spec
    • iterateAllUnusedSegmentsForDatasource

      List<DataSegmentPlus> iterateAllUnusedSegmentsForDatasource(String datasource, @Nullable org.joda.time.Interval interval, @Nullable Integer limit, @Nullable String lastSegmentId, @Nullable SortOrder sortOrder)
      Returns a list of unused segments and their associated metadata for a given datasource over an optional interval. The order in which segments are iterated is from earliest start-time, with ties being broken with earliest end-time first. Note: the iteration may not be as trivially cheap as for example, iteration over an ArrayList. Try (to some reasonable extent) to organize the code so that it iterates the returned iterable only once rather than several times.
      Parameters:
      datasource - the name of the datasource.
      interval - an optional interval to search over. If none is specified, Intervals.ETERNITY
      limit - an optional maximum number of results to return. If none is specified, the results are not limited.
      lastSegmentId - an optional last segment id from which to search for results. All segments returned are > this segment lexigraphically if sortOrder is null or SortOrder.ASC, or < this segment lexigraphically if sortOrder is SortOrder.DESC. If none is specified, no such filter is used.
      sortOrder - an optional order with which to return the matching segments by id, start time, end time. If none is specified, the order of the results is not guarenteed.
    • getUnusedSegmentIntervals

      List<org.joda.time.Interval> getUnusedSegmentIntervals(String dataSource, @Nullable org.joda.time.DateTime minStartTime, org.joda.time.DateTime maxEndTime, int limit, org.joda.time.DateTime maxUsedStatusLastUpdatedTime)
      Returns a list of up to limit unused segment intervals for the specified datasource. Segments are filtered based on the following criteria:
    • The start time of the segment must be no earlier than the specified minStartTime (if not null).
    • The end time of the segment must be no later than the specified maxEndTime.
    • The used_status_last_updated time of the segment must be no later than maxUsedStatusLastUpdatedTime. Segments that have no used_status_last_updated time (due to an upgrade from legacy Druid) will have maxUsedStatusLastUpdatedTime ignored.
    • Returns:
      list of intervals ordered by segment start time and then by end time. Note that the list may contain duplicate intervals.
    • retrieveUnusedSegmentIntervals

      List<org.joda.time.Interval> retrieveUnusedSegmentIntervals(String dataSource, int limit)
      Retrieves intervals of the specified datasource that contain any unused segments. There is no guarantee on the order of intervals in the list or on whether the limited list contains the earliest or latest intervals of the datasource.
      Returns:
      Unsorted list of unused segment intervals containing upto limit entries.
    • markAllNonOvershadowedSegmentsAsUsed

      int markAllNonOvershadowedSegmentsAsUsed(String dataSource)
      Returns the number of segment entries in the database whose state was changed as the result of this call (that is, the segments were marked as used). If the call results in a database error, an exception is relayed to the caller.
      Returns:
      Number of segments updated in the metadata store
    • markNonOvershadowedSegmentsAsUsed

      int markNonOvershadowedSegmentsAsUsed(String dataSource, org.joda.time.Interval interval, @Nullable List<String> versions)
      Marks non-overshadowed unused segments for the given interval and optional list of versions as used. If versions are not specified, all versions of non-overshadowed unused segments in the interval will be marked as used. If an empty list of versions is passed, no segments are marked as used.
      Returns:
      Number of segments updated in the metadata store
    • markNonOvershadowedSegmentsAsUsed

      int markNonOvershadowedSegmentsAsUsed(String dataSource, Set<SegmentId> segmentIds)
      Marks the given segment IDs as "used" only if there are not already overshadowed by other used segments. Qualifying segment IDs that are already marked as "used" are not updated.
      Returns:
      Number of segments updated
      Throws:
      DruidException - of category INVALID_INPUT if any of the given segment IDs do not exist in the metadata store.
    • markSegmentAsUsed

      boolean markSegmentAsUsed(SegmentId segmentId)
      Returns true if the state of the segment entry is changed in the database as the result of this call (that is, the segment was marked as used), false otherwise. If the call results in a database error, an exception is relayed to the caller.
    • validateDataSourceMetadata

      static void validateDataSourceMetadata(@Nullable String supervisorId, @Nullable DataSourceMetadata startMetadata, @Nullable DataSourceMetadata endMetadata)
      Validates the given supervisorId and given metadata to ensure that start/end metadata non-null implies supervisor ID is non-null.