Class SqlSegmentsMetadataQuery

java.lang.Object
org.apache.druid.metadata.SqlSegmentsMetadataQuery

public class SqlSegmentsMetadataQuery extends Object
An object that is used to query the segments table in the metadata store. Each instance of this class is scoped to a single Handle and is meant to be short-lived.
  • Method Details

    • forHandle

      public static SqlSegmentsMetadataQuery forHandle(org.skife.jdbi.v2.Handle handle, SQLMetadataConnector connector, MetadataStorageTablesConfig dbTables, com.fasterxml.jackson.databind.ObjectMapper jsonMapper)
      Create a query object. This instance is scoped to a single handle and is meant to be short-lived. It is okay to use it for more than one query, though.
    • nullAndEmptySafeDate

      @Nullable public static org.joda.time.DateTime nullAndEmptySafeDate(String date)
      Create a DateTime object from a string. If the string is null or empty, return null.
    • retrieveUsedSegments

      public CloseableIterator<DataSegment> retrieveUsedSegments(String dataSource, Collection<org.joda.time.Interval> intervals)
      Retrieves segments for a given datasource that are marked used (i.e. published) in the metadata store, and that *overlap* any interval in a particular collection of intervals. If the collection of intervals is empty, this method will retrieve all used segments.

      You cannot assume that segments returned by this call are actually active. Because there is some delay between new segment publishing and the marking-unused of older segments, it is possible that some segments returned by this call are overshadowed by other segments. To check for this, use SegmentTimeline.forSegments(Iterable).

      This call does not return any information about realtime segments.

      Returns:
      a closeable iterator. You should close it when you are done.
    • retrieveUsedSegments

      public CloseableIterator<DataSegment> retrieveUsedSegments(String dataSource, Collection<org.joda.time.Interval> intervals, List<String> versions)
      Similar to retrieveUsedSegments(java.lang.String, java.util.Collection<org.joda.time.Interval>), but with an additional versions argument. When versions is specified, all used segments in the specified intervals and versions are retrieved.
    • retrieveUsedSegmentsPlus

      public CloseableIterator<DataSegmentPlus> retrieveUsedSegmentsPlus(String dataSource, Collection<org.joda.time.Interval> intervals)
    • retrieveHighestUnusedSegmentId

      @Nullable public SegmentId retrieveHighestUnusedSegmentId(String datasource, org.joda.time.Interval interval, String version)
      Retrieves the ID of the unused segment that has the highest partition number amongst all unused segments that exactly match the given interval and version.
      Returns:
      null if no unused segment exists for the given parameters.
    • iterateAllUnusedSegmentsForDatasource

      public List<DataSegmentPlus> iterateAllUnusedSegmentsForDatasource(String datasource, @Nullable org.joda.time.Interval interval, @Nullable Integer limit, @Nullable String lastSegmentId, @Nullable SortOrder sortOrder)
      Retrieves segments and their associated metadata for a given datasource that are marked unused and that are *fully contained by* an optionally specified interval. If the interval specified is null, this method will retrieve all unused segments. This call does not return any information about realtime segments.
      Parameters:
      datasource - The name of the datasource
      interval - an optional interval to search over.
      limit - an optional maximum number of results to return. If none is specified, the results are not limited.
      lastSegmentId - an optional last segment id from which to search for results. All segments returned are > this segment lexigraphically if sortOrder is null or SortOrder.ASC, or < this segment lexigraphically if sortOrder is SortOrder.DESC. If none is specified, no such filter is used.
      sortOrder - an optional order with which to return the matching segments by id, start time, end time. If none is specified, the order of the results is not guarenteed. Returns an iterable.
    • findUnusedSegments

      public List<DataSegment> findUnusedSegments(String dataSource, org.joda.time.Interval interval, @Nullable List<String> versions, @Nullable Integer limit, @Nullable org.joda.time.DateTime maxUpdatedTime)
      Retrieves unused segments that are fully contained within the given interval.
      Parameters:
      interval - Returned segments must be fully contained within this interval
      versions - Optional list of segment versions. If passed as null, all segment versions are eligible.
      limit - Maximum number of segments to return. If passed as null, all segments are returned.
      maxUpdatedTime - Returned segments must have a used_status_last_updated which is either null or earlier than this value.
    • retrieveUnusedSegments

      public CloseableIterator<DataSegment> retrieveUnusedSegments(String dataSource, Collection<org.joda.time.Interval> intervals, @Nullable List<String> versions, @Nullable Integer limit, @Nullable String lastSegmentId, @Nullable SortOrder sortOrder, @Nullable org.joda.time.DateTime maxUsedStatusLastUpdatedTime)
      Retrieves segments for a given datasource that are marked unused and that are fully contained by any interval in a particular collection of intervals. If the collection of intervals is empty, this method will retrieve all unused segments.

      This call does not return any information about realtime segments.

      Parameters:
      dataSource - The name of the datasource
      intervals - The intervals to search over
      versions - An optional list of unused segment versions to retrieve in the given intervals. If unspecified, all versions of unused segments in the intervals must be retrieved. If an empty list is passed, no segments are retrieved.
      limit - The limit of segments to return
      lastSegmentId - the last segment id from which to search for results. All segments returned are > this segment lexigraphically if sortOrder is null or ASC, or < this segment lexigraphically if sortOrder is DESC.
      sortOrder - Specifies the order with which to return the matching segments by start time, end time. A null value indicates that order does not matter.
      maxUsedStatusLastUpdatedTime - The maximum used_status_last_updated time. Any unused segment in intervals with used_status_last_updated no later than this time will be included in the iterator. Segments without used_status_last_updated time (due to an upgrade from legacy Druid) will have maxUsedStatusLastUpdatedTime ignored
      Returns:
      a closeable iterator. You should close it when you are done.
    • retrieveUnusedSegmentsPlus

      public CloseableIterator<DataSegmentPlus> retrieveUnusedSegmentsPlus(String dataSource, Collection<org.joda.time.Interval> intervals, @Nullable List<String> versions, @Nullable Integer limit, @Nullable String lastSegmentId, @Nullable SortOrder sortOrder, @Nullable org.joda.time.DateTime maxUsedStatusLastUpdatedTime)
      Similar to retrieveUnusedSegments(java.lang.String, java.util.Collection<org.joda.time.Interval>, java.util.List<java.lang.String>, java.lang.Integer, java.lang.String, org.apache.druid.metadata.SortOrder, org.joda.time.DateTime), but also retrieves associated metadata for the segments for a given datasource that are marked unused and that are fully contained by any interval in a particular collection of intervals. If the collection of intervals is empty, this method will retrieve all unused segments. This call does not return any information about realtime segments.
      Parameters:
      dataSource - The name of the datasource
      intervals - The intervals to search over
      limit - The limit of segments to return
      lastSegmentId - the last segment id from which to search for results. All segments returned are > this segment lexigraphically if sortOrder is null or ASC, or < this segment lexigraphically if sortOrder is DESC.
      sortOrder - Specifies the order with which to return the matching segments by start time, end time. A null value indicates that order does not matter.
      maxUsedStatusLastUpdatedTime - The maximum used_status_last_updated time. Any unused segment in intervals with used_status_last_updated no later than this time will be included in the iterator. Segments without used_status_last_updated time (due to an upgrade from legacy Druid) will have maxUsedStatusLastUpdatedTime ignored
      Returns:
      a closeable iterator. You should close it when you are done.
    • retrieveUsedSegmentIds

      public Set<SegmentId> retrieveUsedSegmentIds(String dataSource, org.joda.time.Interval interval)
      Retrieves IDs of used segments that belong to the datasource and overlap the given interval.
    • retrieveSegmentsById

      public List<DataSegmentPlus> retrieveSegmentsById(String datasource, Set<SegmentId> segmentIds)
      Retrieves segments for the given segment IDs from the metadata store.
    • retrieveSegmentsByIdIterator

      public CloseableIterator<DataSegmentPlus> retrieveSegmentsByIdIterator(String datasource, Set<SegmentId> segmentIds, boolean includeSchemaInfo)
      Retrieves segments for the specified IDs in batches of a small size.
      Parameters:
      includeSchemaInfo - If true, additional metadata info such as number of rows and schema fingerprint is also retrieved
      Returns:
      CloseableIterator over the retrieved segments which must be closed once the result has been handled. If the iterator is closed while reading a batch of segments, queries for subsequent batches are not fired.
    • retrieveSegmentsWithSchemaById

      public List<DataSegmentPlus> retrieveSegmentsWithSchemaById(String datasource, Set<SegmentId> segmentIds)
      Retrieves segments with additional metadata such as number of rows and schema fingerprint.
    • retrieveAllUsedSegmentSchemaFingerprints

      public Set<String> retrieveAllUsedSegmentSchemaFingerprints()
      Retrieves all used schema fingerprints present in the metadata store.
    • retrieveAllUsedSegmentSchemas

      public List<SegmentSchemaRecord> retrieveAllUsedSegmentSchemas()
      Retrieves all used segment schemas present in the metadata store irrespective of their last updated time.
    • retrieveUsedSegmentSchemasForFingerprints

      public List<SegmentSchemaRecord> retrieveUsedSegmentSchemasForFingerprints(Set<String> schemaFingerprints)
      Retrieves segment schemas from the metadata store for the given fingerprints.
    • markSegmentsAsUsed

      public int markSegmentsAsUsed(Set<SegmentId> segmentIds, org.joda.time.DateTime updateTime)
      Marks the given segment IDs as used.
      Parameters:
      segmentIds - Segment IDs to update. For better performance, ensure that these segment IDs are not already marked as used.
      updateTime - Updated segments will have their used_status_last_updated column set to this value
      Returns:
      Number of segments updated in the metadata store.
    • markSegmentsAsUnused

      public int markSegmentsAsUnused(Set<SegmentId> segmentIds, org.joda.time.DateTime updateTime)
      Marks the given segment IDs as unused.
      Parameters:
      segmentIds - Segment IDs to update. For better performance, ensure that these segment IDs are not already marked as unused.
      updateTime - Updated segments will have their used_status_last_updated column set to this value
      Returns:
      Number of segments updated in the metadata store.
    • markSegmentsUnused

      public int markSegmentsUnused(String dataSource, org.joda.time.Interval interval, @Nullable List<String> versions, org.joda.time.DateTime updateTime)
      Marks all used segments that are fully contained by a particular interval filtered by an optional list of versions as unused.
      Parameters:
      interval - Only used segments fully contained within this interval are eligible to be marked as unused
      versions - List of eligible segment versions. If null or empty, all versions are considered eligible to be marked as unused.
      updateTime - Updated segments will have their used_status_last_updated column set to this value
      Returns:
      Number of segments updated.
    • markSegmentAsUsed

      public boolean markSegmentAsUsed(SegmentId segmentId, org.joda.time.DateTime updateTime)
    • markAllNonOvershadowedSegmentsAsUsed

      public int markAllNonOvershadowedSegmentsAsUsed(String dataSource, org.joda.time.DateTime updateTime)
    • markNonOvershadowedSegmentsAsUsed

      public int markNonOvershadowedSegmentsAsUsed(String dataSource, org.joda.time.Interval interval, @Nullable List<String> versions, org.joda.time.DateTime updateTime)
    • markNonOvershadowedSegmentsAsUsed

      public int markNonOvershadowedSegmentsAsUsed(String dataSource, Set<SegmentId> segmentIds, org.joda.time.DateTime updateTime)
    • retrieveUnusedSegmentIntervals

      public List<org.joda.time.Interval> retrieveUnusedSegmentIntervals(String dataSource, @Nullable org.joda.time.DateTime minStartTime, org.joda.time.DateTime maxEndTime, int limit, org.joda.time.DateTime maxUsedStatusLastUpdatedTime)
    • retrieveUnusedSegmentIntervals

      public List<org.joda.time.Interval> retrieveUnusedSegmentIntervals(String dataSource, int limit)
      Gets unused segment intervals for the specified datasource. There is no guarantee on the order of intervals in the list or on whether the limited list contains the earliest or latest intervals present in the datasource.
      Returns:
      List of unused segment intervals containing upto limit entries.
    • retrieveUnusedSegmentsWithExactInterval

      public List<DataSegment> retrieveUnusedSegmentsWithExactInterval(String dataSource, org.joda.time.Interval interval, org.joda.time.DateTime maxUpdatedTime, int limit)
      Retrieves unused segments that exactly match the given interval.
      Parameters:
      interval - Returned segments must exactly match this interval.
      maxUpdatedTime - Returned segments must have a used_status_last_updated which is either null or earlier than this value.
      limit - Maximum number of segments to return
    • retrieveUnusedSegmentVersionsWithInterval

      public Set<String> retrieveUnusedSegmentVersionsWithInterval(String dataSource, org.joda.time.Interval interval)
      Retrieves the versions of unused segments which are perfectly aligned with the given interval.
    • retrieveUsedSegmentForId

      @Nullable public DataSegment retrieveUsedSegmentForId(SegmentId segmentId)
      Retrieve the used segment for a given id if it exists in the metadata store and null otherwise
    • retrieveSegmentForId

      @Nullable public DataSegment retrieveSegmentForId(SegmentId segmentId)
      Retrieve the segment for a given id if it exists in the metadata store and null otherwise
    • retrievePendingSegmentIds

      public List<SegmentIdWithShardSpec> retrievePendingSegmentIds(String dataSource, String sequenceName, String sequencePreviousId)
    • retrievePendingSegmentIdsWithExactInterval

      public List<SegmentIdWithShardSpec> retrievePendingSegmentIdsWithExactInterval(String dataSource, String sequenceName, org.joda.time.Interval interval)
    • retrievePendingSegmentsWithExactInterval

      public List<PendingSegmentRecord> retrievePendingSegmentsWithExactInterval(String dataSource, org.joda.time.Interval interval)
    • retrievePendingSegmentsOverlappingInterval

      public List<PendingSegmentRecord> retrievePendingSegmentsOverlappingInterval(String dataSource, org.joda.time.Interval interval)
      Fetches all the pending segments, whose interval overlaps with the given search interval, from the metadata store.
    • retrievePendingSegmentsForTaskAllocatorId

      public List<PendingSegmentRecord> retrievePendingSegmentsForTaskAllocatorId(String dataSource, String taskAllocatorId)
    • getConditionForIntervalsAndMatchMode

      public static String getConditionForIntervalsAndMatchMode(Collection<org.joda.time.Interval> intervals, org.apache.druid.metadata.SqlSegmentsMetadataQuery.IntervalMode matchMode, String quoteString)
      Get the condition for the interval and match mode.
      Parameters:
      intervals - - intervals to fetch the segments for
      matchMode - - Interval match mode - overlaps or contains
      quoteString - - the connector-specific quote string
    • bindIntervalsToQuery

      public static void bindIntervalsToQuery(org.skife.jdbi.v2.Query<Map<String,Object>> query, Collection<org.joda.time.Interval> intervals)
      Bind the supplied intervals to query.
      See Also:
    • getParameterizedInConditionForColumn

      public static String getParameterizedInConditionForColumn(String columnName, List<String> values)
      Returns:
      a parameterized IN clause for the specified columnName. The column values need to be bound to a query by calling bindColumnValuesToQueryWithInCondition(String, List, SQLStatement).
    • bindColumnValuesToQueryWithInCondition

      public static void bindColumnValuesToQueryWithInCondition(String columnName, List<String> values, org.skife.jdbi.v2.SQLStatement<?> query)
      Binds the provided list of values to the specified columnName in the given SQL query that contains an IN clause.
      See Also: