Class AbstractSegmentMetadataCache<T extends DataSourceInformation>

java.lang.Object
org.apache.druid.segment.metadata.AbstractSegmentMetadataCache<T>
Type Parameters:
T - The type of information associated with the data source, which must extend DataSourceInformation.
Direct Known Subclasses:
CoordinatorSegmentMetadataCache

public abstract class AbstractSegmentMetadataCache<T extends DataSourceInformation> extends Object
An abstract class that listens for segment change events and caches segment metadata. It periodically refreshes the segments, by fetching their metadata which includes schema information from sources like data nodes, tasks, metadata database and builds table schema.

At startup, the cache awaits the initialization of the timeline. If the cache employs a segment metadata query to retrieve segment schema, it attempts to refresh a maximum of MAX_SEGMENTS_PER_QUERY segments for each datasource in each refresh cycle. Once all datasources have undergone this process, the initial schema of each datasource is constructed, and the cache is marked as initialized. Subsequently, the cache continues to periodically refresh segments and update the datasource schema. It is also important to note that a failure in segment refresh results in pausing the refresh work, and the process is resumed in the next refresh cycle.

This class has an abstract method refresh(Set, Set) which the child class must override with the logic to build and cache table schema.

Note on handling tombstone segments: These segments lack data or column information. Additionally, segment metadata queries, which are not yet implemented for tombstone segments (see: https://github.com/apache/druid/pull/12137) do not provide metadata for tombstones, leading to indefinite refresh attempts for these segments. Therefore, these segments are never added to the set of segments being refreshed.

  • Field Details

  • Constructor Details

  • Method Details

    • cacheExecLoop

      protected void cacheExecLoop()
    • start

      public abstract void start() throws InterruptedException
      Lifecycle start method.
      Throws:
      InterruptedException
    • stop

      public abstract void stop()
      Lifecycle stop method.
    • refreshWaitCondition

      public void refreshWaitCondition() throws InterruptedException
      Throws:
      InterruptedException
    • shouldRefresh

      protected boolean shouldRefresh()
      Refresh is executed only when there are segments or datasources needing refresh.
    • awaitInitialization

      public void awaitInitialization() throws InterruptedException
      Throws:
      InterruptedException
    • getDatasource

      @Nullable public T getDatasource(String name)
      Fetch schema for the given datasource.
      Parameters:
      name - datasource
      Returns:
      schema information for the given datasource
    • getDataSourceInformationMap

      public Map<String,T> getDataSourceInformationMap()
      Returns:
      Map of datasource and corresponding schema information.
    • getDatasourceNames

      public Set<String> getDatasourceNames()
      Returns:
      Set of datasources for which schema information is cached.
    • getSegmentMetadataSnapshot

      public Map<SegmentId,AvailableSegmentMetadata> getSegmentMetadataSnapshot()
      Get metadata for all the cached segments, which includes information like RowSignature, realtime & numRows etc.
      Returns:
      Map of segmentId and corresponding metadata.
    • iterateSegmentMetadata

      public Iterator<AvailableSegmentMetadata> iterateSegmentMetadata()
      Get metadata for all the cached segments, which includes information like RowSignature, realtime & numRows etc. This is a lower-overhead method than getSegmentMetadataSnapshot().
      Returns:
      iterator of metadata.
    • getAvailableSegmentMetadata

      @Nullable public AvailableSegmentMetadata getAvailableSegmentMetadata(String datasource, SegmentId segmentId)
      Get metadata for the specified segment, which includes information like RowSignature, realtime & numRows.
      Parameters:
      datasource - segment datasource
      segmentId - segment Id
      Returns:
      Metadata information for the given segment
    • getTotalSegments

      public int getTotalSegments()
      Returns total number of segments. This method doesn't use the lock intentionally to avoid expensive contention. As a result, the returned value might be inexact.
    • refresh

      public abstract void refresh(Set<SegmentId> segmentsToRefresh, Set<String> dataSourcesToRebuild) throws IOException
      The child classes must override this method with the logic to build and cache table schema.
      Parameters:
      segmentsToRefresh - segments for which the schema might have changed
      dataSourcesToRebuild - datasources for which the schema might have changed
      Throws:
      IOException - when querying segment schema from data nodes and tasks
    • addSegment

      public void addSegment(DruidServerMetadata server, DataSegment segment)
    • removeSegment

      public void removeSegment(DataSegment segment)
    • removeSegmentAction

      protected abstract void removeSegmentAction(SegmentId segmentId)
      This method should be overridden by child classes to execute any action on segment removal.
    • removeServerSegment

      public void removeServerSegment(DruidServerMetadata server, DataSegment segment)
    • markSegmentAsNeedRefresh

      protected void markSegmentAsNeedRefresh(SegmentId segmentId)
    • unmarkSegmentAsMutable

      protected void unmarkSegmentAsMutable(SegmentId segmentId)
    • markDataSourceAsNeedRebuild

      public void markDataSourceAsNeedRebuild(String datasource)
    • refreshSegments

      public Set<SegmentId> refreshSegments(Set<SegmentId> segments) throws IOException
      Attempt to refresh row signature for a set of segments.
      Returns:
      Set of segment IDs actually updated.
      Throws:
      IOException
    • updateSegmentMetadata

      protected boolean updateSegmentMetadata(SegmentId segmentId, SegmentAnalysis analysis)
      Updates metadata of a segment using the results of a metadata query.
      Returns:
      true if the segment metadata was updated successfully.
    • buildDataSourceRowSignature

      @Nullable public RowSignature buildDataSourceRowSignature(String dataSource)
    • getSegmentsNeedingRefresh

      public TreeSet<SegmentId> getSegmentsNeedingRefresh()
    • getMutableSegments

      public TreeSet<SegmentId> getMutableSegments()
    • getDataSourcesNeedingRebuild

      public Set<String> getDataSourcesNeedingRebuild()
    • fetchAggregatorsInSegmentMetadataQuery

      protected boolean fetchAggregatorsInSegmentMetadataQuery()
    • runSegmentMetadataQuery

      public Sequence<SegmentAnalysis> runSegmentMetadataQuery(Iterable<SegmentId> segments)
      Execute a SegmentMetadata query and return a Sequence of SegmentAnalysis.
      Parameters:
      segments - Iterable of SegmentId objects that are subject of the SegmentMetadata query.
      Returns:
      Sequence of SegmentAnalysis objects
    • setAvailableSegmentMetadata

      public void setAvailableSegmentMetadata(SegmentId segmentId, AvailableSegmentMetadata availableSegmentMetadata)
      This method is not thread-safe and must be used only in unit tests.
    • doInLock

      protected void doInLock(Runnable runnable)
      This is a helper method for unit tests to emulate heavy work done with lock. It must be used only in unit tests.
    • emitMetric

      protected void emitMetric(String metric, long value)
    • emitMetric

      protected void emitMetric(String metric, long value, ServiceMetricEvent.Builder builder)