Class AbstractSegmentMetadataCache<T extends DataSourceInformation>
- Type Parameters:
T- The type of information associated with the data source, which must extendDataSourceInformation.
- Direct Known Subclasses:
CoordinatorSegmentMetadataCache
At startup, the cache awaits the initialization of the timeline.
If the cache employs a segment metadata query to retrieve segment schema, it attempts to refresh a maximum
of MAX_SEGMENTS_PER_QUERY segments for each datasource in each refresh cycle.
Once all datasources have undergone this process, the initial schema of each datasource is constructed,
and the cache is marked as initialized.
Subsequently, the cache continues to periodically refresh segments and update the datasource schema.
It is also important to note that a failure in segment refresh results in pausing the refresh work,
and the process is resumed in the next refresh cycle.
This class has an abstract method refresh(Set, Set) which the child class must override
with the logic to build and cache table schema.
Note on handling tombstone segments: These segments lack data or column information. Additionally, segment metadata queries, which are not yet implemented for tombstone segments (see: https://github.com/apache/druid/pull/12137) do not provide metadata for tombstones, leading to indefinite refresh attempts for these segments. Therefore, these segments are never added to the set of segments being refreshed.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic interfaceColumnTypeMergePolicy defines the rules of which type to use when faced with the possibility of different types for the same column from segment to segment.static classClassic logic, we use the first type we encounter.static classResolves types usingColumnType.leastRestrictiveType(ColumnType, ColumnType)to find the ColumnType that can best represent all data contained across all segments. -
Field Summary
FieldsModifier and TypeFieldDescriptionprotected final ExecutorServiceprotected final ExecutorServiceprotected booleanprotected final ObjectThis lock coordinates the access from multiple threads to those variables guarded by this lock.protected static final com.google.common.collect.Interner<RowSignature>protected static final Comparator<SegmentId>protected final ConcurrentHashMap<String,ConcurrentSkipListMap<SegmentId, AvailableSegmentMetadata>> DataSource -> Segment -> AvailableSegmentMetadata(contains RowSignature) for that segment.protected final ConcurrentHashMap<String,T> Map from datasource name to DataSourceInformation. -
Constructor Summary
ConstructorsConstructorDescriptionAbstractSegmentMetadataCache(QueryLifecycleFactory queryLifecycleFactory, SegmentMetadataCacheConfig config, Escalator escalator, InternalQueryConfig internalQueryConfig, ServiceEmitter emitter) -
Method Summary
Modifier and TypeMethodDescriptionvoidaddSegment(DruidServerMetadata server, DataSegment segment) voidbuildDataSourceRowSignature(String dataSource) protected voidprotected voidThis is a helper method for unit tests to emulate heavy work done withlock.protected voidemitMetric(String metric, long value) protected voidemitMetric(String metric, long value, ServiceMetricEvent.Builder builder) protected booleangetAvailableSegmentMetadata(String datasource, SegmentId segmentId) Get metadata for the specified segment, which includes information like RowSignature, realtime & numRows.getDatasource(String name) Fetch schema for the given datasource.Get metadata for all the cached segments, which includes information like RowSignature, realtime & numRows etc.intReturns total number of segments.Get metadata for all the cached segments, which includes information like RowSignature, realtime & numRows etc.voidmarkDataSourceAsNeedRebuild(String datasource) protected voidmarkSegmentAsNeedRefresh(SegmentId segmentId) abstract voidThe child classes must override this method with the logic to build and cache table schema.refreshSegments(Set<SegmentId> segments) Attempt to refresh row signature for a set of segments.voidvoidremoveSegment(DataSegment segment) protected abstract voidremoveSegmentAction(SegmentId segmentId) This method should be overridden by child classes to execute any action on segment removal.voidremoveServerSegment(DruidServerMetadata server, DataSegment segment) runSegmentMetadataQuery(Iterable<SegmentId> segments) Execute a SegmentMetadata query and return aSequenceofSegmentAnalysis.voidsetAvailableSegmentMetadata(SegmentId segmentId, AvailableSegmentMetadata availableSegmentMetadata) This method is not thread-safe and must be used only in unit tests.protected booleanRefresh is executed only when there are segments or datasources needing refresh.abstract voidstart()Lifecycle start method.abstract voidstop()Lifecycle stop method.protected voidunmarkSegmentAsMutable(SegmentId segmentId) protected booleanupdateSegmentMetadata(SegmentId segmentId, SegmentAnalysis analysis) Updates metadata of a segment using the results of a metadata query.
-
Field Details
-
SEGMENT_ORDER
-
ROW_SIGNATURE_INTERNER
-
segmentMetadataInfo
protected final ConcurrentHashMap<String,ConcurrentSkipListMap<SegmentId, segmentMetadataInfoAvailableSegmentMetadata>> DataSource -> Segment -> AvailableSegmentMetadata(contains RowSignature) for that segment. Use SortedMap for segments so they are merged in deterministic order, from older to newer. This map is updated by these two threads. -callbackExeccan update it inaddSegment(org.apache.druid.server.coordination.DruidServerMetadata, org.apache.druid.timeline.DataSegment),removeServerSegment(org.apache.druid.server.coordination.DruidServerMetadata, org.apache.druid.timeline.DataSegment), andremoveSegment(org.apache.druid.timeline.DataSegment). -cacheExeccan update it inrefreshSegmentsForDataSource(java.lang.String, java.util.Set<org.apache.druid.timeline.SegmentId>). While it is being updated, this map is read by these two types of thread. -cacheExeccan iterate allAvailableSegmentMetadatas per datasource. SeebuildDataSourceRowSignature(java.lang.String). - Query threads can create a snapshot of the entire map for processing queries on the system table. SeegetSegmentMetadataSnapshot(). As the access pattern of this map is read-intensive, we should minimize the contention between writers and readers. Since there are two threads that can update this map at the same time, those writers should lock the inner map first and then lock the entry before it updates segment metadata. This can be done usingConcurrentMap.compute(K, java.util.function.BiFunction<? super K, ? super V, ? extends V>)as below. Note that, if you need to update the variables guarded bylockinside of compute(), you should get the lock before calling compute() to keep the function executed in compute() not expensive.segmentMedataInfo.compute( datasourceParam, (datasource, segmentsMap) -> { if (segmentsMap == null) return null; else { segmentsMap.compute( segmentIdParam, (segmentId, segmentMetadata) -> { // update segmentMetadata } ); return segmentsMap; } } );Readers can simply delegate the locking to the concurrent map and iterate map entries. -
cacheExec
-
callbackExec
-
isServerViewInitialized
protected boolean isServerViewInitialized -
tables
Map from datasource name to DataSourceInformation. This structure can be accessed bycacheExecandcallbackExecthreads. -
lock
This lock coordinates the access from multiple threads to those variables guarded by this lock. Currently, there are 2 threads that can access these variables. -callbackExecexecutes the timeline callbacks whenever ServerView changes. -cacheExecperiodically refreshes segment metadata andDataSourceInformationif necessary based on the information collected via timeline callbacks. -
mutableSegments
-
dataSourcesNeedingRebuild
-
segmentsNeedingRefresh
-
-
Constructor Details
-
AbstractSegmentMetadataCache
public AbstractSegmentMetadataCache(QueryLifecycleFactory queryLifecycleFactory, SegmentMetadataCacheConfig config, Escalator escalator, InternalQueryConfig internalQueryConfig, ServiceEmitter emitter)
-
-
Method Details
-
cacheExecLoop
protected void cacheExecLoop() -
start
Lifecycle start method.- Throws:
InterruptedException
-
stop
public abstract void stop()Lifecycle stop method. -
refreshWaitCondition
- Throws:
InterruptedException
-
shouldRefresh
protected boolean shouldRefresh()Refresh is executed only when there are segments or datasources needing refresh. -
awaitInitialization
- Throws:
InterruptedException
-
getDatasource
Fetch schema for the given datasource.- Parameters:
name- datasource- Returns:
- schema information for the given datasource
-
getDataSourceInformationMap
- Returns:
- Map of datasource and corresponding schema information.
-
getDatasourceNames
- Returns:
- Set of datasources for which schema information is cached.
-
getSegmentMetadataSnapshot
Get metadata for all the cached segments, which includes information like RowSignature, realtime & numRows etc.- Returns:
- Map of segmentId and corresponding metadata.
-
iterateSegmentMetadata
Get metadata for all the cached segments, which includes information like RowSignature, realtime & numRows etc. This is a lower-overhead method thangetSegmentMetadataSnapshot().- Returns:
- iterator of metadata.
-
getAvailableSegmentMetadata
@Nullable public AvailableSegmentMetadata getAvailableSegmentMetadata(String datasource, SegmentId segmentId) Get metadata for the specified segment, which includes information like RowSignature, realtime & numRows.- Parameters:
datasource- segment datasourcesegmentId- segment Id- Returns:
- Metadata information for the given segment
-
getTotalSegments
public int getTotalSegments()Returns total number of segments. This method doesn't use the lock intentionally to avoid expensive contention. As a result, the returned value might be inexact. -
refresh
public abstract void refresh(Set<SegmentId> segmentsToRefresh, Set<String> dataSourcesToRebuild) throws IOException The child classes must override this method with the logic to build and cache table schema.- Parameters:
segmentsToRefresh- segments for which the schema might have changeddataSourcesToRebuild- datasources for which the schema might have changed- Throws:
IOException- when querying segment schema from data nodes and tasks
-
addSegment
-
removeSegment
-
removeSegmentAction
This method should be overridden by child classes to execute any action on segment removal. -
removeServerSegment
-
markSegmentAsNeedRefresh
-
unmarkSegmentAsMutable
-
markDataSourceAsNeedRebuild
-
refreshSegments
Attempt to refresh row signature for a set of segments.- Returns:
- Set of segment IDs actually updated.
- Throws:
IOException
-
updateSegmentMetadata
Updates metadata of a segment using the results of a metadata query.- Returns:
- true if the segment metadata was updated successfully.
-
buildDataSourceRowSignature
-
getSegmentsNeedingRefresh
-
getMutableSegments
-
getDataSourcesNeedingRebuild
-
fetchAggregatorsInSegmentMetadataQuery
protected boolean fetchAggregatorsInSegmentMetadataQuery() -
runSegmentMetadataQuery
Execute a SegmentMetadata query and return aSequenceofSegmentAnalysis.- Parameters:
segments- Iterable ofSegmentIdobjects that are subject of the SegmentMetadata query.- Returns:
SequenceofSegmentAnalysisobjects
-
setAvailableSegmentMetadata
public void setAvailableSegmentMetadata(SegmentId segmentId, AvailableSegmentMetadata availableSegmentMetadata) This method is not thread-safe and must be used only in unit tests. -
doInLock
This is a helper method for unit tests to emulate heavy work done withlock. It must be used only in unit tests. -
emitMetric
-
emitMetric
-