public abstract class ColumnStatsAggregator extends Object
| Modifier and Type | Field and Description |
|---|---|
double |
ndvTuner
The tuner controls the derivation of the NDV value when aggregating statistics from multiple partitions.
|
boolean |
useDensityFunctionForNDVEstimation |
| Constructor and Description |
|---|
ColumnStatsAggregator() |
| Modifier and Type | Method and Description |
|---|---|
abstract ColumnStatisticsObj |
aggregate(List<MetaStoreServerUtils.ColStatsObjWithSourceInfo> colStatsWithSourceInfo,
List<String> partNames,
boolean areAllPartsFound) |
protected abstract ColumnStatisticsData |
initColumnStatisticsData() |
protected KllHistogramEstimator |
mergeHistograms(List<MetaStoreServerUtils.ColStatsObjWithSourceInfo> colStatsWithSourceInfo) |
public boolean useDensityFunctionForNDVEstimation
public double ndvTuner
For example, consider the aggregation of three partitions with NDV values 2, 3, and 4, respectively. The NDV lower bound is 4 (the highest among individual NDVs), and the upper bound is 9 (the sum of individual NDVs). In this case the aggregated NDV will be in the range [4, 9] touching the bounds when the tuner is equal to 0, or 1 respectively.
It is optional and concrete implementations can choose to ignore it completely.
public abstract ColumnStatisticsObj aggregate(List<MetaStoreServerUtils.ColStatsObjWithSourceInfo> colStatsWithSourceInfo, List<String> partNames, boolean areAllPartsFound) throws MetaException
MetaExceptionprotected abstract ColumnStatisticsData initColumnStatisticsData()
protected KllHistogramEstimator mergeHistograms(List<MetaStoreServerUtils.ColStatsObjWithSourceInfo> colStatsWithSourceInfo)
Copyright © 2024 The Apache Software Foundation. All rights reserved.