public class ColumnStatsIndices extends Object
| Modifier and Type | Method and Description |
|---|---|
static List<org.apache.flink.table.data.RowData> |
readColumnStatsIndex(String basePath,
HoodieMetadataConfig metadataConfig,
String[] targetColumns) |
static Pair<List<org.apache.flink.table.data.RowData>,String[]> |
transposeColumnStatsIndex(List<org.apache.flink.table.data.RowData> colStats,
String[] queryColumns,
org.apache.flink.table.types.logical.RowType tableSchema)
Transposes and converts the raw table format of the Column Stats Index representation,
where each row/record corresponds to individual (column, file) pair, into the table format
where each row corresponds to single file with statistic for individual columns collated
w/in such row:
|
public static List<org.apache.flink.table.data.RowData> readColumnStatsIndex(String basePath, HoodieMetadataConfig metadataConfig, String[] targetColumns)
public static Pair<List<org.apache.flink.table.data.RowData>,String[]> transposeColumnStatsIndex(List<org.apache.flink.table.data.RowData> colStats, String[] queryColumns, org.apache.flink.table.types.logical.RowType tableSchema)
Metadata Table Column Stats Index format:
+---------------------------+------------+------------+------------+-------------+ | fileName | columnName | minValue | maxValue | num_nulls | +---------------------------+------------+------------+------------+-------------+ | one_base_file.parquet | A | 1 | 10 | 0 | | another_base_file.parquet | A | -10 | 0 | 5 | +---------------------------+------------+------------+------------+-------------+
Returned table format
+---------------------------+------------+------------+-------------+ | file | A_minValue | A_maxValue | A_nullCount | +---------------------------+------------+------------+-------------+ | one_base_file.parquet | 1 | 10 | 0 | | another_base_file.parquet | -10 | 0 | 5 | +---------------------------+------------+------------+-------------+
NOTE: Column Stats Index might potentially contain statistics for many columns (if not all), while query at hand might only be referencing a handful of those. As such, we collect all the column references from the filtering expressions, and only transpose records corresponding to the columns referenced in those
colStats - RowData list bearing raw Column Stats Index tablequeryColumns - target columns to be included into the final tabletableSchema - schema of the source data tableCopyright © 2022 The Apache Software Foundation. All rights reserved.