T - The type of engine-specific record representation, e.g.,InternalRow in Spark
and RowData in Flink.public abstract class HoodieReaderContext<T> extends Object
HoodieFileGroupReader to use, containing APIs for
engine-specific implementation on reading data files, getting field values from a record,
transforming a record, etc.
For each query engine, this class should be extended and plugged into HoodieFileGroupReader
to realize the file group reading.
| Modifier and Type | Field and Description |
|---|---|
static String |
INTERNAL_META_INSTANT_TIME |
static String |
INTERNAL_META_OPERATION |
static String |
INTERNAL_META_ORDERING_FIELD |
static String |
INTERNAL_META_PARTITION_PATH |
static String |
INTERNAL_META_RECORD_KEY |
static String |
INTERNAL_META_SCHEMA |
| Constructor and Description |
|---|
HoodieReaderContext() |
| Modifier and Type | Method and Description |
|---|---|
int |
compareTo(Comparable o1,
Comparable o2)
Compares values in different types which can contain engine-specific types.
|
abstract HoodieRecord<T> |
constructHoodieRecord(Option<T> recordOption,
Map<String,Object> metadataMap)
Constructs a new
HoodieRecord based on the record of engine-specific type and metadata for merging. |
T |
constructRawDeleteRecord(Map<String,Object> metadata)
Constructs engine specific delete record.
|
abstract T |
convertAvroRecord(org.apache.avro.generic.IndexedRecord avroRecord)
Converts an Avro record, e.g., serialized in the log files, to an engine-specific record.
|
long |
extractRecordPosition(T record,
org.apache.avro.Schema schema,
String fieldName,
long providedPositionIfNeeded)
Extracts the record position value from the record itself.
|
Map<String,Object> |
generateMetadataForRecord(String recordKey,
String partitionPath,
Comparable orderingVal)
Generates metadata map based on the information.
|
Map<String,Object> |
generateMetadataForRecord(T record,
org.apache.avro.Schema schema)
Generates metadata of the record.
|
abstract ClosableIterator<T> |
getFileRecordIterator(StoragePath filePath,
long start,
long length,
org.apache.avro.Schema dataSchema,
org.apache.avro.Schema requiredSchema,
HoodieStorage storage)
Gets the record iterator based on the type of engine-specific record representation from the
file.
|
boolean |
getHasBootstrapBaseFile() |
boolean |
getHasLogFiles() |
String |
getLatestCommitTime() |
boolean |
getNeedsBootstrapMerge() |
Comparable |
getOrderingValue(Option<T> recordOption,
Map<String,Object> metadataMap,
org.apache.avro.Schema schema,
TypedProperties props)
Gets the ordering value in particular type.
|
String |
getRecordKey(T record,
org.apache.avro.Schema schema)
Gets the record key in String.
|
HoodieRecordMerger |
getRecordMerger() |
abstract HoodieRecordMerger |
getRecordMerger(String mergerStrategy) |
HoodieFileGroupReaderSchemaHandler<T> |
getSchemaHandler() |
boolean |
getShouldMergeUseRecordPosition() |
abstract HoodieStorage |
getStorage(String path,
StorageConfiguration<?> conf)
Gets the file system based on the file path and configuration.
|
String |
getTablePath() |
abstract Object |
getValue(T record,
org.apache.avro.Schema schema,
String fieldName)
Gets the field value.
|
abstract ClosableIterator<T> |
mergeBootstrapReaders(ClosableIterator<T> skeletonFileIterator,
org.apache.avro.Schema skeletonRequiredSchema,
ClosableIterator<T> dataFileIterator,
org.apache.avro.Schema dataRequiredSchema)
Merge the skeleton file and data file iterators into a single iterator that will produce rows that contain all columns from the
skeleton file iterator, followed by all columns in the data file iterator
|
UnaryOperator<T> |
projectRecord(org.apache.avro.Schema from,
org.apache.avro.Schema to) |
abstract UnaryOperator<T> |
projectRecord(org.apache.avro.Schema from,
org.apache.avro.Schema to,
Map<String,String> renamedColumns)
Creates a function that will reorder records of schema "from" to schema of "to"
all fields in "to" must be in "from", but not all fields in "from" must be in "to"
|
abstract T |
seal(T record)
Seals the engine-specific record to make sure the data referenced in memory do not change.
|
void |
setHasBootstrapBaseFile(boolean hasBootstrapBaseFile) |
void |
setHasLogFiles(boolean hasLogFiles) |
void |
setLatestCommitTime(String latestCommitTime) |
void |
setNeedsBootstrapMerge(boolean needsBootstrapMerge) |
void |
setRecordMerger(HoodieRecordMerger recordMerger) |
void |
setSchemaHandler(HoodieFileGroupReaderSchemaHandler<T> schemaHandler) |
void |
setShouldMergeUseRecordPosition(boolean shouldMergeUseRecordPosition) |
void |
setTablePath(String tablePath) |
boolean |
supportsParquetRowIndex() |
Map<String,Object> |
updateSchemaAndResetOrderingValInMetadata(Map<String,Object> meta,
org.apache.avro.Schema schema)
Updates the schema and reset the ordering value in existing metadata mapping of a record.
|
public static final String INTERNAL_META_RECORD_KEY
public static final String INTERNAL_META_PARTITION_PATH
public static final String INTERNAL_META_ORDERING_FIELD
public static final String INTERNAL_META_OPERATION
public static final String INTERNAL_META_INSTANT_TIME
public static final String INTERNAL_META_SCHEMA
public HoodieFileGroupReaderSchemaHandler<T> getSchemaHandler()
public void setSchemaHandler(HoodieFileGroupReaderSchemaHandler<T> schemaHandler)
public String getTablePath()
public void setTablePath(String tablePath)
public String getLatestCommitTime()
public void setLatestCommitTime(String latestCommitTime)
public HoodieRecordMerger getRecordMerger()
public void setRecordMerger(HoodieRecordMerger recordMerger)
public boolean getHasLogFiles()
public void setHasLogFiles(boolean hasLogFiles)
public boolean getHasBootstrapBaseFile()
public void setHasBootstrapBaseFile(boolean hasBootstrapBaseFile)
public boolean getNeedsBootstrapMerge()
public void setNeedsBootstrapMerge(boolean needsBootstrapMerge)
public boolean getShouldMergeUseRecordPosition()
public void setShouldMergeUseRecordPosition(boolean shouldMergeUseRecordPosition)
public abstract HoodieStorage getStorage(String path, StorageConfiguration<?> conf)
path - File path to get the file system.conf - StorageConfiguration for I/O.HoodieStorage instance to use.public abstract ClosableIterator<T> getFileRecordIterator(StoragePath filePath, long start, long length, org.apache.avro.Schema dataSchema, org.apache.avro.Schema requiredSchema, HoodieStorage storage) throws IOException
filePath - StoragePath instance of a file.start - Starting byte to start reading.length - Bytes to read.dataSchema - Schema of records in the file in Schema.requiredSchema - Schema containing required fields to read in Schema for projection.storage - HoodieStorage for reading records.ClosableIterator that can return all records through iteration.IOExceptionpublic abstract T convertAvroRecord(org.apache.avro.generic.IndexedRecord avroRecord)
avroRecord - The Avro record.T.public abstract HoodieRecordMerger getRecordMerger(String mergerStrategy)
mergerStrategy - Merger strategy UUID.HoodieRecordMerger to use.public abstract Object getValue(T record, org.apache.avro.Schema schema, String fieldName)
record - The record in engine-specific type.schema - The Avro schema of the record.fieldName - The field name.public String getRecordKey(T record, org.apache.avro.Schema schema)
record - The record in engine-specific type.schema - The Avro schema of the record.public Comparable getOrderingValue(Option<T> recordOption, Map<String,Object> metadataMap, org.apache.avro.Schema schema, TypedProperties props)
recordOption - An option of record.metadataMap - A map containing the record metadata.schema - The Avro schema of the record.props - Properties.public abstract HoodieRecord<T> constructHoodieRecord(Option<T> recordOption, Map<String,Object> metadataMap)
HoodieRecord based on the record of engine-specific type and metadata for merging.recordOption - An option of the record in engine-specific type if exists.metadataMap - The record metadata.HoodieRecord.public abstract T seal(T record)
record - The record.public int compareTo(Comparable o1, Comparable o2)
o1 - Comparable object.o2 - other Comparable object to compare to.public Map<String,Object> generateMetadataForRecord(String recordKey, String partitionPath, Comparable orderingVal)
recordKey - Record key in String.partitionPath - Partition path in String.orderingVal - Ordering value in String.public Map<String,Object> generateMetadataForRecord(T record, org.apache.avro.Schema schema)
record - The record.schema - The Avro schema of the record.public Map<String,Object> updateSchemaAndResetOrderingValInMetadata(Map<String,Object> meta, org.apache.avro.Schema schema)
meta - Metadata in a mapping.schema - New schema to set.public abstract ClosableIterator<T> mergeBootstrapReaders(ClosableIterator<T> skeletonFileIterator, org.apache.avro.Schema skeletonRequiredSchema, ClosableIterator<T> dataFileIterator, org.apache.avro.Schema dataRequiredSchema)
skeletonFileIterator - iterator over bootstrap skeleton files that contain hudi metadata columnsdataFileIterator - iterator over data files that were bootstrapped into the hudi tablepublic abstract UnaryOperator<T> projectRecord(org.apache.avro.Schema from, org.apache.avro.Schema to, Map<String,String> renamedColumns)
from - the schema of records to be passed into UnaryOperatorto - the schema of records produced by UnaryOperatorrenamedColumns - map of renamed columns where the key is the new name from the query and
the value is the old name that exists in the filepublic final UnaryOperator<T> projectRecord(org.apache.avro.Schema from, org.apache.avro.Schema to)
public long extractRecordPosition(T record, org.apache.avro.Schema schema, String fieldName, long providedPositionIfNeeded)
public boolean supportsParquetRowIndex()
Copyright © 2024 The Apache Software Foundation. All rights reserved.