public class HoodieMetadataPayload extends Object implements HoodieRecordPayload<HoodieMetadataPayload>
This single metadata payload is shared by all the partitions under the metadata table. The partition specific records are determined by the field "type" saved within the record. The following types are supported:
METADATA_TYPE_PARTITION_LIST (1):
-- List of all partitions. There is a single such record
-- key = @HoodieTableMetadata.RECORDKEY_PARTITION_LIST
METADATA_TYPE_FILE_LIST (2): -- List of all files in a partition. There is one such record for each partition -- key = partition name
METADATA_TYPE_COLUMN_STATS (3): -- This is an index for column stats in the table
METADATA_TYPE_BLOOM_FILTER (4): -- This is an index for base file bloom filters. This is a map of FileID to its BloomFilter byte[].
During compaction on the table, the deletions are merged with additions and hence records are pruned.
| Modifier | Constructor and Description |
|---|---|
|
HoodieMetadataPayload(org.apache.avro.generic.GenericRecord record,
Comparable<?> orderingVal) |
|
HoodieMetadataPayload(Option<org.apache.avro.generic.GenericRecord> recordOpt) |
protected |
HoodieMetadataPayload(String key,
int type,
Map<String,HoodieMetadataFileInfo> filesystemMetadata,
HoodieMetadataBloomFilter metadataBloomFilter,
HoodieMetadataColumnStats columnStats,
HoodieRecordIndexInfo recordIndexMetadata,
HoodieSecondaryIndexInfo secondaryIndexMetadata) |
| Modifier and Type | Method and Description |
|---|---|
Option<org.apache.avro.generic.IndexedRecord> |
combineAndGetUpdateValue(org.apache.avro.generic.IndexedRecord oldRecord,
org.apache.avro.Schema schema)
This methods is deprecated.
|
Option<org.apache.avro.generic.IndexedRecord> |
combineAndGetUpdateValue(org.apache.avro.generic.IndexedRecord oldRecord,
org.apache.avro.Schema schema,
Properties properties)
This methods lets you write custom merging/combining logic to produce new values as a function of current value on storage and whats contained
in this object.
|
static Option<HoodieRecord<HoodieMetadataPayload>> |
combineSecondaryIndexRecord(HoodieRecord<HoodieMetadataPayload> oldRecord,
HoodieRecord<HoodieMetadataPayload> newRecord) |
static HoodieRecord<HoodieMetadataPayload> |
createBloomFilterMetadataRecord(String partitionName,
String baseFileName,
String timestamp,
String bloomFilterType,
ByteBuffer bloomFilter,
boolean isDeleted)
Create bloom filter metadata record.
|
static Stream<HoodieRecord> |
createColumnStatsRecords(String partitionName,
Collection<HoodieColumnRangeMetadata<Comparable>> columnRangeMetadataList,
boolean isDeleted) |
static HoodieRecord<HoodieMetadataPayload> |
createPartitionFilesRecord(String partition,
Map<String,Long> filesAdded,
List<String> filesDeleted)
Create and return a
HoodieMetadataPayload to save list of files within a partition. |
static HoodieRecord<HoodieMetadataPayload> |
createPartitionListRecord(List<String> partitions)
Create and return a
HoodieMetadataPayload to save list of partitions. |
static HoodieRecord<HoodieMetadataPayload> |
createPartitionListRecord(List<String> partitions,
boolean isDeleted)
Create and return a
HoodieMetadataPayload to save list of partitions. |
static Stream<HoodieRecord> |
createPartitionStatsRecords(String partitionPath,
Collection<HoodieColumnRangeMetadata<Comparable>> columnRangeMetadataList,
boolean isDeleted) |
static HoodieRecord |
createRecordIndexDelete(String recordKey)
Create and return a
HoodieMetadataPayload to delete a record in the Metadata Table's record index. |
static HoodieRecord<HoodieMetadataPayload> |
createRecordIndexUpdate(String recordKey,
String partition,
String fileId,
String instantTime,
int fileIdEncoding)
Create and return a
HoodieMetadataPayload to insert or update an entry for the record index. |
static HoodieRecord<HoodieMetadataPayload> |
createSecondaryIndex(String recordKey,
String secondaryKey,
String partitionPath,
Boolean isDeleted)
Create and return a
HoodieMetadataPayload to insert or update an entry for the secondary index. |
boolean |
equals(Object other) |
static String |
getBloomFilterIndexKey(PartitionIndexID partitionIndexID,
FileIndexID fileIndexID)
Get bloom filter index key.
|
Option<HoodieMetadataBloomFilter> |
getBloomFilterMetadata()
Get the bloom filter metadata from this payload.
|
Option<HoodieMetadataColumnStats> |
getColumnStatMetadata()
Get the bloom filter metadata from this payload.
|
static String |
getColumnStatsIndexKey(PartitionIndexID partitionIndexID,
FileIndexID fileIndexID,
ColumnIndexID columnIndexID)
Get column stats index key.
|
static String |
getColumnStatsIndexKey(String partitionName,
HoodieColumnRangeMetadata<Comparable> columnRangeMetadata)
Get column stats index key from the column range metadata.
|
List<String> |
getDeletions()
Returns the list of filenames deleted as part of this record.
|
List<StoragePathInfo> |
getFileList(HoodieStorage storage,
StoragePath partitionPath)
Returns the files added as part of this record.
|
List<String> |
getFilenames()
Returns the list of filenames added as part of this record.
|
Option<org.apache.avro.generic.IndexedRecord> |
getInsertValue(org.apache.avro.Schema schema)
This method is deprecated.
|
Option<org.apache.avro.generic.IndexedRecord> |
getInsertValue(org.apache.avro.Schema schemaIgnored,
Properties propertiesIgnored)
Generates an avro record out of the given HoodieRecordPayload, to be written out to storage.
|
static String |
getPartitionStatsIndexKey(String partitionPath,
String columnName) |
HoodieRecordGlobalLocation |
getRecordGlobalLocation()
If this is a record-level index entry, returns the file to which this is mapped.
|
String |
getRecordKeyFromSecondaryIndex() |
int |
hashCode() |
boolean |
isDeleted() |
boolean |
isSecondaryIndexDeleted() |
HoodieMetadataPayload |
preCombine(HoodieMetadataPayload previousRecord)
This method is deprecated.
|
String |
toString() |
clone, finalize, getClass, notify, notifyAll, wait, wait, waitgetMetadata, getOrderingValue, preCombine, preCombinepublic static final String KEY_FIELD_NAME
public static final String SCHEMA_FIELD_NAME_TYPE
public static final String SCHEMA_FIELD_NAME_METADATA
public static final String SCHEMA_FIELD_ID_COLUMN_STATS
public static final String SCHEMA_FIELD_ID_BLOOM_FILTER
public static final String SCHEMA_FIELD_ID_RECORD_INDEX
public static final String SCHEMA_FIELD_ID_SECONDARY_INDEX
public static final String COLUMN_STATS_FIELD_MIN_VALUE
public static final String COLUMN_STATS_FIELD_MAX_VALUE
public static final String COLUMN_STATS_FIELD_NULL_COUNT
public static final String COLUMN_STATS_FIELD_VALUE_COUNT
public static final String COLUMN_STATS_FIELD_TOTAL_SIZE
public static final String COLUMN_STATS_FIELD_FILE_NAME
public static final String COLUMN_STATS_FIELD_COLUMN_NAME
public static final String COLUMN_STATS_FIELD_TOTAL_UNCOMPRESSED_SIZE
public static final String COLUMN_STATS_FIELD_IS_DELETED
public static final String RECORD_INDEX_FIELD_PARTITION
public static final String RECORD_INDEX_FIELD_FILEID_HIGH_BITS
public static final String RECORD_INDEX_FIELD_FILEID_LOW_BITS
public static final String RECORD_INDEX_FIELD_FILE_INDEX
public static final String RECORD_INDEX_FIELD_INSTANT_TIME
public static final String RECORD_INDEX_FIELD_FILEID
public static final String RECORD_INDEX_FIELD_FILEID_ENCODING
public static final int RECORD_INDEX_FIELD_FILEID_ENCODING_UUID
public static final int RECORD_INDEX_FIELD_FILEID_ENCODING_RAW_STRING
public static final String RECORD_INDEX_FIELD_POSITION
public static final int RECORD_INDEX_MISSING_FILEINDEX_FALLBACK
public static final String SECONDARY_INDEX_FIELD_RECORD_KEY
public static final String SECONDARY_INDEX_FIELD_IS_DELETED
public HoodieMetadataPayload(@Nullable org.apache.avro.generic.GenericRecord record, Comparable<?> orderingVal)
public HoodieMetadataPayload(Option<org.apache.avro.generic.GenericRecord> recordOpt)
protected HoodieMetadataPayload(String key, int type, Map<String,HoodieMetadataFileInfo> filesystemMetadata, HoodieMetadataBloomFilter metadataBloomFilter, HoodieMetadataColumnStats columnStats, HoodieRecordIndexInfo recordIndexMetadata, HoodieSecondaryIndexInfo secondaryIndexMetadata)
public static HoodieRecord<HoodieMetadataPayload> createPartitionListRecord(List<String> partitions)
HoodieMetadataPayload to save list of partitions.partitions - The list of partitionspublic static HoodieRecord<HoodieMetadataPayload> createPartitionListRecord(List<String> partitions, boolean isDeleted)
HoodieMetadataPayload to save list of partitions.partitions - The list of partitionspublic static HoodieRecord<HoodieMetadataPayload> createPartitionFilesRecord(String partition, Map<String,Long> filesAdded, List<String> filesDeleted)
HoodieMetadataPayload to save list of files within a partition.partition - The name of the partitionfilesAdded - Mapping of files to their sizes for files which have been added to this partitionfilesDeleted - List of files which have been deleted from this partitionpublic static HoodieRecord<HoodieMetadataPayload> createBloomFilterMetadataRecord(String partitionName, String baseFileName, String timestamp, String bloomFilterType, ByteBuffer bloomFilter, boolean isDeleted)
partitionName - - Partition namebaseFileName - - Base file name for which the bloom filter needs to persistedtimestamp - - Instant timestamp responsible for this recordbloomFilter - - Bloom filter for the FileisDeleted - - Is the bloom filter no more validpublic HoodieMetadataPayload preCombine(HoodieMetadataPayload previousRecord)
HoodieRecordPayloadHoodieRecordPayload.preCombine(HoodieRecordPayload, Properties) method.preCombine in interface HoodieRecordPayload<HoodieMetadataPayload>public static Option<HoodieRecord<HoodieMetadataPayload>> combineSecondaryIndexRecord(HoodieRecord<HoodieMetadataPayload> oldRecord, HoodieRecord<HoodieMetadataPayload> newRecord)
public Option<org.apache.avro.generic.IndexedRecord> combineAndGetUpdateValue(org.apache.avro.generic.IndexedRecord oldRecord, org.apache.avro.Schema schema, Properties properties) throws IOException
HoodieRecordPayloadeg: 1) You are updating counters, you may want to add counts to currentValue and write back updated counts 2) You may be reading DB redo logs, and merge them with current image for a database row on storage
combineAndGetUpdateValue in interface HoodieRecordPayload<HoodieMetadataPayload>oldRecord - Current value in storage, to merge/combine this payload withschema - Schema used for recordproperties - Payload related properties. For example pass the ordering field(s) name to extract from value in storage.IOExceptionpublic Option<org.apache.avro.generic.IndexedRecord> combineAndGetUpdateValue(org.apache.avro.generic.IndexedRecord oldRecord, org.apache.avro.Schema schema) throws IOException
HoodieRecordPayloadHoodieRecordPayload.combineAndGetUpdateValue(IndexedRecord, Schema, Properties) for java docs.combineAndGetUpdateValue in interface HoodieRecordPayload<HoodieMetadataPayload>IOExceptionpublic Option<org.apache.avro.generic.IndexedRecord> getInsertValue(org.apache.avro.Schema schemaIgnored, Properties propertiesIgnored) throws IOException
HoodieRecordPayloadgetInsertValue in interface HoodieRecordPayload<HoodieMetadataPayload>schemaIgnored - Schema used for recordpropertiesIgnored - Payload related properties. For example pass the ordering field(s) name to extract from value in storage.IndexedRecord to be inserted.IOExceptionpublic Option<org.apache.avro.generic.IndexedRecord> getInsertValue(org.apache.avro.Schema schema) throws IOException
HoodieRecordPayloadHoodieRecordPayload.getInsertValue(Schema, Properties) for java docs.getInsertValue in interface HoodieRecordPayload<HoodieMetadataPayload>schema - Schema used for recordIndexedRecord to be inserted.IOExceptionpublic List<String> getFilenames()
public List<String> getDeletions()
public Option<HoodieMetadataBloomFilter> getBloomFilterMetadata()
public Option<HoodieMetadataColumnStats> getColumnStatMetadata()
public List<StoragePathInfo> getFileList(HoodieStorage storage, StoragePath partitionPath)
public static String getBloomFilterIndexKey(PartitionIndexID partitionIndexID, FileIndexID fileIndexID)
partitionIndexID - - Partition index idfileIndexID - - File index idpublic static String getColumnStatsIndexKey(PartitionIndexID partitionIndexID, FileIndexID fileIndexID, ColumnIndexID columnIndexID)
partitionIndexID - - Partition index idfileIndexID - - File index idcolumnIndexID - - Column index idpublic static String getColumnStatsIndexKey(String partitionName, HoodieColumnRangeMetadata<Comparable> columnRangeMetadata)
partitionName - - Partition namecolumnRangeMetadata - - Column range metadatapublic static Stream<HoodieRecord> createColumnStatsRecords(String partitionName, Collection<HoodieColumnRangeMetadata<Comparable>> columnRangeMetadataList, boolean isDeleted)
public static Stream<HoodieRecord> createPartitionStatsRecords(String partitionPath, Collection<HoodieColumnRangeMetadata<Comparable>> columnRangeMetadataList, boolean isDeleted)
public static String getPartitionStatsIndexKey(String partitionPath, String columnName)
public static HoodieRecord<HoodieMetadataPayload> createRecordIndexUpdate(String recordKey, String partition, String fileId, String instantTime, int fileIdEncoding)
HoodieMetadataPayload to insert or update an entry for the record index.
Each entry maps the key of a single record in HUDI to its location.
recordKey - Key of the recordpartition - Name of the partition which contains the recordfileId - fileId which contains the recordinstantTime - instantTime when the record was addedpublic static HoodieRecord<HoodieMetadataPayload> createSecondaryIndex(String recordKey, String secondaryKey, String partitionPath, Boolean isDeleted)
HoodieMetadataPayload to insert or update an entry for the secondary index.
Each entry maps the secondary key of a single record in HUDI to its record (or primary) key
recordKey - Primary key of the recordsecondaryKey - Secondary key of the recordisDeleted - true if this record is deletedpublic String getRecordKeyFromSecondaryIndex()
public boolean isSecondaryIndexDeleted()
public static HoodieRecord createRecordIndexDelete(String recordKey)
HoodieMetadataPayload to delete a record in the Metadata Table's record index.recordKey - Key of the record to be deletedpublic HoodieRecordGlobalLocation getRecordGlobalLocation()
public boolean isDeleted()
Copyright © 2024 The Apache Software Foundation. All rights reserved.