@PublicAPIClass(maturity=EVOLVING) public interface HoodieRecordMerger extends Serializable
| Modifier and Type | Field and Description |
|---|---|
static String |
DEFAULT_MERGER_STRATEGY_UUID |
static String |
OVERWRITE_MERGER_STRATEGY_UUID |
| Modifier and Type | Method and Description |
|---|---|
default List<Pair<HoodieRecord,org.apache.avro.Schema>> |
fullOuterMerge(HoodieRecord older,
org.apache.avro.Schema oldSchema,
HoodieRecord newer,
org.apache.avro.Schema newSchema,
TypedProperties props)
Merges two records with the same key in full outer merge fashion i.e.
|
default String[] |
getMandatoryFieldsForMerging(HoodieTableConfig cfg) |
String |
getMergingStrategy()
The kind of merging strategy this recordMerger belongs to.
|
default RecordMergeMode |
getRecordMergeMode()
The record merge mode that corresponds to this record merger
|
HoodieRecord.HoodieRecordType |
getRecordType()
The record type handled by the current merger.
|
Option<Pair<HoodieRecord,org.apache.avro.Schema>> |
merge(HoodieRecord older,
org.apache.avro.Schema oldSchema,
HoodieRecord newer,
org.apache.avro.Schema newSchema,
TypedProperties props)
This method converges combineAndGetUpdateValue and precombine from HoodiePayload.
|
default Option<Pair<HoodieRecord,org.apache.avro.Schema>> |
partialMerge(HoodieRecord older,
org.apache.avro.Schema oldSchema,
HoodieRecord newer,
org.apache.avro.Schema newSchema,
org.apache.avro.Schema readerSchema,
TypedProperties props)
Merges records which can contain partial updates, i.e., only subset of fields and values are
present in the record representing the updates, and absent fields are not updated.
|
default boolean |
shouldFlush(HoodieRecord record,
org.apache.avro.Schema schema,
TypedProperties props)
In some cases a business logic does some checks before flushing a merged record to the disk.
|
static final String DEFAULT_MERGER_STRATEGY_UUID
static final String OVERWRITE_MERGER_STRATEGY_UUID
Option<Pair<HoodieRecord,org.apache.avro.Schema>> merge(HoodieRecord older, org.apache.avro.Schema oldSchema, HoodieRecord newer, org.apache.avro.Schema newSchema, TypedProperties props) throws IOException
IOExceptiondefault Option<Pair<HoodieRecord,org.apache.avro.Schema>> partialMerge(HoodieRecord older, org.apache.avro.Schema oldSchema, HoodieRecord newer, org.apache.avro.Schema newSchema, org.apache.avro.Schema readerSchema, TypedProperties props) throws IOException
For example, the reader schema is {[ {"name":"id", "type":"string"}, {"name":"ts", "type":"long"}, {"name":"name", "type":"string"}, {"name":"price", "type":"double"}, {"name":"tags", "type":"string"} ]} The older and newer records can be (omitting Hudi meta fields):
(1) older (complete record update): id | ts | name | price | tags 1 | 10 | apple | 2.3 | fruit
newer (partial record update): ts | price 16 | 2.8
In this case, in the newer record, only "ts" and "price" fields are updated. With the default merging strategy, the newer record updates the older record and the merging result is
id | ts | name | price | tags 1 | 16 | apple | 2.8 | fruit
(2) older (partial record update): ts | price 10 | 2.8
newer (partial record update): ts | tag 16 | fruit,juicy
In this case, in the older record, only "ts" and "price" fields are updated. In the newer record, only "ts" and "tag" fields are updated. With the default merging strategy, all the changed fields should be included in the merging results.
ts | price | tags 16 | 2.8 | fruit,juicy
older - Older record.oldSchema - Schema of the older record.newer - Newer record.newSchema - Schema of the newer record.readerSchema - Reader schema containing all the fields to read. This is used to maintain
the ordering of the fields of the merged record.props - Configuration in TypedProperties.IOException - upon merging error.default boolean shouldFlush(HoodieRecord record, org.apache.avro.Schema schema, TypedProperties props) throws IOException
record - the merged record.schema - the schema of the merged record.This interface is experimental and might be evolved in the future.
IOExceptiondefault List<Pair<HoodieRecord,org.apache.avro.Schema>> fullOuterMerge(HoodieRecord older, org.apache.avro.Schema oldSchema, HoodieRecord newer, org.apache.avro.Schema newSchema, TypedProperties props) throws IOException
IOExceptiondefault String[] getMandatoryFieldsForMerging(HoodieTableConfig cfg)
HoodieRecord.HoodieRecordType getRecordType()
String getMergingStrategy()
default RecordMergeMode getRecordMergeMode()
Copyright © 2024 The Apache Software Foundation. All rights reserved.