public class PartialUpdateAvroPayload extends OverwriteNonDefaultsWithLatestAvroPayload
Simplified partial update Logic:
1. #preCombine
For records with the same record key in one batch
or in the delta logs that belongs to same File Group,
Checks whether one record's ordering value is larger than the other record.
If yes, overwrites the existing one for specified fields that doesn't equal to null.
2. #combineAndGetUpdateValue
For every incoming record with existing record in storage (same record key)
Checks whether incoming record's ordering value is larger than the existing record.
If yes, overwrites the existing one for specified fields that doesn't equal to null.
else overwrites the incoming one with the existing record for specified fields that doesn't equal to null
and returns a merged record.
Illustration with simple data.
let's say the order field is 'ts' and schema is :
{
[
{"name":"id","type":"string"},
{"name":"ts","type":"long"},
{"name":"name","type":"string"},
{"name":"price","type":"string"}
]
}
case 1
Current data:
id ts name price
1 1 name_1 price_1
Insert data:
id ts name price
1 2 null price_2
Result data after #preCombine or #combineAndGetUpdateValue:
id ts name price
1 2 name_1 price_2
case 2
Current data:
id ts name price
1 2 name_1 null
Insert data:
id ts name price
1 1 null price_1
Result data after preCombine or combineAndGetUpdateValue:
id ts name price
1 2 name_1 price_1
Gotchas:
In cases where a batch of records is preCombine before combineAndGetUpdateValue with the underlying records to be updated located in parquet files, the end states of records might not be as how one will expect when applying a straightforward partial update.
Gotchas-Example:
-- Insertion order of records: INSERT INTO t1 VALUES (1, 'a1', 10, 1000); -- (1) INSERT INTO t1 VALUES (1, 'a1', 11, 999), (1, 'a1_0', null, 1001); -- (2) SELECT id, name, price, _ts FROM t1; -- One would the results to return: -- 1 a1_0 10.0 1001 -- However, the results returned are: -- 1 a1_0 11.0 1001 -- This occurs as preCombine is applied on (2) first to return: -- 1 a1_0 11.0 1001 -- And this then combineAndGetUpdateValue with the existing oldValue: -- 1 a1_0 10.0 1000 -- To return: -- 1 a1_0 11.0 1001
isDeletedRecord, orderingVal, recordBytes| Constructor and Description |
|---|
PartialUpdateAvroPayload(org.apache.avro.generic.GenericRecord record,
Comparable orderingVal) |
PartialUpdateAvroPayload(Option<org.apache.avro.generic.GenericRecord> record) |
| Modifier and Type | Method and Description |
|---|---|
Option<org.apache.avro.generic.IndexedRecord> |
combineAndGetUpdateValue(org.apache.avro.generic.IndexedRecord currentValue,
org.apache.avro.Schema schema)
This methods is deprecated.
|
Option<org.apache.avro.generic.IndexedRecord> |
combineAndGetUpdateValue(org.apache.avro.generic.IndexedRecord currentValue,
org.apache.avro.Schema schema,
Properties prop)
This methods lets you write custom merging/combining logic to produce new values as a function of current value on storage and whats contained
in this object.
|
Option<org.apache.avro.generic.IndexedRecord> |
getInsertValue(org.apache.avro.Schema schema,
boolean isPreCombining)
return itself as long as it called by preCombine
|
protected Option<org.apache.avro.generic.IndexedRecord> |
mergeDisorderRecordsWithMetadata(org.apache.avro.Schema schema,
org.apache.avro.generic.GenericRecord oldRecord,
org.apache.avro.generic.GenericRecord updatingRecord,
boolean isPreCombining)
Merges the given disorder records with metadata.
|
Boolean |
overwriteField(Object value,
Object defaultValue)
Return true if value equals defaultValue otherwise false.
|
PartialUpdateAvroPayload |
preCombine(OverwriteWithLatestAvroPayload oldValue,
org.apache.avro.Schema schema,
Properties properties)
When more than one HoodieRecord have the same HoodieKey in the incoming batch, this function combines them before attempting to insert/upsert by taking in a schema.
|
mergeRecords, setFieldgetInsertValue, getOrderingValue, preCombinecanProduceSentinel, getOrderingVal, isDeleted, isDeleteRecordclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitgetInsertValue, getMetadata, preCombinepublic PartialUpdateAvroPayload(org.apache.avro.generic.GenericRecord record,
Comparable orderingVal)
public PartialUpdateAvroPayload(Option<org.apache.avro.generic.GenericRecord> record)
public PartialUpdateAvroPayload preCombine(OverwriteWithLatestAvroPayload oldValue, org.apache.avro.Schema schema, Properties properties)
HoodieRecordPayloadoldValue - instance of the old HoodieRecordPayload to be combined with.schema - Payload related schema. For example use schema to overwrite old instance for specified fields that doesn't equal to default value.properties - Payload related properties. For example pass the ordering field(s) name to extract from value in storage.public Option<org.apache.avro.generic.IndexedRecord> combineAndGetUpdateValue(org.apache.avro.generic.IndexedRecord currentValue, org.apache.avro.Schema schema) throws IOException
HoodieRecordPayloadHoodieRecordPayload.combineAndGetUpdateValue(IndexedRecord, Schema, Properties) for java docs.combineAndGetUpdateValue in interface HoodieRecordPayload<OverwriteWithLatestAvroPayload>combineAndGetUpdateValue in class OverwriteNonDefaultsWithLatestAvroPayloadIOExceptionpublic Option<org.apache.avro.generic.IndexedRecord> combineAndGetUpdateValue(org.apache.avro.generic.IndexedRecord currentValue, org.apache.avro.Schema schema, Properties prop) throws IOException
HoodieRecordPayloadeg: 1) You are updating counters, you may want to add counts to currentValue and write back updated counts 2) You may be reading DB redo logs, and merge them with current image for a database row on storage
currentValue - Current value in storage, to merge/combine this payload withschema - Schema used for recordprop - Payload related properties. For example pass the ordering field(s) name to extract from value in storage.IOExceptionpublic Boolean overwriteField(Object value, Object defaultValue)
overwriteField in class OverwriteWithLatestAvroPayloadpublic Option<org.apache.avro.generic.IndexedRecord> getInsertValue(org.apache.avro.Schema schema, boolean isPreCombining) throws IOException
schema - isPreCombining - IOExceptionprotected Option<org.apache.avro.generic.IndexedRecord> mergeDisorderRecordsWithMetadata(org.apache.avro.Schema schema, org.apache.avro.generic.GenericRecord oldRecord, org.apache.avro.generic.GenericRecord updatingRecord, boolean isPreCombining)
schema - The record schemaoldRecord - The current record from fileupdatingRecord - The incoming recordCopyright © 2024 The Apache Software Foundation. All rights reserved.