@NotThreadSafe public class HoodieConcatHandle<T,I,K,O> extends HoodieMergeHandle<T,I,K,O>
HoodieWriteConfig.allowDuplicateInserts()}
is set, this handle will be used instead of HoodieMergeHandle.
Simplified Logic:
For every existing record
Write the record as is
For all incoming records, write to file as is, without de-duplicating based on the record key.
Illustration with simple data.
Incoming data:
rec1_2, rec1_3, rec4_2, rec5_1, rec6_1
Existing data:
rec1_1, rec2_1, rec3_1, rec4_1
For every existing record, write to storage as is.
=> rec1_1, rec2_1, rec3_1 and rec4_1 is written to storage
Write all records from incoming set to storage
=> rec1_2, rec1_3, rec4_2, rec5_1 and rec6_1
Final snapshot in storage
rec1_1, rec2_1, rec3_1, rec4_1, rec1_2, rec1_3, rec4_2, rec5_1, rec6_1
Users should ensure there are no duplicates when "insert" operation is used and if the respective config is enabled. So, above scenario should not
happen and every batch should have new records to be inserted. Above example is for illustration purposes only.baseFileToMerge, fileWriter, insertRecordsWritten, keyGeneratorOpt, keyToNewRecords, newFilePath, oldFilePath, partitionFields, partitionValues, preserveMetadata, recordsDeleted, recordsWritten, updatedRecordsWritten, writtenRecordKeysfileId, newRecordLocation, partitionPath, recordMerger, schemaOnReadEnabled, taskContextSupplier, timer, writeSchema, writeSchemaWithMetaFields, writeStatus, writeTokenconfig, hoodieTable, instantTime, storage| Constructor and Description |
|---|
HoodieConcatHandle(HoodieWriteConfig config,
String instantTime,
HoodieTable<T,I,K,O> hoodieTable,
Iterator<HoodieRecord<T>> recordItr,
String partitionPath,
String fileId,
TaskContextSupplier taskContextSupplier,
Option<BaseKeyGenerator> keyGeneratorOpt) |
HoodieConcatHandle(HoodieWriteConfig config,
String instantTime,
HoodieTable hoodieTable,
Map<String,HoodieRecord<T>> keyToNewRecords,
String partitionPath,
String fileId,
HoodieBaseFile dataFileToBeMerged,
TaskContextSupplier taskContextSupplier) |
| Modifier and Type | Method and Description |
|---|---|
void |
write(HoodieRecord oldRecord)
Write old record as is w/o merging with incoming record.
|
protected void |
writeIncomingRecords() |
baseFileForMerge, close, getIOType, getLatestBaseFile, getOldFilePath, getPartitionFields, getPartitionValues, getWriteStatusesAsIterator, init, initializeIncomingRecordsMap, makeOldAndNewFilePaths, performMergeDataValidationCheck, setPartitionFields, setPartitionValues, setWriteStatusPath, writeInsertRecord, writeInsertRecord, writeRecord, writeToFile, writeUpdateRecordcanWrite, createLogWriter, createLogWriter, createMarkerFile, doWrite, getAttemptId, getConfig, getFileId, getHoodieTableMetaClient, getLogCreationCallback, getPartitionId, getPartitionPath, getStageId, getStorage, getWriterSchema, getWriterSchemaWithMetaFields, getWriteStatuses, isClosed, makeNewFilePath, makeNewPath, markClosed, toAvroRecord, writepublic HoodieConcatHandle(HoodieWriteConfig config, String instantTime, HoodieTable<T,I,K,O> hoodieTable, Iterator<HoodieRecord<T>> recordItr, String partitionPath, String fileId, TaskContextSupplier taskContextSupplier, Option<BaseKeyGenerator> keyGeneratorOpt)
public HoodieConcatHandle(HoodieWriteConfig config, String instantTime, HoodieTable hoodieTable, Map<String,HoodieRecord<T>> keyToNewRecords, String partitionPath, String fileId, HoodieBaseFile dataFileToBeMerged, TaskContextSupplier taskContextSupplier)
public void write(HoodieRecord oldRecord)
protected void writeIncomingRecords()
throws IOException
writeIncomingRecords in class HoodieMergeHandle<T,I,K,O>IOExceptionCopyright © 2024 The Apache Software Foundation. All rights reserved.