public class BatchBootstrapOperator<I,O extends HoodieRecord<?>> extends BootstrapOperator<I,O>
This function should only be used for bounded source.
When a record comes in, the function firstly checks whether the partition path of the record is already loaded, if the partition is not loaded yet, loads the entire partition and sends the index records to downstream operators before it sends the input record; if the partition is loaded already, sends the input record directly.
The input records should shuffle by the partition path to avoid repeated loading.
conf, hadoopConf, hoodieTable, writeConfig| Constructor and Description |
|---|
BatchBootstrapOperator(org.apache.flink.configuration.Configuration conf) |
| Modifier and Type | Method and Description |
|---|---|
void |
open() |
protected void |
preLoadIndexRecords()
Load the index records before
BootstrapOperator.processElement(org.apache.flink.streaming.runtime.streamrecord.StreamRecord<I>). |
void |
processElement(org.apache.flink.streaming.runtime.streamrecord.StreamRecord<I> element) |
protected boolean |
shouldLoadFile(String fileId,
int maxParallelism,
int parallelism,
int taskID) |
generateHoodieRecord, initializeState, isAlreadyBootstrap, loadRecords, snapshotStateclose, finish, getChainingStrategy, getContainingTask, getCurrentKey, getExecutionConfig, getInternalTimerService, getKeyedStateBackend, getKeyedStateStore, getMetricGroup, getOperatorConfig, getOperatorID, getOperatorName, getOperatorStateBackend, getOrCreateKeyedState, getPartitionedState, getPartitionedState, getProcessingTimeService, getRuntimeContext, getTimeServiceManager, getUserCodeClassloader, hasKeyContext1, hasKeyContext2, initializeState, isUsingCustomRawKeyedState, notifyCheckpointAborted, notifyCheckpointComplete, prepareSnapshotPreBarrier, processLatencyMarker, processLatencyMarker1, processLatencyMarker2, processWatermark, processWatermark1, processWatermark2, processWatermarkStatus, processWatermarkStatus1, processWatermarkStatus2, reportOrForwardLatencyMarker, setChainingStrategy, setCurrentKey, setKeyContextElement1, setKeyContextElement2, setProcessingTimeService, setup, snapshotStateclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitsetKeyContextElementclose, finish, getMetricGroup, getOperatorID, initializeState, prepareSnapshotPreBarrier, setKeyContextElement1, setKeyContextElement2, snapshotStatenotifyCheckpointAborted, notifyCheckpointCompletegetCurrentKey, setCurrentKeypublic BatchBootstrapOperator(org.apache.flink.configuration.Configuration conf)
public void open()
throws Exception
open in interface org.apache.flink.streaming.api.operators.StreamOperator<O extends HoodieRecord<?>>open in class org.apache.flink.streaming.api.operators.AbstractStreamOperator<O extends HoodieRecord<?>>Exceptionprotected void preLoadIndexRecords()
BootstrapOperatorBootstrapOperator.processElement(org.apache.flink.streaming.runtime.streamrecord.StreamRecord<I>).preLoadIndexRecords in class BootstrapOperator<I,O extends HoodieRecord<?>>public void processElement(org.apache.flink.streaming.runtime.streamrecord.StreamRecord<I> element) throws Exception
processElement in interface org.apache.flink.streaming.api.operators.Input<I>processElement in class BootstrapOperator<I,O extends HoodieRecord<?>>Exceptionprotected boolean shouldLoadFile(String fileId, int maxParallelism, int parallelism, int taskID)
shouldLoadFile in class BootstrapOperator<I,O extends HoodieRecord<?>>Copyright © 2023 The Apache Software Foundation. All rights reserved.