public class BucketAssignFunction<K,I,O extends HoodieRecord<?>>
extends org.apache.flink.streaming.api.functions.KeyedProcessFunction<K,I,O>
implements org.apache.flink.streaming.api.checkpoint.CheckpointedFunction, org.apache.flink.api.common.state.CheckpointListener
BucketAssigner.
All the records are tagged with HoodieRecordLocation, instead of real instant time, INSERT record uses "I" and UPSERT record uses "U" as instant time. There is no need to keep the "real" instant time for each record, the bucket ID (partition path & fileID) actually decides where the record should write to. The "I" and "U" tags are only used for downstream to decide whether the data bucket is an INSERT or an UPSERT, we should factor the tags out when the underneath writer supports specifying the bucket type explicitly.
The output records should then shuffle by the bucket ID and thus do scalable write.
BucketAssigner,
Serialized Form| Constructor and Description |
|---|
BucketAssignFunction(org.apache.flink.configuration.Configuration conf) |
| Modifier and Type | Method and Description |
|---|---|
void |
close() |
void |
initializeState(org.apache.flink.runtime.state.FunctionInitializationContext context) |
void |
notifyCheckpointComplete(long checkpointId) |
void |
open(org.apache.flink.configuration.Configuration parameters) |
void |
processElement(I value,
org.apache.flink.streaming.api.functions.KeyedProcessFunction.Context ctx,
org.apache.flink.util.Collector<O> out) |
void |
snapshotState(org.apache.flink.runtime.state.FunctionSnapshotContext context) |
getIterationRuntimeContext, getRuntimeContext, setRuntimeContextpublic BucketAssignFunction(org.apache.flink.configuration.Configuration conf)
public void open(org.apache.flink.configuration.Configuration parameters)
throws Exception
open in interface org.apache.flink.api.common.functions.RichFunctionopen in class org.apache.flink.api.common.functions.AbstractRichFunctionExceptionpublic void snapshotState(org.apache.flink.runtime.state.FunctionSnapshotContext context)
snapshotState in interface org.apache.flink.streaming.api.checkpoint.CheckpointedFunctionpublic void initializeState(org.apache.flink.runtime.state.FunctionInitializationContext context)
initializeState in interface org.apache.flink.streaming.api.checkpoint.CheckpointedFunctionpublic void processElement(I value, org.apache.flink.streaming.api.functions.KeyedProcessFunction.Context ctx, org.apache.flink.util.Collector<O> out) throws Exception
processElement in class org.apache.flink.streaming.api.functions.KeyedProcessFunction<K,I,O extends HoodieRecord<?>>Exceptionpublic void notifyCheckpointComplete(long checkpointId)
notifyCheckpointComplete in interface org.apache.flink.api.common.state.CheckpointListenerCopyright © 2022 The Apache Software Foundation. All rights reserved.