I - Type of the input recordpublic class StreamWriteFunction<I> extends AbstractStreamWriteFunction<I>
The function firstly buffers the data as a batch of HoodieRecords,
It flushes(write) the records batch when the batch size exceeds the configured size FlinkOptions.WRITE_BATCH_SIZE
or the total buffer size exceeds the configured size FlinkOptions.WRITE_TASK_MAX_SIZE
or a Flink checkpoint starts. After a batch has been written successfully,
the function notifies its operator coordinator StreamWriteOperatorCoordinator to mark a successful write.
The task implements exactly-once semantics by buffering the data between checkpoints. The operator coordinator starts a new instant on the timeline when a checkpoint triggers, the coordinator checkpoints always start before its operator, so when this function starts a checkpoint, a REQUESTED instant already exists.
The function process thread blocks data buffering after the checkpoint thread finishes flushing the existing data buffer until the current checkpoint succeed and the coordinator starts a new instant. Any error triggers the job failure during the metadata committing, when the job recovers from a failure, the write function re-send the write metadata to the coordinator to see if these metadata can re-commit, thus if unexpected error happens during the instant committing, the coordinator would retry to commit when the job recovers.
The operator coordinator checks and commits the last instant then starts a new one after a checkpoint finished successfully. It rolls back any inflight instant before it starts a new instant, this means one hoodie instant only span one checkpoint, the write function blocks data buffer flushing for the configured checkpoint timeout before it throws exception, any checkpoint failure would finally trigger the job failure.
Note: The function task requires the input stream be shuffled by the file IDs.
StreamWriteOperatorCoordinator,
Serialized Formconfig, confirming, currentInstant, eventGateway, metaClient, taskID, writeClient, writeStatuses| Constructor and Description |
|---|
StreamWriteFunction(org.apache.flink.configuration.Configuration config)
Constructs a StreamingSinkFunction.
|
| Modifier and Type | Method and Description |
|---|---|
protected void |
bufferRecord(HoodieRecord<?> value)
Buffers the given record.
|
void |
close() |
void |
endInput()
End input action for batch source.
|
Map<String,List<HoodieRecord>> |
getDataBuffer() |
void |
open(org.apache.flink.configuration.Configuration parameters) |
void |
processElement(I value,
org.apache.flink.streaming.api.functions.ProcessFunction.Context ctx,
org.apache.flink.util.Collector<Object> out) |
void |
snapshotState() |
handleOperatorEvent, initializeState, instantToWrite, isConfirming, lastPendingInstant, setOperatorEventGateway, snapshotStatepublic StreamWriteFunction(org.apache.flink.configuration.Configuration config)
config - The config optionspublic void open(org.apache.flink.configuration.Configuration parameters)
throws IOException
open in interface org.apache.flink.api.common.functions.RichFunctionopen in class org.apache.flink.api.common.functions.AbstractRichFunctionIOExceptionpublic void snapshotState()
snapshotState in class AbstractStreamWriteFunction<I>public void processElement(I value, org.apache.flink.streaming.api.functions.ProcessFunction.Context ctx, org.apache.flink.util.Collector<Object> out) throws Exception
public void close()
close in interface org.apache.flink.api.common.functions.RichFunctionclose in class org.apache.flink.api.common.functions.AbstractRichFunctionpublic void endInput()
endInput in interface org.apache.flink.streaming.api.operators.BoundedOneInputendInput in class AbstractStreamWriteFunction<I>@VisibleForTesting public Map<String,List<HoodieRecord>> getDataBuffer()
protected void bufferRecord(HoodieRecord<?> value)
Flush the data bucket first if the bucket records size is greater than
the configured value FlinkOptions.WRITE_BATCH_SIZE.
Flush the max size data bucket if the total buffer size exceeds the configured
threshold FlinkOptions.WRITE_TASK_MAX_SIZE.
value - HoodieRecordCopyright © 2022 The Apache Software Foundation. All rights reserved.