I - Type of the input recordpublic abstract class AbstractStreamWriteFunction<I> extends AbstractWriteFunction<I> implements org.apache.flink.streaming.api.checkpoint.CheckpointedFunction
StreamWriteOperatorCoordinator,
Serialized Form| Modifier and Type | Field and Description |
|---|---|
protected org.apache.flink.configuration.Configuration |
config
Config options.
|
protected boolean |
confirming
Flag saying whether the write task is waiting for the checkpoint success notification
after it finished a checkpoint.
|
protected String |
currentInstant
The REQUESTED instant we write the data.
|
protected org.apache.flink.runtime.operators.coordination.OperatorEventGateway |
eventGateway
Gateway to send operator events to the operator coordinator.
|
protected HoodieTableMetaClient |
metaClient
Meta Client.
|
protected int |
taskID
Id of current subtask.
|
protected HoodieFlinkWriteClient |
writeClient
Write Client.
|
protected List<WriteStatus> |
writeStatuses
Write status list for the current checkpoint.
|
| Constructor and Description |
|---|
AbstractStreamWriteFunction(org.apache.flink.configuration.Configuration config)
Constructs a StreamWriteFunctionBase.
|
| Modifier and Type | Method and Description |
|---|---|
void |
endInput()
Invoked when bounded source ends up.
|
void |
handleOperatorEvent(org.apache.flink.runtime.operators.coordination.OperatorEvent event)
Handles the operator event sent by the coordinator.
|
void |
initializeState(org.apache.flink.runtime.state.FunctionInitializationContext context) |
protected String |
instantToWrite(boolean hasData)
Prepares the instant time to write with for next checkpoint.
|
boolean |
isConfirming() |
protected String |
lastPendingInstant()
Returns the last pending instant time.
|
void |
setOperatorEventGateway(org.apache.flink.runtime.operators.coordination.OperatorEventGateway operatorEventGateway)
Sets up the event gateway.
|
abstract void |
snapshotState() |
void |
snapshotState(org.apache.flink.runtime.state.FunctionSnapshotContext functionSnapshotContext) |
onTimer, processElementprotected final org.apache.flink.configuration.Configuration config
protected int taskID
protected transient HoodieTableMetaClient metaClient
protected transient HoodieFlinkWriteClient writeClient
protected volatile String currentInstant
protected transient org.apache.flink.runtime.operators.coordination.OperatorEventGateway eventGateway
protected volatile boolean confirming
The flag is needed because the write task does not block during the waiting time interval, some data buckets still flush out with old instant time. There are two cases that the flush may produce corrupted files if the old instant is committed successfully: 1) the write handle was writing data but interrupted, left a corrupted parquet file; 2) the write handle finished the write but was not closed, left an empty parquet file.
To solve, when this flag was set to true, we block the data flushing thus the #processElement method, the flag was reset to false if the task receives the checkpoint success event or the latest inflight instant time changed(the last instant committed successfully).
protected List<WriteStatus> writeStatuses
public AbstractStreamWriteFunction(org.apache.flink.configuration.Configuration config)
config - The config optionspublic void initializeState(org.apache.flink.runtime.state.FunctionInitializationContext context)
throws Exception
initializeState in interface org.apache.flink.streaming.api.checkpoint.CheckpointedFunctionExceptionpublic void snapshotState(org.apache.flink.runtime.state.FunctionSnapshotContext functionSnapshotContext)
throws Exception
snapshotState in interface org.apache.flink.streaming.api.checkpoint.CheckpointedFunctionExceptionpublic abstract void snapshotState()
public void endInput()
AbstractWriteFunctionendInput in interface org.apache.flink.streaming.api.operators.BoundedOneInputendInput in class AbstractWriteFunction<I>@VisibleForTesting public boolean isConfirming()
public void setOperatorEventGateway(org.apache.flink.runtime.operators.coordination.OperatorEventGateway operatorEventGateway)
AbstractWriteFunctionsetOperatorEventGateway in class AbstractWriteFunction<I>public void handleOperatorEvent(org.apache.flink.runtime.operators.coordination.OperatorEvent event)
AbstractWriteFunctionhandleOperatorEvent in class AbstractWriteFunction<I>event - The eventprotected String lastPendingInstant()
protected String instantToWrite(boolean hasData)
hasData - Whether the task has buffering dataCopyright © 2022 The Apache Software Foundation. All rights reserved.