IN - type of results to be written into the sink.@Internal public class CollectSinkFunction<IN> extends RichSinkFunction<IN> implements CheckpointedFunction, org.apache.flink.runtime.state.CheckpointListener
This sink works by limiting the number of results buffered in it (can be configured) so that when the buffer is full, it back-pressures the job until the client consumes some results.
NOTE: When using this sink, make sure that its parallelism is 1, and make sure that it is used
in a StreamTask.
We maintain the following variables in this communication protocol
offset when the
checkpoint happens. This value will be restored from the checkpoint and set back to
offset when the sink restarts. Clients who need exactly-once semantics need to rely
on this value for the position to revert when a failover happens.
Client will put version and offset into the request, indicating that
it thinks what the current version is and it has received this much results.
Sink will check the validity of the request. If version mismatches or
offset is smaller than expected, sink will send back the current version and
lastCheckpointedOffset with an empty result list.
If the request is valid, sink prepares some results starting from offset and
sends them back to the client with lastCheckpointedOffset. If there is currently no
results starting from offset, sink will not wait but will instead send back an empty
result list.
For client who wants exactly-once semantics, when receiving the response, the client will check for the following conditions:
lastCheckpointedOffset.
lastCheckpointedOffset increases, client knows that a checkpoint happens.
It can now move all results before this offset to a user-visible buffer.
Note that
lastCheckpointedOffset, and
lastCheckpointedOffset when sink restarts,
client will never throw away results in user-visible buffer. So this communication protocol achieves exactly-once semantics.
In order not to block job finishing/cancelling, if there are still results in sink's buffer when job terminates, these results will be sent back to client through accumulators.
SinkFunction.Context<T>| 构造器和说明 |
|---|
CollectSinkFunction(org.apache.flink.api.common.typeutils.TypeSerializer<IN> serializer,
int maxResultsPerBatch,
String accumulatorName) |
| 限定符和类型 | 方法和说明 |
|---|---|
void |
accumulateFinalResults() |
void |
close() |
static <T> org.apache.flink.api.java.tuple.Tuple2<Long,CollectCoordinationResponse<T>> |
deserializeAccumulatorResult(byte[] serializedAccResults) |
void |
initializeState(org.apache.flink.runtime.state.FunctionInitializationContext context)
This method is called when the parallel function instance is created during distributed
execution.
|
void |
invoke(IN value,
SinkFunction.Context context)
Writes the given value to the sink.
|
void |
notifyCheckpointAborted(long checkpointId) |
void |
notifyCheckpointComplete(long checkpointId) |
void |
open(org.apache.flink.configuration.Configuration parameters) |
static <T> byte[] |
serializeAccumulatorResult(long offset,
String version,
long lastCheckpointedOffset,
List<T> bufferedResults,
org.apache.flink.api.common.typeutils.TypeSerializer<T> serializer) |
void |
setOperatorEventGateway(org.apache.flink.runtime.operators.coordination.OperatorEventGateway eventGateway) |
void |
snapshotState(org.apache.flink.runtime.state.FunctionSnapshotContext context)
This method is called when a snapshot for a checkpoint is requested.
|
getIterationRuntimeContext, getRuntimeContext, setRuntimeContextclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitinvokepublic void initializeState(org.apache.flink.runtime.state.FunctionInitializationContext context)
throws Exception
CheckpointedFunctioninitializeState 在接口中 CheckpointedFunctioncontext - the context for initializing the operatorException - Thrown, if state could not be created ot restored.public void snapshotState(org.apache.flink.runtime.state.FunctionSnapshotContext context)
throws Exception
CheckpointedFunctionFunctionInitializationContext when the Function was initialized, or offered now by FunctionSnapshotContext itself.snapshotState 在接口中 CheckpointedFunctioncontext - the context for drawing a snapshot of the operatorException - Thrown, if state could not be created ot restored.public void open(org.apache.flink.configuration.Configuration parameters)
throws Exception
open 在接口中 org.apache.flink.api.common.functions.RichFunctionopen 在类中 org.apache.flink.api.common.functions.AbstractRichFunctionExceptionpublic void invoke(IN value, SinkFunction.Context context) throws Exception
SinkFunctionYou have to override this method when implementing a SinkFunction, this is a
default method for backward compatibility with the old-style method only.
invoke 在接口中 SinkFunction<IN>value - The input record.context - Additional context about the input record.Exception - This method may throw exceptions. Throwing an exception will cause the
operation to fail and may trigger recovery.public void close()
throws Exception
close 在接口中 org.apache.flink.api.common.functions.RichFunctionclose 在类中 org.apache.flink.api.common.functions.AbstractRichFunctionExceptionpublic void notifyCheckpointComplete(long checkpointId)
notifyCheckpointComplete 在接口中 org.apache.flink.runtime.state.CheckpointListenerpublic void notifyCheckpointAborted(long checkpointId)
notifyCheckpointAborted 在接口中 org.apache.flink.runtime.state.CheckpointListenerpublic void setOperatorEventGateway(org.apache.flink.runtime.operators.coordination.OperatorEventGateway eventGateway)
@VisibleForTesting
public static <T> byte[] serializeAccumulatorResult(long offset,
String version,
long lastCheckpointedOffset,
List<T> bufferedResults,
org.apache.flink.api.common.typeutils.TypeSerializer<T> serializer)
throws IOException
IOExceptionpublic static <T> org.apache.flink.api.java.tuple.Tuple2<Long,CollectCoordinationResponse<T>> deserializeAccumulatorResult(byte[] serializedAccResults) throws IOException
IOExceptionCopyright © 2014–2021 The Apache Software Foundation. All rights reserved.