T - Type of the elements emitted by this sinkBucketingSink instead.@Deprecated public class RollingSink<T> extends org.apache.flink.streaming.api.functions.sink.RichSinkFunction<T> implements org.apache.flink.api.java.typeutils.InputTypeConfigurable, org.apache.flink.streaming.api.checkpoint.CheckpointedFunction, org.apache.flink.runtime.state.CheckpointListener
FileSystem files. This
is integrated with the checkpointing mechanism to provide exactly once semantics.
When creating the sink a basePath must be specified. The base directory contains
one directory for every bucket. The bucket directories themselves contain several part files.
These contain the actual written data.
The sink uses a Bucketer to determine the name of bucket directories inside the
base directory. Whenever the Bucketer returns a different directory name than
it returned before the sink will close the current part files inside that bucket
and start the new bucket directory. The default bucketer is a DateTimeBucketer with
date format string ""yyyy-MM-dd--HH". You can specify a custom Bucketer
using setBucketer(Bucketer). For example, use
NonRollingBucketer if you don't want to have
buckets but still write part files in a fault-tolerant way.
The filenames of the part files contain the part prefix, the parallel subtask index of the sink
and a rolling counter, for example "part-1-17". Per default the part prefix is
"part" but this can be
configured using setPartPrefix(String). When a part file becomes bigger
than the batch size the current part file is closed, the part counter is increased and
a new part file is created. The batch size defaults to 384MB, this can be configured
using setBatchSize(long).
Part files can be in one of three states: in-progress, pending or finished. The reason for this
is how the sink works together with the checkpointing mechanism to provide exactly-once semantics
and fault-tolerance. The part file that is currently being written to is in-progress. Once
a part file is closed for writing it becomes pending. When a checkpoint is successful the
currently pending files will be moved to finished. If a failure occurs the pending files
will be deleted to reset state to the last checkpoint. The data in in-progress files will
also have to be rolled back. If the FileSystem supports the truncate call
this will be used to reset the file back to a previous state. If not, a special file
with the same name as the part file and the suffix ".valid-length" will be written
that contains the length up to which the file contains valid data. When reading the file
it must be ensured that it is only read up to that point. The prefixes and suffixes for
the different file states and valid-length files can be configured, for example with
setPendingSuffix(String).
Note: If checkpointing is not enabled the pending files will never be moved to the finished state.
In that case, the pending suffix/prefix can be set to "" to make the sink work
in a non-fault-tolerant way but still provide output without prefixes and suffixes.
The part files are written using an instance of Writer. By default
StringWriter is used, which writes the result
of toString() for every element. Separated by newlines. You can configure the writer
using setWriter(Writer). For example,
SequenceFileWriter can be used to write
Hadoop SequenceFiles.
Example:
new RollingSink<Tuple2<IntWritable, Text>>(outPath)
.setWriter(new SequenceFileWriter<IntWritable, Text>())
.setBucketer(new DateTimeBucketer("yyyy-MM-dd--HHmm")
This will create a sink that writes to SequenceFiles and rolls every minute.
| 限定符和类型 | 类和说明 |
|---|---|
static class |
RollingSink.BucketState
已过时。
This is used for keeping track of the current in-progress files and files that we mark
for moving from pending to final location after we get a checkpoint-complete notification.
|
| 构造器和说明 |
|---|
RollingSink(String basePath)
已过时。
Creates a new
RollingSink that writes files to the given base directory. |
| 限定符和类型 | 方法和说明 |
|---|---|
void |
close()
已过时。
|
RollingSink<T> |
disableCleanupOnOpen()
已过时。
This option is deprecated and remains only for backwards compatibility.
We do not clean up lingering files anymore.
|
void |
initializeState(org.apache.flink.runtime.state.FunctionInitializationContext context)
已过时。
|
void |
invoke(T value)
已过时。
|
void |
notifyCheckpointComplete(long checkpointId)
已过时。
|
void |
open(org.apache.flink.configuration.Configuration parameters)
已过时。
|
RollingSink<T> |
setAsyncTimeout(long timeout)
已过时。
Sets the default timeout for asynchronous operations such as recoverLease and truncate.
|
RollingSink<T> |
setBatchSize(long batchSize)
已过时。
Sets the maximum bucket size in bytes.
|
RollingSink<T> |
setBucketer(Bucketer bucketer)
已过时。
Sets the
Bucketer to use for determining the bucket files to write to. |
RollingSink<T> |
setFSConfig(org.apache.flink.configuration.Configuration config)
已过时。
Specify a custom
Configuration that will be used when creating
the FileSystem for writing. |
RollingSink<T> |
setFSConfig(org.apache.hadoop.conf.Configuration config)
已过时。
Specify a custom
Configuration that will be used when creating
the FileSystem for writing. |
RollingSink<T> |
setInProgressPrefix(String inProgressPrefix)
已过时。
Sets the prefix of in-progress part files.
|
RollingSink<T> |
setInProgressSuffix(String inProgressSuffix)
已过时。
Sets the suffix of in-progress part files.
|
void |
setInputType(org.apache.flink.api.common.typeinfo.TypeInformation<?> type,
org.apache.flink.api.common.ExecutionConfig executionConfig)
已过时。
|
RollingSink<T> |
setPartPrefix(String partPrefix)
已过时。
Sets the prefix of part files.
|
RollingSink<T> |
setPendingPrefix(String pendingPrefix)
已过时。
Sets the prefix of pending part files.
|
RollingSink<T> |
setPendingSuffix(String pendingSuffix)
已过时。
Sets the suffix of pending part files.
|
RollingSink<T> |
setValidLengthPrefix(String validLengthPrefix)
已过时。
Sets the prefix of valid-length files.
|
RollingSink<T> |
setValidLengthSuffix(String validLengthSuffix)
已过时。
Sets the suffix of valid-length files.
|
RollingSink<T> |
setWriter(Writer<T> writer)
已过时。
Sets the
Writer to be used for writing the incoming elements to bucket files. |
void |
snapshotState(org.apache.flink.runtime.state.FunctionSnapshotContext context)
已过时。
|
getIterationRuntimeContext, getRuntimeContext, setRuntimeContextpublic RollingSink(String basePath)
RollingSink that writes files to the given base directory.
This uses aDateTimeBucketer as bucketer and a StringWriter has writer.
The maximum bucket size is set to 384 MB.
basePath - The directory to which to write the bucket files.public RollingSink<T> setFSConfig(org.apache.flink.configuration.Configuration config)
Configuration that will be used when creating
the FileSystem for writing.public RollingSink<T> setFSConfig(org.apache.hadoop.conf.Configuration config)
Configuration that will be used when creating
the FileSystem for writing.public void setInputType(org.apache.flink.api.common.typeinfo.TypeInformation<?> type,
org.apache.flink.api.common.ExecutionConfig executionConfig)
setInputType 在接口中 org.apache.flink.api.java.typeutils.InputTypeConfigurablepublic void initializeState(org.apache.flink.runtime.state.FunctionInitializationContext context)
throws Exception
initializeState 在接口中 org.apache.flink.streaming.api.checkpoint.CheckpointedFunctionExceptionpublic void open(org.apache.flink.configuration.Configuration parameters)
throws Exception
open 在接口中 org.apache.flink.api.common.functions.RichFunctionopen 在类中 org.apache.flink.api.common.functions.AbstractRichFunctionExceptionpublic void close()
throws Exception
close 在接口中 org.apache.flink.api.common.functions.RichFunctionclose 在类中 org.apache.flink.api.common.functions.AbstractRichFunctionExceptionpublic void notifyCheckpointComplete(long checkpointId)
throws Exception
notifyCheckpointComplete 在接口中 org.apache.flink.runtime.state.CheckpointListenerExceptionpublic void snapshotState(org.apache.flink.runtime.state.FunctionSnapshotContext context)
throws Exception
snapshotState 在接口中 org.apache.flink.streaming.api.checkpoint.CheckpointedFunctionExceptionpublic RollingSink<T> setBatchSize(long batchSize)
When a bucket part file becomes larger than this size a new bucket part file is started and
the old one is closed. The name of the bucket files depends on the Bucketer.
batchSize - The bucket part file size in bytes.public RollingSink<T> setBucketer(Bucketer bucketer)
Bucketer to use for determining the bucket files to write to.bucketer - The bucketer to use.public RollingSink<T> setWriter(Writer<T> writer)
Writer to be used for writing the incoming elements to bucket files.writer - The Writer to use.public RollingSink<T> setInProgressSuffix(String inProgressSuffix)
"in-progress".public RollingSink<T> setInProgressPrefix(String inProgressPrefix)
"_".public RollingSink<T> setPendingSuffix(String pendingSuffix)
".pending".public RollingSink<T> setPendingPrefix(String pendingPrefix)
"_".public RollingSink<T> setValidLengthSuffix(String validLengthSuffix)
".valid-length".public RollingSink<T> setValidLengthPrefix(String validLengthPrefix)
"_".public RollingSink<T> setPartPrefix(String partPrefix)
"part".@Deprecated public RollingSink<T> disableCleanupOnOpen()
This should only be disabled if using the sink without checkpoints, to not remove the files already in the directory.
public RollingSink<T> setAsyncTimeout(long timeout)
timeout - The timeout, in milliseconds.Copyright © 2014–2019 The Apache Software Foundation. All rights reserved.