public class RocksDBStateBackend
extends org.apache.flink.runtime.state.AbstractStateBackend
StateBackend that stores its state in RocksDB. This state backend can
store very large state that exceeds memory and spills to disk.
All key/value state (including windows) is stored in the key/value index of RocksDB. For persistence against loss of machines, checkpoints take a snapshot of the RocksDB database, and persist that snapshot in a file system (by default) or another configurable state backend.
The behavior of the RocksDB instances can be parametrized by setting RocksDB Options
using the methods setPredefinedOptions(PredefinedOptions) and
setOptions(OptionsFactory).
| Modifier and Type | Field and Description |
|---|---|
protected org.rocksdb.RocksDB |
db
Our RocksDB data base, this is used by the actual subclasses of
AbstractRocksDBState
to store state. |
| Constructor and Description |
|---|
RocksDBStateBackend(String checkpointDataUri)
Creates a new
RocksDBStateBackend that stores its checkpoint data in the
file system and location defined by the given URI. |
RocksDBStateBackend(String checkpointDataUri,
org.apache.flink.runtime.state.AbstractStateBackend nonPartitionedStateBackend) |
RocksDBStateBackend(URI checkpointDataUri)
Creates a new
RocksDBStateBackend that stores its checkpoint data in the
file system and location defined by the given URI. |
RocksDBStateBackend(URI checkpointDataUri,
org.apache.flink.runtime.state.AbstractStateBackend nonPartitionedStateBackend) |
| Modifier and Type | Method and Description |
|---|---|
<S extends Serializable> |
checkpointStateSerializable(S state,
long checkpointID,
long timestamp) |
void |
close() |
org.apache.flink.runtime.state.AbstractStateBackend.CheckpointStateOutputStream |
createCheckpointStateOutputStream(long checkpointID,
long timestamp) |
protected <N,T,ACC> org.apache.flink.api.common.state.FoldingState<T,ACC> |
createFoldingState(org.apache.flink.api.common.typeutils.TypeSerializer<N> namespaceSerializer,
org.apache.flink.api.common.state.FoldingStateDescriptor<T,ACC> stateDesc) |
protected <N,T> org.apache.flink.api.common.state.ListState<T> |
createListState(org.apache.flink.api.common.typeutils.TypeSerializer<N> namespaceSerializer,
org.apache.flink.api.common.state.ListStateDescriptor<T> stateDesc) |
protected <N,T> org.apache.flink.api.common.state.ReducingState<T> |
createReducingState(org.apache.flink.api.common.typeutils.TypeSerializer<N> namespaceSerializer,
org.apache.flink.api.common.state.ReducingStateDescriptor<T> stateDesc) |
protected <N,T> org.apache.flink.api.common.state.ValueState<T> |
createValueState(org.apache.flink.api.common.typeutils.TypeSerializer<N> namespaceSerializer,
org.apache.flink.api.common.state.ValueStateDescriptor<T> stateDesc) |
Object |
currentKey()
Used by k/v states to access the current key.
|
void |
disableFullyAsyncSnapshots()
Disables fully asynchronous snapshotting of the partitioned state held in RocksDB.
|
void |
dispose() |
void |
disposeAllStateForCurrentJob() |
void |
enableFullyAsyncSnapshots()
Enables fully asynchronous snapshotting of the partitioned state held in RocksDB.
|
protected org.rocksdb.ColumnFamilyHandle |
getColumnFamily(org.apache.flink.api.common.state.StateDescriptor descriptor)
Creates a column family handle for use with a k/v state.
|
org.rocksdb.ColumnFamilyOptions |
getColumnOptions()
Gets the RocksDB
ColumnFamilyOptions to be used for all RocksDB instances. |
org.rocksdb.DBOptions |
getDbOptions()
Gets the RocksDB
DBOptions to be used for all RocksDB instances. |
String[] |
getDbStoragePaths() |
OptionsFactory |
getOptions()
Gets the options factory that lazily creates the RocksDB options.
|
PredefinedOptions |
getPredefinedOptions()
Gets the currently set predefined options for RocksDB.
|
File[] |
getStoragePaths()
Visible for tests.
|
void |
initializeForJob(org.apache.flink.runtime.execution.Environment env,
String operatorIdentifier,
org.apache.flink.api.common.typeutils.TypeSerializer<?> keySerializer) |
void |
injectKeyValueStateSnapshots(HashMap<String,org.apache.flink.runtime.state.KvStateSnapshot> keyValueStateSnapshots) |
org.apache.flink.api.common.typeutils.TypeSerializer |
keySerializer()
Used by k/v states to access the key serializer.
|
void |
setDbStoragePath(String path)
Sets the path where the RocksDB local database files should be stored on the local
file system.
|
void |
setDbStoragePaths(String... paths)
Sets the paths across which the local RocksDB database files are distributed on the local
file system.
|
void |
setOptions(OptionsFactory optionsFactory)
Sets
Options for the RocksDB instances. |
void |
setPredefinedOptions(PredefinedOptions options)
Sets the predefined options for RocksDB.
|
HashMap<String,org.apache.flink.runtime.state.KvStateSnapshot<?,?,?,?,?>> |
snapshotPartitionedState(long checkpointId,
long timestamp) |
protected transient volatile org.rocksdb.RocksDB db
AbstractRocksDBState
to store state. The different k/v states that we have don't each have their own RocksDB
instance. They all write to this instance but to their own column family.public RocksDBStateBackend(String checkpointDataUri) throws IOException
RocksDBStateBackend that stores its checkpoint data in the
file system and location defined by the given URI.
A state backend that stores checkpoints in HDFS or S3 must specify the file system host and port in the URI, or have the Hadoop configuration that describes the file system (host / high-availability group / possibly credentials) either referenced from the Flink config, or included in the classpath.
checkpointDataUri - The URI describing the filesystem and path to the checkpoint data directory.IOException - Thrown, if no file system can be found for the scheme in the URI.public RocksDBStateBackend(URI checkpointDataUri) throws IOException
RocksDBStateBackend that stores its checkpoint data in the
file system and location defined by the given URI.
A state backend that stores checkpoints in HDFS or S3 must specify the file system host and port in the URI, or have the Hadoop configuration that describes the file system (host / high-availability group / possibly credentials) either referenced from the Flink config, or included in the classpath.
checkpointDataUri - The URI describing the filesystem and path to the checkpoint data directory.IOException - Thrown, if no file system can be found for the scheme in the URI.public RocksDBStateBackend(String checkpointDataUri, org.apache.flink.runtime.state.AbstractStateBackend nonPartitionedStateBackend) throws IOException
IOExceptionpublic RocksDBStateBackend(URI checkpointDataUri, org.apache.flink.runtime.state.AbstractStateBackend nonPartitionedStateBackend) throws IOException
IOExceptionpublic void initializeForJob(org.apache.flink.runtime.execution.Environment env,
String operatorIdentifier,
org.apache.flink.api.common.typeutils.TypeSerializer<?> keySerializer)
throws Exception
initializeForJob in class org.apache.flink.runtime.state.AbstractStateBackendExceptionpublic void disposeAllStateForCurrentJob()
throws Exception
disposeAllStateForCurrentJob in class org.apache.flink.runtime.state.AbstractStateBackendExceptionpublic void dispose()
dispose in class org.apache.flink.runtime.state.AbstractStateBackendpublic void close()
throws Exception
close in class org.apache.flink.runtime.state.AbstractStateBackendExceptionpublic File[] getStoragePaths()
public HashMap<String,org.apache.flink.runtime.state.KvStateSnapshot<?,?,?,?,?>> snapshotPartitionedState(long checkpointId, long timestamp) throws Exception
snapshotPartitionedState in class org.apache.flink.runtime.state.AbstractStateBackendExceptionpublic final void injectKeyValueStateSnapshots(HashMap<String,org.apache.flink.runtime.state.KvStateSnapshot> keyValueStateSnapshots) throws Exception
injectKeyValueStateSnapshots in class org.apache.flink.runtime.state.AbstractStateBackendExceptionprotected org.rocksdb.ColumnFamilyHandle getColumnFamily(org.apache.flink.api.common.state.StateDescriptor descriptor)
This also checks whether the StateDescriptor for a state matches the one
that we checkpointed, i.e. is already in the map of column families.
public Object currentKey()
public org.apache.flink.api.common.typeutils.TypeSerializer keySerializer()
protected <N,T> org.apache.flink.api.common.state.ValueState<T> createValueState(org.apache.flink.api.common.typeutils.TypeSerializer<N> namespaceSerializer,
org.apache.flink.api.common.state.ValueStateDescriptor<T> stateDesc)
throws Exception
createValueState in class org.apache.flink.runtime.state.AbstractStateBackendExceptionprotected <N,T> org.apache.flink.api.common.state.ListState<T> createListState(org.apache.flink.api.common.typeutils.TypeSerializer<N> namespaceSerializer,
org.apache.flink.api.common.state.ListStateDescriptor<T> stateDesc)
throws Exception
createListState in class org.apache.flink.runtime.state.AbstractStateBackendExceptionprotected <N,T> org.apache.flink.api.common.state.ReducingState<T> createReducingState(org.apache.flink.api.common.typeutils.TypeSerializer<N> namespaceSerializer,
org.apache.flink.api.common.state.ReducingStateDescriptor<T> stateDesc)
throws Exception
createReducingState in class org.apache.flink.runtime.state.AbstractStateBackendExceptionprotected <N,T,ACC> org.apache.flink.api.common.state.FoldingState<T,ACC> createFoldingState(org.apache.flink.api.common.typeutils.TypeSerializer<N> namespaceSerializer,
org.apache.flink.api.common.state.FoldingStateDescriptor<T,ACC> stateDesc)
throws Exception
createFoldingState in class org.apache.flink.runtime.state.AbstractStateBackendExceptionpublic org.apache.flink.runtime.state.AbstractStateBackend.CheckpointStateOutputStream createCheckpointStateOutputStream(long checkpointID,
long timestamp)
throws Exception
createCheckpointStateOutputStream in class org.apache.flink.runtime.state.AbstractStateBackendExceptionpublic <S extends Serializable> org.apache.flink.runtime.state.StateHandle<S> checkpointStateSerializable(S state, long checkpointID, long timestamp) throws Exception
checkpointStateSerializable in class org.apache.flink.runtime.state.AbstractStateBackendExceptionpublic void enableFullyAsyncSnapshots()
By default, this is disabled. This means that RocksDB state is copied in a synchronous step, during which normal processing of elements pauses, followed by an asynchronous step of copying the RocksDB backup to the final checkpoint location. Fully asynchronous snapshots take longer (linear time requirement with respect to number of unique keys) but normal processing of elements is not paused.
public void disableFullyAsyncSnapshots()
By default, this is disabled.
public void setDbStoragePath(String path)
Passing null to this function restores the default behavior, where the configured
temp directories will be used.
path - The path where the local RocksDB database files are stored.public void setDbStoragePaths(String... paths)
Each distinct state will be stored in one path, but when the state backend creates multiple states, they will store their files on different paths.
Passing null to this function restores the default behavior, where the configured
temp directories will be used.
paths - The paths across which the local RocksDB database files will be spread.public String[] getDbStoragePaths()
public void setPredefinedOptions(PredefinedOptions options)
If a user-defined options factory is set (via setOptions(OptionsFactory)),
then the options from the factory are applied on top of the here specified
predefined options.
options - The options to set (must not be null).public PredefinedOptions getPredefinedOptions()
setPredefinedOptions(PredefinedOptions))
are PredefinedOptions.DEFAULT.
If a user-defined options factory is set (via setOptions(OptionsFactory)),
then the options from the factory are applied on top of the predefined options.
public void setOptions(OptionsFactory optionsFactory)
Options for the RocksDB instances.
Because the options are not serializable and hold native code references,
they must be specified through a factory.
The options created by the factory here are applied on top of the pre-defined
options profile selected via setPredefinedOptions(PredefinedOptions).
If the pre-defined options profile is the default
(PredefinedOptions.DEFAULT), then the factory fully controls the RocksDB
options.
optionsFactory - The options factory that lazily creates the RocksDB options.public OptionsFactory getOptions()
public org.rocksdb.DBOptions getDbOptions()
DBOptions to be used for all RocksDB instances.public org.rocksdb.ColumnFamilyOptions getColumnOptions()
ColumnFamilyOptions to be used for all RocksDB instances.Copyright © 2014–2016 The Apache Software Foundation. All rights reserved.