public abstract class StreamExecutionEnvironment extends Object
ExecutionEnvironment for streaming jobs. An instance of it is
necessary to construct streaming topologies.| Modifier and Type | Field and Description |
|---|---|
protected static StreamExecutionEnvironment |
currentEnvironment |
static String |
DEFAULT_JOB_NAME |
protected StreamGraph |
streamGraph |
| Modifier | Constructor and Description |
|---|---|
protected |
StreamExecutionEnvironment()
Constructor for creating StreamExecutionEnvironment
|
| Modifier and Type | Method and Description |
|---|---|
void |
addDefaultKryoSerializer(Class<?> type,
Class<? extends com.esotericsoftware.kryo.Serializer<?>> serializerClass)
Adds a new Kryo default serializer to the Runtime.
|
void |
addDefaultKryoSerializer(Class<?> type,
com.esotericsoftware.kryo.Serializer<?> serializer)
Adds a new Kryo default serializer to the Runtime.
|
<OUT> DataStreamSource<OUT> |
addSource(SourceFunction<OUT> function)
Adds a data source with a custom type information thus opening a
DataStream. |
<OUT> DataStreamSource<OUT> |
addSource(SourceFunction<OUT> function,
String sourceName)
Ads a data source with a custom type information thus opening a
DataStream. |
<F> F |
clean(F f)
Returns a "closure-cleaned" version of the given function.
|
<OUT> DataStreamSource<OUT> |
createInput(org.apache.flink.api.common.io.InputFormat<OUT,?> inputFormat)
Generic method to create an input data stream with
InputFormat. |
<OUT> DataStreamSource<OUT> |
createInput(org.apache.flink.api.common.io.InputFormat<OUT,?> inputFormat,
org.apache.flink.api.common.typeinfo.TypeInformation<OUT> typeInfo)
Generic method to create an input data stream with
InputFormat. |
static LocalStreamEnvironment |
createLocalEnvironment()
Creates a
LocalStreamEnvironment. |
static LocalStreamEnvironment |
createLocalEnvironment(int parallelism)
Creates a
LocalStreamEnvironment. |
static StreamExecutionEnvironment |
createRemoteEnvironment(String host,
int port,
int parallelism,
String... jarFiles)
Creates a
RemoteStreamEnvironment. |
static StreamExecutionEnvironment |
createRemoteEnvironment(String host,
int port,
String... jarFiles)
Creates a
RemoteStreamEnvironment. |
StreamExecutionEnvironment |
disableOperatorChaining()
Disables operator chaining for streaming operators.
|
StreamExecutionEnvironment |
enableCheckpointing()
Method for enabling fault-tolerance.
|
StreamExecutionEnvironment |
enableCheckpointing(long interval)
Method for enabling fault-tolerance.
|
StreamExecutionEnvironment |
enableCheckpointing(long interval,
boolean force)
Deprecated.
|
abstract org.apache.flink.api.common.JobExecutionResult |
execute()
Triggers the program execution.
|
abstract org.apache.flink.api.common.JobExecutionResult |
execute(String jobName)
Triggers the program execution.
|
<OUT> DataStreamSource<OUT> |
fromCollection(Collection<OUT> data)
Creates a data stream from the given non-empty collection.
|
<OUT> DataStreamSource<OUT> |
fromCollection(Collection<OUT> data,
org.apache.flink.api.common.typeinfo.TypeInformation<OUT> typeInfo)
Creates a data stream from the given non-empty collection.Note that this operation will result in
a non-parallel data stream source, i.e.
|
<OUT> DataStreamSource<OUT> |
fromCollection(Iterator<OUT> data,
Class<OUT> type)
Creates a data stream from the given iterator.
|
<OUT> DataStreamSource<OUT> |
fromCollection(Iterator<OUT> data,
org.apache.flink.api.common.typeinfo.TypeInformation<OUT> typeInfo)
Creates a data stream from the given iterator.
|
<OUT> DataStreamSource<OUT> |
fromElements(OUT... data)
Creates a new data stream that contains the given elements.
|
<OUT> DataStreamSource<OUT> |
fromParallelCollection(org.apache.flink.util.SplittableIterator<OUT> iterator,
Class<OUT> type)
Creates a new data stream that contains elements in the iterator.
|
<OUT> DataStreamSource<OUT> |
fromParallelCollection(org.apache.flink.util.SplittableIterator<OUT> iterator,
org.apache.flink.api.common.typeinfo.TypeInformation<OUT> typeInfo)
Creates a new data stream that contains elements in the iterator.
|
DataStreamSource<Long> |
generateParallelSequence(long from,
long to)
Creates a new data stream that contains a sequence of numbers.
|
DataStreamSource<Long> |
generateSequence(long from,
long to)
Creates a new data stream that contains a sequence of numbers.
|
long |
getBufferTimeout()
Sets the maximum time frequency (milliseconds) for the flushing of the
output buffers.
|
org.apache.flink.api.common.ExecutionConfig |
getConfig()
Gets the config object.
|
int |
getDegreeOfParallelism()
Deprecated.
Please use
getParallelism() |
static StreamExecutionEnvironment |
getExecutionEnvironment()
Creates an execution environment that represents the context in which the
program is currently executed.
|
String |
getExecutionPlan()
Creates the plan with which the system will execute the program, and
returns it as a String using a JSON representation of the execution data
flow graph.
|
int |
getNumberOfExecutionRetries()
Gets the number of times the system will try to re-execute failed tasks.
|
int |
getParallelism()
Gets the parallelism with which operation are executed by default.
|
StreamGraph |
getStreamGraph()
Getter of the
StreamGraph of the streaming job. |
protected static void |
initializeFromFactory(StreamExecutionEnvironmentFactory eef) |
<OUT> DataStreamSource<OUT> |
readFile(org.apache.flink.api.common.io.FileInputFormat<OUT> inputFormat,
String filePath)
Reads the given file with the given imput format.
|
<OUT> DataStreamSource<OUT> |
readFileOfPrimitives(String filePath,
Class<OUT> typeClass)
Creates a data stream that represents the primitive type produced by reading the given file line wise.
|
<OUT> DataStreamSource<OUT> |
readFileOfPrimitives(String filePath,
String delimiter,
Class<OUT> typeClass)
Creates a data stream that represents the primitive type produced by reading the given file in delimited way.
|
DataStream<String> |
readFileStream(String filePath,
long intervalMillis,
FileMonitoringFunction.WatchType watchType)
Creates a data stream that contains the contents of file created while system watches the given path.
|
DataStreamSource<String> |
readTextFile(String filePath)
Creates a data stream that represents the Strings produced by reading the given file line wise.
|
DataStreamSource<String> |
readTextFile(String filePath,
String charsetName)
Creates a data stream that represents the Strings produced by reading the given file line wise.
|
DataStreamSource<org.apache.flink.types.StringValue> |
readTextFileWithValue(String filePath)
Creates a data stream that represents the strings produced by reading the given file line wise.
|
DataStreamSource<org.apache.flink.types.StringValue> |
readTextFileWithValue(String filePath,
String charsetName,
boolean skipInvalidLines)
Creates a data stream that represents the Strings produced by reading the given file line wise.
|
void |
registerType(Class<?> type)
Registers the given type with the serialization stack.
|
void |
registerTypeWithKryoSerializer(Class<?> type,
Class<? extends com.esotericsoftware.kryo.Serializer<?>> serializerClass)
Registers the given Serializer via its class as a serializer for the
given type at the KryoSerializer
|
void |
registerTypeWithKryoSerializer(Class<?> type,
com.esotericsoftware.kryo.Serializer<?> serializer)
Registers the given type with a Kryo Serializer.
|
StreamExecutionEnvironment |
setBufferTimeout(long timeoutMillis)
Sets the maximum time frequency (milliseconds) for the flushing of the
output buffers.
|
static void |
setDefaultLocalParallelism(int parallelism)
Sets the default parallelism that will be used for the local execution
environment created by
createLocalEnvironment(). |
StreamExecutionEnvironment |
setDegreeOfParallelism(int parallelism)
Deprecated.
Please use
setParallelism(int) |
void |
setNumberOfExecutionRetries(int numberOfExecutionRetries)
Sets the number of times that failed tasks are re-executed.
|
StreamExecutionEnvironment |
setParallelism(int parallelism)
Sets the parallelism for operations executed through this environment.
|
StreamExecutionEnvironment |
setStateHandleProvider(org.apache.flink.runtime.state.StateHandleProvider<?> provider)
Sets the
StateHandleProvider used for storing operator state
checkpoints when checkpointing is enabled. |
DataStreamSource<String> |
socketTextStream(String hostname,
int port)
Creates a new data stream that contains the strings received infinitely from a socket.
|
DataStreamSource<String> |
socketTextStream(String hostname,
int port,
char delimiter)
Creates a new data stream that contains the strings received infinitely from a socket.
|
DataStreamSource<String> |
socketTextStream(String hostname,
int port,
char delimiter,
long maxRetry)
Creates a new data stream that contains the strings received infinitely from a socket.
|
public static final String DEFAULT_JOB_NAME
protected static StreamExecutionEnvironment currentEnvironment
protected StreamGraph streamGraph
protected StreamExecutionEnvironment()
public org.apache.flink.api.common.ExecutionConfig getConfig()
@Deprecated public StreamExecutionEnvironment setDegreeOfParallelism(int parallelism)
setParallelism(int)LocalStreamEnvironment uses by default a value equal to the
number of hardware contexts (CPU cores / threads). When executing the
program via the command line client from a JAR file, the default degree
of parallelism is the one configured for that setup.parallelism - The parallelism@Deprecated public int getDegreeOfParallelism()
getParallelism()public StreamExecutionEnvironment setParallelism(int parallelism)
LocalStreamEnvironment uses by default a value equal to the
number of hardware contexts (CPU cores / threads). When executing the
program via the command line client from a JAR file, the default degree
of parallelism is the one configured for that setup.parallelism - The parallelismpublic int getParallelism()
public StreamExecutionEnvironment setBufferTimeout(long timeoutMillis)
timeoutMillis - The maximum time between two output flushes.public long getBufferTimeout()
setBufferTimeout(long).public StreamExecutionEnvironment disableOperatorChaining()
public StreamExecutionEnvironment enableCheckpointing(long interval)
setNumberOfExecutionRetries(int numberOfExecutionRetries) method
in case of failure the job will be resubmitted to the cluster
indefinitely.interval - Time interval between state checkpoints in millis@Deprecated public StreamExecutionEnvironment enableCheckpointing(long interval, boolean force)
setNumberOfExecutionRetries(int numberOfExecutionRetries) method
in case of failure the job will be resubmitted to the cluster
indefinitely.interval - Time interval between state checkpoints in millisforce - If true checkpointing will be enabled for iterative jobs as
wellpublic StreamExecutionEnvironment enableCheckpointing()
setNumberOfExecutionRetries(int numberOfExecutionRetries) method
in case of failure the job will be resubmitted to the cluster
indefinitely.public StreamExecutionEnvironment setStateHandleProvider(org.apache.flink.runtime.state.StateHandleProvider<?> provider)
StateHandleProvider used for storing operator state
checkpoints when checkpointing is enabled.
An example would be using a FileStateHandle.createProvider(String)
to use any Flink supported file system as a state backend
public void setNumberOfExecutionRetries(int numberOfExecutionRetries)
-1
indicates that the system default value (as defined in the configuration)
should be used.numberOfExecutionRetries - The number of times the system will try to re-execute failed
tasks.public int getNumberOfExecutionRetries()
-1 indicates that the system default value (as defined
in the configuration) should be used.public static void setDefaultLocalParallelism(int parallelism)
createLocalEnvironment().parallelism - The parallelism to use as the default local parallelism.public void addDefaultKryoSerializer(Class<?> type, com.esotericsoftware.kryo.Serializer<?> serializer)
type - The class of the types serialized with the given serializer.serializer - The serializer to use.public void addDefaultKryoSerializer(Class<?> type, Class<? extends com.esotericsoftware.kryo.Serializer<?>> serializerClass)
type - The class of the types serialized with the given serializer.serializerClass - The class of the serializer to use.public void registerTypeWithKryoSerializer(Class<?> type, com.esotericsoftware.kryo.Serializer<?> serializer)
type - The class of the types serialized with the given serializer.serializer - The serializer to use.public void registerTypeWithKryoSerializer(Class<?> type, Class<? extends com.esotericsoftware.kryo.Serializer<?>> serializerClass)
type - The class of the types serialized with the given serializer.serializerClass - The class of the serializer to use.public void registerType(Class<?> type)
type - The class of the type to register.public DataStreamSource<Long> generateSequence(long from, long to)
from - The number to start at (inclusive)to - The number to stop at (inclusive)public DataStreamSource<Long> generateParallelSequence(long from, long to)
from - The number to start at (inclusive)to - The number to stop at (inclusive)public <OUT> DataStreamSource<OUT> fromElements(OUT... data)
String or Integer.
The framework will try and determine the exact type from the elements. In case of generic elements, it may be
necessary to manually supply the type information via fromCollection(java.util.Collection,
org.apache.flink.api.common.typeinfo.TypeInformation).
Note that this operation will result in a non-parallel data stream source, i.e. a data stream source with a degree of parallelism one.
OUT - The type of the returned data streamdata - The array of elements to create the data stream from.public <OUT> DataStreamSource<OUT> fromCollection(Collection<OUT> data)
The framework will try and determine the exact type from the collection elements. In case of generic
elements, it may be necessary to manually supply the type information via
fromCollection(java.util.Collection, org.apache.flink.api.common.typeinfo.TypeInformation).
Note that this operation will result in a non-parallel data stream source, i.e. a data stream source with a degree of parallelism one.
OUT - The type of the returned data streamdata - The collection of elements to create the data stream frompublic <OUT> DataStreamSource<OUT> fromCollection(Collection<OUT> data, org.apache.flink.api.common.typeinfo.TypeInformation<OUT> typeInfo)
OUT - The type of the returned data streamdata - The collection of elements to create the data stream fromtypeInfo - The TypeInformation for the produced data streampublic <OUT> DataStreamSource<OUT> fromCollection(Iterator<OUT> data, Class<OUT> type)
Note that this operation will result in a non-parallel data stream source, i.e. a data stream source with a degree of parallelism of one.
OUT - The type of the returned data streamdata - The iterator of elements to create the data stream fromtype - The class of the data produced by the iterator. Must not be a generic class.fromCollection(java.util.Iterator, org.apache.flink.api.common.typeinfo.TypeInformation)public <OUT> DataStreamSource<OUT> fromCollection(Iterator<OUT> data, org.apache.flink.api.common.typeinfo.TypeInformation<OUT> typeInfo)
fromCollection(java.util.Iterator, Class) does not supply all type information.
Note that this operation will result in a non-parallel data stream source, i.e. a data stream source with a degree of parallelism one.
OUT - The type of the returned data streamdata - The iterator of elements to create the data stream fromtypeInfo - The TypeInformation for the produced data streampublic <OUT> DataStreamSource<OUT> fromParallelCollection(org.apache.flink.util.SplittableIterator<OUT> iterator, Class<OUT> type)
Because the iterator will remain unmodified until the actual execution happens, the type of data returned by the iterator must be given explicitly in the form of the type class (this is due to the fact that the Java compiler erases the generic type information).
OUT - The type of the returned data streamiterator - The iterator that produces the elements of the data streamtype - The class of the data produced by the iterator. Must not be a generic class.public <OUT> DataStreamSource<OUT> fromParallelCollection(org.apache.flink.util.SplittableIterator<OUT> iterator, org.apache.flink.api.common.typeinfo.TypeInformation<OUT> typeInfo)
Because the iterator will remain unmodified until the actual execution happens, the type of data returned by the
iterator must be given explicitly in the form of the type information. This method is useful for cases where the
type is generic. In that case, the type class (as given in fromParallelCollection(org.apache.flink.util.SplittableIterator,
Class) does not supply all type information.
OUT - The type of the returned data streamiterator - The iterator that produces the elements of the data streamtypeInfo - The TypeInformation for the produced data stream.public DataStreamSource<String> readTextFile(String filePath)
filePath - The path of the file, as a URI (e.g., "file:///some/local/file" or "hdfs://host:port/file/path").public DataStreamSource<String> readTextFile(String filePath, String charsetName)
Charset with the given name will be used to read the files.filePath - The path of the file, as a URI (e.g., "file:///some/local/file" or "hdfs://host:port/file/path")charsetName - The name of the character set used to read the filepublic DataStreamSource<org.apache.flink.types.StringValue> readTextFileWithValue(String filePath)
readTextFile(String), but it produces a data stream with mutable StringValue
objects,
rather than Java Strings. StringValues can be used to tune implementations to be less object and garbage
collection heavy.
The file will be read with the system's default character set.filePath - The path of the file, as a URI (e.g., "file:///some/local/file" or "hdfs://host:port/file/path")public DataStreamSource<org.apache.flink.types.StringValue> readTextFileWithValue(String filePath, String charsetName, boolean skipInvalidLines)
readTextFile(String, String), but it produces a data stream with mutable StringValue
objects, rather than Java Strings. StringValues can be used to tune implementations to be less object and
garbage
collection heavy.
The Charset with the given name will be used to read the files.filePath - The path of the file, as a URI (e.g., "file:///some/local/file" or "hdfs://host:port/file/path")charsetName - The name of the character set used to read the fileskipInvalidLines - A flag to indicate whether to skip lines that cannot be read with the given character setpublic <OUT> DataStreamSource<OUT> readFile(org.apache.flink.api.common.io.FileInputFormat<OUT> inputFormat, String filePath)
OUT - The type of the returned data streamfilePath - The path of the file, as a URI (e.g., "file:///some/local/file" or "hdfs://host:port/file/path")inputFormat - The input format used to create the data streampublic <OUT> DataStreamSource<OUT> readFileOfPrimitives(String filePath, Class<OUT> typeClass)
OUT - The type of the returned data streamfilePath - The path of the file, as a URI (e.g., "file:///some/local/file" or "hdfs://host:port/file/path")typeClass - The primitive type class to be readpublic <OUT> DataStreamSource<OUT> readFileOfPrimitives(String filePath, String delimiter, Class<OUT> typeClass)
OUT - The type of the returned data streamfilePath - The path of the file, as a URI (e.g., "file:///some/local/file" or "hdfs://host:port/file/path")delimiter - The delimiter of the given filetypeClass - The primitive type class to be readpublic DataStream<String> readFileStream(String filePath, long intervalMillis, FileMonitoringFunction.WatchType watchType)
filePath - The path of the file, as a URI (e.g., "file:///some/local/file" or "hdfs://host:port/file/path/")intervalMillis - The interval of file watching in millisecondswatchType - The watch type of file stream. When watchType is FileMonitoringFunction.WatchType.ONLY_NEW_FILES, the system processes
only
new files. FileMonitoringFunction.WatchType.REPROCESS_WITH_APPENDED means that the system re-processes all contents of
appended file. FileMonitoringFunction.WatchType.PROCESS_ONLY_APPENDED means that the system processes only appended
contents
of files.public DataStreamSource<String> socketTextStream(String hostname, int port, char delimiter, long maxRetry)
hostname - The host name which a server socket bindsport - The port number which a server socket binds. A port number of 0 means that the port number is automatically
allocated.delimiter - A character which splits received strings into recordsmaxRetry - The maximal retry interval in seconds while the program waits for a socket that is temporarily down.
Reconnection is initiated every second. A number of 0 means that the reader is immediately terminated,
while
a negative value ensures retrying forever.public DataStreamSource<String> socketTextStream(String hostname, int port, char delimiter)
hostname - The host name which a server socket bindsport - The port number which a server socket binds. A port number of 0 means that the port number is automatically
allocated.delimiter - A character which splits received strings into recordspublic DataStreamSource<String> socketTextStream(String hostname, int port)
hostname - The host name which a server socket bindsport - The port number which a server socket binds. A port number of 0 means that the port number is automatically
allocated.public <OUT> DataStreamSource<OUT> createInput(org.apache.flink.api.common.io.InputFormat<OUT,?> inputFormat)
InputFormat.
Since all data streams need specific information about their types, this method needs to determine the type of
the data produced by the input format. It will attempt to determine the data type by reflection, unless the
input
format implements the ResultTypeQueryable interface. In the latter
case, this method will invoke the ResultTypeQueryable.getProducedType()
method to determine data type produced by the input format.OUT - The type of the returned data streaminputFormat - The input format used to create the data streampublic <OUT> DataStreamSource<OUT> createInput(org.apache.flink.api.common.io.InputFormat<OUT,?> inputFormat, org.apache.flink.api.common.typeinfo.TypeInformation<OUT> typeInfo)
InputFormat.
The data stream is typed to the given TypeInformation. This method is intended for input formats where the
return
type cannot be determined by reflection analysis, and that do not implement the
ResultTypeQueryable interface.
OUT - The type of the returned data streaminputFormat - The input format used to create the data streampublic <OUT> DataStreamSource<OUT> addSource(SourceFunction<OUT> function)
DataStream. Only in very special cases does the user need
to support type information. Otherwise use addSource(org.apache.flink.streaming.api.functions.source.SourceFunction)
By default sources have a parallelism of 1. To enable parallel execution, the user defined source should
implement ParallelSourceFunction or extend RichParallelSourceFunction. In these cases the resulting source
will have the parallelism of the environment. To change this afterwards call DataStreamSource.setParallelism(int)
OUT - type of the returned streamfunction - the user defined functionpublic <OUT> DataStreamSource<OUT> addSource(SourceFunction<OUT> function, String sourceName)
DataStream. Only in very special cases does the user need to
support type information. Otherwise use
addSource(org.apache.flink.streaming.api.functions.source.SourceFunction)OUT - type of the returned streamfunction - the user defined functionsourceName - Name of the data sourcepublic static StreamExecutionEnvironment getExecutionEnvironment()
createLocalEnvironment().public static LocalStreamEnvironment createLocalEnvironment()
LocalStreamEnvironment. The local execution environment
will run the program in a multi-threaded fashion in the same JVM as the
environment was created in. The default parallelism of the local
environment is the number of hardware contexts (CPU cores / threads),
unless it was specified differently by setParallelism(int).public static LocalStreamEnvironment createLocalEnvironment(int parallelism)
LocalStreamEnvironment. The local execution environment
will run the program in a multi-threaded fashion in the same JVM as the
environment was created in. It will use the parallelism specified in the
parameter.parallelism - The parallelism for the local environment.public static StreamExecutionEnvironment createRemoteEnvironment(String host, int port, String... jarFiles)
RemoteStreamEnvironment. The remote environment sends
(parts of) the program to a cluster for execution. Note that all file
paths used in the program must be accessible from the cluster. The
execution will use no parallelism, unless the parallelism is set
explicitly via setParallelism(int).host - The host name or address of the master (JobManager), where the
program should be executed.port - The port of the master (JobManager), where the program should
be executed.jarFiles - The JAR files with code that needs to be shipped to the
cluster. If the program uses user-defined functions,
user-defined input formats, or any libraries, those must be
provided in the JAR files.public static StreamExecutionEnvironment createRemoteEnvironment(String host, int port, int parallelism, String... jarFiles)
RemoteStreamEnvironment. The remote environment sends
(parts of) the program to a cluster for execution. Note that all file
paths used in the program must be accessible from the cluster. The
execution will use the specified parallelism.host - The host name or address of the master (JobManager), where the
program should be executed.port - The port of the master (JobManager), where the program should
be executed.parallelism - The parallelism to use during the execution.jarFiles - The JAR files with code that needs to be shipped to the
cluster. If the program uses user-defined functions,
user-defined input formats, or any libraries, those must be
provided in the JAR files.public abstract org.apache.flink.api.common.JobExecutionResult execute()
throws Exception
Exceptionpublic abstract org.apache.flink.api.common.JobExecutionResult execute(String jobName) throws Exception
jobName - Desired name of the jobExceptionpublic StreamGraph getStreamGraph()
StreamGraph of the streaming job.public String getExecutionPlan()
protected static void initializeFromFactory(StreamExecutionEnvironmentFactory eef)
public <F> F clean(F f)
ExecutionConfigCopyright © 2014–2015 The Apache Software Foundation. All rights reserved.