OUT - The type of the DataStream, i.e., the type of the elements of the
DataStream.public class DataStream<OUT> extends Object
| Modifier and Type | Field and Description |
|---|---|
protected static Integer |
counter |
protected StreamExecutionEnvironment |
environment |
protected Integer |
id |
protected Integer |
iterationID |
protected Long |
iterationWaitTime |
protected int |
parallelism |
protected StreamPartitioner<OUT> |
partitioner |
protected StreamGraph |
streamGraph |
protected org.apache.flink.api.common.typeinfo.TypeInformation |
typeInfo |
protected List<DataStream<OUT>> |
unionizedStreams |
protected List<String> |
userDefinedNames |
| Constructor and Description |
|---|
DataStream(DataStream<OUT> dataStream)
Create a new DataStream by creating a copy of another DataStream
|
DataStream(StreamExecutionEnvironment environment,
org.apache.flink.api.common.typeinfo.TypeInformation<OUT> typeInfo)
Create a new
DataStream in the given execution environment with
partitioning set to forward by default. |
| Modifier and Type | Method and Description |
|---|---|
DataStreamSink<OUT> |
addSink(SinkFunction<OUT> sinkFunction)
Adds the given sink to this DataStream.
|
protected SingleOutputStreamOperator<OUT,?> |
aggregate(AggregationFunction<OUT> aggregate) |
DataStream<OUT> |
broadcast()
Sets the partitioning of the
DataStream so that the output tuples
are broadcasted to every parallel instance of the next component. |
protected void |
checkFieldRange(int pos)
Checks if the given field position is allowed for the output type
|
protected <F> F |
clean(F f) |
<R> ConnectedDataStream<OUT,R> |
connect(DataStream<R> dataStream)
Creates a new
ConnectedDataStream by connecting
DataStream outputs of (possible) different types with each other. |
protected <X> void |
connectGraph(DataStream<X> inputStream,
Integer outputID,
int typeNumber)
Internal function for assembling the underlying
JobGraph of the job. |
DataStream<OUT> |
copy()
Creates a copy of the
DataStream |
SingleOutputStreamOperator<Long,?> |
count()
Creates a new DataStream containing the current number (count) of
received records.
|
<IN2> StreamCrossOperator<OUT,IN2> |
cross(DataStream<IN2> dataStreamToCross)
Initiates a temporal Cross transformation.
A Cross transformation combines the elements of two DataStreams
into one DataStream over a specified time window. |
WindowedDataStream<OUT> |
every(WindowingHelper policyHelper)
Create a
WindowedDataStream on the full stream history, to
produce periodic aggregates. |
protected void |
fillInType(org.apache.flink.api.common.typeinfo.TypeInformation<OUT> typeInfo)
Tries to fill in the type information.
|
SingleOutputStreamOperator<OUT,?> |
filter(org.apache.flink.api.common.functions.FilterFunction<OUT> filter)
Applies a Filter transformation on a
DataStream. |
<R> SingleOutputStreamOperator<R,?> |
flatMap(org.apache.flink.api.common.functions.FlatMapFunction<OUT,R> flatMapper)
Applies a FlatMap transformation on a
DataStream. |
<R> SingleOutputStreamOperator<R,?> |
fold(R initialValue,
org.apache.flink.api.common.functions.FoldFunction<OUT,R> folder)
Applies a fold transformation on the data stream.
|
DataStream<OUT> |
forward()
Sets the partitioning of the
DataStream so that the output tuples
are forwarded to the local subtask of the next component (whenever
possible). |
protected Class<?> |
getClassAtPos(int pos)
Gets the class of the field at the given position.
|
protected org.apache.flink.api.common.ExecutionConfig |
getExecutionConfig() |
StreamExecutionEnvironment |
getExecutionEnvironment() |
Integer |
getId()
Returns the ID of the
DataStream in the current StreamExecutionEnvironment. |
int |
getParallelism()
Gets the parallelism for this operator.
|
org.apache.flink.api.common.typeinfo.TypeInformation<OUT> |
getType()
Gets the type of the stream.
|
DataStream<OUT> |
global()
Sets the partitioning of the
DataStream so that the output values
all go to the first instance of the next processing operator. |
GroupedDataStream<OUT> |
groupBy(int... fields)
Groups the elements of a
DataStream by the given key positions to
be used with grouped operators like
GroupedDataStream.reduce(ReduceFunction) |
GroupedDataStream<OUT> |
groupBy(org.apache.flink.api.java.functions.KeySelector<OUT,?> keySelector)
Groups the elements of a
DataStream by the key extracted by the
KeySelector to be used with grouped operators like
GroupedDataStream.reduce(ReduceFunction). |
GroupedDataStream<OUT> |
groupBy(String... fields)
Groups a
DataStream using field expressions. |
IterativeDataStream<OUT> |
iterate()
Initiates an iterative part of the program that feeds back data streams.
|
IterativeDataStream<OUT> |
iterate(long maxWaitTimeMillis)
Initiates an iterative part of the program that feeds back data streams.
|
<IN2> StreamJoinOperator<OUT,IN2> |
join(DataStream<IN2> dataStreamToJoin)
Initiates a temporal Join transformation.
|
<R> SingleOutputStreamOperator<R,?> |
map(org.apache.flink.api.common.functions.MapFunction<OUT,R> mapper)
Applies a Map transformation on a
DataStream. |
SingleOutputStreamOperator<OUT,?> |
max(int positionToMax)
Applies an aggregation that gives the current maximum of the data stream
at the given position.
|
SingleOutputStreamOperator<OUT,?> |
max(String field)
Applies an aggregation that that gives the current maximum of the pojo
data stream at the given field expression.
|
SingleOutputStreamOperator<OUT,?> |
maxBy(int positionToMaxBy)
Applies an aggregation that that gives the current element with the
maximum value at the given position, if more elements have the maximum
value at the given position, the operator returns the first one by
default.
|
SingleOutputStreamOperator<OUT,?> |
maxBy(int positionToMaxBy,
boolean first)
Applies an aggregation that that gives the current element with the
maximum value at the given position, if more elements have the maximum
value at the given position, the operator returns either the first or
last one, depending on the parameter set.
|
SingleOutputStreamOperator<OUT,?> |
maxBy(String positionToMaxBy)
Applies an aggregation that that gives the current element with the
maximum value at the given position, if more elements have the maximum
value at the given position, the operator returns the first one by
default.
|
SingleOutputStreamOperator<OUT,?> |
maxBy(String field,
boolean first)
Applies an aggregation that that gives the current maximum element of the
pojo data stream by the given field expression.
|
SingleOutputStreamOperator<OUT,?> |
min(int positionToMin)
Applies an aggregation that that gives the current minimum of the data
stream at the given position.
|
SingleOutputStreamOperator<OUT,?> |
min(String field)
Applies an aggregation that that gives the current minimum of the pojo
data stream at the given field expression.
|
SingleOutputStreamOperator<OUT,?> |
minBy(int positionToMinBy)
Applies an aggregation that that gives the current element with the
minimum value at the given position, if more elements have the minimum
value at the given position, the operator returns the first one by
default.
|
SingleOutputStreamOperator<OUT,?> |
minBy(int positionToMinBy,
boolean first)
Applies an aggregation that that gives the current element with the
minimum value at the given position, if more elements have the minimum
value at the given position, the operator returns either the first or
last one, depending on the parameter set.
|
SingleOutputStreamOperator<OUT,?> |
minBy(String positionToMinBy)
Applies an aggregation that that gives the current element with the
minimum value at the given position, if more elements have the minimum
value at the given position, the operator returns the first one by
default.
|
SingleOutputStreamOperator<OUT,?> |
minBy(String field,
boolean first)
Applies an aggregation that that gives the current minimum element of the
pojo data stream by the given field expression.
|
DataStream<OUT> |
partitionByHash(int... fields)
Sets the partitioning of the
DataStream so that the output is
partitioned hashing on the given fields. |
DataStream<OUT> |
partitionByHash(org.apache.flink.api.java.functions.KeySelector<OUT,?> keySelector)
Sets the partitioning of the
DataStream so that the output is
partitioned using the given KeySelector. |
DataStream<OUT> |
partitionByHash(String... fields)
Sets the partitioning of the
DataStream so that the output is
partitioned hashing on the given fields. |
DataStreamSink<OUT> |
print()
Writes a DataStream to the standard output stream (stdout).
For each element of the DataStream the result of Object.toString() is written. |
DataStreamSink<OUT> |
printToErr()
Writes a DataStream to the standard output stream (stderr).
For each element of the DataStream the result of Object.toString() is written. |
<R extends org.apache.flink.api.java.tuple.Tuple> |
project(int... fieldIndexes)
Initiates a Project transformation on a
Tuple DataStream.Note: Only Tuple DataStreams can be projected. The transformation projects each Tuple of the DataSet onto a (sub)set of fields. |
DataStream<OUT> |
rebalance()
Sets the partitioning of the
DataStream so that the output tuples
are distributed evenly to instances of the next component in a Round-robin
fashion. |
SingleOutputStreamOperator<OUT,?> |
reduce(org.apache.flink.api.common.functions.ReduceFunction<OUT> reducer)
Applies a reduce transformation on the data stream.
|
protected DataStream<OUT> |
setConnectionType(StreamPartitioner<OUT> partitioner)
Internal function for setting the partitioner for the DataStream
|
DataStream<OUT> |
shuffle()
Sets the partitioning of the
DataStream so that the output tuples
are shuffled uniformly randomly to the next component. |
SplitDataStream<OUT> |
split(OutputSelector<OUT> outputSelector)
Operator used for directing tuples to specific named outputs using an
OutputSelector. |
SingleOutputStreamOperator<OUT,?> |
sum(int positionToSum)
Applies an aggregation that sums the data stream at the given position.
|
SingleOutputStreamOperator<OUT,?> |
sum(String field)
Applies an aggregation that that gives the current sum of the pojo data
stream at the given field expression.
|
<R> SingleOutputStreamOperator<R,?> |
transform(String operatorName,
org.apache.flink.api.common.typeinfo.TypeInformation<R> outTypeInfo,
OneInputStreamOperator<OUT,R> operator)
Method for passing user defined operators along with the type
information that will transform the DataStream.
|
DataStream<OUT> |
union(DataStream<OUT>... streams)
Creates a new
DataStream by merging DataStream outputs of
the same type with each other. |
WindowedDataStream<OUT> |
window(TriggerPolicy<OUT> trigger,
EvictionPolicy<OUT> eviction)
|
WindowedDataStream<OUT> |
window(WindowingHelper policyHelper)
Create a
WindowedDataStream that can be used to apply
transformation like WindowedDataStream.reduceWindow(org.apache.flink.api.common.functions.ReduceFunction<OUT>),
WindowedDataStream.mapWindow(org.apache.flink.streaming.api.functions.WindowMapFunction<OUT, R>) or aggregations on preset
chunks(windows) of the data stream. |
DataStreamSink<OUT> |
write(org.apache.flink.api.common.io.OutputFormat<OUT> format,
long millis)
Writes the dataStream into an output, described by an OutputFormat.
|
<X extends org.apache.flink.api.java.tuple.Tuple> |
writeAsCsv(String path)
Writes a DataStream to the file specified by path in csv format.
|
<X extends org.apache.flink.api.java.tuple.Tuple> |
writeAsCsv(String path,
org.apache.flink.core.fs.FileSystem.WriteMode writeMode)
Writes a DataStream to the file specified by path in csv format.
|
<X extends org.apache.flink.api.java.tuple.Tuple> |
writeAsCsv(String path,
org.apache.flink.core.fs.FileSystem.WriteMode writeMode,
long millis)
Writes a DataStream to the file specified by path in csv format.
|
<X extends org.apache.flink.api.java.tuple.Tuple> |
writeAsCsv(String path,
long millis)
Writes a DataStream to the file specified by path in csv format.
|
DataStreamSink<OUT> |
writeAsText(String path)
Writes a DataStream to the file specified by path in text format.
|
DataStreamSink<OUT> |
writeAsText(String path,
org.apache.flink.core.fs.FileSystem.WriteMode writeMode)
Writes a DataStream to the file specified by path in text format.
|
DataStreamSink<OUT> |
writeAsText(String path,
org.apache.flink.core.fs.FileSystem.WriteMode writeMode,
long millis)
Writes a DataStream to the file specified by path in text format.
|
DataStreamSink<OUT> |
writeAsText(String path,
long millis)
Writes a DataStream to the file specified by path in text format.
|
DataStreamSink<OUT> |
writeToSocket(String hostName,
int port,
SerializationSchema<OUT,byte[]> schema)
Writes the DataStream to a socket as a byte array.
|
protected static Integer counter
protected final StreamExecutionEnvironment environment
protected final Integer id
protected int parallelism
protected StreamPartitioner<OUT> partitioner
protected org.apache.flink.api.common.typeinfo.TypeInformation typeInfo
protected List<DataStream<OUT>> unionizedStreams
protected Integer iterationID
protected Long iterationWaitTime
protected final StreamGraph streamGraph
public DataStream(StreamExecutionEnvironment environment, org.apache.flink.api.common.typeinfo.TypeInformation<OUT> typeInfo)
DataStream in the given execution environment with
partitioning set to forward by default.environment - StreamExecutionEnvironmenttypeInfo - Type of the datastreampublic DataStream(DataStream<OUT> dataStream)
dataStream - The DataStream that will be copied.public Integer getId()
DataStream in the current StreamExecutionEnvironment.public int getParallelism()
public org.apache.flink.api.common.typeinfo.TypeInformation<OUT> getType()
protected void fillInType(org.apache.flink.api.common.typeinfo.TypeInformation<OUT> typeInfo)
typeInfo - The type information to fill in.IllegalStateException - Thrown, if the type information has been accessed before.protected <F> F clean(F f)
public StreamExecutionEnvironment getExecutionEnvironment()
protected org.apache.flink.api.common.ExecutionConfig getExecutionConfig()
public DataStream<OUT> union(DataStream<OUT>... streams)
DataStream by merging DataStream outputs of
the same type with each other. The DataStreams merged using this operator
will be transformed simultaneously.streams - The DataStreams to union output with.DataStream.public SplitDataStream<OUT> split(OutputSelector<OUT> outputSelector)
OutputSelector.
Calling this method on an operator creates a new SplitDataStream.outputSelector - The user defined
OutputSelector
for directing the tuples.SplitDataStreampublic <R> ConnectedDataStream<OUT,R> connect(DataStream<R> dataStream)
ConnectedDataStream by connecting
DataStream outputs of (possible) different types with each other.
The DataStreams connected using this operator can be used with
CoFunctions to apply joint transformations.dataStream - The DataStream with which this stream will be connected.ConnectedDataStream.public GroupedDataStream<OUT> groupBy(int... fields)
DataStream by the given key positions to
be used with grouped operators like
GroupedDataStream.reduce(ReduceFunction) This operator also
affects the partitioning of the stream, by forcing values with the same
key to go to the same processing instance.fields - The position of the fields on which the DataStream
will be grouped.DataStreampublic GroupedDataStream<OUT> groupBy(String... fields)
DataStream using field expressions. A field expression
is either the name of a public field or a getter method with parentheses
of the DataStreamS underlying type. A dot can be used to drill
down into objects, as in "field1.getInnerField2()" . This method
returns an GroupedDataStream. This operator also affects the
partitioning of the stream, by forcing values with the same key to go to
the same processing instance.fields - One or more field expressions on which the DataStream will be
grouped.DataStreampublic GroupedDataStream<OUT> groupBy(org.apache.flink.api.java.functions.KeySelector<OUT,?> keySelector)
DataStream by the key extracted by the
KeySelector to be used with grouped operators like
GroupedDataStream.reduce(ReduceFunction).
This operator also affects the partitioning of the stream, by forcing
values with the same key to go to the same processing instance.keySelector - The KeySelector that will be used to extract keys for
the valuesDataStreampublic DataStream<OUT> partitionByHash(int... fields)
DataStream so that the output is
partitioned hashing on the given fields. This setting only
effects the how the outputs will be distributed between the parallel
instances of the next processing operator.fields - The tuple fields that should be used for partitioningpublic DataStream<OUT> partitionByHash(String... fields)
DataStream so that the output is
partitioned hashing on the given fields. This setting only
effects the how the outputs will be distributed between the parallel
instances of the next processing operator.fields - The tuple fields that should be used for partitioningpublic DataStream<OUT> partitionByHash(org.apache.flink.api.java.functions.KeySelector<OUT,?> keySelector)
DataStream so that the output is
partitioned using the given KeySelector. This setting only
effects the how the outputs will be distributed between the parallel
instances of the next processing operator.keySelector - public DataStream<OUT> broadcast()
DataStream so that the output tuples
are broadcasted to every parallel instance of the next component.
This setting only effects the how the outputs will be distributed between the parallel instances of the next processing operator.
public DataStream<OUT> shuffle()
DataStream so that the output tuples
are shuffled uniformly randomly to the next component.
This setting only effects the how the outputs will be distributed between the parallel instances of the next processing operator.
public DataStream<OUT> forward()
DataStream so that the output tuples
are forwarded to the local subtask of the next component (whenever
possible).
This setting only effects the how the outputs will be distributed between the parallel instances of the next processing operator.
public DataStream<OUT> rebalance()
DataStream so that the output tuples
are distributed evenly to instances of the next component in a Round-robin
fashion.
This setting only effects the how the outputs will be distributed between the parallel instances of the next processing operator.
public DataStream<OUT> global()
DataStream so that the output values
all go to the first instance of the next processing operator. Use this
setting with care since it might cause a serious performance bottleneck
in the application.public IterativeDataStream<OUT> iterate()
IterativeDataStream.closeWith(DataStream). The transformation of
this IterativeDataStream will be the iteration head. The data stream
given to the IterativeDataStream.closeWith(DataStream) method is
the data stream that will be fed back and used as the input for the
iteration head. A common usage pattern for streaming iterations is to use
output splitting to send a part of the closing data stream to the head.
Refer to split(OutputSelector) for more information.
The iteration edge will be partitioned the same way as the first input of the iteration head.
By default a DataStream with iteration will never terminate, but the user can use the maxWaitTime parameter to set a max waiting time for the iteration head. If no data received in the set time, the stream terminates.
public IterativeDataStream<OUT> iterate(long maxWaitTimeMillis)
IterativeDataStream.closeWith(DataStream). The transformation of
this IterativeDataStream will be the iteration head. The data stream
given to the IterativeDataStream.closeWith(DataStream) method is
the data stream that will be fed back and used as the input for the
iteration head. A common usage pattern for streaming iterations is to use
output splitting to send a part of the closing data stream to the head.
Refer to split(OutputSelector) for more information.
The iteration edge will be partitioned the same way as the first input of the iteration head.
By default a DataStream with iteration will never terminate, but the user can use the maxWaitTime parameter to set a max waiting time for the iteration head. If no data received in the set time, the stream terminates.
maxWaitTimeMillis - Number of milliseconds to wait between inputs before shutting
downpublic <R> SingleOutputStreamOperator<R,?> map(org.apache.flink.api.common.functions.MapFunction<OUT,R> mapper)
DataStream. The transformation
calls a MapFunction for each element of the DataStream. Each
MapFunction call returns exactly one element. The user can also extend
RichMapFunction to gain access to other features provided by the
RichFunction interface.R - output typemapper - The MapFunction that is called for each element of the
DataStream.DataStream.public <R> SingleOutputStreamOperator<R,?> flatMap(org.apache.flink.api.common.functions.FlatMapFunction<OUT,R> flatMapper)
DataStream. The
transformation calls a FlatMapFunction for each element of the
DataStream. Each FlatMapFunction call can return any number of elements
including none. The user can also extend RichFlatMapFunction to
gain access to other features provided by the
RichFunction interface.R - output typeflatMapper - The FlatMapFunction that is called for each element of the
DataStreamDataStream.public SingleOutputStreamOperator<OUT,?> reduce(org.apache.flink.api.common.functions.ReduceFunction<OUT> reducer)
RichReduceFunction to gain access to
other features provided by the
RichFunction interface.reducer - The ReduceFunction that will be called for every
element of the input values.public <R> SingleOutputStreamOperator<R,?> fold(R initialValue, org.apache.flink.api.common.functions.FoldFunction<OUT,R> folder)
RichFoldFunction to gain access to other
features provided by the
RichFunction interfacefolder - The FoldFunction that will be called for every element
of the input values.public SingleOutputStreamOperator<OUT,?> filter(org.apache.flink.api.common.functions.FilterFunction<OUT> filter)
DataStream. The
transformation calls a FilterFunction for each element of the
DataStream and retains only those element for which the function returns
true. Elements for which the function returns false are filtered. The
user can also extend RichFilterFunction to gain access to other
features provided by the
RichFunction interface.filter - The FilterFunction that is called for each element of the
DataStream.public <R extends org.apache.flink.api.java.tuple.Tuple> SingleOutputStreamOperator<R,?> project(int... fieldIndexes)
Tuple DataStream.fieldIndexes - The field indexes of the input tuples that are retained. The
order of fields in the output tuple corresponds to the order
of field indexes.Tuple,
DataStreampublic <IN2> StreamCrossOperator<OUT,IN2> cross(DataStream<IN2> dataStreamToCross)
DataStreams
into one DataStream over a specified time window. It builds all pair
combinations of elements of both DataStreams, i.e., it builds a Cartesian
product.
This method returns a StreamCrossOperator on which the
TemporalOperator.onWindow(long, java.util.concurrent.TimeUnit) should be called to define the
window.
Call StreamCrossOperator.CrossWindow#with(org.apache.flink.api.common.functions.CrossFunction)
to define a custom cross function.
dataStreamToCross - The other DataStream with which this DataStream is crossed.StreamCrossOperator to continue the definition of the
cross transformation.public <IN2> StreamJoinOperator<OUT,IN2> join(DataStream<IN2> dataStreamToJoin)
DataStreams on key equality over a specified time window.
This method returns a StreamJoinOperator on which the
TemporalOperator.onWindow(long, java.util.concurrent.TimeUnit)
should be called to define the window, and then the
StreamJoinOperator.JoinWindow#where(int...) and
StreamJoinOperator.JoinPredicate#equalTo(int...) can be used to define
the join keys.
The user can also use the
StreamJoinOperator.JoinedStream#with(org.apache.flink.api.common.functions.JoinFunction)
to apply a custom join function.
dataStreamToJoin - The other DataStream with which this DataStream is joined.StreamJoinOperator to continue the definition of the
Join transformation.public SingleOutputStreamOperator<OUT,?> sum(int positionToSum)
positionToSum - The position in the data point to sumpublic SingleOutputStreamOperator<OUT,?> sum(String field)
DataStreamS underlying type. A dot can be used to drill down into
objects, as in "field1.getInnerField2()" .field - The field expression based on which the aggregation will be
applied.public SingleOutputStreamOperator<OUT,?> min(int positionToMin)
positionToMin - The position in the data point to minimizepublic SingleOutputStreamOperator<OUT,?> min(String field)
DataStreamS underlying type. A dot can be used to drill down into
objects, as in "field1.getInnerField2()" .field - The field expression based on which the aggregation will be
applied.public SingleOutputStreamOperator<OUT,?> max(int positionToMax)
positionToMax - The position in the data point to maximizepublic SingleOutputStreamOperator<OUT,?> max(String field)
DataStreamS underlying type. A dot can be used to drill down into
objects, as in "field1.getInnerField2()" .field - The field expression based on which the aggregation will be
applied.public SingleOutputStreamOperator<OUT,?> minBy(String field, boolean first)
DataStreamS underlying type. A dot can be used to drill down
into objects, as in "field1.getInnerField2()" .field - The field expression based on which the aggregation will be
applied.first - If True then in case of field equality the first object will
be returnedpublic SingleOutputStreamOperator<OUT,?> maxBy(String field, boolean first)
DataStreamS underlying type. A dot can be used to drill down
into objects, as in "field1.getInnerField2()" .field - The field expression based on which the aggregation will be
applied.first - If True then in case of field equality the first object will
be returnedpublic SingleOutputStreamOperator<OUT,?> minBy(int positionToMinBy)
positionToMinBy - The position in the data point to minimizepublic SingleOutputStreamOperator<OUT,?> minBy(String positionToMinBy)
positionToMinBy - The position in the data point to minimizepublic SingleOutputStreamOperator<OUT,?> minBy(int positionToMinBy, boolean first)
positionToMinBy - The position in the data point to minimizefirst - If true, then the operator return the first element with the
minimal value, otherwise returns the lastpublic SingleOutputStreamOperator<OUT,?> maxBy(int positionToMaxBy)
positionToMaxBy - The position in the data point to maximizepublic SingleOutputStreamOperator<OUT,?> maxBy(String positionToMaxBy)
positionToMaxBy - The position in the data point to maximizepublic SingleOutputStreamOperator<OUT,?> maxBy(int positionToMaxBy, boolean first)
positionToMaxBy - The position in the data point to maximize.first - If true, then the operator return the first element with the
maximum value, otherwise returns the lastpublic SingleOutputStreamOperator<Long,?> count()
public WindowedDataStream<OUT> window(WindowingHelper policyHelper)
WindowedDataStream that can be used to apply
transformation like WindowedDataStream.reduceWindow(org.apache.flink.api.common.functions.ReduceFunction<OUT>),
WindowedDataStream.mapWindow(org.apache.flink.streaming.api.functions.WindowMapFunction<OUT, R>) or aggregations on preset
chunks(windows) of the data stream. To define windows a
WindowingHelper such as Time, Count,
Delta and FullStream can be used. When applied
to a grouped data stream, the windows (evictions) and slide sizes
(triggers) will be computed on a per group basis. For more
advanced control over the trigger and eviction policies please refer to
window(TriggerPolicy, EvictionPolicy) For example to create a
sum every 5 seconds in a tumbling fashion:
ds.window(Time.of(5, TimeUnit.SECONDS)).sum(field) To
create sliding windows use the
WindowedDataStream.every(WindowingHelper) The same
example with 3 second slides:
ds.window(Time.of(5, TimeUnit.SECONDS)).every(Time.of(3,
TimeUnit.SECONDS)).sum(field)policyHelper - Any WindowingHelper such as Time,
Count, Delta FullStream to define the
window size.WindowedDataStream providing further operations.public WindowedDataStream<OUT> window(TriggerPolicy<OUT> trigger, EvictionPolicy<OUT> eviction)
WindowedDataStream using the given TriggerPolicy
and EvictionPolicy. Windowing can be used to apply transformation
like WindowedDataStream.reduceWindow(org.apache.flink.api.common.functions.ReduceFunction<OUT>),
WindowedDataStream.mapWindow(org.apache.flink.streaming.api.functions.WindowMapFunction<OUT, R>) or aggregations on preset
chunks(windows) of the data stream.For most common use-cases
please refer to window(WindowingHelper)trigger - The TriggerPolicy that will determine how often the
user function is called on the window.eviction - The EvictionPolicy that will determine the number of
elements in each time window.WindowedDataStream providing further operations.public WindowedDataStream<OUT> every(WindowingHelper policyHelper)
WindowedDataStream on the full stream history, to
produce periodic aggregates.WindowedDataStream providing further operations.public DataStreamSink<OUT> print()
Object.toString() is written.public DataStreamSink<OUT> printToErr()
Object.toString() is written.public DataStreamSink<OUT> writeAsText(String path)
Object.toString()
is written.path - the path pointing to the location the text file is written topublic DataStreamSink<OUT> writeAsText(String path, long millis)
Object.toString()
is written.path - the path pointing to the location the text file is written tomillis - the file update frequencypublic DataStreamSink<OUT> writeAsText(String path, org.apache.flink.core.fs.FileSystem.WriteMode writeMode)
Object.toString()
is written.path - the path pointing to the location the text file is written towriteMode - Control the behavior for existing files. Options are
NO_OVERWRITE and OVERWRITE.public DataStreamSink<OUT> writeAsText(String path, org.apache.flink.core.fs.FileSystem.WriteMode writeMode, long millis)
Object.toString()
is written.path - the path pointing to the location the text file is written towriteMode - Controls the behavior for existing files. Options are
NO_OVERWRITE and OVERWRITE.millis - the file update frequencypublic <X extends org.apache.flink.api.java.tuple.Tuple> DataStreamSink<OUT> writeAsCsv(String path)
Object.toString()
is written. This method can only be used on data streams of tuples.path - the path pointing to the location the text file is written topublic <X extends org.apache.flink.api.java.tuple.Tuple> DataStreamSink<OUT> writeAsCsv(String path, long millis)
Object.toString()
is written. This method can only be used on data streams of tuples.path - the path pointing to the location the text file is written tomillis - the file update frequencypublic <X extends org.apache.flink.api.java.tuple.Tuple> DataStreamSink<OUT> writeAsCsv(String path, org.apache.flink.core.fs.FileSystem.WriteMode writeMode)
Object.toString()
is written. This method can only be used on data streams of tuples.path - the path pointing to the location the text file is written towriteMode - Controls the behavior for existing files. Options are
NO_OVERWRITE and OVERWRITE.public <X extends org.apache.flink.api.java.tuple.Tuple> DataStreamSink<OUT> writeAsCsv(String path, org.apache.flink.core.fs.FileSystem.WriteMode writeMode, long millis)
Object.toString()
is written. This method can only be used on data streams of tuples.path - the path pointing to the location the text file is written towriteMode - Controls the behavior for existing files. Options are
NO_OVERWRITE and OVERWRITE.millis - the file update frequencypublic DataStreamSink<OUT> writeToSocket(String hostName, int port, SerializationSchema<OUT,byte[]> schema)
SerializationSchema.hostName - host of the socketport - port of the socketschema - schema for serializationpublic DataStreamSink<OUT> write(org.apache.flink.api.common.io.OutputFormat<OUT> format, long millis)
format - The output formatmillis - the write frequencyprotected SingleOutputStreamOperator<OUT,?> aggregate(AggregationFunction<OUT> aggregate)
public <R> SingleOutputStreamOperator<R,?> transform(String operatorName, org.apache.flink.api.common.typeinfo.TypeInformation<R> outTypeInfo, OneInputStreamOperator<OUT,R> operator)
R - type of the return streamoperatorName - name of the operator, for logging purposesoutTypeInfo - the output type of the operatoroperator - the object containing the transformation logicprotected DataStream<OUT> setConnectionType(StreamPartitioner<OUT> partitioner)
partitioner - Partitioner to set.protected <X> void connectGraph(DataStream<X> inputStream, Integer outputID, int typeNumber)
JobGraph of the job. Connects
the outputs of the given input stream to the specified output stream
given by the outputID.inputStream - input data streamoutputID - ID of the outputtypeNumber - Number of the type (used at co-functions)public DataStreamSink<OUT> addSink(SinkFunction<OUT> sinkFunction)
StreamExecutionEnvironment.execute()
method is called.sinkFunction - The object containing the sink's invoke function.protected Class<?> getClassAtPos(int pos)
pos - Position of the fieldprotected void checkFieldRange(int pos)
pos - Position to checkpublic DataStream<OUT> copy()
DataStreamCopyright © 2014–2015 The Apache Software Foundation. All rights reserved.