IN - The data type of the input data set.OUT - The data type of the returned data set.public abstract class SingleInputUdfOperator<IN,OUT,O extends SingleInputUdfOperator<IN,OUT,O>> extends SingleInputOperator<IN,OUT,O> implements UdfOperator<O>
RichMapFunction or
RichReduceFunction).
This class encapsulates utilities for the UDFs, such as broadcast variables, parameterization through configuration objects, and semantic properties.
name, parallelism| Modifier | Constructor and Description |
|---|---|
protected |
SingleInputUdfOperator(DataSet<IN> input,
org.apache.flink.api.common.typeinfo.TypeInformation<OUT> resultType)
Creates a new operators with the given data set as input.
|
| Modifier and Type | Method and Description |
|---|---|
protected org.apache.flink.api.common.operators.SingleInputSemanticProperties |
extractSemanticAnnotations(Class<?> udfClass) |
protected boolean |
getAnalyzedUdfSemanticsFlag() |
Map<String,DataSet<?>> |
getBroadcastSets()
Gets the broadcast sets (name and data set) that have been added to context of the UDF.
|
protected abstract org.apache.flink.api.common.functions.Function |
getFunction() |
org.apache.flink.configuration.Configuration |
getParameters()
Gets the configuration parameters that will be passed to the UDF's open method
AbstractRichFunction.open(Configuration). |
org.apache.flink.api.common.operators.SingleInputSemanticProperties |
getSemanticProperties()
Gets the semantic properties that have been set for the user-defined functions (UDF).
|
O |
returns(Class<OUT> typeClass)
Adds a type information hint about the return type of this operator.
|
O |
returns(String typeInfoString)
Adds a type information hint about the return type of this operator.
|
O |
returns(org.apache.flink.api.common.typeinfo.TypeInformation<OUT> typeInfo)
Adds a type information hint about the return type of this operator.
|
protected void |
setAnalyzedUdfSemanticsFlag() |
void |
setSemanticProperties(org.apache.flink.api.common.operators.SingleInputSemanticProperties properties)
Sets the semantic properties for the user-defined function (UDF).
|
protected boolean |
udfWithForwardedFieldsAnnotation(Class<?> udfClass) |
O |
withBroadcastSet(DataSet<?> data,
String name)
Adds a certain data set as a broadcast set to this operator.
|
O |
withForwardedFields(String... forwardedFields)
Adds semantic information about forwarded fields of the user-defined function.
|
O |
withParameters(org.apache.flink.configuration.Configuration parameters)
Sets the configuration parameters for the UDF.
|
getInput, getInputType, translateToDataFlowgetName, getParallelism, getResultType, name, setParallelismaggregate, checkSameExecutionContext, clean, coGroup, collect, combineGroup, count, cross, crossWithHuge, crossWithTiny, distinct, distinct, distinct, distinct, fillInType, filter, first, flatMap, getExecutionEnvironment, getType, groupBy, groupBy, groupBy, iterate, iterateDelta, join, join, joinWithHuge, joinWithTiny, map, mapPartition, max, maxBy, min, minBy, output, partitionByHash, partitionByHash, partitionByHash, partitionCustom, partitionCustom, partitionCustom, print, print, printOnTaskManager, printToErr, printToErr, project, rebalance, reduce, reduceGroup, runOperation, sortPartition, sortPartition, sum, union, write, write, writeAsCsv, writeAsCsv, writeAsCsv, writeAsCsv, writeAsFormattedText, writeAsFormattedText, writeAsText, writeAsTextprotected SingleInputUdfOperator(DataSet<IN> input, org.apache.flink.api.common.typeinfo.TypeInformation<OUT> resultType)
input - The data set that is the input to the operator.resultType - The type of the elements in the resulting data set.protected abstract org.apache.flink.api.common.functions.Function getFunction()
public O withParameters(org.apache.flink.configuration.Configuration parameters)
UdfOperatorAbstractRichFunction.open(Configuration) method.withParameters in interface UdfOperator<O extends SingleInputUdfOperator<IN,OUT,O>>parameters - The configuration parameters for the UDF.public O withBroadcastSet(DataSet<?> data, String name)
UdfOperatorRuntimeContext.getBroadcastVariable(String).
The runtime context itself is available in all UDFs via
AbstractRichFunction.getRuntimeContext().withBroadcastSet in interface UdfOperator<O extends SingleInputUdfOperator<IN,OUT,O>>data - The data set to be broadcasted.name - The name under which the broadcast data set retrieved.public O withForwardedFields(String... forwardedFields)
Adds semantic information about forwarded fields of the user-defined function. The forwarded fields information declares fields which are never modified by the function and which are forwarded at the same position to the output or unchanged copied to another position in the output.
Fields that are forwarded at the same position are specified by their position.
The specified position must be valid for the input and output data type and have the same type.
For example withForwardedFields("f2") declares that the third field of a Java input tuple is
copied to the third field of an output tuple.
Fields which are unchanged copied to another position in the output are declared by specifying the
source field reference in the input and the target field reference in the output.
withForwardedFields("f0->f2") denotes that the first field of the Java input tuple is
unchanged copied to the third field of the Java output tuple. When using a wildcard ("*") ensure that
the number of declared fields and their types in input and output type match.
Multiple forwarded fields can be annotated in one (withForwardedFields("f2; f3->f0; f4"))
or separate Strings (withForwardedFields("f2", "f3->f0", "f4")).
Please refer to the JavaDoc of Function or Flink's documentation for
details on field references such as nested fields and wildcard.
It is not possible to override existing semantic information about forwarded fields which was
for example added by a FunctionAnnotation.ForwardedFields class annotation.
NOTE: Adding semantic information for functions is optional! If used correctly, semantic information can help the Flink optimizer to generate more efficient execution plans. However, incorrect semantic information can cause the optimizer to generate incorrect execution plans which compute wrong results! So be careful when adding semantic information.
forwardedFields - A list of field forward expressions.FunctionAnnotation,
FunctionAnnotation.ForwardedFieldspublic O returns(String typeInfoString)
Type hints are important in cases where the Java compiler throws away generic type information necessary for efficient execution.
This method takes a type information string that will be parsed. A type information string can contain the following types:
Integer, String, etc.
Integer[],
String[], etc.
Tuple1<TYPE0>,
Tuple2<TYPE0, TYPE1>, etc.org.my.MyPojo<myFieldName=TYPE0,myFieldName2=TYPE1>, etc.java.lang.Class, etc.
org.my.CustomClass[],
org.my.CustomClass$StaticInnerClass[], etc.
DoubleValue,
StringValue, IntegerValue, etc.Tuple2<TYPE0,TYPE1>[], etc.Writable<org.my.CustomWritable>Enum<org.my.CustomEnum>"Tuple2<String,Tuple2<Integer,org.my.MyJob$Pojo<word=String>>>"typeInfoString - type information string to be parsedpublic O returns(org.apache.flink.api.common.typeinfo.TypeInformation<OUT> typeInfo)
Type hints are important in cases where the Java compiler throws away generic type information necessary for efficient execution.
This method takes an instance of TypeInformation such as:
BasicTypeInfoBasicArrayTypeInfoTupleTypeInfoPojoTypeInfoWritableTypeInfoValueTypeInfotypeInfo - type information as a return type hintpublic O returns(Class<OUT> typeClass)
Type hints are important in cases where the Java compiler throws away generic type information necessary for efficient execution.
This method takes a class that will be analyzed by Flink's type extraction capabilities.
Examples for classes are:
Integer.class, String.class, etc.MyPojo.classTuple1.class,Tuple2.class, etc. are not sufficient.String[].class, etc.typeClass - class as a return type hintpublic Map<String,DataSet<?>> getBroadcastSets()
UdfOperatorUdfOperator.withBroadcastSet(DataSet, String).getBroadcastSets in interface UdfOperator<O extends SingleInputUdfOperator<IN,OUT,O>>public org.apache.flink.configuration.Configuration getParameters()
UdfOperatorAbstractRichFunction.open(Configuration).
The configuration is set via the UdfOperator.withParameters(Configuration)
method.getParameters in interface UdfOperator<O extends SingleInputUdfOperator<IN,OUT,O>>public org.apache.flink.api.common.operators.SingleInputSemanticProperties getSemanticProperties()
UdfOperatorgetSemanticProperties in interface UdfOperator<O extends SingleInputUdfOperator<IN,OUT,O>>public void setSemanticProperties(org.apache.flink.api.common.operators.SingleInputSemanticProperties properties)
UdfOperator.getSemanticProperties().properties - The semantic properties for the UDF.UdfOperator.getSemanticProperties()protected boolean getAnalyzedUdfSemanticsFlag()
protected void setAnalyzedUdfSemanticsFlag()
protected org.apache.flink.api.common.operators.SingleInputSemanticProperties extractSemanticAnnotations(Class<?> udfClass)
protected boolean udfWithForwardedFieldsAnnotation(Class<?> udfClass)
Copyright © 2014–2015 The Apache Software Foundation. All rights reserved.