T - the type of the values being transcodedpublic interface Coder<T> extends Serializable
Coder<T> defines how to encode and decode values of type T into
byte streams.
Coder instances are serialized during job creation and deserialized
before use, via JSON serialization. See SerializableCoder for an example of a
Coder that adds a custom field to
the Coder serialization. It provides a constructor annotated with
JsonCreator, which is a factory method used when
deserializing a Coder instance.
Coder classes for compound types are often composed from coder classes for types
contains therein. The composition of Coder instances into a coder for the compound
class is the subject of the CoderFactory type, which enables automatic generic
composition of Coder classes within the CoderRegistry. With particular
static methods on a compound Coder class, a CoderFactory can be automatically
inferred. See KvCoder for an example of a simple compound Coder that supports
automatic composition in the CoderRegistry.
The binary format of a Coder is identified by getEncodingId(); be sure to
understand the requirements for evolving coder formats.
All methods of a Coder are required to be thread safe.
| Modifier and Type | Interface and Description |
|---|---|
static class |
Coder.Context
The context in which encoding or decoding is being done.
|
static class |
Coder.NonDeterministicException
Exception thrown by
verifyDeterministic() if the encoding is
not deterministic, including details of why the encoding is not deterministic. |
| Modifier and Type | Method and Description |
|---|---|
org.apache.beam.sdk.util.CloudObject |
asCloudObject()
Returns the
CloudObject that represents this Coder. |
boolean |
consistentWithEquals()
|
T |
decode(InputStream inStream,
Coder.Context context)
Decodes a value of type
T from the given input stream in
the given context. |
void |
encode(T value,
OutputStream outStream,
Coder.Context context)
Encodes the given value of type
T onto the given output stream
in the given context. |
Collection<String> |
getAllowedEncodings()
A collection of encodings supported by
decode(java.io.InputStream, org.apache.beam.sdk.coders.Coder.Context) in addition to the encoding
from getEncodingId() (which is assumed supported). |
List<? extends Coder<?>> |
getCoderArguments()
If this is a
Coder for a parameterized type, returns the
list of Coders being used for each of the parameters, or
returns null if this cannot be done or this is not a
parameterized type. |
String |
getEncodingId()
An identifier for the binary format written by
encode(T, java.io.OutputStream, org.apache.beam.sdk.coders.Coder.Context). |
boolean |
isRegisterByteSizeObserverCheap(T value,
Coder.Context context)
Returns whether
registerByteSizeObserver(T, org.apache.beam.sdk.util.common.ElementByteSizeObserver, org.apache.beam.sdk.coders.Coder.Context) cheap enough to
call for every element, that is, if this Coder can
calculate the byte size of the element to be coded in roughly
constant time (or lazily). |
void |
registerByteSizeObserver(T value,
org.apache.beam.sdk.util.common.ElementByteSizeObserver observer,
Coder.Context context)
Notifies the
ElementByteSizeObserver about the byte size
of the encoded value using this Coder. |
Object |
structuralValue(T value)
Returns an object with an
Object.equals() method that represents structural equality
on the argument. |
void |
verifyDeterministic()
Throw
Coder.NonDeterministicException if the coding is not deterministic. |
void encode(T value, OutputStream outStream, Coder.Context context) throws CoderException, IOException
T onto the given output stream
in the given context.IOException - if writing to the OutputStream fails
for some reasonCoderException - if the value could not be encoded for some reasonT decode(InputStream inStream, Coder.Context context) throws CoderException, IOException
T from the given input stream in
the given context. Returns the decoded value.IOException - if reading from the InputStream fails
for some reasonCoderException - if the value could not be decoded for some reasonList<? extends Coder<?>> getCoderArguments()
Coder for a parameterized type, returns the
list of Coders being used for each of the parameters, or
returns null if this cannot be done or this is not a
parameterized type.org.apache.beam.sdk.util.CloudObject asCloudObject()
CloudObject that represents this Coder.void verifyDeterministic()
throws Coder.NonDeterministicException
Coder.NonDeterministicException if the coding is not deterministic.
In order for a Coder to be considered deterministic,
the following must be true:
Object.equals()
or Comparable.compareTo(), if supported) have the same
encoding.
Coder always produces a canonical encoding, which is the
same for an instance of an object even if produced on different
computers at different times.
Coder.NonDeterministicException - if this coder is not deterministic.boolean consistentWithEquals()
true if this Coder is injective with respect to Object.equals(java.lang.Object).
Whenever the encoded bytes of two values are equal, then the original values are equal
according to Objects.equals(). Note that this is well-defined for null.
This condition is most notably false for arrays. More generally, this condition is false
whenever equals() compares object identity, rather than performing a
semantic/structural comparison.
Object structuralValue(T value) throws Exception
Object.equals() method that represents structural equality
on the argument.
For any two values x and y of type T, if their encoded bytes are the
same, then it must be the case that structuralValue(x).equals(@code structuralValue(y).
Most notably:
null should be a proper object with
an equals() method, even if the input value is null.
See also consistentWithEquals().
Exceptionboolean isRegisterByteSizeObserverCheap(T value, Coder.Context context)
registerByteSizeObserver(T, org.apache.beam.sdk.util.common.ElementByteSizeObserver, org.apache.beam.sdk.coders.Coder.Context) cheap enough to
call for every element, that is, if this Coder can
calculate the byte size of the element to be coded in roughly
constant time (or lazily).
Not intended to be called by user code, but instead by
PipelineRunner
implementations.
void registerByteSizeObserver(T value, org.apache.beam.sdk.util.common.ElementByteSizeObserver observer, Coder.Context context) throws Exception
ElementByteSizeObserver about the byte size
of the encoded value using this Coder.
Not intended to be called by user code, but instead by
PipelineRunner
implementations.
Exception@Experimental(value=CODER_ENCODING_ID) String getEncodingId()
encode(T, java.io.OutputStream, org.apache.beam.sdk.coders.Coder.Context).
This value, along with the fully qualified class name, forms an identifier for the binary format of this coder. Whenever this value changes, the new encoding is considered incompatible with the prior format: It is presumed that the prior version of the coder will be unable to correctly read the new format and the new version of the coder will be unable to correctly read the old format.
If the format is changed in a backwards-compatible way (the Coder can still accept data from
the prior format), such as by adding optional fields to a Protocol Buffer or Avro definition,
and you want Dataflow to understand that the new coder is compatible with the prior coder,
this value must remain unchanged. It is then the responsibility of decode(java.io.InputStream, org.apache.beam.sdk.coders.Coder.Context) to correctly
read data from the prior format.
@Experimental(value=CODER_ENCODING_ID) Collection<String> getAllowedEncodings()
decode(java.io.InputStream, org.apache.beam.sdk.coders.Coder.Context) in addition to the encoding
from getEncodingId() (which is assumed supported).
This information is not currently used for any purpose. It is descriptive only, and this method is subject to change.
getEncodingId()