T - the type of elements handled by this coderpublic class AvroCoder<T> extends StandardCoder<T>
Coder using Avro binary format.
Each instance of AvroCoder<T> encapsulates an Avro schema for objects of type
T.
The Avro schema may be provided explicitly via of(Class, Schema) or
omitted via of(Class), in which case it will be inferred
using Avro's ReflectData.
For complete details about schema generation and how it can be controlled please see
the org.apache.avro.reflect package.
Only concrete classes with a no-argument constructor can be mapped to Avro records.
All inherited fields that are not static or transient are included. Fields are not permitted to
be null unless annotated by Nullable or a Union schema
containing "null".
To use, specify the Coder type on a PCollection:
PCollection<MyCustomElement> records =
input.apply(...)
.setCoder(AvroCoder.of(MyCustomElement.class);
or annotate the element class using @DefaultCoder.
@DefaultCoder(AvroCoder.class)
public class MyCustomElement {
...
}
The implementation attempts to determine if the Avro encoding of the given type will satisfy
the criteria of Coder.verifyDeterministic() by inspecting both the type and the
Schema provided or generated by Avro. Only coders that are deterministic can be used in
GroupByKey operations.
Coder.Context, Coder.NonDeterministicException| Modifier and Type | Field and Description |
|---|---|
static CoderProvider |
PROVIDER |
| Modifier | Constructor and Description |
|---|---|
protected |
AvroCoder(Class<T> type,
Schema schema) |
| Modifier and Type | Method and Description |
|---|---|
org.apache.beam.sdk.util.CloudObject |
asCloudObject()
Returns the
CloudObject that represents this Coder. |
DatumReader<T> |
createDatumReader()
Deprecated.
For
AvroCoder internal use only. |
DatumWriter<T> |
createDatumWriter()
Deprecated.
For
AvroCoder internal use only. |
T |
decode(InputStream inStream,
Coder.Context context)
Decodes a value of type
T from the given input stream in
the given context. |
void |
encode(T value,
OutputStream outStream,
Coder.Context context)
Encodes the given value of type
T onto the given output stream
in the given context. |
List<? extends Coder<?>> |
getCoderArguments()
If this is a
Coder for a parameterized type, returns the
list of Coders being used for each of the parameters, or
returns null if this cannot be done or this is not a
parameterized type. |
String |
getEncodingId()
The encoding identifier is designed to support evolution as per the design of Avro
In order to use this class effectively, carefully read the Avro
documentation at
Schema Resolution
to ensure that the old and new schema match.
|
Schema |
getSchema()
Returns the schema used by this coder.
|
Class<T> |
getType()
Returns the type this coder encodes/decodes.
|
static <T> AvroCoder<T> |
of(Class<T> clazz)
Returns an
AvroCoder instance for the provided element class. |
static <T> AvroCoder<T> |
of(Class<T> type,
Schema schema)
Returns an
AvroCoder instance for the provided element type
using the provided Avro schema. |
static AvroCoder<GenericRecord> |
of(Schema schema)
Returns an
AvroCoder instance for the Avro schema. |
static AvroCoder<?> |
of(String classType,
String schema) |
static <T> AvroCoder<T> |
of(TypeDescriptor<T> type)
Returns an
AvroCoder instance for the provided element type. |
void |
verifyDeterministic()
Throw
Coder.NonDeterministicException if the coding is not deterministic. |
consistentWithEquals, equals, getAllowedEncodings, getComponents, getEncodedElementByteSize, hashCode, isRegisterByteSizeObserverCheap, registerByteSizeObserver, structuralValue, toString, verifyDeterministic, verifyDeterministicpublic static final CoderProvider PROVIDER
public static <T> AvroCoder<T> of(TypeDescriptor<T> type)
AvroCoder instance for the provided element type.T - the element typepublic static <T> AvroCoder<T> of(Class<T> clazz)
AvroCoder instance for the provided element class.T - the element typepublic static AvroCoder<GenericRecord> of(Schema schema)
AvroCoder instance for the Avro schema. The implicit
type is GenericRecord.public static <T> AvroCoder<T> of(Class<T> type, Schema schema)
AvroCoder instance for the provided element type
using the provided Avro schema.
If the type argument is GenericRecord, the schema may be arbitrary. Otherwise, the schema must correspond to the type provided.
T - the element typepublic static AvroCoder<?> of(String classType, String schema) throws ClassNotFoundException
ClassNotFoundExceptionpublic String getEncodingId()
In particular, this encoding identifier is guaranteed to be the same for AvroCoder
instances of the same principal class, and otherwise distinct. The schema is not included
in the identifier.
When modifying a class to be encoded as Avro, here are some guidelines; see the above link for greater detail.
required field.
optional fields, with sensible defaults.
Code consuming this message class should be prepared to support all versions of the class until it is certain that no remaining serialized instances exist.
If backwards incompatible changes must be made, the best recourse is to change the name of your class.
getEncodingId in interface Coder<T>getEncodingId in class StandardCoder<T>public void encode(T value, OutputStream outStream, Coder.Context context) throws IOException
CoderT onto the given output stream
in the given context.IOException - if writing to the OutputStream fails
for some reasonCoderException - if the value could not be encoded for some reasonpublic T decode(InputStream inStream, Coder.Context context) throws IOException
CoderT from the given input stream in
the given context. Returns the decoded value.IOException - if reading from the InputStream fails
for some reasonCoderException - if the value could not be decoded for some reasonpublic List<? extends Coder<?>> getCoderArguments()
CoderCoder for a parameterized type, returns the
list of Coders being used for each of the parameters, or
returns null if this cannot be done or this is not a
parameterized type.public org.apache.beam.sdk.util.CloudObject asCloudObject()
CoderCloudObject that represents this Coder.asCloudObject in interface Coder<T>asCloudObject in class StandardCoder<T>public void verifyDeterministic()
throws Coder.NonDeterministicException
CoderCoder.NonDeterministicException if the coding is not deterministic.
In order for a Coder to be considered deterministic,
the following must be true:
Object.equals()
or Comparable.compareTo(), if supported) have the same
encoding.
Coder always produces a canonical encoding, which is the
same for an instance of an object even if produced on different
computers at different times.
NonDeterministicException - when the type may not be deterministically
encoded using the given Schema, the directBinaryEncoder, and the
ReflectDatumWriter or GenericDatumWriter.Coder.NonDeterministicException - if this coder is not deterministic.@Deprecated public DatumReader<T> createDatumReader()
AvroCoder internal use only.DatumReader that can be used to read from an Avro file directly. Assumes
the schema used to read is the same as the schema that was used when writing.@Deprecated public DatumWriter<T> createDatumWriter()
AvroCoder internal use only.DatumWriter that can be used to write to an Avro file directly.public Schema getSchema()