Class DynamoDBIO
- java.lang.Object
-
- org.apache.beam.sdk.io.aws2.dynamodb.DynamoDBIO
-
@Experimental(SOURCE_SINK) public final class DynamoDBIO extends java.lang.ObjectIO to read from and write to DynamoDB tables.Reading from DynamoDB
Example usage:
PCollection<List<Map<String, AttributeValue>>> output = pipeline.apply( DynamoDBIO.<List<Map<String, AttributeValue>>>read() .withScanRequestFn(in -> ScanRequest.builder().tableName(tableName).totalSegments(1).build()) .items()); // ScanResponse items mapperAt a minimum you have to provide:
- a
scanRequestFnproviding theScanRequestinstance;table nameandtotal segmentsare required. Note: Choosetotal segmentsaccording to the number of workers used. - a
scanResponseMapperFnto map theScanResponseto the expected output type, such asDynamoDBIO.Read.items().
Writing to DynamoDB
Example usage:
PCollection<T> data = ...; SerializableFunction<T, WriteRequest> requestBuilder = ...; data.apply( DynamoDBIO.<WriteRequest>write() .withWriteRequestMapperFn(t -> KV.of(tableName, requestBuilder.apply(t))));At a minimum you have to provide a
writeRequestMapperFnto map each element into aKVoftable nameandWriteRequest.Note: AWS does not allow writing duplicate keys within a single batch operation. If primary keys possibly repeat in your stream (i.e. an upsert stream), you may encounter a `ValidationError`. To address this you have to provide the key names corresponding to your primary key using
DynamoDBIO.Write.withDeduplicateKeys(List). Based on these keys only the last observed element is kept. Nevertheless, if no deduplication keys are provided, identical elements are still deduplicated.Configuration of AWS clients
AWS clients for all AWS IOs can be configured using
AwsOptions, e.g.--awsRegion=us-west-1.AwsOptionscontain reasonable defaults based on default providers forRegionandAwsCredentialsProvider.If you require more advanced configuration, you may change the
ClientBuilderFactoryusingAwsOptions.setClientBuilderFactory(Class).Configuration for a specific IO can be overwritten using
withClientConfiguration(), which also allows to configure the retry behavior for the respective IO.Retries
Retries for failed requests can be configured using
ClientConfiguration.Builder.retry(Consumer)and are handled by the AWS SDK unless there's a partial success (batch requests). The SDK uses a backoff strategy with equal jitter for computing the delay before the next retry.Note: Once retries are exhausted the error is surfaced to the runner which may then opt to retry the current partition in entirety or abort if the max number of retries of the runner is reached.
- a
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classDynamoDBIO.Read<T>Read data from DynamoDB usingDynamoDBIO.Read.getScanRequestFn()and emit an element of typeDynamoDBIO.Readfor eachScanResponseusing the mapping functionDynamoDBIO.Read.getScanResponseMapperFn().static classDynamoDBIO.RetryConfigurationDeprecated.UseRetryConfigurationinstead to delegate retries to the AWS SDK.static classDynamoDBIO.Write<T>Write a PCollectiondata into DynamoDB.
-
Constructor Summary
Constructors Constructor Description DynamoDBIO()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static <T> DynamoDBIO.Read<T>read()static <T> DynamoDBIO.Write<T>write()
-
-
-
Method Detail
-
read
public static <T> DynamoDBIO.Read<T> read()
-
write
public static <T> DynamoDBIO.Write<T> write()
-
-