package s3upload
Type Members
-
class
ExecutorServiceResultsHandler[V] extends Iterable[V]
Wrapper around an ExecutorService that allows you to easily submit
Callables, get results via iteration, and handle failure quickly.Wrapper around an ExecutorService that allows you to easily submit
Callables, get results via iteration, and handle failure quickly. When a submitted callable throws an exception in its thread this will result in aRuntimeExceptionwhen iterating over results. Typical usage is as follows:- Create an ExecutorService and pass it to the constructor.
- Create Callables and ensure that they respond to interruption, e.g. regularly call:
if (Thread.currentThread().isInterrupted()) { throw new RuntimeException("The thread was interrupted, likely indicating failure in a sibling thread.");} - Pass the callables to the
submit()method. - Call
finishedSubmitting(). - Iterate over this object (e.g. with a foreach loop) to get results from the callables.
Each iteration will block waiting for the next result.
If one of the callables throws an unhandled exception or the thread is interrupted during iteration
then
ExecutorService#shutdownNow()will be called resulting in all still running callables being interrupted, and aRuntimeExceptionwill be thrown
You can also call
abort()to shut down the threads yourself. -
class
MultiPartOutputStream extends OutputStream
An
OutputStreamwhich packages data written to it into discreteStreamParts which can be obtained in a separate thread via iteration and uploaded to S3.An
OutputStreamwhich packages data written to it into discreteStreamParts which can be obtained in a separate thread via iteration and uploaded to S3.A single
MultiPartOutputStreamis allocated a range of part numbers it can assign to theStreamParts it produces, which is determined at construction.Creating a
StreamPartis triggered whenMultiPartOutputStream#checkSize()is called and the stream holds enough data, so be sure to call this regularly when writing data. It's also essential to callMultiPartOutputStream#close()when finished so that it can create the finalStreamPartand consumers can finish. -
class
StreamTransferManager extends AnyRef
Manages streaming of data to S3 without knowing the size beforehand and without keeping it all in memory or writing to disk.
Manages streaming of data to S3 without knowing the size beforehand and without keeping it all in memory or writing to disk.
The data is split into chunks and uploaded using the multipart upload API. The uploading is done on separate threads, the number of which is configured by the user.
After creating an instance with details of the upload, use
StreamTransferManager#getMultiPartOutputStreams()to get a list ofMultiPartOutputStreams. As you write data to these streams, callMultiPartOutputStream#checkSize()regularly. When you finish, callMultiPartOutputStream#close(). Parts will be uploaded to S3 as you write.Once all streams have been closed, call
StreamTransferManager#complete(). Alternatively you can callStreamTransferManager#abort()at any point if needed.Here is an example. A lot of the code relates to setting up threads for creating data unrelated to the library. The essential parts are commented.
AmazonS3Client client = new AmazonS3Client(awsCreds); int numStreams = 2; int numUploadThreads = 2; int queueCapacity = 2; int partSize = 5; // Setting up final StreamTransferManager manager = new StreamTransferManager(bucket, key, client, numStreams, numUploadThreads, queueCapacity, partSize); final Listcatch (InterruptedException e) { throw new RuntimeException(e); } } // The stream must be closed once all the data has been written outputStream.close(); } catch (Exception e) { // Aborts all uploads manager.abort(e); } } }); } pool.shutdown(); pool.awaitTermination(5, TimeUnit.SECONDS); // Finishing off manager.complete(); }streams = manager.getMultiPartOutputStreams(); ExecutorService pool = Executors.newFixedThreadPool(numStreams); for (int i = 0; i < numStreams; i++) { final int streamIndex = i; pool.submit(new Runnable() { public void run() { try { MultiPartOutputStream outputStream = streams.get(streamIndex); for (int lineNum = 0; lineNum < 1000000; lineNum++) { String line = generateData(streamIndex, lineNum); // Writing data and potentially sending off a part outputStream.write(line.getBytes()); try { outputStream.checkSize(); The final file on S3 will then usually be the result of concatenating all the data written to each stream, in the order that the streams were in in the list obtained from
getMultiPartOutputStreams(). However this may not be true if multiple streams are used and some of them produce less than 5 MB of data. This is because the multipart upload API does not allow the uploading of more than one part smaller than 5 MB, which leads to fundamental limits on what this class can accomplish. If order of data is important to you, then either use only one stream or ensure that you write at least 5 MB to every stream.While performing the multipart upload this class will create instances of
InitiateMultipartUploadRequest,UploadPartRequest, andCompleteMultipartUploadRequest, fill in the essential details, and send them off. If you need to add additional details then override the appropriatecustomise*Requestmethods and set the required properties within.This class does not perform retries when uploading. If an exception is thrown at any stage the upload will be aborted and the exception rethrown, wrapped in a
RuntimeException.