Packages

package s3upload

Type Members

  1. class ExecutorServiceResultsHandler[V] extends Iterable[V]

    Wrapper around an ExecutorService that allows you to easily submit Callables, get results via iteration, and handle failure quickly.

    Wrapper around an ExecutorService that allows you to easily submit Callables, get results via iteration, and handle failure quickly. When a submitted callable throws an exception in its thread this will result in a RuntimeException when iterating over results. Typical usage is as follows:

    • Create an ExecutorService and pass it to the constructor.
    • Create Callables and ensure that they respond to interruption, e.g. regularly call:
      
          if (Thread.currentThread().isInterrupted()) {
              throw new RuntimeException("The thread was interrupted, likely indicating failure in a sibling thread.");
          }
      
    • Pass the callables to the submit() method.
    • Call finishedSubmitting().
    • Iterate over this object (e.g. with a foreach loop) to get results from the callables. Each iteration will block waiting for the next result. If one of the callables throws an unhandled exception or the thread is interrupted during iteration then ExecutorService#shutdownNow() will be called resulting in all still running callables being interrupted, and a RuntimeException will be thrown

    You can also call abort() to shut down the threads yourself.

  2. class MultiPartOutputStream extends OutputStream

    An OutputStream which packages data written to it into discrete StreamParts which can be obtained in a separate thread via iteration and uploaded to S3.

    An OutputStream which packages data written to it into discrete StreamParts which can be obtained in a separate thread via iteration and uploaded to S3.

    A single MultiPartOutputStream is allocated a range of part numbers it can assign to the StreamParts it produces, which is determined at construction.

    Creating a StreamPart is triggered when MultiPartOutputStream#checkSize() is called and the stream holds enough data, so be sure to call this regularly when writing data. It's also essential to call MultiPartOutputStream#close() when finished so that it can create the final StreamPart and consumers can finish.

  3. class StreamTransferManager extends AnyRef

    Manages streaming of data to S3 without knowing the size beforehand and without keeping it all in memory or writing to disk.

    Manages streaming of data to S3 without knowing the size beforehand and without keeping it all in memory or writing to disk.

    The data is split into chunks and uploaded using the multipart upload API. The uploading is done on separate threads, the number of which is configured by the user.

    After creating an instance with details of the upload, use StreamTransferManager#getMultiPartOutputStreams() to get a list of MultiPartOutputStreams. As you write data to these streams, call MultiPartOutputStream#checkSize() regularly. When you finish, call MultiPartOutputStream#close(). Parts will be uploaded to S3 as you write.

    Once all streams have been closed, call StreamTransferManager#complete(). Alternatively you can call StreamTransferManager#abort() at any point if needed.

    Here is an example. A lot of the code relates to setting up threads for creating data unrelated to the library. The essential parts are commented.

    
        AmazonS3Client client = new AmazonS3Client(awsCreds);
        int numStreams = 2;
        int numUploadThreads = 2;
        int queueCapacity = 2;
        int partSize = 5;
    
        // Setting up
        final StreamTransferManager manager = new StreamTransferManager(bucket, key, client, numStreams,
                                                                        numUploadThreads, queueCapacity, partSize);
        final List streams = manager.getMultiPartOutputStreams();
    
        ExecutorService pool = Executors.newFixedThreadPool(numStreams);
        for (int i = 0; i < numStreams; i++) {
            final int streamIndex = i;
            pool.submit(new Runnable() {
                public void run() {
                    try {
                        MultiPartOutputStream outputStream = streams.get(streamIndex);
                        for (int lineNum = 0; lineNum < 1000000; lineNum++) {
                            String line = generateData(streamIndex, lineNum);
    
                            // Writing data and potentially sending off a part
                            outputStream.write(line.getBytes());
                            try {
                                outputStream.checkSize();
                             catch (InterruptedException e) {
                                throw new RuntimeException(e);
                            }
                        }
    
                        // The stream must be closed once all the data has been written
                        outputStream.close();
                    } catch (Exception e) {
    
                        // Aborts all uploads
                        manager.abort(e);
                    }
                }
            });
        }
        pool.shutdown();
        pool.awaitTermination(5, TimeUnit.SECONDS);
    
        // Finishing off
        manager.complete();
    }
    

    The final file on S3 will then usually be the result of concatenating all the data written to each stream, in the order that the streams were in in the list obtained from getMultiPartOutputStreams(). However this may not be true if multiple streams are used and some of them produce less than 5 MB of data. This is because the multipart upload API does not allow the uploading of more than one part smaller than 5 MB, which leads to fundamental limits on what this class can accomplish. If order of data is important to you, then either use only one stream or ensure that you write at least 5 MB to every stream.

    While performing the multipart upload this class will create instances of InitiateMultipartUploadRequest, UploadPartRequest, and CompleteMultipartUploadRequest, fill in the essential details, and send them off. If you need to add additional details then override the appropriate customise*Request methods and set the required properties within.

    This class does not perform retries when uploading. If an exception is thrown at any stage the upload will be aborted and the exception rethrown, wrapped in a RuntimeException.

Ungrouped