class StreamTransferManager extends AnyRef
Manages streaming of data to S3 without knowing the size beforehand and without keeping it all in memory or writing to disk.
The data is split into chunks and uploaded using the multipart upload API. The uploading is done on separate threads, the number of which is configured by the user.
After creating an instance with details of the upload, use StreamTransferManager#getMultiPartOutputStreams()
to get a list
of MultiPartOutputStreams. As you write data to these streams, call
MultiPartOutputStream#checkSize() regularly. When you finish, call MultiPartOutputStream#close().
Parts will be uploaded to S3 as you write.
Once all streams have been closed, call StreamTransferManager#complete(). Alternatively you can call
StreamTransferManager#abort()
at any point if needed.
Here is an example. A lot of the code relates to setting up threads for creating data unrelated to the library. The essential parts are commented.
AmazonS3Client client = new AmazonS3Client(awsCreds);
int numStreams = 2;
int numUploadThreads = 2;
int queueCapacity = 2;
int partSize = 5;
// Setting up
final StreamTransferManager manager = new StreamTransferManager(bucket, key, client, numStreams,
numUploadThreads, queueCapacity, partSize);
final List streams = manager.getMultiPartOutputStreams();
ExecutorService pool = Executors.newFixedThreadPool(numStreams);
for (int i = 0; i < numStreams; i++) {
final int streamIndex = i;
pool.submit(new Runnable() {
public void run() {
try {
MultiPartOutputStream outputStream = streams.get(streamIndex);
for (int lineNum = 0; lineNum < 1000000; lineNum++) {
String line = generateData(streamIndex, lineNum);
// Writing data and potentially sending off a part
outputStream.write(line.getBytes());
try {
outputStream.checkSize();
catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
// The stream must be closed once all the data has been written
outputStream.close();
} catch (Exception e) {
// Aborts all uploads
manager.abort(e);
}
}
});
}
pool.shutdown();
pool.awaitTermination(5, TimeUnit.SECONDS);
// Finishing off
manager.complete();
}
The final file on S3 will then usually be the result of concatenating all the data written to each stream,
in the order that the streams were in in the list obtained from getMultiPartOutputStreams(). However this
may not be true if multiple streams are used and some of them produce less than 5 MB of data. This is because the multipart
upload API does not allow the uploading of more than one part smaller than 5 MB, which leads to fundamental limits
on what this class can accomplish. If order of data is important to you, then either use only one stream or ensure
that you write at least 5 MB to every stream.
While performing the multipart upload this class will create instances of InitiateMultipartUploadRequest,
UploadPartRequest, and CompleteMultipartUploadRequest, fill in the essential details, and send them
off. If you need to add additional details then override the appropriate customise*Request methods and
set the required properties within.
This class does not perform retries when uploading. If an exception is thrown at any stage the upload will be aborted and the
exception rethrown, wrapped in a RuntimeException.
- Alphabetic
- By Inheritance
- StreamTransferManager
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
- new StreamTransferManager(bucketName: String, putKey: String, s3Client: AmazonS3, meta: ObjectMetadata, numStreams: Int, numUploadThreads: Int, queueCapacity: Int, partSize: Int)
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
abort(): Unit
Aborts the upload.
Aborts the upload. Repeated calls have no effect.
-
def
abort(throwable: Throwable): Unit
Aborts the upload and logs a message including the stack trace of the given throwable.
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[java.lang]
- Definition Classes
- AnyRef
- Annotations
- @native() @throws( ... )
-
def
complete(): Unit
Blocks while waiting for the threads uploading the contents of the streams returned by
StreamTransferManager#getMultiPartOutputStreams()to finish, then sends a request to S3 to complete the upload.Blocks while waiting for the threads uploading the contents of the streams returned by
StreamTransferManager#getMultiPartOutputStreams()to finish, then sends a request to S3 to complete the upload. For the former to complete, it's essential that every stream is closed, otherwise the upload threads will block forever waiting for more data. - def customiseCompleteRequest(request: CompleteMultipartUploadRequest): Unit
- def customiseInitiateRequest(request: InitiateMultipartUploadRequest): Unit
- def customiseUploadPartRequest(request: UploadPartRequest): Unit
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
finalize(): Unit
- Attributes
- protected[java.lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def getMultiPartOutputStreams(): List[MultiPartOutputStream]
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- StreamTransferManager → AnyRef → Any
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @throws( ... )