Package de.jungblut.datastructure
Class SortedFile<M extends org.apache.hadoop.io.WritableComparable>
- java.lang.Object
-
- de.jungblut.datastructure.SortedFile<M>
-
- Type Parameters:
M- the message type extending WritableComparable.
- All Implemented Interfaces:
java.io.Closeable,java.lang.AutoCloseable,org.apache.hadoop.util.IndexedSortable
public final class SortedFile<M extends org.apache.hadoop.io.WritableComparable> extends java.lang.Object implements org.apache.hadoop.util.IndexedSortable, java.io.CloseableA file that serializes WritableComparables to a buffer, once it hits a threshold this buffer will be sorted in memory. After the file is closed, all sorted segments are merged to a single file. Afterwards the file can be read using the providedWritableComparablein order defined in it.- Author:
- thomasjungblut
-
-
Constructor Summary
Constructors Constructor Description SortedFile(java.lang.String dir, java.lang.String finalFileName, int bufferSize, java.lang.Class<M> msgClass)Creates a single sorted file.SortedFile(java.lang.String dir, java.lang.String finalFileName, int bufferSize, java.lang.Class<M> msgClass, boolean intermediateMerge)Creates a single sorted file.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidclose()voidcollect(M msg)Collects a message.intcompare(int left, int right)voidswap(int left, int right)
-
-
-
Constructor Detail
-
SortedFile
public SortedFile(java.lang.String dir, java.lang.String finalFileName, int bufferSize, java.lang.Class<M> msgClass) throws java.io.IOExceptionCreates a single sorted file. This means, there is no intermediate fileformat, the sorted file can be read with the provided msgClass's read/write methods. The first 4 bytes are the number of times the class was serialized to the file in total. So the output file is not usable for further merging.- Parameters:
dir- the directory to use for swapping, will be created if not exists.finalFileName- the final file where the data should end up merged.bufferSize- the buffersize. By default, the spill starts when 90% of the buffer is reached, so you should overallocated ~10% of the data.msgClass- the class that implements the comparable, usually the message class that will be added into collect.- Throws:
java.io.IOException- in case the directory couldn't be created if not exists.
-
SortedFile
public SortedFile(java.lang.String dir, java.lang.String finalFileName, int bufferSize, java.lang.Class<M> msgClass, boolean intermediateMerge) throws java.io.IOExceptionCreates a single sorted file.- Parameters:
dir- the directory to use for swapping, will be created if not exists.finalFileName- the final file where the data should end up merged.bufferSize- the buffersize. By default, the spill starts when 90% of the buffer is reached, so you should overallocated ~10% of the data.msgClass- the class that implements the comparable, usually the message class that will be added into collect.intermediateMerge- if true, the outputted single sorted file has a special format so theMergercan read it for further merging.- Throws:
java.io.IOException- in case the directory couldn't be created if not exists.
-
-
Method Detail
-
collect
public void collect(M msg) throws java.io.IOException
Collects a message. If the buffer threshold is exceeded it will sort the buffer and spill to disk. Note that this is synchronous, so this waits until it is finished.- Parameters:
msg- the message to add.- Throws:
java.io.IOException- when an IO error happens.
-
compare
public int compare(int left, int right)- Specified by:
comparein interfaceorg.apache.hadoop.util.IndexedSortable
-
swap
public void swap(int left, int right)- Specified by:
swapin interfaceorg.apache.hadoop.util.IndexedSortable
-
close
public void close() throws java.io.IOException- Specified by:
closein interfacejava.lang.AutoCloseable- Specified by:
closein interfacejava.io.Closeable- Throws:
java.io.IOException
-
-