Class SortedFile<M extends org.apache.hadoop.io.WritableComparable>

  • Type Parameters:
    M - the message type extending WritableComparable.
    All Implemented Interfaces:
    java.io.Closeable, java.lang.AutoCloseable, org.apache.hadoop.util.IndexedSortable

    public final class SortedFile<M extends org.apache.hadoop.io.WritableComparable>
    extends java.lang.Object
    implements org.apache.hadoop.util.IndexedSortable, java.io.Closeable
    A file that serializes WritableComparables to a buffer, once it hits a threshold this buffer will be sorted in memory. After the file is closed, all sorted segments are merged to a single file. Afterwards the file can be read using the provided WritableComparable in order defined in it.
    Author:
    thomasjungblut
    • Constructor Summary

      Constructors 
      Constructor Description
      SortedFile​(java.lang.String dir, java.lang.String finalFileName, int bufferSize, java.lang.Class<M> msgClass)
      Creates a single sorted file.
      SortedFile​(java.lang.String dir, java.lang.String finalFileName, int bufferSize, java.lang.Class<M> msgClass, boolean intermediateMerge)
      Creates a single sorted file.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void close()  
      void collect​(M msg)
      Collects a message.
      int compare​(int left, int right)  
      void swap​(int left, int right)  
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • SortedFile

        public SortedFile​(java.lang.String dir,
                          java.lang.String finalFileName,
                          int bufferSize,
                          java.lang.Class<M> msgClass)
                   throws java.io.IOException
        Creates a single sorted file. This means, there is no intermediate fileformat, the sorted file can be read with the provided msgClass's read/write methods. The first 4 bytes are the number of times the class was serialized to the file in total. So the output file is not usable for further merging.
        Parameters:
        dir - the directory to use for swapping, will be created if not exists.
        finalFileName - the final file where the data should end up merged.
        bufferSize - the buffersize. By default, the spill starts when 90% of the buffer is reached, so you should overallocated ~10% of the data.
        msgClass - the class that implements the comparable, usually the message class that will be added into collect.
        Throws:
        java.io.IOException - in case the directory couldn't be created if not exists.
      • SortedFile

        public SortedFile​(java.lang.String dir,
                          java.lang.String finalFileName,
                          int bufferSize,
                          java.lang.Class<M> msgClass,
                          boolean intermediateMerge)
                   throws java.io.IOException
        Creates a single sorted file.
        Parameters:
        dir - the directory to use for swapping, will be created if not exists.
        finalFileName - the final file where the data should end up merged.
        bufferSize - the buffersize. By default, the spill starts when 90% of the buffer is reached, so you should overallocated ~10% of the data.
        msgClass - the class that implements the comparable, usually the message class that will be added into collect.
        intermediateMerge - if true, the outputted single sorted file has a special format so the Merger can read it for further merging.
        Throws:
        java.io.IOException - in case the directory couldn't be created if not exists.
    • Method Detail

      • collect

        public void collect​(M msg)
                     throws java.io.IOException
        Collects a message. If the buffer threshold is exceeded it will sort the buffer and spill to disk. Note that this is synchronous, so this waits until it is finished.
        Parameters:
        msg - the message to add.
        Throws:
        java.io.IOException - when an IO error happens.
      • compare

        public int compare​(int left,
                           int right)
        Specified by:
        compare in interface org.apache.hadoop.util.IndexedSortable
      • swap

        public void swap​(int left,
                         int right)
        Specified by:
        swap in interface org.apache.hadoop.util.IndexedSortable
      • close

        public void close()
                   throws java.io.IOException
        Specified by:
        close in interface java.lang.AutoCloseable
        Specified by:
        close in interface java.io.Closeable
        Throws:
        java.io.IOException