org.apache.hadoop.tools.util
Class DistCpUtils

java.lang.Object
  extended by org.apache.hadoop.tools.util.DistCpUtils

public class DistCpUtils
extends Object

Utility functions used in DistCp.


Constructor Summary
DistCpUtils()
           
 
Method Summary
static void checkFileSystemAclSupport(org.apache.hadoop.fs.FileSystem fs)
          Determines if a file system supports ACLs by running a canary getAclStatus request on the file system root.
static void checkFileSystemXAttrSupport(org.apache.hadoop.fs.FileSystem fs)
          Determines if a file system supports XAttrs by running a getXAttrs request on the file system root.
static boolean checksumsAreEqual(org.apache.hadoop.fs.FileSystem sourceFS, org.apache.hadoop.fs.Path source, org.apache.hadoop.fs.FileChecksum sourceChecksum, org.apache.hadoop.fs.FileSystem targetFS, org.apache.hadoop.fs.Path target)
          Utility to compare checksums for the paths specified.
static boolean compareFs(org.apache.hadoop.fs.FileSystem srcFs, org.apache.hadoop.fs.FileSystem destFs)
           
static List<org.apache.hadoop.fs.permission.AclEntry> getAcl(org.apache.hadoop.fs.FileSystem fileSystem, org.apache.hadoop.fs.FileStatus fileStatus)
          Returns a file's full logical ACL.
static long getFileSize(org.apache.hadoop.fs.Path path, org.apache.hadoop.conf.Configuration configuration)
          Retrieves size of the file at the specified path.
static DecimalFormat getFormatter()
           
static int getInt(org.apache.hadoop.conf.Configuration configuration, String label)
          Utility to retrieve a specified key from a Configuration.
static long getLong(org.apache.hadoop.conf.Configuration configuration, String label)
          Utility to retrieve a specified key from a Configuration.
static String getRelativePath(org.apache.hadoop.fs.Path sourceRootPath, org.apache.hadoop.fs.Path childPath)
          Gets relative path of child path with respect to a root path For ex.
static Class<? extends org.apache.hadoop.mapreduce.InputFormat> getStrategy(org.apache.hadoop.conf.Configuration conf, DistCpOptions options)
          Returns the class that implements a copy strategy.
static String getStringDescriptionFor(long nBytes)
           
static Map<String,byte[]> getXAttrs(org.apache.hadoop.fs.FileSystem fileSystem, org.apache.hadoop.fs.Path path)
          Returns a file's all xAttrs.
static String packAttributes(EnumSet<DistCpOptions.FileAttribute> attributes)
          Pack file preservation attributes into a string, containing just the first character of each preservation attribute
static void preserve(org.apache.hadoop.fs.FileSystem targetFS, org.apache.hadoop.fs.Path path, CopyListingFileStatus srcFileStatus, EnumSet<DistCpOptions.FileAttribute> attributes, boolean preserveRawXattrs)
          Preserve attribute on file matching that of the file status being sent as argument.
static
<T> void
publish(org.apache.hadoop.conf.Configuration configuration, String label, T value)
          Utility to publish a value to a configuration.
static org.apache.hadoop.fs.Path sortListing(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path sourceListing)
          Sort sequence file containing FileStatus and Text as key and value respecitvely
static CopyListingFileStatus toCopyListingFileStatus(org.apache.hadoop.fs.FileSystem fileSystem, org.apache.hadoop.fs.FileStatus fileStatus, boolean preserveAcls, boolean preserveXAttrs, boolean preserveRawXAttrs)
          Converts a FileStatus to a CopyListingFileStatus.
static EnumSet<DistCpOptions.FileAttribute> unpackAttributes(String attributes)
          Unpacks preservation attribute string containing the first character of each preservation attribute back to a set of attributes to preserve
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DistCpUtils

public DistCpUtils()
Method Detail

getFileSize

public static long getFileSize(org.apache.hadoop.fs.Path path,
                               org.apache.hadoop.conf.Configuration configuration)
                        throws IOException
Retrieves size of the file at the specified path.

Parameters:
path - The path of the file whose size is sought.
configuration - Configuration, to retrieve the appropriate FileSystem.
Returns:
The file-size, in number of bytes.
Throws:
IOException, - on failure.
IOException

publish

public static <T> void publish(org.apache.hadoop.conf.Configuration configuration,
                               String label,
                               T value)
Utility to publish a value to a configuration.

Type Parameters:
T - The type of the value.
Parameters:
configuration - The Configuration to which the value must be written.
label - The label for the value being published.
value - The value being published.

getInt

public static int getInt(org.apache.hadoop.conf.Configuration configuration,
                         String label)
Utility to retrieve a specified key from a Configuration. Throw exception if not found.

Parameters:
configuration - The Configuration in which the key is sought.
label - The key being sought.
Returns:
Integer value of the key.

getLong

public static long getLong(org.apache.hadoop.conf.Configuration configuration,
                           String label)
Utility to retrieve a specified key from a Configuration. Throw exception if not found.

Parameters:
configuration - The Configuration in which the key is sought.
label - The key being sought.
Returns:
Long value of the key.

getStrategy

public static Class<? extends org.apache.hadoop.mapreduce.InputFormat> getStrategy(org.apache.hadoop.conf.Configuration conf,
                                                                                   DistCpOptions options)
Returns the class that implements a copy strategy. Looks up the implementation for a particular strategy from distcp-default.xml

Parameters:
conf - - Configuration object
options - - Handle to input options
Returns:
Class implementing the strategy specified in options.

getRelativePath

public static String getRelativePath(org.apache.hadoop.fs.Path sourceRootPath,
                                     org.apache.hadoop.fs.Path childPath)
Gets relative path of child path with respect to a root path For ex. If childPath = /tmp/abc/xyz/file and sourceRootPath = /tmp/abc Relative path would be /xyz/file If childPath = /file and sourceRootPath = / Relative path would be /file

Parameters:
sourceRootPath - - Source root path
childPath - - Path for which relative path is required
Returns:
- Relative portion of the child path (always prefixed with / unless it is empty

packAttributes

public static String packAttributes(EnumSet<DistCpOptions.FileAttribute> attributes)
Pack file preservation attributes into a string, containing just the first character of each preservation attribute

Parameters:
attributes - - Attribute set to preserve
Returns:
- String containing first letters of each attribute to preserve

unpackAttributes

public static EnumSet<DistCpOptions.FileAttribute> unpackAttributes(String attributes)
Unpacks preservation attribute string containing the first character of each preservation attribute back to a set of attributes to preserve

Parameters:
attributes - - Attribute string
Returns:
- Attribute set

preserve

public static void preserve(org.apache.hadoop.fs.FileSystem targetFS,
                            org.apache.hadoop.fs.Path path,
                            CopyListingFileStatus srcFileStatus,
                            EnumSet<DistCpOptions.FileAttribute> attributes,
                            boolean preserveRawXattrs)
                     throws IOException
Preserve attribute on file matching that of the file status being sent as argument. Barring the block size, all the other attributes are preserved by this function

Parameters:
targetFS - - File system
path - - Path that needs to preserve original file status
srcFileStatus - - Original file status
attributes - - Attribute set that needs to be preserved
preserveRawXattrs - if true, raw.* xattrs should be preserved
Throws:
IOException - - Exception if any (particularly relating to group/owner change or any transient error)

getAcl

public static List<org.apache.hadoop.fs.permission.AclEntry> getAcl(org.apache.hadoop.fs.FileSystem fileSystem,
                                                                    org.apache.hadoop.fs.FileStatus fileStatus)
                                                             throws IOException
Returns a file's full logical ACL.

Parameters:
fileSystem - FileSystem containing the file
fileStatus - FileStatus of file
Returns:
List containing full logical ACL
Throws:
IOException - if there is an I/O error

getXAttrs

public static Map<String,byte[]> getXAttrs(org.apache.hadoop.fs.FileSystem fileSystem,
                                           org.apache.hadoop.fs.Path path)
                                    throws IOException
Returns a file's all xAttrs.

Parameters:
fileSystem - FileSystem containing the file
path - file path
Returns:
Map containing all xAttrs
Throws:
IOException - if there is an I/O error

toCopyListingFileStatus

public static CopyListingFileStatus toCopyListingFileStatus(org.apache.hadoop.fs.FileSystem fileSystem,
                                                            org.apache.hadoop.fs.FileStatus fileStatus,
                                                            boolean preserveAcls,
                                                            boolean preserveXAttrs,
                                                            boolean preserveRawXAttrs)
                                                     throws IOException
Converts a FileStatus to a CopyListingFileStatus. If preserving ACLs, populates the CopyListingFileStatus with the ACLs. If preserving XAttrs, populates the CopyListingFileStatus with the XAttrs.

Parameters:
fileSystem - FileSystem containing the file
fileStatus - FileStatus of file
preserveAcls - boolean true if preserving ACLs
preserveXAttrs - boolean true if preserving XAttrs
preserveRawXAttrs - boolean true if preserving raw.* XAttrs
Throws:
IOException - if there is an I/O error

sortListing

public static org.apache.hadoop.fs.Path sortListing(org.apache.hadoop.fs.FileSystem fs,
                                                    org.apache.hadoop.conf.Configuration conf,
                                                    org.apache.hadoop.fs.Path sourceListing)
                                             throws IOException
Sort sequence file containing FileStatus and Text as key and value respecitvely

Parameters:
fs - - File System
conf - - Configuration
sourceListing - - Source listing file
Returns:
Path of the sorted file. Is source file with _sorted appended to the name
Throws:
IOException - - Any exception during sort.

checkFileSystemAclSupport

public static void checkFileSystemAclSupport(org.apache.hadoop.fs.FileSystem fs)
                                      throws CopyListing.AclsNotSupportedException
Determines if a file system supports ACLs by running a canary getAclStatus request on the file system root. This method is used before distcp job submission to fail fast if the user requested preserving ACLs, but the file system cannot support ACLs.

Parameters:
fs - FileSystem to check
Throws:
CopyListing.AclsNotSupportedException - if fs does not support ACLs

checkFileSystemXAttrSupport

public static void checkFileSystemXAttrSupport(org.apache.hadoop.fs.FileSystem fs)
                                        throws CopyListing.XAttrsNotSupportedException
Determines if a file system supports XAttrs by running a getXAttrs request on the file system root. This method is used before distcp job submission to fail fast if the user requested preserving XAttrs, but the file system cannot support XAttrs.

Parameters:
fs - FileSystem to check
Throws:
CopyListing.XAttrsNotSupportedException - if fs does not support XAttrs

getFormatter

public static DecimalFormat getFormatter()

getStringDescriptionFor

public static String getStringDescriptionFor(long nBytes)

checksumsAreEqual

public static boolean checksumsAreEqual(org.apache.hadoop.fs.FileSystem sourceFS,
                                        org.apache.hadoop.fs.Path source,
                                        org.apache.hadoop.fs.FileChecksum sourceChecksum,
                                        org.apache.hadoop.fs.FileSystem targetFS,
                                        org.apache.hadoop.fs.Path target)
                                 throws IOException
Utility to compare checksums for the paths specified. If checksums's can't be retrieved, it doesn't fail the test Only time the comparison would fail is when checksums are available and they don't match

Parameters:
sourceFS - FileSystem for the source path.
source - The source path.
sourceChecksum - The checksum of the source file. If it is null we still need to retrieve it through sourceFS.
targetFS - FileSystem for the target path.
target - The target path.
Returns:
If either checksum couldn't be retrieved, the function returns false. If checksums are retrieved, the function returns true if they match, and false otherwise.
Throws:
IOException - if there's an exception while retrieving checksums.

compareFs

public static boolean compareFs(org.apache.hadoop.fs.FileSystem srcFs,
                                org.apache.hadoop.fs.FileSystem destFs)


Copyright © 2014 Apache Software Foundation. All Rights Reserved.