org.apache.hadoop.tools
Class SimpleCopyListing

java.lang.Object
  extended by org.apache.hadoop.conf.Configured
      extended by org.apache.hadoop.tools.CopyListing
          extended by org.apache.hadoop.tools.SimpleCopyListing
All Implemented Interfaces:
org.apache.hadoop.conf.Configurable

public class SimpleCopyListing
extends CopyListing

The SimpleCopyListing is responsible for making the exhaustive list of all files/directories under its specified list of input-paths. These are written into the specified copy-listing file. Note: The SimpleCopyListing doesn't handle wild-cards in the input-paths.


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.hadoop.tools.CopyListing
CopyListing.AclsNotSupportedException, CopyListing.XAttrsNotSupportedException
 
Constructor Summary
protected SimpleCopyListing(org.apache.hadoop.conf.Configuration configuration, org.apache.hadoop.security.Credentials credentials)
          Protected constructor, to initialize configuration.
 
Method Summary
 void doBuildListing(org.apache.hadoop.fs.Path pathToListingFile, DistCpOptions options)
          The interface to be implemented by sub-classes, to create the source/target file listing.
 void doBuildListing(org.apache.hadoop.io.SequenceFile.Writer fileListWriter, DistCpOptions options)
          Collect the list of to be copied and write to the sequence file.
protected  long getBytesToCopy()
          Return the total bytes that distCp should copy for the source paths This doesn't consider whether file is same should be skipped during copy
protected  long getNumberOfPaths()
          Return the total number of paths to distcp, includes directories as well This doesn't consider whether file/dir is already present and should be skipped during copy
protected  boolean shouldCopy(org.apache.hadoop.fs.Path path, DistCpOptions options)
          Provide an option to skip copy of a path, Allows for exclusion of files such as FileOutputCommitter.SUCCEEDED_FILE_NAME
protected  void validatePaths(DistCpOptions options)
          Validate input and output paths
 
Methods inherited from class org.apache.hadoop.tools.CopyListing
buildListing, getCopyListing, getCredentials, setCredentials
 
Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

SimpleCopyListing

protected SimpleCopyListing(org.apache.hadoop.conf.Configuration configuration,
                            org.apache.hadoop.security.Credentials credentials)
Protected constructor, to initialize configuration.

Parameters:
configuration - The input configuration, with which the source/target FileSystems may be accessed.
credentials - - Credentials object on which the FS delegation tokens are cached. If null delegation token caching is skipped
Method Detail

validatePaths

protected void validatePaths(DistCpOptions options)
                      throws IOException,
                             org.apache.hadoop.tools.CopyListing.InvalidInputException
Description copied from class: CopyListing
Validate input and output paths

Specified by:
validatePaths in class CopyListing
Parameters:
options - - Input options
Throws:
IOException
org.apache.hadoop.tools.CopyListing.InvalidInputException

doBuildListing

public void doBuildListing(org.apache.hadoop.fs.Path pathToListingFile,
                           DistCpOptions options)
                    throws IOException
The interface to be implemented by sub-classes, to create the source/target file listing.

Specified by:
doBuildListing in class CopyListing
Parameters:
pathToListingFile - Path on HDFS where the listing file is written.
options - Input Options for DistCp (indicating source/target paths.)
Throws:
IOException

doBuildListing

public void doBuildListing(org.apache.hadoop.io.SequenceFile.Writer fileListWriter,
                           DistCpOptions options)
                    throws IOException
Collect the list of to be copied and write to the sequence file. In essence, any file or directory that need to be copied or sync-ed is written as an entry to the sequence file, with the possible exception of the source root: when either -update (sync) or -overwrite switch is specified, and if the the source root is a directory, then the source root entry is not written to the sequence file, because only the contents of the source directory need to be copied in this case. See DistCpUtils.getRelativePath(org.apache.hadoop.fs.Path, org.apache.hadoop.fs.Path) for how relative path is computed. See computeSourceRootPath method for how the root path of the source is computed.

Parameters:
fileListWriter -
options -
Throws:
IOException

shouldCopy

protected boolean shouldCopy(org.apache.hadoop.fs.Path path,
                             DistCpOptions options)
Provide an option to skip copy of a path, Allows for exclusion of files such as FileOutputCommitter.SUCCEEDED_FILE_NAME

Parameters:
path - - Path being considered for copy while building the file listing
options - - Input options passed during DistCp invocation
Returns:
- True if the path should be considered for copy, false otherwise

getBytesToCopy

protected long getBytesToCopy()
Return the total bytes that distCp should copy for the source paths This doesn't consider whether file is same should be skipped during copy

Specified by:
getBytesToCopy in class CopyListing
Returns:
total bytes to copy

getNumberOfPaths

protected long getNumberOfPaths()
Return the total number of paths to distcp, includes directories as well This doesn't consider whether file/dir is already present and should be skipped during copy

Specified by:
getNumberOfPaths in class CopyListing
Returns:
Total number of paths to distcp


Copyright © 2014 Apache Software Foundation. All Rights Reserved.