Hadoop Cmds

Wai hain
Let me add Manihs
The general command line syntax is

bin/hadoop command [genericOptions] [commandOptions]
(DEV) root@st1dhd001n# hdfs dfs -help

Usage: hadoop fs [generic options]
[-appendToFile <localsrc> ... <dst>]
[-cat [-ignoreCrc] <src> ...]
[-checksum <src> ...]
[-chgrp [-R] GROUP PATH...]
[-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
[-chown [-R] [OWNER][:[GROUP]] PATH...]
[-copyFromLocal [-f] [-p] [-l] <localsrc> ... <dst>]
[-copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
[-count [-q] [-h] [-v] [-x] <path> ...]
[-cp [-f] [-p | -p[topax]] <src> ... <dst>]
[-createSnapshot <snapshotDir> [<snapshotName>]]
[-deleteSnapshot <snapshotDir> <snapshotName>]
[-df [-h] [<path> ...]]
[-du [-s] [-h] [-x] <path> ...]
[-expunge]
[-find <path> ... <expression> ...]
[-get [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
[-getfacl [-R] <path>]
[-getfattr [-R] {-n name | -d} [-e en] <path>]
[-getmerge [-nl] <src> <localdst>]
[-help [cmd ...]]
[-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [<path> ...]]
[-mkdir [-p] <path> ...]
[-moveFromLocal <localsrc> ... <dst>]
[-moveToLocal <src> <localdst>]
[-mv <src> ... <dst>]
[-put [-f] [-p] [-l] <localsrc> ... <dst>]
[-renameSnapshot <snapshotDir> <oldName> <newName>]
[-rm [-f] [-r|-R] [-skipTrash] <src> ...]
[-rmdir [--ignore-fail-on-non-empty] <dir> ...]
[-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]
[-setfattr {-n name [-v value] | -x name} <path>]
[-setrep [-R] [-w] <rep> <path> ...]
[-stat [format] <path> ...]
[-tail [-f] <file>]
[-test -[defsz] <path>]
[-text [-ignoreCrc] <src> ...]
[-touchz <path> ...]
[-usage [cmd ...]]
-appendToFile Appends the contents of all the given local files to the given dst file. The dst file will be created if it does not exist. If <localSrc> is -,
<localsrc> ... <dst> then the input is read from stdin. This Hadoop fs command append single sources or multiple sources from local file system to the
file system at the destination. It also reads input from standard input and adds it to destination file system.
• hadoop fs -appendToFile localfile /user/hadoop/hadoopfile
• hadoop fs -appendToFile localfile1 localfile2 /user/hadoop/hadoopfile
• hadoop fs -appendToFile localfile hdfs://nn.example.com/hadoop/hadoopfile
• hadoop fs -appendToFile – hdfs://nn.example.com/hadoop/hadoopfile Reads the input from stdin.
-cat [-ignoreCrc] <src> ... Fetch all files that match the file pattern <src> and display their content on stdout. Copies source paths to stdout.
• Hadoop fs -cat hdfs://nn1.example.com/file1 hdfs://nn2.example.com/file2
-checksum <src> .. Dump checksum information for files that match the file pattern <src> to stdout. Note that this requires a round-trip to a datanode
storing each block of the file, and thus is not efficient to run on a large number of files. The checksum of a file depends on its
content, block size and the checksum algorithm and parameters used for creating the file.
• hadoop fs -checksum hdfs://nn1.example.com/file1
-chgrp [-R] GROUP This is equivalent to -chown ... :GROUP ...This command is to change the group. –R option for recursive change.
PATH... hdfs dfs –chgrp superadmin /home/training
hdfs dfs –chown :superadmin /home/training --here owner is not provided, so only group is updated.
-chmod [-R] Changes permissions of a file. This works similar to the shell's chmod command with a few exceptions. Change the permissions of
<MODE[,MODE]... | files. With -R, make the change recursively through the directory structure. The user must be the owner of the file, or else a super-
OCTALMODE> PATH... user. This Hadoop shell command changes the permissions of files.
-R modifies the files recursively. This is the only option currently supported.
<MODE> Mode is the same as mode used for the shell's command. The only letters recognized are 'rwxXt', e.g. +t,a+r,g-w,+rwx,o=r.
<OCTALMODE> Mode specified in 3 or 4 digits. If 4 digits, the first may be 1 or 0 to turn the sticky bit on or off, respectively. Unlike
the shell command, it is not possible to specify only part of the mode, e.g. 754 is same as u=rwx,g=rx,o=r.
If none of 'augo' is specified, 'a' is assumed and unlike the shell command, no umask is applied.
-chown [-R] Changes owner and group of a file. This is similar to the shell's chown command with a few exceptions.
[OWNER][:[GROUP]] -R modifies the files recursively. This is the only option currently supported.
PATH... If only the owner or group is specified, then only the owner or group is modified. The owner and group names may only consist of
digits, alphabet, and any of [-_./@a-zA-Z0-9]. The names are case sensitive.
WARNING: Avoid using '.' to separate user name and group though Linux allows it. If user names have dots in them and you are
using local file system, you might see surprising results since the shell command 'chown' is used for local files.
hdfs dfs –chown anish:superadmin /home/training
hdfs dfs –chown :superadmin /home/training --here owner is not provided, so only group is updated.
-copyFromLocal [-f] [-p] This hadoop shell command is similar to put command, but the source is restricted to a local file reference. Copies only file, cannot
[-l] <localsrc> ... <dst> copy directory.
hdfs dfs -copyFromLocal /home/dataflair/Desktop/sample /user/dataflair/dir1
-copyToLocal [-p] [- Identical to the -get command. Similar to get command, only the difference is that in this the destination is restricted to a local file
ignoreCrc] [-crc] <src> ... reference.
<localdst> : hdfs dfs -copyToLocal /user/dataflair/dir1/sample /home/dataflair/Desktop
-count [-q] [-h] [-v] [-x] Count the number of directories, files and bytes under the paths that match the specified file pattern. The output columns are:
<path> ... DIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME
with the -q option: QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME
The -h option shows file sizes in human readable format.
The -v option displays a header line.
The -x option excludes snapshots from being calculated.
-cp [-f] [-p | -p[topax]] This Hadoop File system shell command copies the file or directory identified by the source to destination, within HDFS. Copy files
<src> ... <dst> : that match the file pattern <src> to a destination. When copying multiple files, the destination must be a directory. Passing -p
preserves status [topax] (timestamps, ownership, permission, ACLs, XAttr).
If -p is specified with no <arg>, then preserves timestamps, ownership, permission.
If -pa is specified, then preserves permission also because ACL is a super-set of permission.
Passing -f overwrites the destination if it already exists. Raw namespace extended attributes are preserved if (1) they are supported
(HDFS only) and, (2) all of the source and target pathnames are in the /.reserved/raw hierarchy. raw namespace xattr preservation
is determined solely by the presence (or absence) of the /.reserved/raw prefix and not by the -p option.
-createSnapshot Create a snapshot on a directory
<snapshotDir>
[<snapshotName>]
-deleteSnapshot Delete a snapshot from a directory
<snapshotDir>
<snapshotName>
-df [-h] [<path> ...] This Hadoop fs displays free space. Shows the capacity, free and used space of the filesystem. If the filesystem has multiple
partitions, and no path to a particular partition is specified, then the status of the root partitions will be shown.
-h Formats the sizes of files in a human-readable fashion rather than a number of bytes.
-du [-s] [-h] [-x] <path> This HDFS dfs command shows disk usage, in bytes, for all the files present on the path provided by the user; reporting of filenames
... are done with the full HDFS protocol prefix. Show the amount of space, in bytes, used by the files that match the specified file
pattern. The following flags are optional:
-s Rather than showing the size of each individual file that matches the pattern, shows the total (summary) size.
-x Excludes snapshots from being counted.
Note that, even without the -s option, this only shows size summaries one level deep into a directory. The output is in the form size
disk space consumed name(full path)
-expunge Empty the Trash
-find <path> ... Finds all files that match the specified expression and applies selected actions to them. If no <path> is specified then defaults to the
<expression> ... current working directory. If no expression is specified then defaults to -print.
The following primary expressions are recognised:
-name pattern
-iname pattern
Evaluates as true if the basename of the file matches the pattern using standard file system globbing. If -iname is used then the
match is case insensitive.
-print
-print0
Always evaluates to true. Causes the current pathname to be written to standard output followed by a newline. If the -print0
expression is used then an ASCII NULL character is appended rather than a newline.
The following operators are recognised:

expression -a expression
expression -and expression
expression expression
Logical AND operator for joining two expressions. Returns true if both child expressions return true. Implied by the juxtaposition of
two expressions and so does not need to be explicitly specified. The second expression will not be applied if the first fails.
-get [-p] [-ignoreCrc] [- This HDFS fs command copies the file or directory in HDFS identified by the source to the local file system path identified by local
crc] <src> ... <localdst> destination. Copy files that match the file pattern <src> to the local name. <src> is kept. When copying multiple files, the
destination must be a directory.
Passing –p preserves access and modification times, ownership and the mode.
-getfacl [-R] <path> Displays the Access Control Lists (ACLs) of files and directories. If a directory has a default ACL, then getfacl also displays the default
ACL.
-R List the ACLs of all files and directories recursively. <path> File or directory to list.
-getfattr [-R] {-n name | Displays the extended attribute names and values (if any) for a file or directory.
-d} [-e en] <path> -R Recursively list the attributes for all files and directories.
-n name Dump the named extended attribute value.
-d Dump all extended attribute values associated with pathname.
-e <encoding> Encode values after retrieving them.Valid encodings are "text", "hex", and "base64". Values encoded as text strings
are enclosed in double quotes ("), and values encoded as hexadecimal and base64 are prefixed with 0x and 0s, respectively.
<path> The file or directory.
-getmerge [-nl] <src> Get all the files in the directories that match the source file pattern and merge and sort them to only one file on local fs.
<localdst> <src> is kept.
-nl Add a newline character at the end of each file.
-help [cmd ...] Displays help for given command or all commands if none is specified.
-ls [-C] [-d] [-h] [-q] [-R] List the contents that match the specified file pattern. If path is not specified, the contents of /user/<currentUser> will be listed.
[-t] [-S] [-r] [-u] [<path> For a directory a list of its direct children is returned (unless -d option is specified).
...] Directory entries are of the form:
permissions - userId groupId sizeOfDirectory(in bytes) modificationDate(yyyy-MM-dd HH:mm) directoryName
file entries are of the form:

permissions numberOfReplicas userId groupId sizeOfFile(in bytes) modificationDate(yyyy-MM-dd HH:mm) fileName
-C Display the paths of files and directories only.

-d Directories are listed as plain files.
-q Print ? instead of non-printable characters.
-R Recursively list the contents of directories.
-t Sort files by modification time (most recent first).
-S Sort files by size.
-r Reverse the order of the sort.
-u Use time of last access instead of modification for display and sorting.
-mkdir [-p] <path> .. Create a directory in specified location. mkdir creates any parent directories in path that are missing.
-p Do not fail if the directory already exists
hdfs dfs -mkdir /user/dataflair/dir1
-moveFromLocal This HDFS command copies the file or directory from the local file system identified by the local source to destination within HDFS,
<localsrc> ... <dst> and then deletes the local copy on success. Same as -put, except that the source is deleted after it's copied.
-moveToLocal <src> This hadoop basic command works like -get, but deletes the HDFS copy on success.
<localdst>
-mv <src> ... <dst> This basic HDFS command moves the file or directory indicated by the source to destination, within HDFS. Move files that match
the specified file pattern <src> to a destination <dst>. When moving multiple files, the destination must be a directory.
-put [-f] [-p] [-l] Copy files from the local file system into fs. This hadoop basic command copies the file or directory from the local file system to
<localsrc> ... <dst> the destination within the DFS. Copying fails if the file already exists, unless the -f flag is given.
Flags:
-p Preserves access and modification times, ownership and the mode.
-f Overwrites the destination if it already exists.
-l Allow DataNode to lazily persist the file to disk. Forces replication factor of 1. This flag will result in reduced durability. Use with
care.
hdfs dfs -put /home/dataflair/Desktop/sample /user/dataflair/dir1
-renameSnapshot Rename a snapshot from oldName to newName

<snapshotDir>
<oldName>
<newName>
-rm [-f] [-r|-R] [- Delete all files that match the specified file pattern. Equivalent to the Unix command "rm <src>"
skipTrash] <src> ... -skipTrash option bypasses trash, if enabled, and immediately deletes <src>
-f If the file does not exist, do not display a diagnostic message or modify the exit status to reflect an error.
-[rR] Recursively deletes directories
rm –r is recursive version of delete, deletes directory and all the files present in the directory.
-rmdir [--ignore-fail-on- Removes the directory entry specified by each directory argument, provided it is empty.
non-empty] <dir> ...
-setfacl [-R] [{-b|-k} {- Sets Access Control Lists (ACLs) of files and directories.
m|-x <acl_spec>} Options:
<path>]|[--set -b Remove all but the base ACL entries. The entries for user, group and others are retained for compatibility with permission bits.
<acl_spec> <path>] -k Remove the default ACL.
-R Apply operations to all files and directories recursively.
-m Modify ACL. New entries are added to the ACL, and existing entries are retained.
-x Remove specified ACL entries. Other ACL entries are retained.
--set Fully replace the ACL, discarding all existing entries. The <acl_spec> must include entries for user, group, and others for
compatibility with permission bits.
<acl_spec> Comma separated list of ACL entries.
<path> File or directory to modify.
-setfattr {-n name [-v Sets an extended attribute name and value for a file or directory.
value] | -x name} <path> -n name The extended attribute name.
-v value The extended attribute value. There are three different encoding methods for the value. If the argument is enclosed in
double quotes, then the value is the string inside the quotes. If the argument is prefixed with 0x or 0X, then it is taken as a
hexadecimal number. If the argument begins with 0s or 0S, then it is taken as a base64 encoding.
-x name Remove the extended attribute.
<path> The file or directory.
-setrep [-R] [-w] <rep> Set the replication level of a file. If <path> is a directory then the command recursively changes the replication factor of all files
<path> ... under the directory tree rooted at <path>.
-w It requests that the command waits for the replication to complete. This can potentially take a very long time.
-R It is accepted for backwards compatibility. It has no effect.
-stat [format] <path> ... Print statistics about the file/directory at <path> in the specified format. Format accepts filesize in blocks (%b), group name of
owner(%g), filename (%n), block size (%o), replication (%r), user name of owner(%u), modification date (%y, %Y)
-tail [-f] <file> Show the last 1KB of the file. -f Shows appended data as the file grows.
-test -[defsz] <path> Answer various questions about <path>, with result via exit status.
-d return 0 if <path> is a directory.
-e return 0 if <path> exists.
-f return 0 if <path> is a file.
-s return 0 if file <path> is greater than zero bytes in size.
-z return 0 if file <path> is zero bytes in size, else return 1.
-text [-ignoreCrc] <src> Takes a source file and outputs the file in text format. The allowed formats are zip and TextRecordInputStream and Avro.
...
-touchz <path> ... Creates a file of zero length at <path> with current time as the timestamp of that <path>. An error is returned if the file exists with
non-zero length
-usage [cmd ...] Displays the usage for given command or all commands if none is specified.
Generic options Generic options supported are

-conf <configuration file> specify an application configuration file
-D <property=value> use value for given property
-fs <local|namenode:port> specify a namenode
-jt <local|resourcemanager:port> specify a ResourceManager
-files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars> specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives> specify comma separated archives to be unarchived on the compute machines.
HDFS Commands Guide

 Overview
 User Commands
o classpath
o dfs
o fetchdt
o fsck
o getconf
o groups
o lsSnapshottableDir
o jmxget
o oev
o oiv
o oiv_legacy
o snapshotDiff
o version
 Administration Commands
o balancer
o cacheadmin
o crypto
o datanode
o dfsadmin
o haadmin
o journalnode
o mover
o namenode
o nfs3
o portmap
o secondarynamenode
o storagepolicies
o zkfc
 Debug Commands
o verify
o recoverLease
Overview
All HDFS commands are invoked by the bin/hdfs script. Running the hdfs script without any arguments prints the description for all commands.
Usage: hdfs [SHELL_OPTIONS] COMMAND [GENERIC_OPTIONS] [COMMAND_OPTIONS]
Hadoop has an option parsing framework that employs parsing generic options as well as running classes.
COMMAND_OPTIONS Description
COMMAND_OPTIONS Description
--config The common set of shell options. These are documented on the Commands Manual page.
--loglevel
GENERIC_OPTIONS The common set of options supported by multiple commands. See the Hadoop Commands Manual for more information.
COMMAND COMMAND_OPTIONS Various commands with their options are described in the following sections. The commands have been grouped into User Comman
User Commands
Commands useful for users of a hadoop cluster.
classpath
Usage: hdfs classpath
Prints the class path needed to get the Hadoop jar and the required libraries
dfs
Usage: hdfs dfs [COMMAND [COMMAND_OPTIONS]]
Run a filesystem command on the file system supported in Hadoop. The various COMMAND_OPTIONS can be found at File System Shell Guide.
fetchdt
Usage: hdfs fetchdt [--webservice <namenode_http_addr>] <path>
COMMAND_OPTION Description
--webservice https_address use http protocol instead of RPC
fileName File name to store the token into.
Gets Delegation Token from a NameNode. See fetchdt for more info.
fsck
Usage:
hdfs fsck <path>

[-list-corruptfileblocks |
[-move | -delete | -openforwrite]
[-files [-blocks [-locations | -racks]]]
[-includeSnapshots]
path Start checking from this path.
-delete Delete corrupted files.
-files Print out files being checked.
-files -blocks Print out the block report
-files -blocks -locations Print out locations for every block.
-files -blocks -racks Print out network topology for data-node locations.
-includeSnapshots Include snapshot data if the given path indicates a snapshottable directory or there are snapshottable directories und
-list-corruptfileblocks Print out list of missing blocks and files they belong to.
-move Move corrupted files to /lost+found.
-openforwrite Print out files opened for write.
Runs the HDFS filesystem checking utility. See fsck for more info.
getconf
Usage:
hdfs getconf -namenodes

hdfs getconf -secondaryNameNodes
hdfs getconf -backupNodes
hdfs getconf -includeFile
hdfs getconf -excludeFile
hdfs getconf -nnRpcAddresses
hdfs getconf -confKey [key]
-namenodes gets list of namenodes in the cluster.
-secondaryNameNodes gets list of secondary namenodes in the cluster.
-backupNodes gets list of backup nodes in the cluster.
-includeFile gets the include file path that defines the datanodes that can join the cluster.
-excludeFile gets the exclude file path that defines the datanodes that need to decommissioned.
-nnRpcAddresses gets the namenode rpc addresses
-confKey [key] gets a specific key from the configuration
Gets configuration information from the configuration directory, post-processing.
groups
Usage: hdfs groups [username ...]
Returns the group information given one or more usernames.
lsSnapshottableDir
Usage: hdfs lsSnapshottableDir [-help]
-help print help
Get the list of snapshottable directories. When this is run as a super user, it returns all snapshottable directories. Otherwise it returns those
directories that are owned by the current user.
jmxget
Usage: hdfs jmxget [-localVM ConnectorURL | -port port | -server mbeanserver | -service service]
-help print help
-localVM ConnectorURL connect to the VM on the same machine
-port mbean server port specify mbean server port, if missing it will try to connect to MBean Server in the same VM
-service specify jmx service, either DataNode or NameNode, the default
Dump JMX information from a service.
oev
Usage: hdfs oev [OPTIONS] -i INPUT_FILE -o OUTPUT_FILE
Required command line arguments:
-i,--inputFile arg edits file to process, xml (case insensitive) extension means XML format, any other filename means binary format
-o,--outputFile arg Name of output file. If the specified file exists, it will be overwritten, format of the file is determined by -p option
Optional command line arguments:
-f,--fix-txids Renumber the transaction IDs in the input, so that there are no gaps or invalid transaction IDs.
-h,--help Display usage information and exit
-r,--ecover When reading binary edit logs, use recovery mode. This will give you the chance to skip corrupt parts of the edit log.
-p,-- Select which type of processor to apply against image file, currently supported processors are: binary (native binary format that Hadoop uses), x
processor arg file)
-v,--verbose More verbose output, prints the input and output filenames, for processors that write to a file, also output to screen. On large image files this will
Hadoop offline edits viewer.
oiv
Usage: hdfs oiv [OPTIONS] -i INPUT_FILE
Required command line arguments:
Optional command line arguments:
-o,-- Name of output file. If the specified file exists, it will be overwritten, format of the file is determined by -p option
outputFile arg
-p,--processor arg Select which type of processor to apply against image file, currently supported processors are: binary (native binary format that Hadoop uses),
edits file)
Hadoop Offline Image Viewer for newer image files.
oiv_legacy
Usage: hdfs oiv_legacy [OPTIONS] -i INPUT_FILE -o OUTPUT_FILE
-o,--outputFile arg Name of output file. If the specified file exists, it will be overwritten, format of the file is determined by -p option
Hadoop offline image viewer for older versions of Hadoop.
snapshotDiff
Usage: hdfs snapshotDiff <path> <fromSnapshot> <toSnapshot>
Determine the difference between HDFS snapshots. See the HDFS Snapshot Documentation for more information.
version
Usage: hdfs version
Prints the version.
Administration Commands
Commands useful for administrators of a hadoop cluster.
balancer
Usage:
hdfs balancer
[-threshold <threshold>]
[-policy <policy>]
[-exclude [-f <hosts-file> | <comma-separated list of hosts>]]
[-include [-f <hosts-file> | <comma-separated list of hosts>]]
[-idleiterations <idleiterations>]
-policy <policy> datanode (default): Cluster is balanced if each datanode is balanced.
blockpool: Cluster is balanced if each block pool in each datanode is b
-threshold <threshold> Percentage of disk capacity. This overwrites the default threshold.
-exclude -f <hosts-file> | <comma-separated list of hosts> Excludes the specified datanodes from being balanced by the balancer.
-include -f <hosts-file> | <comma-separated list of hosts> Includes only the specified datanodes to be balanced by the balancer.
-idleiterations <iterations> Maximum number of idle iterations before exit. This overwrites the defau
Runs a cluster balancing utility. An administrator can simply press Ctrl-C to stop the rebalancing process. See Balancer for more details.
Note that the blockpool policy is more strict than the datanode policy.
cacheadmin
Usage: hdfs cacheadmin -addDirective -path <path> -pool <pool-name> [-force] [-replication <replication>] [-
ttl <time-to-live>]
See the HDFS Cache Administration Documentation for more information.
crypto
Usage:
hdfs crypto -createZone -keyName <keyName> -path <path>

hdfs crypto -help <command-name>
hdfs crypto -listZones
See the HDFS Transparent Encryption Documentation for more information.
datanode
Usage: hdfs datanode [-regular | -rollback | -rollingupgrace rollback]
-regular Normal datanode startup (default).
-rollback Rollback the datanode to the previous version. This should be used after stopping the datanode and distributing the old hadoo
-rollingupgrade rollback Rollback a rolling upgrade operation.
Runs a HDFS datanode.
dfsadmin
Usage:
hdfs dfsadmin [GENERIC_OPTIONS]

[-report [-live] [-dead] [-decommissioning]]
[-safemode enter | leave | get | wait]
[-saveNamespace]
[-rollEdits]
[-restoreFailedStorage true |false |check]
[-refreshNodes]
[-setQuota <quota> <dirname>...<dirname>]
[-clrQuota <dirname>...<dirname>]
[-setSpaceQuota <quota> <dirname>...<dirname>]
[-clrSpaceQuota <dirname>...<dirname>]
[-setStoragePolicy <path> <policyName>]
[-getStoragePolicy <path>]
[-finalizeUpgrade]
[-rollingUpgrade [<query> |<prepare> |<finalize>]]
[-metasave filename]
[-refreshServiceAcl]
[-refreshUserToGroupsMappings]
[-refreshSuperUserGroupsConfiguration]
[-refreshCallQueue]
[-refresh <host:ipc_port> <key> [arg1..argn]]
[-reconfig <datanode |...> <host:ipc_port> <start |status>]
[-printTopology]
[-refreshNamenodes datanodehost:port]
[-deleteBlockPool datanode-host:port blockpoolId [force]]
[-setBalancerBandwidth <bandwidth in bytes per second>]
[-allowSnapshot <snapshotDir>]
[-disallowSnapshot <snapshotDir>]
[-fetchImage <local directory>]
[-shutdownDatanode <datanode_host:ipc_port> [upgrade]]
[-getDatanodeInfo <datanode_host:ipc_port>]
[-triggerBlockReport [-incremental] <datanode_host:ipc_port>]
[-help [cmd]]
-report [-live] [-dead] [-decommissioning] Reports basic filesystem information and statistics. Optional flags may be used to filter the list of display
-safemode enter|leave|get|wait Safe mode maintenance command. Safe mode is a Namenode state in which it
1. does not accept changes to the name space (read-only)
2. does not replicate or delete blocks.
Safe mode is entered automatically at Namenode startup, and leaves safe mode automatically when the
minimum replication condition. Safe mode can also be entered manually, but then it can only be turned
-saveNamespace Save current namespace into storage directories and reset edits log. Requires safe mode.
-rollEdits Rolls the edit log on the active NameNode.
-restoreFailedStoragetrue|false|check This option will turn on/off automatic attempt to restore failed storage replicas. If a failed storage becom
and/or fsimage during checkpoint. ‘check’ option will return current setting.
-refreshNodes Re-read the hosts and exclude files to update the set of Datanodes that are allowed to connect to the Na
recommissioned.
-setQuota <quota> <dirname>…<dirname> See HDFS Quotas Guide for the detail.
-clrQuota <dirname>…<dirname> See HDFS Quotas Guide for the detail.
-setSpaceQuota <quota> <dirname>…<dirname> See HDFS Quotas Guide for the detail.
-clrSpaceQuota <dirname>…<dirname> See HDFS Quotas Guide for the detail.
-setStoragePolicy <path> <policyName> Set a storage policy to a file or a directory.
-getStoragePolicy <path> Get the storage policy of a file or a directory.
-finalizeUpgrade Finalize upgrade of HDFS. Datanodes delete their previous version working directories, followed by Nam
-rollingUpgrade [<query>|<prepare>|<finalize>] See Rolling Upgrade document for the detail.
-metasave filename Save Namenode’s primary data structures to filename in the directory specified by hadoop.log.dir prope
one line for each of the following
1. Datanodes heart beating with Namenode
2. Blocks waiting to be replicated
3. Blocks currently being replicated
4. Blocks waiting to be deleted
-refreshServiceAcl Reload the service-level authorization policy file.
-refreshUserToGroupsMappings Refresh user-to-groups mappings.
-refreshSuperUserGroupsConfiguration Refresh superuser proxy groups mappings
-refreshCallQueue Reload the call queue from config.
-refresh <host:ipc_port> <key> [arg1..argn] Triggers a runtime-refresh of the resource specified by <key> on <host:ipc_port>. All other args after a
-reconfig <datanode |…> <host:ipc_port> <start|status> Start reconfiguration or get the status of an ongoing reconfiguration. The second parameter specifies th
configuration is supported.
-printTopology Print a tree of the racks and their nodes as reported by the Namenode
-refreshNamenodes datanodehost:port For the given datanode, reloads the configuration files, stops serving the removed block-pools and start
-deleteBlockPool datanode-host:port blockpoolId [force] If force is passed, block pool directory for the given blockpool id on the given datanode is deleted along
is empty. The command will fail if datanode is still serving the block pool. Refer to refreshNamenodes to
-setBalancerBandwidth <bandwidth in bytes per Changes the network bandwidth used by each datanode during HDFS block balancing. <bandwidth> is t
second> by each datanode. This value overrides the dfs.balance.bandwidthPerSec parameter. NOTE: The new va
-allowSnapshot <snapshotDir> Allowing snapshots of a directory to be created. If the operation completes successfully, the directory be
Documentation for more information.
-disallowSnapshot <snapshotDir> Disallowing snapshots of a directory to be created. All snapshots of the directory must be deleted before
Documentation for more information.
-fetchImage <local directory> Downloads the most recent fsimage from the NameNode and saves it in the specified local directory.
-shutdownDatanode<datanode_host:ipc_port> [upgrade] Submit a shutdown request for the given datanode. See Rolling Upgrade document for the detail.
-getDatanodeInfo<datanode_host:ipc_port> Get the information about the given datanode. See Rolling Upgrade document for the detail.
-triggerBlockReport [- Trigger a block report for the given datanode. If ‘incremental’ is specified, it will be otherwise, it will be a
incremental]<datanode_host:ipc_port>
-help [cmd] Displays help for the given command or all commands if none is specified.
Runs a HDFS dfsadmin client.
haadmin
Usage:
hdfs haadmin -checkHealth <serviceId>

hdfs haadmin -failover [--forcefence] [--forceactive] <serviceId> <serviceId>
hdfs haadmin -getServiceState <serviceId>
hdfs haadmin -help <command>
hdfs haadmin -transitionToActive <serviceId> [--forceactive]
hdfs haadmin -transitionToStandby <serviceId>
-checkHealth check the health of the given NameNode
-failover initiate a failover between two NameNodes
-getServiceState determine whether the given NameNode is Active or Standby
-transitionToActive transition the state of the given NameNode to Active (Warning: No fencing is done)
-transitionToStandby transition the state of the given NameNode to Standby (Warning: No fencing is done)
See HDFS HA with NFS or HDFS HA with QJM for more information on this command.
journalnode
Usage: hdfs journalnode
This comamnd starts a journalnode for use with HDFS HA with QJM.
mover
Usage: hdfs mover [-p <files/dirs> | -f <local file name>]
-f <local file> Specify a local file containing a list of HDFS files/dirs to migrate.
-p <files/dirs> Specify a space separated list of HDFS files/dirs to migrate.
Runs the data migration utility. See Mover for more details.
Note that, when both -p and -f options are omitted, the default path is the root directory.
namenode
Usage:
hdfs namenode [-backup] |

[-checkpoint] |
[-format [-clusterid cid ] [-force] [-nonInteractive] ] |
[-upgrade [-clusterid cid] [-renameReserved<k-v pairs>] ] |
[-upgradeOnly [-clusterid cid] [-renameReserved<k-v pairs>] ] |
[-rollback] |
[-rollingUpgrade <downgrade |rollback> ] |
[-finalize] |
[-importCheckpoint] |
[-initializeSharedEdits] |
[-bootstrapStandby] |
[-recover [-force] ] |
[-metadataVersion ]
-backup Start backup node.
-checkpoint Start checkpoint node.
-format [-clusterid cid] [-force][- Formats the specified NameNode. It starts the NameNode, formats it and then shut it down. -force
nonInteractive] option aborts if the name directory exists, unless -force option is specified.
-upgrade [-clusterid cid] [-renameReserved <k-v Namenode should be started with upgrade option after the distribution of new Hadoop version.
pairs>]
-upgradeOnly [-clusterid cid] [- Upgrade the specified NameNode and then shutdown it.
renameReserved <k-v pairs>]
-rollback Rollback the NameNode to the previous version. This should be used after stopping the cluster and
-rollingUpgrade<downgrade|rollback|started> See Rolling Upgrade document for the detail.
-finalize Finalize will remove the previous state of the files system. Recent upgrade will become permanent.
finalization it shuts the NameNode down.
-importCheckpoint Loads image from a checkpoint directory and save it into the current one. Checkpoint dir is read fro
-initializeSharedEdits Format a new shared edits dir and copy in enough edit log segments so that the standby NameNod
-bootstrapStandby Allows the standby NameNode’s storage directories to be bootstrapped by copying the latest names
first configuring an HA cluster.
-recover [-force] Recover lost metadata on a corrupt filesystem. See HDFS User Guide for the detail.
-metadataVersion Verify that configured directories exist, then print the metadata versions of the software and the im
Runs the namenode. More info about the upgrade, rollback and finalize is at Upgrade Rollback.
nfs3
Usage: hdfs nfs3
This comamnd starts the NFS3 gateway for use with the HDFS NFS3 Service.
portmap
Usage: hdfs portmap
This comamnd starts the RPC portmap for use with the HDFS NFS3 Service.
secondarynamenode
Usage: hdfs secondarynamenode [-checkpoint [force]] | [-format] | [-geteditsize]
-checkpoint [force] Checkpoints the SecondaryNameNode if EditLog size >= fs.checkpoint.size. If force is used, checkpoint irrespective of EditLog size.
-format Format the local storage during startup.
-geteditsize Prints the number of uncheckpointed transactions on the NameNode.
Runs the HDFS secondary namenode. See Secondary Namenode for more info.
storagepolicies
Usage: hdfs storagepolicies
Lists out all storage policies. See the HDFS Storage Policy Documentation for more information.
zkfc
Usage: hdfs zkfc [-formatZK [-force] [-nonInteractive]]
-formatZK Format the Zookeeper instance
-h Display help
This comamnd starts a Zookeeper Failover Controller process for use with HDFS HA with QJM.
Debug Commands
Useful commands to help administrators debug HDFS issues, like validating block files and calling recoverLease.
verify
Usage: hdfs debug verify [-meta <metadata-file>] [-block <block-file>]
-block block-file Optional parameter to specify the absolute path for the block file on the local file system of the data node.
-meta metadata-file Absolute path for the metadata file on the local file system of the data node.
Verify HDFS metadata and block files. If a block file is specified, we will verify that the checksums in the metadata file match the block file.
recoverLease
Usage: hdfs debug recoverLease [-path <path>] [-retries <num-retries>]
[-path path] HDFS path for which to recover the lease.
[-retries num-retries] Number of times the client will retry calling recoverLease. The default number of retries is 1.
Recover the lease on the specified path. The path must reside on an HDFS filesystem. The default number of retries is 1.

Hadoop Cmds

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Hadoop Cmds

Загружено:

Авторское право:

Доступные форматы

Wai hain

Let me add Manihs

The general command line syntax is

(DEV) root@st1dhd001n# hdfs dfs -help

The following operators are recognised:

file entries are of the form:

-C Display the paths of files and directories only.

-renameSnapshot Rename a snapshot from oldName to newName

Generic options Generic options supported are

HDFS Commands Guide

Usage: hdfs [SHELL_OPTIONS] COMMAND [GENERIC_OPTIONS] [COMMAND_OPTIONS]

Commands useful for users of a hadoop cluster.

Usage: hdfs classpath

Usage: hdfs dfs [COMMAND [COMMAND_OPTIONS]]

Usage: hdfs fetchdt [--webservice <namenode_http_addr>] <path>

fileName File name to store the token into.

hdfs fsck <path>

-files Print out files being checked.

-files -blocks Print out the block report

-files -blocks -locations Print out locations for every block.

-move Move corrupted files to /lost+found.

-openforwrite Print out files opened for write.

hdfs getconf -namenodes

-secondaryNameNodes gets list of secondary namenodes in the cluster.

-backupNodes gets list of backup nodes in the cluster.

-nnRpcAddresses gets the namenode rpc addresses

-confKey [key] gets a specific key from the configuration

Gets configuration information from the configuration directory, post-processing.

Usage: hdfs groups [username ...]

Returns the group information given one or more usernames.

Usage: hdfs lsSnapshottableDir [-help]

directories that are owned by the current user.

-localVM ConnectorURL connect to the VM on the same machine

-service specify jmx service, either DataNode or NameNode, the default

Dump JMX information from a service.

Usage: hdfs oev [OPTIONS] -i INPUT_FILE -o OUTPUT_FILE

Required command line arguments:

-h,--help Display usage information and exit

Hadoop offline edits viewer.

Usage: hdfs oiv [OPTIONS] -i INPUT_FILE

Required command line arguments:

Optional command line arguments:

Hadoop Offline Image Viewer for newer image files.

Usage: hdfs oiv_legacy [OPTIONS] -i INPUT_FILE -o OUTPUT_FILE

Hadoop offline image viewer for older versions of Hadoop.

Usage: hdfs snapshotDiff <path> <fromSnapshot> <toSnapshot>

Usage: hdfs version

Prints the version.

See the HDFS Cache Administration Documentation for more information.

hdfs crypto -createZone -keyName <keyName> -path <path>

See the HDFS Transparent Encryption Documentation for more information.

Usage: hdfs datanode [-regular | -rollback | -rollingupgrace rollback]

-rollingupgrade rollback Rollback a rolling upgrade operation.

Runs a HDFS datanode.

hdfs dfsadmin [GENERIC_OPTIONS]

-rollEdits Rolls the edit log on the active NameNode.

-clrQuota <dirname>…<dirname> See HDFS Quotas Guide for the detail.

-clrSpaceQuota <dirname>…<dirname> See HDFS Quotas Guide for the detail.

-setStoragePolicy <path> <policyName> Set a storage policy to a file or a directory.

-getStoragePolicy <path> Get the storage policy of a file or a directory.

-rollingUpgrade [<query>|<prepare>|<finalize>] See Rolling Upgrade document for the detail.

-refreshUserToGroupsMappings Refresh user-to-groups mappings.

-refreshSuperUserGroupsConfiguration Refresh superuser proxy groups mappings

-refreshCallQueue Reload the call queue from config.