Hadoop Commands Only
Hadoop Commands Only
Following are the three commands which appears same but have minute
differences
1. hadoop fs {args}
2. hadoopdfs {args}
3. hdfsdfs {args}
hadoop fs <args>
FS relates to a generic file system which can point to any file systems like
local, HDFS etc. So this can be used when you are dealing with different
file systems such as Local FS, (S)FTP, S3, and others.
hadoopdfs<args>
dfs is very specific to HDFS. Would work for operation related to HDFS.
This has been deprecated and we should use hdfsdfs instead.
hdfsdfs<args>
same as 2nd i.e would work for all the operations related to HDFS and is
the recommended command instead of hadoopdfs
The first set of shell commands are very similar to common Linux
file system commands such as ls, mkdir and so on. For example, the
command hdfsdfs –cat /path/to/hdfs/file works the same as a
Linux cat command, by printing the output of a file onto the screen.
The second set of HDFS shell commands are specific to HDFS, such as
the command that lets you set the file replication factor.
Various ways to access the HDFS
Commands
2. lsr<path>
Behaves like -ls, but recursively displays entries in all subdirectories of path.
3. du <path>
Shows disk usage, in bytes, for all the files which match path; filenames are
reported with the full HDFS protocol prefix.
4. dus<path>
Like -du, but prints a summary of disk usage of all files/directories in the path.
5. mv <src><dest>
Moves the file or directory indicated by src to dest, within HDFS.
$ hadoop fs -mv /user/hadoop/sample1.txt /user/text/
6. cp<src><dest>
Copies the file or directory identified by src to dest, within HDFS.
Example:
$ hadoop fs -cp /user/data/sample1.txt /user/hadoop1
$ hadoop fs -cp /user/data/sample2.txt /user/test/in1
7. rm<path>
Removes the file or empty directory identified by path.
Options:
8. rmr<path>
Removes the file or directory identified by path. Recursively deletes any child
entries (i.e., files or subdirectories of path).
9. put <localSrc><dest>
Copies the file or directory from the local file system identified by localSrc to
dest within the DFS. This overwrites the destination if the file already exists
before the copy
Uploads a single file or multiple source files.
Option:
Syntax:
$ hadoop fs -put [-f] [-p] ...
Example:
$ hadoop fs -put sample.txt /user/data/
If we want to copy more than one file into hdfs then we have to
give directory as a source.
11 moveFromLocal<localSrc><dest>
. Copies the file or directory from the local file system identified by localSrc to
dest within HDFS, and then deletes the local copy on success.
13 getmerge<src><localDest>
. Retrieves all files that match the path src in HDFS, and copies them to a
single, merged file in the local file system identified by localDest.
Example:
$ hadoop fs -getmerge /user/data
14 cat <filename>
. Displays the contents of filename on stdout.
Example:
$ hadoop fs -cat /user/data/sampletext.txt
15 copyToLocal<src><localDest>
. Identical to –get
16 moveToLocal<src><localDest>
. Works like -get, but deletes the HDFS copy on success.
17 mkdir<path>
. Creates a directory named path in HDFS.
Creates any parent directories in path that are missing (e.g., mkdir -p in
Linux).
Options:
–p mention not to fail if the directory
already exists.
Syntax:
1 $ hadoop fs -mkdir [-p]
example:
1 $ hadoop fs -mkdir -p /user/hadoop/
2 $ hadoop fs -mkdir–p /user/data/
In order to create subdirectories, the parent directory must exist. If the
condition is not met then, ‘No such file or directory’ message appears
18 setrep [-R] [-w] rep <path>
. Sets the target replication factor for files identified by path to rep. (The actual
replication factor will move toward the target over time).
Options:
19 touchz<path>
. Creates a file at path containing the current time as a timestamp. Fails if a file
already exists at path, unless the file is already size 0.
Example:
$ hadoop fs -test -[defsz] /user/test/test.txt
21 stat [format] <path>
. Prints information about path. Format is a string which accepts file size in
blocks (%b), filename (%n), block size (%o), replication (%r), and
modification date (%y, %Y).
26 help <cmd-name>
. Returns usage information for one of the commands listed above. You must
omit the leading '-' character in cmd.
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/
FileSystemShell.html#get
text
HDFS Command that takes a source file and outputs the file in text format.
Usage: hdfsdfs –text /directory/filename
Command: hdfsdfs –text /new_edureka/test
sudojps
Command to see all daemons running on system.
fsck
HDFS Command to check the health of the Hadoop file system.
Command: hdfsfsck /
count
HDFS Command to count the number of directories, files, and bytes under the paths
that match the specified file pattern.
Usage: hdfs dfs -count <path>
Command: hdfsdfs –count /user
expunge
HDFS Command that makes the trash empty.
Command: hdfs dfs -expunge
Website for learning working of HDFS and HDFS commands
https://www.youtube.com/watch?v=m9v9lky3zcE
EXAMPLES
Note that when you list HDFS files, each file will show its replication factor. In this case, the
file test1.txt has a replication factor of 3 (the default replication factor).
$ hadoopnamenode -format
After formatting the HDFS, start the distributed file system. The following command will start the namenode
as well as the data nodes as cluster.
$ start-dfs.sh
Step 1
You have to create an input directory.
Step 2
Transfer and store a data file from local systems to the Hadoop file system using the put command.
Step 3
You can verify the file using ls command.
Step 1
Initially, view the data from HDFS using cat command.
$ $HADOOP_HOME/bin/hadoop fs -cat /user/output/outfile
Step 2
Get the file from HDFS to the local file system using get command.
$ stop-dfs.sh
Data Lake : A data lake is a storage repository that holds a vast amount of raw data in its native format until it is needed.