0% found this document useful (0 votes)
23 views

Hadoop Commands Only

Uploaded by

juhi46125
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

Hadoop Commands Only

Uploaded by

juhi46125
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 19

Hadoop Operation Modes

Hadoopis operated in one of thethree supported modes:


 Local/Standalone Mode: After downloading Hadoop in your system, by default,
it is configured in a standalone mode and can be run as a single java process.
 Pseudo Distributed Mode: Itis a distributed simulation on single machine. Each
Hadoop daemon such as hdfs, yarn, MapReduce etc., will run as a separate java
process. This mode is useful for development.
 Fully Distributed Mode: This mode is fully distributed with minimum two or
more machines as a cluster.
HADOOP DISTRIBUTERS
File System: A file system is a method of organizing and retrieving files from a
storage medium, such as a hard drive. Examples: Microsoft uses FAT, LINUX uses
ext3, ext4 etc.
FS Shell TheFileSystem (FS) shell is invoked by bin/hadoop fs.
The File System (FS) shell includes various shell-like commands that directly
interact with the Hadoop Distributed File System (HDFS) as well as other file
systems that Hadoop supports, such as Local FS, WebHDFS, S3 FS, and others.
There are several other Java implementations of file systems that work with
Hadoop. These include local file systems (file), WebHDFS (WebHDFS), HAR
(Hadoop archive files), View (viewfs), S3 (s3a) and others.
For each file system, Hadoop uses a different URI scheme for the file system
instance in order to connect with it.
DFShell The HDFS shell is invoked by bin/hadoopdfs. It is specific to HDFS. Most
of the commands in HDFS shell behave like corresponding Unix commands.

To know details about filesystem on your Hadoop.


$ df -T

Following are the three commands which appears same but have minute
differences
1. hadoop fs {args}
2. hadoopdfs {args}
3. hdfsdfs {args}
hadoop fs <args>
FS relates to a generic file system which can point to any file systems like
local, HDFS etc. So this can be used when you are dealing with different
file systems such as Local FS, (S)FTP, S3, and others.
hadoopdfs<args>
dfs is very specific to HDFS. Would work for operation related to HDFS.
This has been deprecated and we should use hdfsdfs instead.
hdfsdfs<args>
same as 2nd i.e would work for all the operations related to HDFS and is
the recommended command instead of hadoopdfs

You can use two types of HDFS shell commands:

 The first set of shell commands are very similar to common Linux
file system commands such as ls, mkdir and so on. For example, the
command hdfsdfs –cat /path/to/hdfs/file works the same as a
Linux cat command, by printing the output of a file onto the screen.

 The second set of HDFS shell commands are specific to HDFS, such as
the command that lets you set the file replication factor.
Various ways to access the HDFS

 the command line


 over the webthrough a web interface, called WebHDFS
 Using the HttpFS gateway to access HDFS from behind a firewall
 Through application code.
 Through tools like Hive, Pig etc.

Commands

1. hdfs dfsYou may view all available HDFS commands by simply


invoking the hdfsdfs command with no options.

2. hadoop versionTo check hadoop version

3. hdfs dfs –help Details of all commands.

4. hdfs dfs –help <command>Details of mentioned <command>.

5. sudo jpsTells about daemons running on machine.

6. clear To clear screen.

7. start-all.sh To start all daemons.

8. stop-all.sh To stop all daemons.

9. df –T To know details about filesystem on your Hadoop.

Note: There is no cd command in Hadoop. Use absolute path wherever


required.
1. ls <path>
Lists the contents of the directory specified by path, showing the names,
permissions, owner, size and modification date for each entry.
options:
–d The option is used to list the
directories as plain files
–h The option is used to format the sizes
of files into a human-readable
manner than just number of bytes
–R The option is used to recursively list
the contents of Subdirectories
Syntax:
1 $ hadoop fs -ls [-d] [-h] [-R]
Example:
1 $ hadoop fs -ls /
2 $ hadoop fs -lsr /
$ hadoop fs –usage ls
Usage command gives all the options that can be used with a particular hdfs
command.

2. lsr<path>
Behaves like -ls, but recursively displays entries in all subdirectories of path.

3. du <path>
Shows disk usage, in bytes, for all the files which match path; filenames are
reported with the full HDFS protocol prefix.

4. dus<path>
Like -du, but prints a summary of disk usage of all files/directories in the path.

5. mv <src><dest>
Moves the file or directory indicated by src to dest, within HDFS.
$ hadoop fs -mv /user/hadoop/sample1.txt /user/text/

6. cp<src><dest>
Copies the file or directory identified by src to dest, within HDFS.
Example:
$ hadoop fs -cp /user/data/sample1.txt /user/hadoop1
$ hadoop fs -cp /user/data/sample2.txt /user/test/in1

7. rm<path>
Removes the file or empty directory identified by path.
Options:

–rm Only files can be removed but


directories can’t be deleted by this
command

–rm r Recursively remove directories and


files

–skipTrash used to bypass the trash then it


immediately deletes the source

–f mention that if there is no file


existing

–rR used to recursively delete directories


Syntax:
$ hadoop fs -rm [-f] [-r|-R] [-skipTrash]
Example:
$ hadoop fs -rm -r /user/test/sample.txt

8. rmr<path>
Removes the file or directory identified by path. Recursively deletes any child
entries (i.e., files or subdirectories of path).

9. put <localSrc><dest>
Copies the file or directory from the local file system identified by localSrc to
dest within the DFS. This overwrites the destination if the file already exists
before the copy
Uploads a single file or multiple source files.
Option:

–p The flag preserves the access,


modification time, ownership and
the mode

Syntax:
$ hadoop fs -put [-f] [-p] ...
Example:
$ hadoop fs -put sample.txt /user/data/
If we want to copy more than one file into hdfs then we have to
give directory as a source.

$ Hadoop fs –put file1 file2 hadoop-dir


$ Hadoop fs –put abc.txt wc.txt techaltum
10 copyFromLocal<localSrc><dest>
. Identical to –put
For the following examples, we will use Sample.txt file available in the
/home/Cloudera location.

Example - $ hadoop fs –copyFromLocal Sample1.txt


/user/cloudera/dezyre1
Copy/Upload Sample1.txt available in /home/cloudera (local default) to
/user/cloudera/dezyre1 (hdfs path)

11 moveFromLocal<localSrc><dest>
. Copies the file or directory from the local file system identified by localSrc to
dest within HDFS, and then deletes the local copy on success.

Example - $ hadoop fs –moveFromLocal Sample3.txt


/user/cloudera/dezyre1
Move Sample3.txt available in /home/cloudera (local default) to
/user/cloudera/dezyre1 (hdfs path). Source file will be deleted after moving.

12 get [-crc] <src><localDest>


. Copies the file or directory in HDFS identified by src to the local file system
path identified by localDest.
Syntax:
$ hadoop fs -get [-p]
 -p : Preserves access and modification times, ownership and the
permissions. (assuming the permissions can be propagated across
filesystems)
 -f : Overwrites the destination if it already exists.
 -ignorecrc : Skip CRC checks on the file(s) downloaded.
 -crc: write CRC checksums for the files downloaded.
Example:
$ hadoop fs -get /user/data/sample.txt workspace/

The usual way of detecting corrupted data is by computing a checksum for


the data when it first enters the system, and again whenever it is transmitted
across a channel that is unreliable and hence capable of corrupting the data.
The data is deemed to be corrupt if the newly generated checksum doesn’t
exactly match the original. This technique doesn’t offer any way to fix the data
—it is merely error detection.
A commonly used error-detecting code is CRC-32 (cyclic redundancy check),
which computes a 32-bit integer checksum for input of any size.
HDFS transparently checksums all data written to it and by default verifies
checksums when reading data. A separate checksum is created for
every io.bytes.per.checksum bytes of data. The default is 512 bytes, and
because a CRC-32 checksum is 4 bytes long, the storage overhead is less than
1%.

More about Checksum: https://www.oreilly.com/library/view/hadoop-the-


definitive/9781449328917/ch04.html

13 getmerge<src><localDest>
. Retrieves all files that match the path src in HDFS, and copies them to a
single, merged file in the local file system identified by localDest.
Example:
$ hadoop fs -getmerge /user/data
14 cat <filename>
. Displays the contents of filename on stdout.
Example:
$ hadoop fs -cat /user/data/sampletext.txt
15 copyToLocal<src><localDest>
. Identical to –get

16 moveToLocal<src><localDest>
. Works like -get, but deletes the HDFS copy on success.

17 mkdir<path>
. Creates a directory named path in HDFS.
Creates any parent directories in path that are missing (e.g., mkdir -p in
Linux).
Options:
–p mention not to fail if the directory
already exists.

Syntax:
1 $ hadoop fs -mkdir [-p]
example:
1 $ hadoop fs -mkdir -p /user/hadoop/
2 $ hadoop fs -mkdir–p /user/data/
In order to create subdirectories, the parent directory must exist. If the
condition is not met then, ‘No such file or directory’ message appears
18 setrep [-R] [-w] rep <path>
. Sets the target replication factor for files identified by path to rep. (The actual
replication factor will move toward the target over time).
Options:

–w used to request the command to wait


for the replication to be completed

–R used to accept for backward


capability and has no effect
Syntax:
$ hadoop fs -setrep [-R] [-w]
Example:
$ hadoop fs -setrep -R /user/hadoop/

19 touchz<path>
. Creates a file at path containing the current time as a timestamp. Fails if a file
already exists at path, unless the file is already size 0.

20 test -[ezd] <path>


. Returns 1 if path exists; has zero length; or is a directory or 0 otherwise.
options:

–d used to check whether if it is a


directory or not, returns 0 if it is a
directory

–e used to check whether they exist or


not, returns 0 if the exists

–f used to check whether there is a file


or not, returns 0 if the file exists

–s used to check whether the file size is


greater than 0 bytes or not, returns 0
if the size is greater than 0 bytes

–z used to check whether the file size is


zero bytes or not. If the file size is
zero bytes, then returns 0 or else
returns 1.

Example:
$ hadoop fs -test -[defsz] /user/test/test.txt
21 stat [format] <path>
. Prints information about path. Format is a string which accepts file size in
blocks (%b), filename (%n), block size (%o), replication (%r), and
modification date (%y, %Y).

22 tail [-f] <file2name>


. Shows the last 1KB of file on stdout.

23 chmod [-R] mode,mode,... <path>...


. Changes the file permissions associated with one or more objects identified by
path.... Performs changes recursively with R. mode is a 3-digit octal mode, or
{augo}+/-{rwxX}. Assumes if no scope is specified and does not apply an
umask.
Chmod a=rw file1

24 chown [-R] [owner][:[group]] <path>...


. Sets the owning user and/or group for files or directories identified by path....
Sets owner recursively if -R is specified.

$ hdfsdfs –chownsam:produsers /data/customers/names.txt


You must be a super user to modify the ownership of files and directories.

25 chgrp [-R] group <path>...


. Sets the owning group for files or directories identified by path.... Sets group
recursively if -R is specified.
You can change just the group of a user with the chgrp command, as shown
here:

$ sudo –u hdfshdfsdfs –chgrp marketing /users/sales/markets.txt


It will change the markets.txtfile group membership from supergroup to
marketing.
$ sudo –u hdfshadoop fs –chgrp –R cloudera /dezyre
It will change the /dezyre directory group membership from supergroup to
cloudera (To perform this operation superuser permission is required)

26 help <cmd-name>
. Returns usage information for one of the commands listed above. You must
omit the leading '-' character in cmd.

https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/
FileSystemShell.html#get
 text
HDFS Command that takes a source file and outputs the file in text format.
Usage: hdfsdfs –text /directory/filename
Command: hdfsdfs –text /new_edureka/test
 sudojps
Command to see all daemons running on system.
 fsck
HDFS Command to check the health of the Hadoop file system.
Command: hdfsfsck /
 count
HDFS Command to count the number of directories, files, and bytes under the paths
that match the specified file pattern.
Usage: hdfs dfs -count <path>
Command: hdfsdfs –count /user
 expunge
HDFS Command that makes the trash empty.
Command: hdfs dfs -expunge
Website for learning working of HDFS and HDFS commands
https://www.youtube.com/watch?v=m9v9lky3zcE

EXAMPLES

Listing Both Files and Directories


If the target of the ls command is a file, it shows the statistics for the file, and if it’s
a directory, it lists the contents of that directory. You can use the following
command to get a directory listing of the HDFS root directory:
$ hdfs dfs –ls /
Found 8 items
drwxr-xr-x - hdfshdfs 0 2013-12-11 09:09 /data
drwxr-xr-x - hdfssupergroup 0 2015-05-04 13:22 /lost+found
drwxrwxrwt - hdfshdfs 0 2015-05-20 07:49 /tmp
drwxr-xr-x - hdfssupergroup 0 2015-05-07 14:38 /user
...
#
For example, the following command shows all files within a directory ordered by
filenames:
$ hdfs dfs -ls /user/hadoop/testdir1
Alternately, you can specify the HDFS URI when listing files:
$ hdfs dfs –ls hdfs://<hostname>:9000/user/hdfs/dir1/
You can also specify multiple files or directories with the ls command:
$ hdfs dfs -ls /user/hadoop/testdir1 /user/hadoop/testdir2
Listing Just Directories
You can view information that pertains just to directories by passing the –d option:
$ hdfs dfs -ls -d /user/alapati
drwxr-xr-x - hdfssupergroup 0 2015-05-20 12:27 /user/alapati

Note that when you list HDFS files, each file will show its replication factor. In this case, the
file test1.txt has a replication factor of 3 (the default replication factor).

$ hdfs dfs -ls /user/alapati/


-rw-r--r-- 3hdfssupergroup 12 2016-05-24 15:44 /user/alapati/test.txt

Using the hdfs stat Command to Get Details about a File


Although the hdfs dfs –ls command lets you get the file information you need,
there are times when you need specific bits of information from HDFS. When you
run the hdfsdfs –ls command, it returns the complete path of the file. When you
want to see only the base name, you can use the hdfs –stat command to view only
specific details of a file.
You can format the hdfs –stat command with the following options:
%b Size of file in bytes
%F Will return "file", "directory", or "symlink" depending on the type of inode
%g Group name
%n Filename
%o HDFS Block size in bytes ( 128MB by default )
%r Replication factor
%u Username of owner
%y Formatted mtime of inode
%Y UNIX Epoch mtime of inode
In the following example, I show how to confirm if a file or directory exists.
# hdfs dfs -stat "%n" /user/alapati/messages
messages
If you run the hdfs –stat command against a directory, it tells you that the name
you specify is indeed a directory.
$ hdfs dfs -stat "%b %F %g %n %o %r %u %y %Y" /user/alapati/test2222
0 directory supergroup test2222 0 0 hdfs 2015-08-24 20:44:11 1432500251198
$
The following examples show how you can view different types of information
with the hdfs dfs –stat command when compared to the hdfs dfs –ls command.
Note that I specify all the -stat command options here.

$ hdfs dfs -ls /user/alapati/test2222/true.txt


-rw-r--r-- 2 hdfssupergroup 12 2015-08-24 15:44
/user/alapati/test2222/true.txt

$ hdfs dfs -stat "%b %F %g %n %o %r %u %y %Y" /user/alapati/test2222/true.txt


12 regular file supergroup true.txt 268435456 2 hdfs 2015-05-24 20:44:11
1432500251189
$
$

Website link: http://www.informit.com/articles/article.aspx?p=2755708


Commands which are used when you work on Namenode
Starting HDFS
Initially you have to format the configured HDFS file system, open namenode (HDFS server), and execute the
following command.

$ hadoopnamenode -format

After formatting the HDFS, start the distributed file system. The following command will start the namenode
as well as the data nodes as cluster.

$ start-dfs.sh

Listing Files in HDFS


After loading the information in the server, we can find the list of files in a directory, status of a file, using ‘ls’.
Given below is the syntax of ls that you can pass to a directory or a filename as an argument.

$ $HADOOP_HOME/bin/hadoop fs -ls <args>

Inserting Data into HDFS


Assume we have data in the file called file.txt in the local system which is ought to be saved in the hdfs file
system. Follow the steps given below to insert the required file in the Hadoop file system.

Step 1
You have to create an input directory.

$ $HADOOP_HOME/bin/hadoop fs -mkdir /user/input

Step 2
Transfer and store a data file from local systems to the Hadoop file system using the put command.

$ $HADOOP_HOME/bin/hadoop fs -put /home/file.txt /user/input

Step 3
You can verify the file using ls command.

$ $HADOOP_HOME/bin/hadoop fs -ls /user/input

Retrieving Data from HDFS


Assume we have a file in HDFS called outfile. Given below is a simple demonstration for retrieving the
required file from the Hadoop file system.

Step 1
Initially, view the data from HDFS using cat command.
$ $HADOOP_HOME/bin/hadoop fs -cat /user/output/outfile

Step 2
Get the file from HDFS to the local file system using get command.

$ $HADOOP_HOME/bin/hadoop fs -get /user/output/ /home/hadoop_tp/

Shutting Down the HDFS


You can shut down the HDFS by using the following command.

$ stop-dfs.sh

There are many more commands in "$HADOOP_HOME/bin/hadoopfs"than are demonstrated here,


although these basic operations will get you started. Running ./bin/hadoopdfs with no additional arguments
will list all the commands that can be run with the FsShell system.
Furthermore, $HADOOP_HOME/bin/hadoop fs -help commandName will display a short usage summary
for the operation in question, if you are stuck.

Data Lake : A data lake is a storage repository that holds a vast amount of raw data in its native format until it is needed.

You might also like