0% found this document useful (0 votes)

44 views

l2 Hdfs and Mapreduce Model 2022s2

This document discusses HDFS interfaces and the MapReduce model. It describes the shell and Java interfaces for HDFS, as well as how to interact with HDFS through commands, upload and download files, and read/write files using the Java API. It also provides an overview of how reading data from HDFS works internally by contacting the namenode and datanodes.

Uploaded by

Comp Scif

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views

l2 Hdfs and Mapreduce Model 2022s2

Uploaded by

Comp Scif

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 52

HDFS Interfaces

and
MapReduce Model
ISIT312/912 LECTURE 2

1
Content
1. Hadoop Distributed File System (HDFS)
a) Shell Interface
b) Java Interface
c) Internals
2. The MapReduce model

2
Recall Hadoop’s Core
Components

3
Interacting with HDFS
HDFS provides multiple interfaces to read, write, interrogate, and
manage the filesystem:
Ø The filesystem shell (Command-Line Interface): hadoop fs or
hdfs dfs
Ø The Hadoop Filesystem Java API
Ø Hadoop’s simple Web UI
Ø Other interfaces, such as RESTful proxy interfaces (e.g., HttpFS)

4
Hadoop’s home directory
Commands are provided in the shell Bash
> which bash
/bin/bash

> cd $HADOOP_HOME
… > ls
bin include libexec logs README.txt share
etc lib LICENSE.txt NOTICE.txt sbin

You will mostly use scripts in the bin and sbin folders, and use jar files
in the share folder.

5
Hadoop Daemons
> jps
28530 SecondaryNameNode
11188 NodeManager
28133 NameNode
28311 DataNode
10845 ResourceManager
3542 Jps

Hadoop is running properly only if the above services are running.

6
HDFS Command-Line Interface
Create an HDFS user account (already created in the VM)
> bin/hdfs dfs -mkdir -p /user/bigdata

Create an folder “input” :

> bin/hadoop fs -mkdir input

View the folder:

> bin/hadoop fs -ls
Found 1 item
drwxr-xr-x - bigdata supergroup 0 2017-07-17 16:33 input

7
HDFS Command-Line Interface
Upload a file to HDFS:
> bin/hadoop fs -put README.txt input
> bin/hadoop fs -ls input
-rw-r--r-- 1 bigdata supergroup 1494 2017-07-12 17:53 input/README.txt

Read a file in HDFS:

> bin/hadoop fs -cat input/README.txt
<contents of README.txt shown here…>

8
Paths in HDFS
The path in HDFS is represented as a URI with the prefix “hdfs://”
For example,
◦ “hdfs: //<hostname>:<port>/user/bigdata/input” refers to the “input”
directory in HDFS under the user of “bigdata”
◦ “hdfs ://<hostname>:<port>/user/bigdata/input/README.txt” refers to the
file “README.txt” in the above “input” directory in HDFS.

When interacting with HDFS’s interface in the default setting, one can
omit IP, port and user, and simply mention the directory or file.
Thus, the full spelling of “hadoop fs -ls input” is:
“hadoop fs -ls hdfs://<hostname>:<port>/user/bigdata/input”

9
Some Usual Commands
Command Description
-put Upload a file (or files) from the local filesystem to HDFS
-mkdir Create a directory in HDFS
-ls List the files in a directory in HDFS
-cat Read the content of a file (or files) in HDFS
-copyFromLocal Copy a file from the local filesystem to HDFS (similar to
“put”)
-copyToLocal Copy a file (or files) from HDFS to the local filesystem
-rm Delete a file (or files) in HDFS
-rm -r Delete a directory in HDFS

10
Web Interface of HDFS

11
Web Interface of HDFS

12
Java Interface:
The FileSystem API
A file in a Hadoop filesystem (including HDFS) is represented by a
Hadoop Path object
◦ Its syntax is URI,
◦ e.g., hdfs://localhost:8020/user/bigdata/input/README.txt

To get an instance of FileSystem, use the following factory methods:

◦ public static FileSystem get(Configuration conf) throws IOException
◦ public static FileSystem get(URI uri, Configuration conf) throws IOException
◦ public static FileSystem get(URI uri, Configuration conf, String user)
throws IOException

The following method gets a local filesystem instance:

◦ public static FileSystem getLocal(Configuration conf) throws IOException

13
The FileSystem API: Reading
A Configuration object is determined by the Hadoop configuration files
or user-provided parameters.
Using the default configuration, one can simply set
◦ Configuration conf = new Configuration()

With a FileSystem instance in hand, we invoke an open() method to get

the input stream for a file:
◦ public FSDataInputStream open(Path f) throws IOException
◦ public FSDataInputStream open(Path f, int bufferSize) throws IOException

A Path object can be created by using a designated URI. For example:

◦ Path f = new Path(uri)

14
A File Reading Application
Putting together, we can create the following class:
public class FileSystemCat {

public static void main(String[] args) throws Exception {

String uri = args[0];

Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create(uri), conf);
FSDataInputStream in = null;

Path path = new Path(uri);

in = fs.open(path);
IOUtils.copyBytes(in, System.out, 4096, true);

}
}

15
Compiling and Running APP in
Hadoop
The compilation simply uses the “javac” command, but needs to point the
dependencies in the class path.
> export HADOOP_CLASSPATH=$($HADOOP_HOME/bin/hadoop classpath)
> javac -cp $HADOOP_CLASSPATH FileSystemCat.java

Then, package a jar file and run as follows:

> hadoop jar FileSystemCat.jar FileSystemcat input/README.txt

The output is the same as running the “hadoop fs -cat” command.

16
The FileSystem API: Write
Suppose an input stream is created to read a local file.
To write a file on HDFS, the simplest way is to take a Path object for the
file to be created and return an output stream to write to:
public FSDataOutputStream create(Path f) throws IOException
And then just copy the input stream to the output stream.
Another (more flexible) way is to read the input stream into a buffer
and then write to the output stream.

17
A File Writing Application
public class FileSystemPut {

public static void main(String[] args) throws Exception {

String localStr = args[0];

String hdfsStr = args[1];

Configuration conf = new Configuration();

FileSystem local = FileSystem.getLocal(conf);
FileSystem hdfs = FileSystem.get(URI.create(hdfsStr), conf);

Path localFile = new Path(localStr);

Path hdfsFile = new Path(hdfsStr);

FSDataInputStream in = local.open(localFile);
FSDataOutputStream out = hdfs.create(hdfsFile);

IOUtils.copyBytes(in, out, 4096, true);

}
}

18
Another File Writing
Application
public class FileSystemPutAlt {
public static void main(String[] args) throws Exception {
String localStr = args[0];
String hdfsStr = args[1];
Configuration conf = new Configuration();
FileSystem local = FileSystem.getLocal(conf);
FileSystem hdfs = FileSystem.get(URI.create(hdfsStr), conf);
Path localFile = new Path(localStr);
Path hdfsFile = new Path(hdfsStr);
FSDataInputStream in = local.open(localFile);
FSDataOutputStream out = hdfs.create(hdfsFile);
byte[] buffer = new byte[256];
int bytesRead = 0;
while( (bytesRead = in.read(buffer)) > 0) { buffer
out.write(buffer, 0, bytesRead);
}
in.close();
out.close();
}
}

19
Other FileSystem API
§ The method mkdirs() creates a directory
§ The method getFileStatus() gets the meta information for a single file
or directory
§ The method listStatus() lists contents of files in a directory
§ The method exists() checks whether a file exists
§ The method delete() removes a file

v The Java API enables the implementation of customised applications

to interact with HDFS

20
Read Data in HDFS: What
Happens Inside

21
Read Data in HDFS
Step 1: The client opens the file it
wishes to read by calling open() on
the FileSystem object, which for
HDFS is an instance of
DistributedFileSystem.
Step 2: DistributedFileSystem calls
the namenode, using remote
procedure calls (RPCs), to determine
the locations of the first few blocks
in the file.

Step 3: The DistributedFileSystem returns an FSDataInputStream to the client and

the client calls read() on the stream.

22
Read Data in HDFS
Step 4: FSDataInputStream
connects to the first datanode for
the first block in the file, and then
data is streamed from the
datanode back to the client, by
calling read() repeatedly on the
stream.
Step 5: When the end of the block
is reached, FSDataInputStream will
close the connection to the
datanode, then find the best
(possibly the same) datanode for
the next block.
Step 6: When the client has finished reading, it calls close() on the
FSDataInputStream .

23
Write Data in HDFS

24
Write Data In HDFS
Step 1: The client creates the file by
calling create() on
DistributedFileSystem.
Step 2: DistributedFileSystem makes
an RPC call to the namenode to
create a new file in the filesystem’s
namespace and returns an
FSDataOutputStream for the client
to start writing data to.
Step 3: The client writes data into
the FSDataOutputStream.
Step 4: Data wrapped by the FSDataOutputStream is split into packages, which
are flushed into a queue; data packages are sent to the blocks in a datanode and
forwarded to other (usually two) datanodes.

25
Write Data In HDFS
Step 5: If FSDataStream receives
an ack from the datanode the
data packages are removed from
the queue.
Step 6: When the client has
finished writing data, it calls
close() on the stream.
Step 7: The client signals the
namenode that the writing is
completed.

26
The MapReduce
Model

27
Key-Value Pairs: MapReduce’s
Basic Data Model
Key Value
City Sydney
Employer ID Employee Albot’s profile

Input, output and intermediate records in MapReduce are represented

as key-value pairs (aka name-value/attribute-value pairs).
The key is an identifier (e.g., the name of an attribute).
◦ In MapReduce, the key is not required to be unique.
The value is the data of the key.
◦ It may be simple value or a complex object.

28
MapReduce Model
MapReduce Model
Map

Implemented by developer
Partition

Shuffle & Sort Implemented by platform

Reduce

29
An Abstract MapReduce
Program: WordCount
function map(Long lineNo, String line):
// lineNo: the position no. of a line in the text
// line: a line of text
for each word w in line:
emit (w, 1)

function reduce(String w, List loc):

// w: a word
// loc: a list of counts
sum = 0
for each c in loc: text: 1 text: 1
sum += c “text to to: 1 to: 2
emit (word, sum) pass to pass: 1 pass: 1
reader” to: 1 reader: 1
reader: 1

30
The MapReduce Model

31
Map Phase
ØMap Phase uses input format and record reader functions to derive
records in the form of key-value pairs for the input data
ØMap Phase applies a function or functions to each key-value pair over a
portion of the dataset
vIn the case of a dataset hosted in HDFS, this portion is usually called
a block.
vIf there are n blocks of data in the input dataset, there will be at
least n Map tasks (also referred to as Mappers).

32
Map Phase
Each Map task operates against
one filesystem (HDFS) block.
As illustrated, a Map task will call
its map() function, represented by
M in the diagram, once for each
record, or key-value pair; for
example, rec1, rec2, and so on.

33
Map Phase
Each call of the map() function
accepts one key-value pair and
emits (or outputs) zero or more
key-value pairs:
map (in_key, in_value) →
list (itm_key, itm_value)
The emitted data from Mapper,
also in the form of lists of key-value
pairs, will be subsequently
processed in the Reduce phase.
Different Mappers do not
communicate or share data with
each other!

34
Examples of Map Functions
Common map() functions include filtering of specific keys, such as
filtering log messages if you only want to count or analyse ERROR log
messages:
let map (k, v) = if (ERROR in v) then emit (k, v)
Another example of a map() function would be to manipulate values,
such as a map() function that converts a text value to lowercase:
let map (k, v) = emit (k, v.toLowercase ( ))

35
Partitioning Function
ØPartition function, or Partitioner, ensures each key and its list of values
is passed to one and only one Reduce task or Reducer
ØThe number of partitions is determined by the (default or user-
defined) number of Reducers
ØCustom Partitioners are developed for various practical purposes

36
Reduce Phase
ØInput of the Reduce phase is output of the Map phase (via shuffle-and
sort)
ØEach Reduce task (or Reducer) executes a reduce() function for each
intermediate key and its list of associated intermediate values.
ØThe output from each reduce() function is zero or more key-values:
reduce (intermediate_key, list (intermediate_value))
→ (out_key, out_value)
ØNote that, in reality, the output from Reducer may be the input of
another Map phase in a complex multistage computational workflow.

37
Example of Reduce Functions
The simplest and most common reduce() function is the summation,
which simply sums a list of values for each key:
let reduce (k, list <v>) = {
sum = 0
for int i in list <v> :
sum += i
emit (k, sum) }
A count operation is as simple as summing a set of numbers
representing instances of the values you wish to count.
Other examples of the reduce() function: max/mix and average

38
Shuffle and Sort
ØShuffle-and-sort is the process where data are transferred from
Mapper to Reducer
q It is “the heart of MapReduce where the ‘magic’ happens.”

ØThe most important purpose of Shuffle-and-sort is to minimise data

transmit through network I/O.
Ø In general, in Shuffle-and-Sort, the Mapper output is sent to the target
Reducer according to the partitioning function.

39
Shuffle and Sort in
MapReduce

40
Combine Phase

Suppose the
Reduce
function is a
sum.

41
Combine Function
ØIf the Reduce function is commutative and associative, it can be
performed before the Shuffle-and-Sort phase. In this case, the Reduce
function is called a Combiner function.
Ø E.g., sum (or count) is commutative and associative, but average is not.

ØThe use of a Combiner can minimise the amount of data transferred to

Reduce phase, alleviating the network transmit overhead.

42
Map-Only MapReduce
A MapReduce application may contain zero Reduce tasks. In this case, it
is a map-only application.
Examples of map-only MapReduce jobs:
◦ ETL routines without data
summarization, aggregation
and reduction
◦ File format conversion jobs
◦ Image processing jobs

43
An Election Analogy for
MapReduce

44
MapReduce Example:
Average Contact Number
For a database of 1 billion people, compute the average number of
social contacts a person has according to age.
In SQL-like language:

SELECT age, AVG(contacts)

FROM social.person
GROUP BY age

45
MapReduce Example:
Average Contact Number
Now suppose these records are stored in different datanodes. In
MapReduce:

function Map is
input: integer K between 1 and 1000 // thus each integer
representing a batch of 1 million social.person records
for each social.person record in the K-th batch do
produce one output record (Y,(N,1))
where Y is the person's age
and N is the number of contacts that the person has
end function

46
MapReduce Example:
Average Contact Number
function Reduce is
input: age (in years) Y, number of contacts N, count C
for each input record (Y,(N,C)) do
Accumulate in S the sum of N*C
Accumulate in D the sum of C
produce one output record (Y, S/D)
end function

MapReduce sends the codes to the location of each data batch (not the other
way around)

Question: the output from Map is multiple copies of (Y, (N, 1)), but the input to
Reduce is (Y, (N, C)), so what fills the gap?

47
Submit A MapReduce
Application to Hadoop
A MapReduce application in Hadoop is a Java implementation of the
MapReduce model for a specific problem (e.g., word count).

Here it
goes

Map-
Redcuce
job

48
Sample run on the Screen
The application

49
50
Behind the Screen: Running of
MapReduce Jobs
Client: submits an MR Job
YARN resource manager:
coordinates the allocation of
computing resources in the
cluster
YARN node manager(s): launch & monitor
containers on machines in the cluster.
MapReduce application master: runs in a
container, and coordinates the tasks in a
MapReduce job.
HDFS: used for sharing job files between
the other files

51
Summary
How to interact with Hadoop’s storage system HDFS
◦ Command-Line Interface: the hadoop (and dfs) script
◦ Java API: Read, write and other operations
The MapReduce model and its implementation in
Hadoop
◦ Map stage, Reduce stage, Shuffleand Sort, Partitioner, Combiner

Test Bank For Big Java Early Objects 6th Edition by Horstmann
No ratings yet
Test Bank For Big Java Early Objects 6th Edition by Horstmann
29 pages
HTML Interview Questions
No ratings yet
HTML Interview Questions
15 pages
Hadoop Distributed File System HDFS 1688981751
No ratings yet
Hadoop Distributed File System HDFS 1688981751
49 pages
3_HDFS-Hive-HBase-Pig
No ratings yet
3_HDFS-Hive-HBase-Pig
8 pages
1) Discuss The Design of Hadoop Distributed File System (HDFS) and Concept in Detail
No ratings yet
1) Discuss The Design of Hadoop Distributed File System (HDFS) and Concept in Detail
11 pages
Unit 3.1
No ratings yet
Unit 3.1
88 pages
BDT - Unit - II - Hdfs and Hadoop Io
No ratings yet
BDT - Unit - II - Hdfs and Hadoop Io
42 pages
Big data Unit 4 own
No ratings yet
Big data Unit 4 own
18 pages
BDA UNIT -3 Updated (1).docx
No ratings yet
BDA UNIT -3 Updated (1).docx
25 pages
Unit 3 Bda
No ratings yet
Unit 3 Bda
9 pages
Unit-2 Introduction To Hadoop
No ratings yet
Unit-2 Introduction To Hadoop
19 pages
Big Data Ia Answers
No ratings yet
Big Data Ia Answers
14 pages
BIG DATA UNIT -2
No ratings yet
BIG DATA UNIT -2
18 pages
Hadoop: OREIN IT Technologies
No ratings yet
Hadoop: OREIN IT Technologies
65 pages
The_Java_Interface
No ratings yet
The_Java_Interface
32 pages
Big Data AnalyticUnit2
No ratings yet
Big Data AnalyticUnit2
19 pages
Lecture 4 - Hadoop HDFS
No ratings yet
Lecture 4 - Hadoop HDFS
48 pages
HDFS Unit 4
No ratings yet
HDFS Unit 4
8 pages
Hadoop Session
No ratings yet
Hadoop Session
65 pages
10 Dfs
No ratings yet
10 Dfs
5 pages
Bda - Unit 2
No ratings yet
Bda - Unit 2
56 pages
CS19741-Cloud Computing-Unit 3 Notes
No ratings yet
CS19741-Cloud Computing-Unit 3 Notes
37 pages
Read Write in HDFS
No ratings yet
Read Write in HDFS
6 pages
Hadoop Working
No ratings yet
Hadoop Working
33 pages
Chapter 4 - Hadoop Ecosystem
No ratings yet
Chapter 4 - Hadoop Ecosystem
24 pages
Copying A File From Local To HDFS Using The Java API
No ratings yet
Copying A File From Local To HDFS Using The Java API
10 pages
lab2_BD
No ratings yet
lab2_BD
20 pages
HDFS Tutorial - Architecture, Read & Write Operation Using Java API
No ratings yet
HDFS Tutorial - Architecture, Read & Write Operation Using Java API
3 pages
PDC All Labs
100% (1)
PDC All Labs
129 pages
02 HDFS - 3 JavaAPI
No ratings yet
02 HDFS - 3 JavaAPI
26 pages
BDA Module 2-2023
No ratings yet
BDA Module 2-2023
30 pages
BigData Unit 2
No ratings yet
BigData Unit 2
56 pages
HDFS Commands Updated
No ratings yet
HDFS Commands Updated
87 pages
BD Unit-IIINotes
No ratings yet
BD Unit-IIINotes
17 pages
Unit - 3 HDFS MAPREDUCE HBASE
No ratings yet
Unit - 3 HDFS MAPREDUCE HBASE
34 pages
UNIT 3 HDFS, Hadoop Environment Part 2
No ratings yet
UNIT 3 HDFS, Hadoop Environment Part 2
6 pages
Unit - II
No ratings yet
Unit - II
64 pages
unit IV
No ratings yet
unit IV
248 pages
Hadoop Introduction
No ratings yet
Hadoop Introduction
29 pages
Unit-4 BDA as on 25-11-2024
No ratings yet
Unit-4 BDA as on 25-11-2024
248 pages
BDA Module 2 - Notes PDF
No ratings yet
BDA Module 2 - Notes PDF
101 pages
Hadoop module1
No ratings yet
Hadoop module1
37 pages
Unit 2-HDFS SGS
No ratings yet
Unit 2-HDFS SGS
29 pages
Hadoop Intro
No ratings yet
Hadoop Intro
40 pages
Big-Data Computing: Hadoop Distributed File System: B. Ramamurthy
No ratings yet
Big-Data Computing: Hadoop Distributed File System: B. Ramamurthy
43 pages
Unit 3
No ratings yet
Unit 3
61 pages
HDFS Internals
No ratings yet
HDFS Internals
30 pages
10th August Morning and Afternoon session Hadoop (1)
No ratings yet
10th August Morning and Afternoon session Hadoop (1)
18 pages
Unit v Programming Model
No ratings yet
Unit v Programming Model
53 pages
BigdataUnit III-Part2
No ratings yet
BigdataUnit III-Part2
9 pages
BDA-UNIT-2 - 2023
No ratings yet
BDA-UNIT-2 - 2023
58 pages
3.1 Hadoop Ecosystem
No ratings yet
3.1 Hadoop Ecosystem
48 pages
Hadoop Distributed File System: Presented by Mohammad Sufiyan Nagaraju Kola Prudhvi Krishna Kamireddy
No ratings yet
Hadoop Distributed File System: Presented by Mohammad Sufiyan Nagaraju Kola Prudhvi Krishna Kamireddy
17 pages
4
No ratings yet
4
53 pages
DATA228 Lecture Notes Week 4
No ratings yet
DATA228 Lecture Notes Week 4
21 pages
CC Hadoop Lab
No ratings yet
CC Hadoop Lab
6 pages
Bda Imp No Header Footer (1)
No ratings yet
Bda Imp No Header Footer (1)
25 pages
02 Unit-II Hadoop Architecture and HDFS
No ratings yet
02 Unit-II Hadoop Architecture and HDFS
18 pages
Big-Data Computing: Hadoop Distributed File System: B. Ramamurthy
No ratings yet
Big-Data Computing: Hadoop Distributed File System: B. Ramamurthy
45 pages
1- HADOOP crash course
No ratings yet
1- HADOOP crash course
52 pages
Linux Commands By Example
From Everand
Linux Commands By Example
Khaled Jamal
4.5/5 (3)
Quick Configuration of Openldap and Kerberos in Linux and Authenicating Linux to Active Directory
From Everand
Quick Configuration of Openldap and Kerberos in Linux and Authenicating Linux to Active Directory
Dr. Hidaia Mahmood Alassouli
No ratings yet
Lab 2
No ratings yet
Lab 2
6 pages
Blokfb
No ratings yet
Blokfb
1 page
Software Quality Assurance: Lecture # 6
No ratings yet
Software Quality Assurance: Lecture # 6
37 pages
PXC 3877773
No ratings yet
PXC 3877773
3 pages
Dbms 16 Marks - All 5 Units
65% (20)
Dbms 16 Marks - All 5 Units
88 pages
Data Analyst in 2025
No ratings yet
Data Analyst in 2025
13 pages
Iscsi Lun Linux 2014 PDF 2120982 PDF
No ratings yet
Iscsi Lun Linux 2014 PDF 2120982 PDF
18 pages
Understanding Indexes: by Tim Gorman, Sagelogix, Inc
No ratings yet
Understanding Indexes: by Tim Gorman, Sagelogix, Inc
10 pages
Java App CST8221-JAP-W22-CSI
No ratings yet
Java App CST8221-JAP-W22-CSI
7 pages
GYM
No ratings yet
GYM
51 pages
Evolution of Internal Audit
No ratings yet
Evolution of Internal Audit
3 pages
Weekly Schedule 2024 - INF1505
No ratings yet
Weekly Schedule 2024 - INF1505
6 pages
Creating Android Applications: Develop
No ratings yet
Creating Android Applications: Develop
274 pages
Pwa Documentation - V4
No ratings yet
Pwa Documentation - V4
40 pages
Harsh Kathiriya Resume
No ratings yet
Harsh Kathiriya Resume
1 page
Dbms Lab Report: K.Priyanka 191fa04570
No ratings yet
Dbms Lab Report: K.Priyanka 191fa04570
4 pages
Practical 7
No ratings yet
Practical 7
3 pages
Project Report On Online Shopping Website: Aakash
No ratings yet
Project Report On Online Shopping Website: Aakash
53 pages
Role of Chat GPT in Computer Programming
No ratings yet
Role of Chat GPT in Computer Programming
10 pages
Interview Questions
No ratings yet
Interview Questions
16 pages
Support ChangeIpaddress
No ratings yet
Support ChangeIpaddress
1 page
Export-Import Rails DB To Yaml, Yaml - DB
No ratings yet
Export-Import Rails DB To Yaml, Yaml - DB
2 pages
Distributed Transaction Processing
No ratings yet
Distributed Transaction Processing
18 pages
Digital Signature
No ratings yet
Digital Signature
3 pages
Vendor: Juniper Exam Code: Jn0-220 Exam Name: Automation and Devops, Associate (Jncia-Devops) Part of New Questions From
No ratings yet
Vendor: Juniper Exam Code: Jn0-220 Exam Name: Automation and Devops, Associate (Jncia-Devops) Part of New Questions From
4 pages
09 - Java Io
No ratings yet
09 - Java Io
9 pages
Caatts For Data Extraction and Analysis: It Auditing & Assurance, 2E, Hall & Singleton
No ratings yet
Caatts For Data Extraction and Analysis: It Auditing & Assurance, 2E, Hall & Singleton
29 pages
Securonix Security Analytics Platform: Next-Generation SIEM, Simplified
No ratings yet
Securonix Security Analytics Platform: Next-Generation SIEM, Simplified
2 pages

l2 Hdfs and Mapreduce Model 2022s2

Uploaded by

l2 Hdfs and Mapreduce Model 2022s2

Uploaded by

HDFS Interfaces

Hadoop is running properly only if the above services are running.

Create an folder “input” :

> bin/hadoop fs -mkdir input

View the folder:

Read a file in HDFS:

To get an instance of FileSystem, use the following factory methods:

The following method gets a local filesystem instance:

With a FileSystem instance in hand, we invoke an open() method to get

A Path object can be created by using a designated URI. For example:

public static void main(String[] args) throws Exception {

String uri = args[0];

Path path = new Path(uri);

Then, package a jar file and run as follows:

> hadoop jar FileSystemCat.jar FileSystemcat input/README.txt

The output is the same as running the “hadoop fs -cat” command.

public static void main(String[] args) throws Exception {

String localStr = args[0];

Configuration conf = new Configuration();

Path localFile = new Path(localStr);

IOUtils.copyBytes(in, out, 4096, true);

v The Java API enables the implementation of customised applications

Step 3: The DistributedFileSystem returns an FSDataInputStream to the client and

Input, output and intermediate records in MapReduce are represented

Shuffle & Sort Implemented by platform

function reduce(String w, List loc):

ØThe most important purpose of Shuffle-and-sort is to minimise data

ØThe use of a Combiner can minimise the amount of data transferred to

SELECT age, AVG(contacts)

You might also like