0% found this document useful (0 votes)

31 views

CSE545 Sp23 (3) Hadoop MapReduce 2-13

Hadoop MapReduce

Uploaded by

Selvi Krish

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views

CSE545 Sp23 (3) Hadoop MapReduce 2-13

Hadoop MapReduce

Uploaded by

Selvi Krish

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 96

“Hadoop”

A Distributed Architecture, FileSystem, & MapReduce

H. Andrew Schwartz

CSE545
Spring 2023

(freesvg.org/1534373472)
Big Data Analytics, The Class
Goal: Generalizations
A model or summarization of the data.

Data Workﬂow Frameworks Analytics and Algorithms

Hadoop File System Similarity Search

Spark Hypothesis Testing
Streaming Transformers/Self-Supervision
MapReduce
Deep Learning Frameworks Recommendation Systems
Link Analysis
Big Data Analytics, The Class

W st
or em
Sy s
kf s
m
ir th
lo
w l go
A

Big Data Analytics

a l

D ols
tic s

is
To
s

tri
i
at hod

bu
t
S et

ted
M
Big Data Analytics, The Class

W st
or em
Sy s
kf s
m
ir th
lo
w l go
A

Big Data Analytics

a l

D ols
tic s

is
To
s

tri
i
at hod

bu
t
S et

ted
M
Data
Classical Data Analytics

CPU

Memory

Disk
Classical Data Analytics

CPU

Memory
(64 GB)

Disk
Classical Data Analytics

CPU

Memory
(64 GB)

Disk
Classical Data Analytics

CPU

Memory
(64 GB)

Disk
IO Bounded
Reading a word from disk versus main memory: 105 slower!
Reading many contiguously stored words
is faster per word, but fast modern disks
still only reach ~1GB/s for sequential reads.
IO Bounded
Reading a word from disk versus main memory: 105 slower!
Reading many contiguously stored words
is faster per word, but fast modern disks
still only reach ~1GB/s for sequential reads.

IO Bound: biggest performance bottleneck is reading / writing to disk.

starts around 500 GBs: >10 minutes just to read

500 TBs: ~8,600 minutes = ~6 days
Classical Big Data

CPU
Classical focus: efficient use of disk.
e.g. Apache Lucene / Solr
Memory

Disk Classical limitation: Still bounded when

needing to process all of a large file.
Classical Big Data

Classical focus: efficient use of disk.

How to solve?
e.g. Apache Lucene / Solr

Classical limitation: Still bounded when

needing to process all of a large file.
Distributed Architecture
Switch
~10Gbps

Rack 1
Rack 2
Switch Switch
~1Gbps ~1Gbps
...

CPU CPU CPU CPU CPU CPU

Memory Memory ... Memory Memory Memory ... Memory

Disk Disk Disk Disk Disk Disk

Distributed Architecture
In reality, modern setups often have multiple cpus and disks
per server, but we will model as if one machine
per cpu-disk pair.
Switch
~1Gbps

CPU CPU CPU CPU CPU CPU

... ...
...
Memory Memory

Disk Disk ... Disk Disk Disk ... Disk

Distributed Architecture (Cluster)

Switch
~10Gbps

Rack 1
Rack 2
Switch Switch
~1Gbps ~1Gbps
...

CPU CPU CPU CPU CPU CPU

Memory Memory ... Memory Memory Memory ... Memory

Disk Disk Disk Disk Disk Disk

Distributed Architecture (Cluster)
Challenges for IO Cluster Computing

1. Nodes fail
1 in 1000 nodes fail a day

2. Network is a bottleneck
Typically 1-10 Gb/s throughput

3. Traditional distributed programming is

often ad-hoc and complicated
Distributed Architecture (Cluster)
Challenges for IO Cluster Computing
1. Nodes fail
1 in 1000 nodes fail a day
Duplicate Data
2. Network is a bottleneck
Typically 1-10 Gb/s throughput
Bring computation to nodes, rather than
data to nodes.
3. Traditional distributed programming is
often ad-hoc and complicated
Stipulate a programming system that
can easily be distributed
Distributed Architecture (Cluster)
Challenges for IO Cluster Computing
1. Nodes fail
1 in 1000 nodes fail a day
Duplicate Data
2. Network is a bottleneck
Typically 1-10 Gb/s throughput HDFS with
Bring computation to nodes, rather than MapReduce
data to nodes. accomplishes all!
3. Traditional distributed programming is
often ad-hoc and complicated
Stipulate a programming system that
can easily be distributed
Distributed Filesystem

The eﬀectiveness of MapReduce, Spark, and other

distributed processing systems is in part simply due to
use of a distributed ﬁlesystem!
Distributed Filesystem
Characteristics for Big Data Tasks
Large files (i.e. >100 GB to TBs)
Reads are most common
No need to update in place
(append preferred)
CPU

Memory

Disk
Distributed Filesystem
(e.g. Apache HadoopDFS, GoogleFS, EMRFS)

C, D: Two different files

https://opensource.com/life/14/8/intro
-apache-hadoop-big-data

C
D
Distributed Filesystem
“Hadoop” was named after a
toy elephant belonging to Doug
Cutting’s son. Cutting was one
(e.g. Apache HadoopDFS, GoogleFS, EMRFS)
of Hadoop’s creators.
C, D: Two different files

https://opensource.com/life/14/8/intro
-apache-hadoop-big-data

C
D
Distributed Filesystem
(e.g. Apache HadoopDFS, GoogleFS, EMRFS)

C, D: Two different files; break into chunks (or "partitions"):

C0 D0

C1 D1

C2 D2

C3 D3

C4 D4

C5 D5
Distributed Filesystem
(e.g. Apache HadoopDFS, GoogleFS, EMRFS)

C, D: Two different files

chunk server 1 chunk server 2 chunk server 3 chunk server n

(Leskovec at al., 2014; http://www.mmds.org/)

Distributed Filesystem
(e.g. Apache HadoopDFS, GoogleFS, EMRFS)

C, D: Two different files

chunk server 1 chunk server 2 chunk server 3 chunk server n

(Leskovec at al., 2014; http://www.mmds.org/)

Distributed Filesystem
(e.g. Apache HadoopDFS, GoogleFS, EMRFS)

C, D: Two different files

chunk server 1 chunk server 2 chunk server 3 chunk server n

(Leskovec at al., 2014; http://www.mmds.org/)

Distributed Filesystem
Chunk servers (on Data Nodes)
File is split into contiguous chunks
Typically each chunk is 16-64MB
Each chunk replicated (usually 2x or 3x)
Try to keep replicas in different racks

(Leskovec at al., 2014; http://www.mmds.org/)

Components of a Distributed Filesystem
Chunk servers (on Data Nodes)
File is split into contiguous chunks
Typically each chunk is 16-64MB
Each chunk replicated (usually 2x or 3x)
Try to keep replicas in different racks
Name node (aka master node)
Stores metadata about where files are stored
Might be replicated or distributed across data nodes.

(Leskovec at al., 2014; http://www.mmds.org/)

Distributed Architecture (Cluster)
Challenges for IO Cluster Computing
1. Nodes fail
1 in 1000 nodes fail a day
Duplicate Data (Distributed FS)
2. Network is a bottleneck
Typically 1-10 Gb/s throughput
Bring computation to nodes, rather than
data to nodes.
3. Traditional distributed programming is
often ad-hoc and complicated
Stipulate a programming system that
can easily be distributed
What is MapReduce
noun.1 - A style of programming

input chunks => map tasks | group_by keys | reduce tasks => output

“|” is the linux “pipe” symbol: passes stdout from first process to stdin of next.
What is MapReduce
noun.1 - A style of programming

input chunks => map tasks | group_by keys | reduce tasks => output

“|” is the linux “pipe” symbol: passes stdout from first process to stdin of next.

E.g. counting words:

tokenize(document) | sort | uniq -c

What is MapReduce
noun.1 - A style of programming

input chunks | map tasks | group_by keys | reduce tasks => output

“|” is the linux “pipe” symbol: passes output from first process to input of next.

E.g. counting words:

cat file.txt | tr -s '[[:space:]]' '\n' | sort | uniq -c

noun.2 - A system that distributes MapReduce style programs across a

distributed file-system.

(e.g. Google’s internal “MapReduce” or apache.hadoop.mapreduce with hdfs)

What is MapReduce
noun.1 - A style of programming

input chunks => map tasks | group_by keys | reduce tasks => output

“|” is the linux “pipe” symbol: passes output from first process to input of next.

E.g. counting words:

tokenize(document) | sort | uniq -c

noun.2 - A system that distributes MapReduce style programs across a

distributed file-system.

(e.g. Google’s internal “MapReduce” or apache.hadoop.mapreduce with hdfs)

What is MapReduce
What is MapReduce

extract what
you care
about.

line => (k, v) Map

What is MapReduce

sort and
shuffle

many (k, v) =>

(k, [v1, v2]), ...
extract what
you care
about.

Map
What is MapReduce

sort and
shuffle
extract what
you care
about. aggregate,
summarize
Map
Reduce
What is MapReduce
Easy as 1, 2, 3!
Step 1: Map Step 2: Sort / Group by Step 3: Reduce
What is MapReduce
Easy as 1, 2, 3!
Step 1: Map Step 2: Sort / Group by Step 3: Reduce

(Leskovec at al., 2014; http://www.mmds.org/)

(1) The Map Step

(Leskovec at al., 2014; http://www.mmds.org/)

(2) The Sort / Group-by Step

(Leskovec at al., 2014; http://www.mmds.org/)

(3) The Reduce Step

(Leskovec at al., 2014; http://www.mmds.org/)

What is MapReduce
Easy as 1, 2, 3!
Step 1: Map Step 2: Sort / Group by Step 3: Reduce

(Leskovec at al., 2014; http://www.mmds.org/)

What is MapReduce
Map: (k,v) -> (k’, v’)*
(Written by programmer)

Group by key: (k1’, v1’), (k2’, v2’), ... -> (k1’, (v1’, v’, …),
(system handles) (k2’, (v1’, v’, …), …

Reduce: (k’, (v1’, v’, …)) -> (k’, v’’)*

(Written by programmer)
Example: Word Count
tokenize(document) | sort | uniq -c
Example: Word Count
tokenize(document) | sort | uniq -c

Map: extract
what you sort and Reduce:
care about. shuffle aggregate,
summarize
Example: Word Count

(Leskovec at al., 2014; http://www.mmds.org/)

(Leskovec at al., 2014;
http://www.mmds.org/)

Chunks
Example: Word Count
@abstractmethod
def map(k, v):
pass

@abstractmethod
def reduce(k, vs):
pass
Example: Word Count (v1)
def map(k, v):
for w in tokenize(v):
yield (w,1)

def reduce(k, vs):

return len(vs)
Example: Word Count (v1)
def map(k, v): def tokenize(s):
for w in tokenize(v): #simple version
yield (w,1) return s.split(‘ ‘)

def reduce(k, vs):

return len(vs)
Example: Word Count (v2)
def map(k, v):
counts = dict()
for w in tokenize(v):

counts each word within the chunk

(try/except is faster than
“if w in counts”)
Example: Word Count (v2)
def map(k, v):
counts = dict()
for w in tokenize(v):
try:
counts[w] += 1 counts each word within the chunk
except KeyError: (try/except is faster than
counts[w] = 1 “if w in counts”)
for item in counts.iteritems():
yield item
Example: Word Count (v2)
def map(k, v):
counts = dict()
for w in tokenize(v):
try:
counts[w] += 1 counts each word within the chunk
except KeyError: (try/except is faster than
counts[w] = 1 “if w in counts”)
for item in counts.iteritems():
yield item

def reduce(k, vs): sum of counts from different chunks

return (k, sum(vs) )
Distributed Architecture (Cluster)
Challenges for IO Cluster Computing
1. Nodes fail
1 in 1000 nodes fail a day
Duplicate Data (Distributed FS)
2. Network is a bottleneck
Typically 1-10 Gb/s throughput
Bring computation to nodes, rather than
data to nodes.
3. Traditional distributed programming is
often ad-hoc and complicated
Stipulate a programming system that
can easily be distributed
Distributed Architecture (Cluster)
Challenges for IO Cluster Computing
1. Nodes fail
1 in 1000 nodes fail a day
Duplicate Data (Distributed FS)
2. Network is a bottleneck
Typically 1-10 Gb/s throughput
Bring computation to nodes, rather than
data to nodes. (Sort and Shuffle)
3. Traditional distributed programming is
often ad-hoc and complicated
Stipulate a programming system that
can easily be distributed
Distributed Architecture (Cluster)
Challenges for IO Cluster Computing
1. Nodes fail
1 in 1000 nodes fail a day
Duplicate Data (Distributed FS)
2. Network is a bottleneck
Typically 1-10 Gb/s throughput
Bring computation to nodes, rather than
data to nodes. (Sort and Shuffle)
3. Traditional distributed programming is
often ad-hoc and complicated (Simply define a map
Stipulate a programming system that and reduce)
can easily be distributed
Example: Relational Algebra

Select

Project

Union, Intersection, Difference

Natural Join

Grouping
Example: Relational Algebra

Select

Project

Union, Intersection, Difference

Natural Join

Grouping
Example: Relational Algebra

Select

R(A1,A2,A3,...), Relation R, Attributes A*

return only those attribute tuples where condition C is true

Example: Relational Algebra
Select
R(A1,A2,A3,...), Relation R, Attributes A*
return only those attribute tuples where condition C is true
def map(k, v): #v is list of attribute tuples: [(...,), (...,), ...]
r = []
for t in v:
if t satisfies C:
r += [(t, t)]
return r
Example: Relational Algebra
Select
R(A1,A2,A3,...), Relation R, Attributes A*
return only those attribute tuples where condition C is true
def map(k, v): #v is list of attribute tuples: [(...,), (...,), ...]
r = []
for t in v:
if t satisfies C:
r += [(t, t)]
return r
def reduce(k, vs):
r = []
for each v in vs:
r += [(k, v)]
return r
Example: Relational Algebra

Select

R(A1,A2,A3,...), Relation R, Attributes A*

return only those attribute tuples where condition C is true

def map(k, v): #v is list of attribute tuples
for t in v:
if t satisfies C:
yield (t, t)

def reduce(k, vs):

For each v in vs:
yield (k, v)
Example: Relational Algebra
Natural Join
Given R1 and R2 return Rjoin
-- union of all pairs of tuples that match given attributes.
def map(k, v): #k \in {R1, R2}, v is (A, B) for R1, (B, C) for R2
#B are matched attributes
Example: Relational Algebra
Natural Join
Given R1 and R2 return Rjoin
-- union of all pairs of tuples that match given attributes.
def map(k, v): #k \in {R1, R2}, v is (A, B) for R1, (B, C) for R2
#B are matched attributes
if k==’R1’:
(a, b) = v
return (b,(‘R1’,a))
if k==’R2’:
(b,c) = v
return (b,(‘R2’,c))
Example: Relational Algebra
Natural Join
Given R1 and R2 return Rjoin
-- union of all pairs of tuples that match given attributes.
def map(k, v): #k \in {R1, R2}, v is (A, B) for R1, (B, C) for R2
#B are matched attributes
if k==’R1’:
def reduce(k, vs):
(a, b) = v
return (b,(‘R1’,a)) r1, r2, rjn = [], [], []
if k==’R2’: for (s, x) in vs: #separate rs
(b,c) = v if s == ‘R1’: r1.append(x)
return (b,(‘R2’,c)) else: r2.append(x)
for a in r1: #join as tuple
for each c in r2:
rjn += (‘Rjoin’, (a, k, c)) #k is b
return rjn
Data Flow
Data Flow

hash

(Leskovec at al., 2014; http://www.mmds.org/)

Data Flow

Programmer

hash

Programmer

(Leskovec at al., 2014; http://www.mmds.org/)

Data Flow

DFS Map Map’s Local FS Reduce DFS

Data Flow

MapReduce system handles:

● Partitioning
● Scheduling map / reducer execution
● Group by key

● Restarts from node failures

● Inter-machine communication
Data Flow

DFS MapReduce DFS

● Schedule map tasks near physical storage of chunk

● Intermediate results stored locally
● Master / Name Node coordinates
Data Flow

DFS MapReduce DFS

● Schedule map tasks near physical storage of chunk

● Intermediate results stored locally
● Master / Name Node coordinates
○ Task status: idle, in-progress, complete
○ Receives location of intermediate results and schedules with reducer
○ Checks nodes for failures and restarts when necessary
■ All map tasks on nodes must be completely restarted
■ Reduce tasks can pickup with reduce task failed
Data Flow

DFS MapReduce DFS

● Schedule map tasks near physical storage of chunk

DFS MapReduce DFS MapReduce DFS

Data Flow

Skew: The degree to which certain tasks end up taking much

longer than others.

Handled with:

● More reducers (i.e. partitions) than reduce tasks

● More reduce tasks than nodes
Data Flow

Key Question: How many Map and Reduce jobs?

M: map tasks, R: reducer tasks
Data Flow

Key Question: How many Map and Reduce jobs?

M: map tasks, R: reducer tasks
CPU CPU CPU

Answer: 1) If possible, one chunk per map task Mem Mem . Mem
(maximizes ﬂexibility for scheduling) .
.
Disk Disk Disk
2) M >> |nodes| ≈≈ |cores|
(better handling of node failures, better load balancing)
3) R <= M
(reduces number of parts stored in DFS)
Data Flow Tasks (Map Task or Reduce Task)
version 1: few reduce tasks
(same number of reduce tasks as nodes)

node1

node2

node3

node4

node5

time
tasks represented by
time to complete task
(some tasks take much longer)
Data Flow Tasks (Map Task or Reduce Task)
version 1: few reduce tasks version 2: more reduce tasks
(same number of reduce tasks as nodes) (more reduce tasks than nodes)

node1 node1

node2 node2

node3 node3

node4 node4

node5 node5

time time
tasks represented by tasks represented by
time to complete task time to complete task
(some tasks take much longer) (some tasks take much longer)
Data Flow Tasks (Map Task or Reduce Task)
version 1: few reduce tasks version 2: more reduce tasks
(same number of reduce tasks as nodes) (more reduce tasks than nodes)

node1 node1 node1

Last task
completed
node2 node2 node2

node3 node3 Can node3

redistribute
these tasks to
node4 node4 other nodes node4

node5 node5 node5

time time time

tasks represented by tasks represented by
time to complete task time to complete task (the last task now completes
(some tasks take much longer) (some tasks take much longer) much earlier )
Communication Cost Model

How to assess performance?

(1) Computation: Map + Reduce + System Tasks

(2) Communication: Moving (key, value) pairs

Communication Cost Model

How to assess performance?

(1) Computation: Map + Reduce + System Tasks

(2) Communication: Moving (key, value) pairs

Ultimate Goal: wall-clock Time.

Communication Cost Model

How to assess performance?

(1) Computation: Map + Reduce + System Tasks

● Mappers and reducers often single pass O(n) within node
(2) Communication: Moving
● System: sort the keys key,
is usually value
most pairs
expensive
● Even if map executes on same node, disk read usually
dominates
● In any case, can add more nodes
Ultimate Goal: wall-clock Time.
Communication Cost Model

How to assess performance?

(1) Computation: Map + Reduce + System Tasks

(2) Communication: Moving key, value pairs

How to assess performance?

Communication
(1) Cost Map
Computation: = input size +
+ Reduce + System Tasks
(sum of size of all map-to-reducer files)

(2) Communication: Moving key, value pairs

How to assess performance?

Communication
(1) Cost Map
Computation: = input size +
+ Reduce + System Tasks
(sum of size of all map-to-reducer files)

(2) Communication: Moving key, value pairs

Often dominates computation.
● Connection speeds: 1-10 gigabits per sec;
UltimateHDGoal:
read:wall-clock Time.
50-150 gigabytes per sec
● Even reading from disk to memory typically takes longer than
operating on the data.
● Output from reducer ignored because it’s either small (finished
summarizing data) or being passed to another mapreduce job.
Communication Cost: Natural Join

R, S: Relations (Tables) R(A, B) ⨝ S(B, C)

Communication Cost = input size +

(sum of size of all map-to-reducer files)

DFS Map LocalFS Network Reduce DFS ?

(Anytime where MapReduce would need to write and read from disk a lot).
Communication Cost: Natural Join
R, S: Relations (Tables) R(A, B) ⨝ S(B, C)

Communication Cost = input size +

(sum of size of all map-to-reducer files)

def reduce(k, vs):

r1, r2 = [], []
def map(k, v): for (rel, x) in vs: #separate rs
if k==”R1”: if rel == ‘R’: r1.append(x)
(a, b) = v else: r2.append(x)
yield (b,(R1,a))
for a in r1: #join as tuple
if k==”R2”:
(b,c) = v for each c in r2:
yield (b,(R2,c)) yield (Rjoin’, (a, k, c)) #k is
b
Communication Cost: Natural Join
R, S: Relations (Tables) R(A, B) ⨝ S(B, C)

Communication Cost = input size +

(sum of size of all map-to-reducer files)

= |R1| + |R2| + (|R1| + |R2|)

def reduce(k, vs):
= O(|R1| + |R2|)
r1, r2 = [], []
def map(k, v): for (rel, x) in vs: #separate rs
if k==”R1”: if rel == ‘R’: r1.append(x)
(a, b) = v else: r2.append(x)
yield (b,(R1,a))
for a in r1: #join as tuple
if k==”R2”:
(b,c) = v for each c in r2:
yield (b,(R2,c)) yield (Rjoin’, (a, k, c)) #k is
b
MapReduce: Final Considerations
● Performance Refinements:
○ Combiners (like word count version 2 but done via reduce)
■ Run reduce right after map from same node before passing to
reduce (MapTask can execute)
■ Reduces communication cost

○ Backup tasks (aka speculative tasks)

■ Schedule multiple copies of tasks when close to the end to
mitigate certain nodes running slow.

○ Override partition hash function to organize data

E.g. instead of hash(url) use hash(hostname(url))
MapReduce: Final Considerations
● Performance Refinements:
○ Combiners (like word count version 2 but done via reduce)
■ Run reduce right after map from same node before passing to
reduce (MapTask can execute)
■ Reduces communication cost but requires commutative
reduce steps
○ Backup tasks (aka speculative tasks)
■ Schedule multiple copies of tasks when close to the end to
mitigate certain nodes running slow.

○ Override partition hash function to organize data

E.g. instead of hash(url) use hash(hostname(url))

OpenText Archive Center Cluster SDW 2022
No ratings yet
OpenText Archive Center Cluster SDW 2022
7 pages
Chapter 3
No ratings yet
Chapter 3
47 pages
Week 02
No ratings yet
Week 02
115 pages
BDA-Lec5
No ratings yet
BDA-Lec5
40 pages
TM2 ch02 Mapreduce
No ratings yet
TM2 ch02 Mapreduce
51 pages
BigData Unit 2
No ratings yet
BigData Unit 2
56 pages
CAIM: Cerca I Anàlisi D'informació Massiva: FIB, Grau en Enginyeria Informàtica
No ratings yet
CAIM: Cerca I Anàlisi D'informació Massiva: FIB, Grau en Enginyeria Informàtica
65 pages
Map Reduce Notes and Learning
No ratings yet
Map Reduce Notes and Learning
48 pages
Ch02a Mapreduce
No ratings yet
Ch02a Mapreduce
53 pages
Mapreduce: Simplified Data Processing On Large Clusters
No ratings yet
Mapreduce: Simplified Data Processing On Large Clusters
38 pages
Bda CHP2
No ratings yet
Bda CHP2
105 pages
Map Reduce
No ratings yet
Map Reduce
28 pages
Cloud Compute
No ratings yet
Cloud Compute
46 pages
MapReduce and The New Software Stack
No ratings yet
MapReduce and The New Software Stack
33 pages
4
No ratings yet
4
53 pages
CS621 Week 15
No ratings yet
CS621 Week 15
64 pages
Lecture 1 - Map Reduce
No ratings yet
Lecture 1 - Map Reduce
31 pages
Unit 5 - Introduction To Hadoop
No ratings yet
Unit 5 - Introduction To Hadoop
50 pages
MapReduce-Final
No ratings yet
MapReduce-Final
92 pages
BDA-UNIT-2 - 2023
No ratings yet
BDA-UNIT-2 - 2023
58 pages
DW - Bigdata9
No ratings yet
DW - Bigdata9
113 pages
Parallel & Distributed Computing
100% (1)
Parallel & Distributed Computing
52 pages
3 Hadoop
No ratings yet
3 Hadoop
111 pages
Hadoop Echosystem and Ibm Big Insights: Rafie Tarabay Eng - Rafie@Mans - Edu.Eg
No ratings yet
Hadoop Echosystem and Ibm Big Insights: Rafie Tarabay Eng - Rafie@Mans - Edu.Eg
112 pages
Unit Ii LM
No ratings yet
Unit Ii LM
18 pages
DA
No ratings yet
DA
51 pages
Map Reduce
No ratings yet
Map Reduce
25 pages
Chapter2 Bdi
No ratings yet
Chapter2 Bdi
101 pages
Hadoop Spark
No ratings yet
Hadoop Spark
34 pages
02 Unit-II Hadoop Architecture and HDFS
No ratings yet
02 Unit-II Hadoop Architecture and HDFS
18 pages
Hadoop: A Seminar Report On
No ratings yet
Hadoop: A Seminar Report On
28 pages
Lecture4 IntroMapReduce PDF
No ratings yet
Lecture4 IntroMapReduce PDF
75 pages
BDP 2023 03
No ratings yet
BDP 2023 03
59 pages
Introduction To Hadoop
No ratings yet
Introduction To Hadoop
5 pages
Unit 5 - Introduction To Hadoop
No ratings yet
Unit 5 - Introduction To Hadoop
50 pages
Hadoop Unit-4
No ratings yet
Hadoop Unit-4
44 pages
Hadoop module1
No ratings yet
Hadoop module1
37 pages
BDA Module 2 COMP
No ratings yet
BDA Module 2 COMP
29 pages
Big Data Analysis pdf 2
No ratings yet
Big Data Analysis pdf 2
18 pages
Hadoop ISE 2
No ratings yet
Hadoop ISE 2
25 pages
CS19741-Cloud Computing-Unit 3 Notes
No ratings yet
CS19741-Cloud Computing-Unit 3 Notes
37 pages
Take A Close Look At: Ma Ed
No ratings yet
Take A Close Look At: Ma Ed
42 pages
HADOOP
No ratings yet
HADOOP
19 pages
Another Intro To Hadoop
No ratings yet
Another Intro To Hadoop
23 pages
Hadoop Training in Bangalore
No ratings yet
Hadoop Training in Bangalore
31 pages
Unit-2 Hadoop and MapReduce
No ratings yet
Unit-2 Hadoop and MapReduce
32 pages
Big Data Ia Answers
No ratings yet
Big Data Ia Answers
14 pages
Introduction To The Big Data Ecosystem
No ratings yet
Introduction To The Big Data Ecosystem
13 pages
MapReduce - 1
No ratings yet
MapReduce - 1
39 pages
Introduction To Map Reduce
No ratings yet
Introduction To Map Reduce
50 pages
DC Hadoop
No ratings yet
DC Hadoop
48 pages
Lecture 2
No ratings yet
Lecture 2
70 pages
3.1 Hadoop Ecosystem
No ratings yet
3.1 Hadoop Ecosystem
48 pages
2 Hadoop Ecosystem
No ratings yet
2 Hadoop Ecosystem
41 pages
Big Data Computing
No ratings yet
Big Data Computing
36 pages
BDA Unit 1 Notes
No ratings yet
BDA Unit 1 Notes
24 pages
Printing Big Data Hadoop
No ratings yet
Printing Big Data Hadoop
24 pages
Introduction To Hadoop: Dr. G Sudha Sadhasivam Professor, CSE PSG College of Technology Coimbatore
No ratings yet
Introduction To Hadoop: Dr. G Sudha Sadhasivam Professor, CSE PSG College of Technology Coimbatore
34 pages
Map Reduce
No ratings yet
Map Reduce
30 pages
Big Data
No ratings yet
Big Data
51 pages
Hard Circle Drives (HDDs): Uncovering the Center of Information Stockpiling
From Everand
Hard Circle Drives (HDDs): Uncovering the Center of Information Stockpiling
Friend Good
No ratings yet
Kejriwal Castings Limited (Testing Lab, DGP) : 1.0 Purpose
No ratings yet
Kejriwal Castings Limited (Testing Lab, DGP) : 1.0 Purpose
2 pages
IBM LTO-7 Firmware Versions G9Q0 (FH) and G9Q1 (HH) Release Notes
No ratings yet
IBM LTO-7 Firmware Versions G9Q0 (FH) and G9Q1 (HH) Release Notes
4 pages
TN-2064-Nutanix-Clones-and-Snapshots
No ratings yet
TN-2064-Nutanix-Clones-and-Snapshots
24 pages
Capsule Backup Share
No ratings yet
Capsule Backup Share
1 page
Skill Enhancement Course-I Aws Cloud Computing: Submitted by
No ratings yet
Skill Enhancement Course-I Aws Cloud Computing: Submitted by
34 pages
Parts of A Motherboard and Their Function
No ratings yet
Parts of A Motherboard and Their Function
9 pages
Introduction To Ict
No ratings yet
Introduction To Ict
8 pages
Ir3300 017
No ratings yet
Ir3300 017
4 pages
Ece5023 Memory-Design-And-testing TH 1.1 47 Ece5023
No ratings yet
Ece5023 Memory-Design-And-testing TH 1.1 47 Ece5023
2 pages
Large and Fast: Exploiting Memory Hierarchy: Topics To Be Covered
No ratings yet
Large and Fast: Exploiting Memory Hierarchy: Topics To Be Covered
13 pages
Yamaha+Psr 350
No ratings yet
Yamaha+Psr 350
33 pages
DATASHEET - Azure Stack Edge Mini R - 12 - 04 - 20 PDF
No ratings yet
DATASHEET - Azure Stack Edge Mini R - 12 - 04 - 20 PDF
2 pages
3.1 Storage Devices and Media
No ratings yet
3.1 Storage Devices and Media
21 pages
Registration (Amendment) Karnataka Act, 2001
No ratings yet
Registration (Amendment) Karnataka Act, 2001
7 pages
Hadoop
No ratings yet
Hadoop
11 pages
Platform Technologies - P4
No ratings yet
Platform Technologies - P4
38 pages
For Learners: TLE-9 (ICT) Second Quarter, Week 1 Day 1-4
No ratings yet
For Learners: TLE-9 (ICT) Second Quarter, Week 1 Day 1-4
29 pages
Master File Table
No ratings yet
Master File Table
2 pages
The Function of Secondary Storage
No ratings yet
The Function of Secondary Storage
7 pages
IBM DS8000 Family Enterprise Disk Storage Technical Sales Level 3 Quiz Attempt Review PDF
No ratings yet
IBM DS8000 Family Enterprise Disk Storage Technical Sales Level 3 Quiz Attempt Review PDF
12 pages
Information Storage Management
No ratings yet
Information Storage Management
2 pages
History Lesson
No ratings yet
History Lesson
2 pages
Backing Storage
No ratings yet
Backing Storage
16 pages
Cool Base
No ratings yet
Cool Base
30 pages
IBM Deskstar 120GXP
No ratings yet
IBM Deskstar 120GXP
2 pages
Lesson 9 - Unit of Measurement
No ratings yet
Lesson 9 - Unit of Measurement
36 pages
T1 Homework 1
100% (1)
T1 Homework 1
3 pages
Advanced Authoring Format (AAF) Low-Level Container Specification v1.0.1
No ratings yet
Advanced Authoring Format (AAF) Low-Level Container Specification v1.0.1
11 pages