Lect7 IoT BigData1
Lect7 IoT BigData1
Presented by
Dr. Amany AbdElSamea
Outline
2
What is Big Data?
• Collection of data sets so large and complex that it becomes difficult to
process using on-hand database management tools or traditional data
processing applications.
• The scale, diversity, and complexity of data require new architecture,
techniques, algorithms, and analytics to manage and extract value and
hidden knowledge from it.
Why Big Data?
• Map Reduce
An underlying distributed file systems (e.g., GFS) splits large data files into
chunks which are managed by different nodes in the cluster
Even though the file chunks are distributed across several machines, they form
a single namesapce
24
MapReduce Steps
In MapReduce, chunks are processed in chunks C0 C1 C2 C3
isolation by tasks called Mappers
Map Phase
The outputs from the mappers are denoted as mappers M0 M1 M2 M3
Reduce Phase
The process of bringing together IOs into a set Shuffling Data
of Reducers is known as shuffling process
Reducers R0 R1
The map and reduce functions receive and emit (K, V) pairs
Input Splits Intermediate Outputs Final Outputs