1 SQL Hadoop Analyzing Big Data Hive m1 Intro Hadoop Slides
1 SQL Hadoop Analyzing Big Data Hive m1 Intro Hadoop Slides
Ahmad Alkilani
www.pluralsight.com
Introduction to Hadoop
Ahmad Alkilani
www.pluralsight.com
Outline
Memory C CPU
P
Memory
DISK U
Google
~40 Billion Web Pages x 30 KB each = Petabyte
Today’s average disk speed reads about 120 MB/sec
Little over 3 months to read the web!
Approximately 1,000 drives to store and use
Distributed Computing Challenges
Data Node
CPU Data Node Data Node
CPU Data Node
Disk Disk
Hadoop File System (HDFS)
Server Rack A Server Rack B
64 64 64 64
MB MB MB MB
MapReduce
Folder in HDFS
Reducer A Reducer B
Word Count Example
Key Value
Reducer A Reducer B
first 1 is 2
second 1 line 2
the 2
This 2
Basic commands using HDFS
Hadoop Demo
Environment Setup