0% found this document useful (0 votes)

47 views

1 SQL Hadoop Analyzing Big Data Hive m1 Intro Hadoop Slides

This document introduces Hadoop and its components. It discusses the motivation for Hadoop in dealing with big data that exceeds single computer capabilities. It describes the Hadoop architecture including HDFS for reliable storage across clusters and MapReduce for distributed processing of large datasets in parallel. It provides an example of how MapReduce and HDFS can be used to solve a word counting problem on big data in a distributed manner.

Uploaded by

गोपाल शर्मा

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views

1 SQL Hadoop Analyzing Big Data Hive m1 Intro Hadoop Slides

Uploaded by

गोपाल शर्मा

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

SQL on Hadoop - Analyzing

Big Data with Hive

Ahmad Alkilani
www.pluralsight.com
Introduction to Hadoop

Ahmad Alkilani
www.pluralsight.com
Outline

 Why Hadoop? Motivation

 Hadoop architecture and distributed computing
 HDFS
 MapReduce
 Getting up and running
Motivation for Hadoop

Memory C CPU
P
Memory
DISK U
Google
 ~40 Billion Web Pages x 30 KB each = Petabyte
 Today’s average disk speed reads about 120 MB/sec
 Little over 3 months to read the web!
 Approximately 1,000 drives to store and use
Distributed Computing Challenges

 Scale out with distributed computing

 Hadoop based on Google’s implementation
 Volume, Velocity, and Variety
 Recover from failures Name Node Job Tracker
 Shared nothing architecture
 Hadoop file system (HDFS)
Data Node
CPU Data Node Data Node Data node
CPU
 MapReduce Disk Disk

Data Node Data Node Data Node Data Node

Data Node
CPU Data Node Data Node
CPU Data Node

Disk Disk
Hadoop File System (HDFS)
Server Rack A Server Rack B
64 64 64 64
MB MB MB MB
MapReduce

Data Node Data Node Data Node

One mapper per block
Map Map Map
Block of data Block of data Block of data
Parallel distributed processing given
a file is split into blocks
across multiple servers.

5 Value 9 Value 2 Value

9 Value 2 Value 3 Value
9 Value 3 Value 2 Value
Shuffle and Sort
7 Value 7 Value

Folder in HDFS
Reducer A Reducer B
Word Count Example
Key Value

Byte offset This is the first line

Byte offset This is the second
line
Key Value Key Value Key Value Key Value
This 1 This 1 This 1 line 1
is 1 is 1 This 1 line 1
the 1 the 1 the 1 is 1
first 1 second 1 the 1 is 1
line 1 line 1 second 1
first 1

Reducer A Reducer B
first 1 is 2
second 1 line 2
the 2
This 2
Basic commands using HDFS

Hadoop Demo
Environment Setup

 Course focus is on development

 Use a Virtual Machine image to follow along with examples
 Pseudo distributed sandbox
 Replication factor set to 1
 Name Node, Job Tracker, Data Node, and Task Tracker on a single machine
 Demos using Hortonworks’ HDP sandbox
 Hive 0.10, 0.11 and above
Summary

 Distributed computing and scaling out to solve big data problems

 Key system characteristics
 Built to handle failures
 Move processing to the data
 Failures are inevitable. Embracing this allows for solutions built on
commodity servers
 MapReduce
 Mapper assigned to each block of data
 Key-value pairs are both the input to and output of each phase
 Keys must implement WritableComparable interface
 Shuffle and Sort plays a key role in solving problem

Rate Limiting Using Token Bucket Filter - Java Multithreading For Senior Engineering Interviews
No ratings yet
Rate Limiting Using Token Bucket Filter - Java Multithreading For Senior Engineering Interviews
8 pages
Hadoop Overview-Tutorial-20081128 PDF
No ratings yet
Hadoop Overview-Tutorial-20081128 PDF
31 pages
Lecture 5 - Hadoop and Mapreduce
No ratings yet
Lecture 5 - Hadoop and Mapreduce
30 pages
BDP 2023 03
No ratings yet
BDP 2023 03
59 pages
Chapter4 PDF
No ratings yet
Chapter4 PDF
50 pages
Lez.d-01-Hadoop (A) Intro
No ratings yet
Lez.d-01-Hadoop (A) Intro
58 pages
Hadoop – Big Data Solution V1.01
No ratings yet
Hadoop – Big Data Solution V1.01
43 pages
Lecture 5 - Hadoop and Mapreduce
No ratings yet
Lecture 5 - Hadoop and Mapreduce
30 pages
Analyzing Big Data in Hadoop Spark
No ratings yet
Analyzing Big Data in Hadoop Spark
30 pages
Hadoop Tutorials: Daniel Lanza Zbigniew Baranowski
No ratings yet
Hadoop Tutorials: Daniel Lanza Zbigniew Baranowski
49 pages
Introduction To The Big Data Ecosystem
No ratings yet
Introduction To The Big Data Ecosystem
13 pages
Big Data(Hadoop) ppt
No ratings yet
Big Data(Hadoop) ppt
28 pages
Lecture 2
No ratings yet
Lecture 2
28 pages
Hdfs MR Wordcount
No ratings yet
Hdfs MR Wordcount
16 pages
Map Reduce Programming
No ratings yet
Map Reduce Programming
74 pages
9 Hadoop PDF
No ratings yet
9 Hadoop PDF
59 pages
5_bdp-2024-06
No ratings yet
5_bdp-2024-06
14 pages
Hadoop: Fasilkom/Pusilkom UI (Credit: Samuel Louvan)
No ratings yet
Hadoop: Fasilkom/Pusilkom UI (Credit: Samuel Louvan)
44 pages
Hadoop Introduction Final
No ratings yet
Hadoop Introduction Final
13 pages
Hadoop and HBase
No ratings yet
Hadoop and HBase
31 pages
Hadoop Introduction
No ratings yet
Hadoop Introduction
29 pages
Intro ToHadoop-Unit 04
No ratings yet
Intro ToHadoop-Unit 04
24 pages
HDFS 79
No ratings yet
HDFS 79
74 pages
Lecture3 Hadoop-NLP
No ratings yet
Lecture3 Hadoop-NLP
44 pages
Lez.d-01-Hadoop (C)
No ratings yet
Lez.d-01-Hadoop (C)
29 pages
Module 3 - Mapreduce
No ratings yet
Module 3 - Mapreduce
40 pages
BDP 2024 06
No ratings yet
BDP 2024 06
14 pages
Hadoop
No ratings yet
Hadoop
104 pages
MapReduce
No ratings yet
MapReduce
9 pages
BDA-Unit-1
No ratings yet
BDA-Unit-1
35 pages
Hadoop Map Reduce Concept
No ratings yet
Hadoop Map Reduce Concept
23 pages
Session3_4-Bigdata Tools and Movie use case
No ratings yet
Session3_4-Bigdata Tools and Movie use case
79 pages
Apache Hadoop and Spark:: and Use Cases For Data Analysis
No ratings yet
Apache Hadoop and Spark:: and Use Cases For Data Analysis
48 pages
BigDataAnalytics_1.2 (2)
No ratings yet
BigDataAnalytics_1.2 (2)
25 pages
Basics of RDD
No ratings yet
Basics of RDD
84 pages
Hadoop Course Outline UPDATED SURESH
No ratings yet
Hadoop Course Outline UPDATED SURESH
5 pages
HADOOP
No ratings yet
HADOOP
57 pages
Distributed and Cloud Computing
No ratings yet
Distributed and Cloud Computing
58 pages
Devops Roadmap
No ratings yet
Devops Roadmap
1 page
Hadoop and MR Programming: DR G Sudha Sadasivam Professor Cse, PSGCT
No ratings yet
Hadoop and MR Programming: DR G Sudha Sadasivam Professor Cse, PSGCT
71 pages
LectureNotes_HadoopFinal_1
No ratings yet
LectureNotes_HadoopFinal_1
74 pages
Hadoop Intro
No ratings yet
Hadoop Intro
25 pages
Tableau Module 10
No ratings yet
Tableau Module 10
55 pages
Exam 70-445 Prep
No ratings yet
Exam 70-445 Prep
56 pages
Speed Up Your Queries With Hive LLAP Engine On Hadoop or in The Cloud
No ratings yet
Speed Up Your Queries With Hive LLAP Engine On Hadoop or in The Cloud
29 pages
Da ANSWERS
No ratings yet
Da ANSWERS
13 pages
07-BigData-DataAnalysis
No ratings yet
07-BigData-DataAnalysis
66 pages
Lecture 1 - Map Reduce
No ratings yet
Lecture 1 - Map Reduce
31 pages
Leveraging Consistent Hashing in Your Python Applications
No ratings yet
Leveraging Consistent Hashing in Your Python Applications
39 pages
Chapter 9 - Processing Big Data With Mapreduce
No ratings yet
Chapter 9 - Processing Big Data With Mapreduce
157 pages
A-Deep-Dive-In-Hadoop-Spark-and-SQL-DW
No ratings yet
A-Deep-Dive-In-Hadoop-Spark-and-SQL-DW
41 pages
Caching Product Comparison
No ratings yet
Caching Product Comparison
2 pages
Zookeeper
No ratings yet
Zookeeper
59 pages
Container Orchestration For Big Data Workloads Final 72717 301083
No ratings yet
Container Orchestration For Big Data Workloads Final 72717 301083
66 pages
Hadoop Vs Apache Spark
No ratings yet
Hadoop Vs Apache Spark
6 pages
Unit 5 Frameworks and Visualizatoins Hadoop Map Reduce Architecture and Example
No ratings yet
Unit 5 Frameworks and Visualizatoins Hadoop Map Reduce Architecture and Example
45 pages
The Big Data Technology Landscape
No ratings yet
The Big Data Technology Landscape
36 pages
Lecture 1 Parallel Databases
No ratings yet
Lecture 1 Parallel Databases
30 pages
Data Engineering 101 Hadoop Q as 1725280945
No ratings yet
Data Engineering 101 Hadoop Q as 1725280945
27 pages
Chapter Five Hadoop Mapreduce & HDFS
No ratings yet
Chapter Five Hadoop Mapreduce & HDFS
44 pages
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet
Fast Algorithms For Mining Association Rules
No ratings yet
Fast Algorithms For Mining Association Rules
2 pages
JW Player 6.8.4616 (Ads Edition) - Google खोज
0% (1)
JW Player 6.8.4616 (Ads Edition) - Google खोज
2 pages
FFmpeg, HLS - Google खोज
No ratings yet
FFmpeg, HLS - Google खोज
2 pages
What Is LightGBM, How To Implement It - How To Fine Tune The Parameters
No ratings yet
What Is LightGBM, How To Implement It - How To Fine Tune The Parameters
2 pages
Flutter Documentation - Flutter
No ratings yet
Flutter Documentation - Flutter
3 pages
Mahesh
No ratings yet
Mahesh
1 page
Om Namah Shivaya
No ratings yet
Om Namah Shivaya
1 page
3 SQL Hadoop Analyzing Big Data Hive m3 Hiveql Slides
No ratings yet
3 SQL Hadoop Analyzing Big Data Hive m3 Hiveql Slides
33 pages
2 SQL Hadoop Analyzing Big Data Hive m2 Intro Slides
No ratings yet
2 SQL Hadoop Analyzing Big Data Hive m2 Intro Slides
14 pages
Ashish Kedia's Answer To How Do I Practice Programming Everyday
No ratings yet
Ashish Kedia's Answer To How Do I Practice Programming Everyday
1 page
Power Development Department, J & K Online Payment Receipt
No ratings yet
Power Development Department, J & K Online Payment Receipt
1 page
Acio Interview - What Willbe Asked in Acio Interview - Quora
No ratings yet
Acio Interview - What Willbe Asked in Acio Interview - Quora
3 pages
Golang Tutorial - Table of Contents
No ratings yet
Golang Tutorial - Table of Contents
3 pages
Jammu Secretariat) : Kashmir at (Chief
No ratings yet
Jammu Secretariat) : Kashmir at (Chief
4 pages
Note
No ratings yet
Note
1 page
5 SQL Hadoop Analyzing Big Data Hive m5 Storage Eco System Slides
No ratings yet
5 SQL Hadoop Analyzing Big Data Hive m5 Storage Eco System Slides
15 pages
IB ACIO Previous Paper 2014-15
No ratings yet
IB ACIO Previous Paper 2014-15
30 pages
Dept. Name Postname Ur SC Stobctotalexsohhhvhgp
No ratings yet
Dept. Name Postname Ur SC Stobctotalexsohhhvhgp
4 pages
DONE - CS604-Quiz 2 Solution
No ratings yet
DONE - CS604-Quiz 2 Solution
8 pages
Threads in Operating Systems
No ratings yet
Threads in Operating Systems
2 pages
ANP - 3. IPC, Pipe, FIFO
No ratings yet
ANP - 3. IPC, Pipe, FIFO
8 pages
Anr 5.9.8 (50981210) 20181210 213626
No ratings yet
Anr 5.9.8 (50981210) 20181210 213626
3 pages
CS3451 Operating System
No ratings yet
CS3451 Operating System
96 pages
CS405 Computer System Architecture
No ratings yet
CS405 Computer System Architecture
3 pages
Chapter - 4 Concurency Control Techniques
No ratings yet
Chapter - 4 Concurency Control Techniques
28 pages
Group2 - Chapter 2 Problems
No ratings yet
Group2 - Chapter 2 Problems
5 pages
Parallel Computing Lecture # 6: Parallel Computer Memory Architectures
No ratings yet
Parallel Computing Lecture # 6: Parallel Computer Memory Architectures
16 pages
Characterization of Existing Systems: Puneet Chopra Ramakrishna Kotla
No ratings yet
Characterization of Existing Systems: Puneet Chopra Ramakrishna Kotla
9 pages
LPU CSE 316 Unit 2 CPU Scheduling
No ratings yet
LPU CSE 316 Unit 2 CPU Scheduling
44 pages
Flow Chart
No ratings yet
Flow Chart
7 pages
COMP313 W1 2016 - Test One: UNIX
No ratings yet
COMP313 W1 2016 - Test One: UNIX
2 pages
Openmp: Parallel Processing
No ratings yet
Openmp: Parallel Processing
40 pages
CORBA Response Log
No ratings yet
CORBA Response Log
13 pages
Os Lab Manual R20
No ratings yet
Os Lab Manual R20
34 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
26 pages
Distributed Mutex
No ratings yet
Distributed Mutex
24 pages
ICS 143 - Principles of Operating Systems
No ratings yet
ICS 143 - Principles of Operating Systems
58 pages
Assignment 1 With Solution
No ratings yet
Assignment 1 With Solution
5 pages
Unit-II: CPU Scheduling
No ratings yet
Unit-II: CPU Scheduling
69 pages
Parallel and Distributed Course Outline
No ratings yet
Parallel and Distributed Course Outline
4 pages
Mutual Exclusion
No ratings yet
Mutual Exclusion
29 pages
RT2021-Chap4
No ratings yet
RT2021-Chap4
51 pages
Communication in Distributed System
No ratings yet
Communication in Distributed System
58 pages
CORBA
No ratings yet
CORBA
2 pages
Portal Info Stub
No ratings yet
Portal Info Stub
23 pages
Big CPU Big Data
No ratings yet
Big CPU Big Data
424 pages
Ict Model
No ratings yet
Ict Model
1 page

1 SQL Hadoop Analyzing Big Data Hive m1 Intro Hadoop Slides

Uploaded by

1 SQL Hadoop Analyzing Big Data Hive m1 Intro Hadoop Slides

Uploaded by

SQL on Hadoop - Analyzing

Big Data with Hive

 Why Hadoop? Motivation

 Scale out with distributed computing

Data Node Data Node Data Node Data Node

Data Node Data Node Data Node

5 Value 9 Value 2 Value

Byte offset This is the first line

 Course focus is on development

 Distributed computing and scaling out to solve big data problems

You might also like