0% found this document useful (0 votes)

48 views

By - Shubham Parmar

Hadoop is an open-source software framework for distributed storage and processing of large datasets across clusters of computers. It allows for the distributed processing of large data sets across clusters of computers using simple programming models. Hadoop has two main components: the Hadoop Distributed File System (HDFS) for storage, and MapReduce for processing/analyzing data in parallel across the cluster. It addresses problems like hardware failure, large data processing times, and combining results by providing redundancy, scaling processing, and simplifying programming.

Uploaded by

Gagan Deep

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views

By - Shubham Parmar

Uploaded by

Gagan Deep

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

BY – SHUBHAM PARMAR

What is Hadoop?
• The Apache Hadoop software library is a
framework that allows for the distributed
processing of large data sets across clusters
of computers using simple programming
models.

• It is made by apache software foundation in

2011.

• Written in JAVA.
Hadoop is open source software.

Framework

Massive Storage

Processing Power
Big Data
• Big data is a term used to define very large amount of unstructured and
semi structured data a company creates.

•The term is used when talking about Petabytes and Exabyte of data.

•That much data would take so much time and cost to load into relational
database for analysis.

•Facebook has almost 10billion photos taking up to 1Petabytes of storage.

So what is the problem??
1. Processing that large data is very difficult in relational database.

2. It would take too much time to process data and cost.

We can solve this problem by Distributed
Computing.
But the problems in distributed computing is –

Hardware failure
Chances of hardware failure is always there.

Combine the data after analysis

Data from all disks have to be combined from all the disks which is a mess.
To Solve all the Problems Hadoop Came.
It has two main parts –

1. Hadoop Distributed File System (HDFS),

2. Data Processing Framework & MapReduce

1. Hadoop Distributed File System
It ties so many small and reasonable priced machines together into a single cost effective computer
cluster.

Data and application processing are protected against hardware failure.

 If a node goes down, jobs are automatically redirected to other nodes to make sure the distributed
computing does not fail.

it automatically stores multiple copies of all data.

It provides simplified programming model which allows user to quickly read and write the
distributed system.
2. MapReduce
MapReduce is a programming model for processing and generating large data sets with a
parallel, distributed algorithm on a cluster.

It is an associative implementation for processing and generating large data sets.

MAP function that process a key pair to generates a set of intermediate key pairs.

REDUCE function that merges all intermediate values associated with the same intermediate
key
Pros of Hadoop

1. Computing power
2. Flexibility
3. Fault Tolerance
4. Low Cost
5. Scalability
Cons of Hadoop

1. Integration with existing systems

Hadoop is not optimised for ease for use. Installing and integrating with existing
databases might prove to be difficult, especially since there is no software support
provided.
2. Administration and ease of use
Hadoop requires knowledge of MapReduce, while most data practitioners use SQL. This
means significant training may be required to administer Hadoop clusters.
3. Security
Hadoop lacks the level of security functionality needed for safe enterprise deployment,
especially if it concerns sensitive data.

Unit Iii
No ratings yet
Unit Iii
20 pages
Spring Notes PDF
100% (1)
Spring Notes PDF
55 pages
Big Data & Hadoop Training Material 0 1 PDF
50% (2)
Big Data & Hadoop Training Material 0 1 PDF
168 pages
VDI Proof of Concept (POC) Test Plan
33% (3)
VDI Proof of Concept (POC) Test Plan
18 pages
Oracle Approval Management Engine
No ratings yet
Oracle Approval Management Engine
21 pages
CC UNIT 2 (1)
No ratings yet
CC UNIT 2 (1)
29 pages
INTRODUCTION TO DATA SCIENCE
No ratings yet
INTRODUCTION TO DATA SCIENCE
14 pages
CC Unit - 5
No ratings yet
CC Unit - 5
27 pages
Unit 2-1
No ratings yet
Unit 2-1
43 pages
Big Data ABHISHEK PRAJA C CCCCCCCCCCC
No ratings yet
Big Data ABHISHEK PRAJA C CCCCCCCCCCC
11 pages
Apache Hadoop
No ratings yet
Apache Hadoop
27 pages
Big Data RAJNEESH CCC
No ratings yet
Big Data RAJNEESH CCC
11 pages
Unit 4 Hadoop
No ratings yet
Unit 4 Hadoop
31 pages
11 Lecture
No ratings yet
11 Lecture
22 pages
Big Data – Introduction to Hadoop
No ratings yet
Big Data – Introduction to Hadoop
61 pages
HADOOP
No ratings yet
HADOOP
10 pages
Big Data Unit II
No ratings yet
Big Data Unit II
42 pages
Seminar Report PDF
100% (2)
Seminar Report PDF
35 pages
Chapter - 2 Hadoop
No ratings yet
Chapter - 2 Hadoop
32 pages
Unit - 3
No ratings yet
Unit - 3
34 pages
00 HadoopWelcome Transcript
No ratings yet
00 HadoopWelcome Transcript
4 pages
Big Data?: Hadoop?
No ratings yet
Big Data?: Hadoop?
2 pages
Report On An Exploratory Analysis of The
No ratings yet
Report On An Exploratory Analysis of The
19 pages
Part 02 - Big Data Solutions
No ratings yet
Part 02 - Big Data Solutions
17 pages
Big Data, Map Reduce & Hadoop: By: Surbhi Vyas (7) Varsha
No ratings yet
Big Data, Map Reduce & Hadoop: By: Surbhi Vyas (7) Varsha
40 pages
Module 2. 16974328568170
No ratings yet
Module 2. 16974328568170
113 pages
Hadoop Presentation: Swarnali B.SC Computer Science Hons. 2 Year Chandernagore Govt. College Halder
No ratings yet
Hadoop Presentation: Swarnali B.SC Computer Science Hons. 2 Year Chandernagore Govt. College Halder
8 pages
CC-KML051-Unit V
No ratings yet
CC-KML051-Unit V
17 pages
Hadoop
No ratings yet
Hadoop
11 pages
BDA Presentations Unit-4 - Hadoop, Ecosystem
100% (1)
BDA Presentations Unit-4 - Hadoop, Ecosystem
25 pages
Unit-III (Big Data) Final
No ratings yet
Unit-III (Big Data) Final
34 pages
Big Data Technology
No ratings yet
Big Data Technology
9 pages
CC-Unit 3
No ratings yet
CC-Unit 3
22 pages
Big Data Intro
No ratings yet
Big Data Intro
10 pages
Hadoop in bigdata processing concept
No ratings yet
Hadoop in bigdata processing concept
2 pages
Hadoop and Mapreduce
No ratings yet
Hadoop and Mapreduce
21 pages
BDA Unit-3
No ratings yet
BDA Unit-3
47 pages
Hadoop V.01
No ratings yet
Hadoop V.01
24 pages
BDA Unit 3
No ratings yet
BDA Unit 3
6 pages
Unit-2 Hadoop and MapReduce
No ratings yet
Unit-2 Hadoop and MapReduce
32 pages
Seminar On: Hadoop Technology
No ratings yet
Seminar On: Hadoop Technology
13 pages
Bda PPT M1 P2 1
No ratings yet
Bda PPT M1 P2 1
19 pages
Big Data - Hadoop
No ratings yet
Big Data - Hadoop
20 pages
Hadoop Lab
100% (1)
Hadoop Lab
32 pages
BD - HadoopEcoSystem Unit 2part 1
No ratings yet
BD - HadoopEcoSystem Unit 2part 1
12 pages
Parallel Project
No ratings yet
Parallel Project
32 pages
Big Data
No ratings yet
Big Data
29 pages
Lect 2 Big Data Lesson01
No ratings yet
Lect 2 Big Data Lesson01
26 pages
Unit 3 - Hadoop
No ratings yet
Unit 3 - Hadoop
10 pages
Bda Unit 2
No ratings yet
Bda Unit 2
44 pages
UNIT-4-Hadoop Ecosystem-Part 1
No ratings yet
UNIT-4-Hadoop Ecosystem-Part 1
22 pages
BDAunit-II
No ratings yet
BDAunit-II
4 pages
Hadoop-How It Works
No ratings yet
Hadoop-How It Works
5 pages
MA_VaishuAchini_VIT_24 - ICT703 - A3
No ratings yet
MA_VaishuAchini_VIT_24 - ICT703 - A3
21 pages
Unit 3 Introduction To Hadoop Syllabus
No ratings yet
Unit 3 Introduction To Hadoop Syllabus
22 pages
Hadoop PPT
No ratings yet
Hadoop PPT
25 pages
Bda Aiml Note Unit 2
No ratings yet
Bda Aiml Note Unit 2
13 pages
Hadoop - Quick Guide Hadoop - Big Data Overview
No ratings yet
Hadoop - Quick Guide Hadoop - Big Data Overview
32 pages
Hadoop Quick Guide
No ratings yet
Hadoop Quick Guide
32 pages
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
From Everand
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Articles: A/an, The: Eterminers
100% (1)
Articles: A/an, The: Eterminers
4 pages
Conjunctions Exercises 2: Complete The Sentences With One of The Words or Phrases
100% (1)
Conjunctions Exercises 2: Complete The Sentences With One of The Words or Phrases
1 page
Fill In: The, A, An or - (Leave Blank) : Articles Art 1
No ratings yet
Fill In: The, A, An or - (Leave Blank) : Articles Art 1
2 pages
Adverbs Worksheet
No ratings yet
Adverbs Worksheet
2 pages
Marketing On Modular Woodwork
No ratings yet
Marketing On Modular Woodwork
13 pages
IBM-BO Interview Questions
No ratings yet
IBM-BO Interview Questions
2 pages
Database Systems: Homework 1 Key: T1.P T2.A
No ratings yet
Database Systems: Homework 1 Key: T1.P T2.A
7 pages
Sukhbir Singh Ghotra - Resume
No ratings yet
Sukhbir Singh Ghotra - Resume
3 pages
Netapp Interview Questions - Q&A
90% (10)
Netapp Interview Questions - Q&A
12 pages
Power Event
No ratings yet
Power Event
2 pages
Focusrite Control 3.6.0 - Release Notes
No ratings yet
Focusrite Control 3.6.0 - Release Notes
2 pages
VCS-278.examcollection - Premium.exam.159q p2vOwLd PDF
No ratings yet
VCS-278.examcollection - Premium.exam.159q p2vOwLd PDF
51 pages
Period Wise Balance Sheet: Created By, Last Modified On
No ratings yet
Period Wise Balance Sheet: Created By, Last Modified On
2 pages
Lecture 1-Data Mining (Introduction)
No ratings yet
Lecture 1-Data Mining (Introduction)
30 pages
Deploying RecoverPoint For Virtual Machines 5.0 SP1 With VxRail
No ratings yet
Deploying RecoverPoint For Virtual Machines 5.0 SP1 With VxRail
41 pages
Data Engineering Lab: List of Programs
No ratings yet
Data Engineering Lab: List of Programs
2 pages
Tutorial: Enhancing A Product Rule Set in The Standardization Rules Designer
No ratings yet
Tutorial: Enhancing A Product Rule Set in The Standardization Rules Designer
56 pages
ATG REST MVC Definition Framework
No ratings yet
ATG REST MVC Definition Framework
5 pages
Water Resource Planning Using GIS Based Decision Support System
No ratings yet
Water Resource Planning Using GIS Based Decision Support System
5 pages
Integration Framework
No ratings yet
Integration Framework
4 pages
Apache Helix
No ratings yet
Apache Helix
54 pages
Arpit Bhatnagar Entrepreneurship & Management Processes International Satbari, Chattarpur, New Delhi-110074
No ratings yet
Arpit Bhatnagar Entrepreneurship & Management Processes International Satbari, Chattarpur, New Delhi-110074
2 pages
ECS Control Center
100% (1)
ECS Control Center
28 pages
Thoughtworks Questions
No ratings yet
Thoughtworks Questions
22 pages
Lista de Cursos - Guru Dos Cursos - 01
No ratings yet
Lista de Cursos - Guru Dos Cursos - 01
292 pages
Text To Speech Converter Using Javascript: Madhav Institute of Technology & Science Gwalior
No ratings yet
Text To Speech Converter Using Javascript: Madhav Institute of Technology & Science Gwalior
3 pages
CS6502 OOAD Notes 2013 Regulation
No ratings yet
CS6502 OOAD Notes 2013 Regulation
54 pages
Microchip's MPLAB Starter Kit For DsPIC Digital Controllers - ..
100% (5)
Microchip's MPLAB Starter Kit For DsPIC Digital Controllers - ..
5 pages
SwanSoft Software
No ratings yet
SwanSoft Software
3 pages
Excel Registry
No ratings yet
Excel Registry
14 pages
Practice Exercises: Answer
100% (1)
Practice Exercises: Answer
4 pages
How To Download DS9 For Mac Users - Las Cumbres Observatory Global Telescope Network
No ratings yet
How To Download DS9 For Mac Users - Las Cumbres Observatory Global Telescope Network
3 pages

By - Shubham Parmar

Uploaded by

By - Shubham Parmar

Uploaded by

BY – SHUBHAM PARMAR

• It is made by apache software foundation in

•Facebook has almost 10billion photos taking up to 1Petabytes of storage.

2. It would take too much time to process data and cost.

Combine the data after analysis

1. Hadoop Distributed File System (HDFS),

2. Data Processing Framework & MapReduce

Data and application processing are protected against hardware failure.

it automatically stores multiple copies of all data.

1. Integration with existing systems

You might also like