0% found this document useful (0 votes)

1 views

Big Data

Big data with HADOOP

Uploaded by

Ayanava Chatterjee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views

Big Data

Big data with HADOOP

Uploaded by

Ayanava Chatterjee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Name: Ayanava Chatterjee

Roll: 144001200127
Sem: 8th SEM
Sub: Big Data
Topic: Big Data with HADOOP
Introduction to Big Data

Definition of Big Data

Characteristics: Volume, Velocity, Variety, Veracity, and
Value
Importance of Big Data in today’s world
Applications of Big Data (e.g., healthcare, finance,
marketing)
Challenges of Big Data

Data Storage and Management

Data Processing Speed
Data Analysis Complexity
Data Security and Privacy Concerns
Hadoop Ecosystem
Components
HDFS (HADOOP DISTRIBUTED FILE SYSTEM): STORAGE LAYER OF
HADOOP
MAPREDUCE: PROGRAMMING MODEL FOR PROCESSING LARGE
DATASETS
YARN (YET ANOTHER RESOURCE NEGOTIATOR): RESOURCE
MANAGEMENT LAYER
HIVE, PIG, HBASE, SQOOP, FLUME, OOZIE, ETC.: TOOLS AND
FRAMEWORKS FOR DATA PROCESSING, QUERYING, AND
MANAGEMENT
How Hadoop Works

 Data storage in HDFS

 Data processing using MapReduce
 Resource management using YARN
 Fault tolerance and scalability in Hadoop
Advantages of Hadoop

 Scalability: Easily scales to handle petabytes of data

 Fault tolerance: Data is replicated across different nodes
 Cost-effectiveness: Runs on commodity hardware
 Flexibility: Supports various data formats (structured, semi-
structured, unstructured)
Hadoop Use Cases

 Social Media Data Analysis

 E-Commerce & Recommendation Systems
 Real-Time Analytics
 Data Warehousing
 Internet of Things (IoT) Data Management
COMPARISON OF
HADOOP AND
RELATIONAL DATABASES
Hadoop vs (RDBMS):::
1) SCALABILITY
Traditional 2) FLEXIBILITY
Databases 3) COST
4) SPEED
5) USE CASES
Future of Hadoop and Big Data

Integration with Cloud Computing

Advancements in Machine Learning and AI
More adoption in industries like finance, healthcare, and
transportation
Big Data Technologies Overview

Overview of popular Big Data tools and frameworks:

Apache Spark: Fast, in-memory data processing
Apache Flink: Stream processing
Apache Kafka: Real-time data streaming
NoSQL Databases (MongoDB, Cassandra, etc.)
Elasticsearch: Search and analytics engine
Hadoop vs Spark

Apache Spark:
In-memory processing for faster data processing compared to
MapReduce
Real-time stream processing
More user-friendly APIs for data analytics and machine learning
Hadoop MapReduce:
Batch processing
Slower due to disk-based storage
Best for large-scale batch jobs
Hadoop Distributed File System
(HDFS)

HDFS Overview:
Designed for storing large files across multiple machines
Data replication for fault tolerance
High throughput access to data
HDFS Architecture:
NameNode: Manages metadata and file structure
DataNodes: Store the actual data blocks
YARN (Yet Another Resource
Negotiator)

YARN’s Role in the Hadoop Ecosystem:

Resource management and job scheduling
Enables multiple applications to run on a single Hadoop cluster
Manages and allocates resources dynamically
YARN Components:
ResourceManager: Manages resources across the cluster
NodeManager: Runs on each node and manages resources on that
Node.
Hadoop Ecosystem: Hive

Apache Hive:
A data warehouse system built on top of Hadoop
Provides SQL-like querying capabilities for Hadoop
Supports ETL operations and batch processing
Hive Architecture:
Metastore: Stores schema information
Query Compiler: Converts SQL queries into MapReduce jobs
Hadoop Ecosystem: Hbase

Hbase:
A NoSQL database built on top of HDFS
Provides random read/write access to large datasets
Scalable and distributed architecture
Use cases: Real-time analytics, serving large-scale data applications
Hadoop Ecosystem: Pig

Apache Pig:
A high-level platform for creating MapReduce programs
Uses Pig Latin, a scripting language to simplify data processing
Pig vs MapReduce:
Pig is easier to write, but MapReduce is more flexible for complex
workflows
Ideal for ETL (Extract, Transform, Load) tasks
Hadoop Ecosystem: Sqoop and
Flume

Sqoop:
Designed for importing and exporting data between Hadoop and
relational databases
Used for batch processing tasks
Flume:
Collects and aggregates large amounts of log data
Streams data in real-time to Hadoop HDFS
Security in Hadoop

Authentication:
Kerberos: A network authentication protocol to secure access to Hadoop
services
Authorization:
Apache Ranger: Provides centralized access control and policy
management
Data Encryption:
Encrypt data at rest (HDFS) and in transit (between components)
Auditing:
Track user access and behavior with auditing tools
Real-World Big Data Use Cases

Healthcare:

Predictive analytics for patient outcomes

Managing electronic health records (EHRs) and medical research

Finance:
Fraud detection in real-time financial transactions

High-frequency trading analysis

Telecommunications:

Network performance monitoring and predictive maintenance

Customer churn prediction and service optimization

E-Commerce:

Real-time personalized recommendations

Fraud detection and customer behavior analysis

Hadoop in the Cloud

 Cloud Platforms Supporting Hadoop:

 Amazon EMR (Elastic MapReduce)
 Google Cloud Dataproc
 Microsoft Azure HDInsight
 Benefits of Cloud Hadoop:
 Scalability without infrastructure management
 Pay-per-use model for computing resources
 Easy integration with other cloud services like storage and analytics tools
Hadoop Ecosystem: Oozie

Oozie:
A workflow scheduler system for managing Hadoop jobs
Supports complex job workflows, such as MapReduce, Hive, and Pig
Key Features:
Error handling
Job scheduling and dependency management
Integration with other components like HDFS, Hbase, and Hiv
Hadoop Performance Tuning

 Techniques to optimize Hadoop performance:

 Data Locality: Ensuring tasks are executed on nodes where the data
resides
 Compression: Reducing the size of data being stored and transmitted
 Caching: Storing frequently accessed data in memory to speed up tasks
 Increasing Parallelism: Splitting tasks into smaller units and running them
concurrently
Machine Learning with Hadoop

Mllib (Apache Spark) and Mahout (Apache Hadoop):

Machine learning libraries for large-scale data processing
Algorithms for classification, regression, clustering, and recommendation
systems
Use Case: Predictive modeling on big datasets, fraud detection,
recommendation engines
Hadoop Performance Metrics &
Monitoring

Tools to monitor Hadoop clusters:

Ganglia: Real-time monitoring system
Ambari: Provides cluster management and monitoring for Hadoop
Cloudera Manager: For managing and monitoring Hadoop clusters
Key Metrics:
Resource utilization (CPU, memory, disk, network)
Job performance (MapReduce job statistics, task completion times)
USING HADOOP FOR LARGE-
SCALE LOG ANALYSIS:
1) COLLECTING DATA FROM
Hadoop Use WEB SERVERS, DATABASES,
AND APPLICATIONS
Case: Log 2) USING TOOLS LIKE FLUME
Analysis FOR DATA INGESTION AND
HIVE FOR QUERYING LOGS
BENEFITS: SCALABILITY AND
FLEXIBILITY TO PROCESS
MASSIVE LOG DATA
Future of Hadoop and Big Data

Evolution of Hadoop:
Integration with cloud computing
Real-time stream processing and machine learning
Other upcoming trends:
AI and Deep Learning for Big Data analytics
IoT (Internet of Things) applications using Big Data tools
Increased adoption of edge computing for data
processing at the source
THANK YOU

Introduction To Big Data With Spark and Hadoop
No ratings yet
Introduction To Big Data With Spark and Hadoop
61 pages
DT02 - Health Check Report Summary - 07072023
No ratings yet
DT02 - Health Check Report Summary - 07072023
334 pages
Big Data & Hadoop Training Material 0 1 PDF
50% (2)
Big Data & Hadoop Training Material 0 1 PDF
168 pages
Chapter 2 Hadoop Eco System
No ratings yet
Chapter 2 Hadoop Eco System
34 pages
BD by maaz
No ratings yet
BD by maaz
19 pages
IOT and Comp.architecture
No ratings yet
IOT and Comp.architecture
17 pages
Unit 2 - Intro To Hadoop
No ratings yet
Unit 2 - Intro To Hadoop
51 pages
hadoop.pptx
No ratings yet
hadoop.pptx
61 pages
I am preparing for a Big Data Analytics university... (1)
No ratings yet
I am preparing for a Big Data Analytics university... (1)
15 pages
Big Data Analytics Overview
No ratings yet
Big Data Analytics Overview
17 pages
Cloud - UNIT V
No ratings yet
Cloud - UNIT V
18 pages
Big Data Analytics Presentation
No ratings yet
Big Data Analytics Presentation
30 pages
HADOOP
No ratings yet
HADOOP
10 pages
week_5_researchpaper
No ratings yet
week_5_researchpaper
7 pages
Benefits of Hadoop MapReduce
No ratings yet
Benefits of Hadoop MapReduce
1 page
Chap3_OverviewOfBigDataEcosystem
No ratings yet
Chap3_OverviewOfBigDataEcosystem
91 pages
Mastering Big Data and Hadoop: From Basics to Expert Proficiency
From Everand
Mastering Big Data and Hadoop: From Basics to Expert Proficiency
William Smith
No ratings yet
Unit 6-1
No ratings yet
Unit 6-1
128 pages
Unit 5 - Introduction To Hadoop
No ratings yet
Unit 5 - Introduction To Hadoop
50 pages
Big data
No ratings yet
Big data
8 pages
1 - Big Data and Hadoop Framework
No ratings yet
1 - Big Data and Hadoop Framework
40 pages
Big Data – Introduction to Hadoop
No ratings yet
Big Data – Introduction to Hadoop
61 pages
HADOOP NOTES
No ratings yet
HADOOP NOTES
8 pages
DC Hadoop
No ratings yet
DC Hadoop
48 pages
Hadoop Kufkaf Apeche
No ratings yet
Hadoop Kufkaf Apeche
14 pages
data analyst
No ratings yet
data analyst
9 pages
SDCBDASPARKWEEK1-1
No ratings yet
SDCBDASPARKWEEK1-1
9 pages
Fillatre Big Data
No ratings yet
Fillatre Big Data
98 pages
INtroduction To Big DAta and HAdoop
No ratings yet
INtroduction To Big DAta and HAdoop
30 pages
Hadoop in bigdata processing concept
No ratings yet
Hadoop in bigdata processing concept
2 pages
(Ebook) Hadoop MapReduce Cookbook by Srinath Perera; Thilina Gunarathne ISBN 9781849517287, 1849517282 - Download the ebook now and own the full detailed content
100% (2)
(Ebook) Hadoop MapReduce Cookbook by Srinath Perera; Thilina Gunarathne ISBN 9781849517287, 1849517282 - Download the ebook now and own the full detailed content
51 pages
Apache Hadoop
No ratings yet
Apache Hadoop
27 pages
Attachment (21)
No ratings yet
Attachment (21)
11 pages
Unit-2 (HADOOP)
No ratings yet
Unit-2 (HADOOP)
20 pages
Day 2 S1 Intro_to_hadoop_Ashok
No ratings yet
Day 2 S1 Intro_to_hadoop_Ashok
27 pages
bda2
No ratings yet
bda2
25 pages
INSIDE CLOUD - CASE STUDY
No ratings yet
INSIDE CLOUD - CASE STUDY
11 pages
Printing Big Data Hadoop
No ratings yet
Printing Big Data Hadoop
24 pages
Hadoop Ecosystem
No ratings yet
Hadoop Ecosystem
5 pages
Get Hadoop Essentials Delve into the Key Concepts of Hadoop and Get a Thorough Understanding of the Hadoop Ecosystem 1st Edition Shiva Achari free all chapters
100% (2)
Get Hadoop Essentials Delve into the Key Concepts of Hadoop and Get a Thorough Understanding of the Hadoop Ecosystem 1st Edition Shiva Achari free all chapters
67 pages
An Introduction To Hadoop Presentation PDF
100% (1)
An Introduction To Hadoop Presentation PDF
91 pages
Chapter 2 Introduction To Hadoop
No ratings yet
Chapter 2 Introduction To Hadoop
31 pages
HADOOP
No ratings yet
HADOOP
55 pages
Chapter - 2 Hadoop
No ratings yet
Chapter - 2 Hadoop
32 pages
Big Data ANAlysis short
No ratings yet
Big Data ANAlysis short
114 pages
Ch6 Architectural Design v1
No ratings yet
Ch6 Architectural Design v1
26 pages
BDA Unit 3
No ratings yet
BDA Unit 3
6 pages
Analyzing Big Data in Hadoop Spark
No ratings yet
Analyzing Big Data in Hadoop Spark
30 pages
04
No ratings yet
04
23 pages
Instant download Hadoop Essentials Delve into the Key Concepts of Hadoop and Get a Thorough Understanding of the Hadoop Ecosystem 1st Edition Shiva Achari pdf all chapter
100% (8)
Instant download Hadoop Essentials Delve into the Key Concepts of Hadoop and Get a Thorough Understanding of the Hadoop Ecosystem 1st Edition Shiva Achari pdf all chapter
67 pages
Unit 5 - Introduction To Hadoop
No ratings yet
Unit 5 - Introduction To Hadoop
50 pages
CC Unit - 5
No ratings yet
CC Unit - 5
27 pages
Hadoop Essentials Delve into the Key Concepts of Hadoop and Get a Thorough Understanding of the Hadoop Ecosystem 1st Edition Shiva Achari instant download
100% (2)
Hadoop Essentials Delve into the Key Concepts of Hadoop and Get a Thorough Understanding of the Hadoop Ecosystem 1st Edition Shiva Achari instant download
58 pages
Hadoop V.01
No ratings yet
Hadoop V.01
24 pages
BDA ESE
No ratings yet
BDA ESE
21 pages
UNIT2 BDA
No ratings yet
UNIT2 BDA
12 pages
Bda Module 2
No ratings yet
Bda Module 2
12 pages
Hadoop Main
No ratings yet
Hadoop Main
19 pages
UNIT-I Introduction To Hadoop - A20
No ratings yet
UNIT-I Introduction To Hadoop - A20
24 pages
Advanced Hadoop Techniques: A Comprehensive Guide to Mastery
From Everand
Advanced Hadoop Techniques: A Comprehensive Guide to Mastery
Adam Jones
No ratings yet
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet
DOC-20250520-WA0010.
No ratings yet
DOC-20250520-WA0010.
6 pages
DOC-20250520-WA0007.
No ratings yet
DOC-20250520-WA0007.
17 pages
Multimedia Components
No ratings yet
Multimedia Components
31 pages
Naive Bayes
No ratings yet
Naive Bayes
14 pages
DOC-20250520-WA0002.
No ratings yet
DOC-20250520-WA0002.
12 pages
FINAL PS PR2!11!12 UNIT 4 LESSON 1 Relevant Literature Sources For Quantitative Research
No ratings yet
FINAL PS PR2!11!12 UNIT 4 LESSON 1 Relevant Literature Sources For Quantitative Research
28 pages
Moshe Shira
No ratings yet
Moshe Shira
2 pages
009 - s7 - Energy - Suite - Function - Manual - en-US - en-US
No ratings yet
009 - s7 - Energy - Suite - Function - Manual - en-US - en-US
189 pages
12.10 Create An ER Diagram For Each of The Following Descriptions
No ratings yet
12.10 Create An ER Diagram For Each of The Following Descriptions
4 pages
Data 2
No ratings yet
Data 2
1 page
Lab 06 DB
No ratings yet
Lab 06 DB
5 pages
GIS Assignment-I
No ratings yet
GIS Assignment-I
4 pages
UKM Quantitative and Qualitative Research Design
100% (1)
UKM Quantitative and Qualitative Research Design
56 pages
Sindh Social Protection Authority Advertisement 2023
No ratings yet
Sindh Social Protection Authority Advertisement 2023
14 pages
Chapter 3 Research
No ratings yet
Chapter 3 Research
3 pages
Long Quiz in Practical Research I Test 1. Multiple Choice Direction: Choose The Letter of The Correct Answer. Encircle The Letter of Your Answer
No ratings yet
Long Quiz in Practical Research I Test 1. Multiple Choice Direction: Choose The Letter of The Correct Answer. Encircle The Letter of Your Answer
2 pages
Ram - Random Access Memory:: What Is RAM Structure, Explain With Block Diagram?
No ratings yet
Ram - Random Access Memory:: What Is RAM Structure, Explain With Block Diagram?
2 pages
(GIS'23) Lecture 3 - GIS Data Representation & Modeling
No ratings yet
(GIS'23) Lecture 3 - GIS Data Representation & Modeling
33 pages
18MBA23 Research Methods Full Notes With Question Bank
88% (8)
18MBA23 Research Methods Full Notes With Question Bank
87 pages
Hpe Storage Upgraed
No ratings yet
Hpe Storage Upgraed
28 pages
Daniel Gudina
No ratings yet
Daniel Gudina
57 pages
II Semester Syllabus Autonomous
No ratings yet
II Semester Syllabus Autonomous
6 pages
SQL 3
No ratings yet
SQL 3
8 pages
Konacni Rezultati Popisa U BiH 2013
No ratings yet
Konacni Rezultati Popisa U BiH 2013
154 pages
Segmentation and Paging: Solutions
No ratings yet
Segmentation and Paging: Solutions
6 pages
Ericsson MML Commands
86% (7)
Ericsson MML Commands
2 pages
Syllabus 6sem
No ratings yet
Syllabus 6sem
2 pages
Negative Impact of Chatbot AI in Students
No ratings yet
Negative Impact of Chatbot AI in Students
5 pages
Transact SQL Reference
No ratings yet
Transact SQL Reference
55 pages
Data Structure Lab Manual
No ratings yet
Data Structure Lab Manual
77 pages
Azure Cosmos DB - v1
No ratings yet
Azure Cosmos DB - v1
6 pages
Shift Registers
No ratings yet
Shift Registers
8 pages
Mazak t2 v70
No ratings yet
Mazak t2 v70
3 pages
Grading and Criteria in Science MYP Year 3
No ratings yet
Grading and Criteria in Science MYP Year 3
6 pages

Big Data

Uploaded by

Big Data

Uploaded by

Name: Ayanava Chatterjee

Definition of Big Data

Data Storage and Management

 Data storage in HDFS

 Scalability: Easily scales to handle petabytes of data

 Social Media Data Analysis

Integration with Cloud Computing

Overview of popular Big Data tools and frameworks:

YARN’s Role in the Hadoop Ecosystem:

Predictive analytics for patient outcomes

High-frequency trading analysis

Network performance monitoring and predictive maintenance

Customer churn prediction and service optimization

Real-time personalized recommendations

Fraud detection and customer behavior analysis

 Cloud Platforms Supporting Hadoop:

 Techniques to optimize Hadoop performance:

Mllib (Apache Spark) and Mahout (Apache Hadoop):

Tools to monitor Hadoop clusters:

You might also like