SlideShare a Scribd company logo
vs
vs
Submitted by:
Aastha Joshi
Aishwarya Singh
Joel Keith Pais
Laxmi
P Vignesh
RDBMS
Hadoop
Apache Spark
Comparison between them
Contents
RDBMS
Stands for ‘Relational Database Management
System’
It is a database that stores data in a structured
format using rows and columns.
One can execute queries on the data like
adding, updating, and searching for values.
It also provides a visual representation of the
data.
It is "relational" because the values within each
table are related to each other.
The relational structure makes it possible to run
queries across multiple tables at once.
Structured Query Language is the standard
programming language used to access the
database.
ADVANTAGES
Addresses the need for integrating,
managing and analysing data from
multiple sources across on-
premises and cloud environments
Ease to locate and access specific
values within the database
High flexibility due to storage,
retrieval and publishing of JSON data
within a relational database
EXAMPLES
It is a matter of the past when data were limited.
Now, the world has already experienced the power
of Big Data, and the same is used to analyze to
frame different business strategies and others.
Apache Hadoop is one of the kinds of open-
source platforms that we can use to store and
process relatively large datasets amounting from
gigabytes to petabytes. This open-source allows
multiple computers to make clusters and analyze
the large datasets in parallel and effectively.
Four main
components
of
the Hadoop
ecosystem:
HADOOP DISTRIBUTED FILE SYSTEM (HDFS)
A primary data storage system that runs on commodity
hardware and manages enormous data collections. It
also has a high data throughput and a high fault
tolerance.
YET ANOTHER RESOURCE NEGOTIATOR (YARN)
YARN is a cluster resource manager that schedules
tasks and assigns resources (such as CPU and memory)
to applications.
1
2
3
HADOOP MAPREDUCE
Breaks down the big data processing tasks into smaller
ones, distributes them across different nodes, and then
runs each one.
4
HADOOP COMMON (HADOOP CORE):
A collection of common libraries and utilities on which
the other three modules rely.
Importance
of
Hadoop
Ability to quickly store and handle large amounts of any type of data
That's an important concern as data volumes and varieties continue to grow, notably from social
media and the Internet of Things (IoT).
Computer processing power.
Hadoop's distributed computing model efficiently processes large amounts of data. The more
computing nodes you use, the more processing power you have.
Fault tolerance
Data and application processing are protected against hardware failure. If a node fails, jobs are
automatically transferred to other nodes, ensuring that the distributed computing does not fail.
Multiple copies of all data are stored automatically.
Flexibility
Unlike traditional relational databases, we don’t have to preprocess data before storing it. We can
store as much data as we want and decide how to use it later. It includes unstructured data like text,
pictures, and videos.
Low cost
The open-source framework is free and stores large amounts of data by using commodity
hardware.
Scalability
By simply adding nodes, we can easily expand our system to handle more data. A little administrative
is required.
Challenges in using Hadoop


1 MAPREDUCE PROGRAMMING ISN'T SUITED FOR EVERY PROBLEM
It performs well for simple information requests and problems that can be broken down into independent units, but it is
inefficient for iterative and interactive analytic operations. MapReduce is file-intensive. Iterative algorithms require
multiple map-shuffle/sort-reduce phases to complete because the nodes mainly communicate through sorts and
shuffles. This results in so many files being created between MapReduce phases, which is inefficient for advanced
analytics computing.
2
It can be difficult to find entry-level programmers who have adequate Java expertise to be productive with MapReduce.
That's one reason distribution providers are racing to put relational (SQL) technology on top of Hadoop. Programmers
with SQL skills are easy to find than MapReduce skills. And, Hadoop administration seems a mix of art and science,
requiring a basic understanding of operating systems, hardware, and Hadoop kernel settings.
THERE’S A WIDELY ACKNOWLEDGED TALENT GAP
3
Another concern is the fragmented data protection challenges, which are being handled by new tools and technology.
The Kerberos authentication protocol is a significant step toward securing Hadoop environments.
DATA SECURITY
4
Hadoop lacks user-friendly, full-featured tools for data management, data cleansing, governance, and metadata services.
FULL-FLEDGED DATA GOVERNANCE AND MANAGEMENT
Apache Spark began in 2009 as a research project at UC Berkeley's AMPLab focused on data-
intensive application areas.
Apache Spark is an open-source, distributed processing system used for big data workloads.
For rapid analytic queries against any quantity of data, it uses in-memory caching and efficient
query execution.
It allows code reuse across different workloads—batch processing, interactive queries, real-
time analytics, machine learning, and graph processing—and provides development APIs in
Java, Scala, Python, and R.
Spark's objective was to build a new framework that was optimised for quick iterative
processing, such as machine learning and interactive data analysis, while preserving Hadoop
MapReduce's scalability and fault tolerance.
The primary importance of Apache Spark in the Big data industry is because of its in-memory
data processing that makes it a high-speed data processing engine compared to MapReduce.
Apache Spark delivers a better-integrated framework which supports all ranges of Big data
formats like batch data, text data, real-time streaming data, graphical data, etc.
Apache Spark
Core Components
Spark SQL and Data Frames: Spark SQL
allows users to run SQL and HQL queries in
order to process structured and semi-
structured data.
Spark Streaming: Spark streaming facilitates
the processing of live stream data i.e. log files.
It also contains APIs to manipulate data
streams.
MLib Machine Learning: MLib is the Spark
library with machine learning functionality. It
contains various machine learning algorithms
such as regressions, clustering, collaborative
filtering, classification, etc.
GraphX: The library that supports graph
computation is known as GraphX. It
enables users to perform graph
manipulation. It also provides graph
computation algorithms.
Apache Spark Core API: It provides a
platform to execute Spark applications.
Apache Spark framework consists of the main five components that are responsible
for the functioning of the Spark.
Advantages
Speed: For large-scale data processing, Spark is 100 times quicker than Hadoop. Apache Spark
utilizes an in-memory (RAM) processing architecture.
Ease of Use: Apache Spark provides simple APIs for working with big datasets. It has over 80
high-level operators that make creating parallel programs a breeze.
Advanced Analytics: Spark does more than only support 'MAP' and 'reduce'. Machine learning
(ML), graph algorithms, streaming data, SQL queries, and other features are also supported.
Apache Spark is faster than most data warehouses.
Dynamic: Apache Spark allows simple creation of parallel apps. Over 80 high-level operators
are available through Spark.
Multilingual: Python, Java, Scala, and more programming languages are supported by Apache
Spark.
Powerful: Because of its low-latency in-memory data processing capacity, Apache Spark can
handle a wide range of analytics problems. It has well-developed libraries for graph analytics
and machine learning techniques.
Open-source: The best thing about Apache Spark is, it has a massive Open-source community
behind it.
RDBMS vs Hadoop vs Spark
THANK YOU
Ad

More Related Content

What's hot (20)

Spark architecture
Spark architectureSpark architecture
Spark architecture
GauravBiswas9
 
Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)
Kevin Weil
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
sudhakara st
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
Vadim Y. Bichutskiy
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
EMC
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
Simplilearn
 
Graph database
Graph database Graph database
Graph database
Shruti Arya
 
Spark
SparkSpark
Spark
Heena Madan
 
Resilient Distributed DataSets - Apache SPARK
Resilient Distributed DataSets - Apache SPARKResilient Distributed DataSets - Apache SPARK
Resilient Distributed DataSets - Apache SPARK
Taposh Roy
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
 
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
Edureka!
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
Anastasios Skarlatidis
 
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Simplilearn
 
Apache Spark overview
Apache Spark overviewApache Spark overview
Apache Spark overview
DataArt
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
Alexey Grishchenko
 
Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBase
alexbaranau
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
Chandler Huang
 
Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map Reduce
Apache Apex
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
Simplilearn
 
Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilHive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas Patil
Databricks
 
Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)
Kevin Weil
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
sudhakara st
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
EMC
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
Simplilearn
 
Resilient Distributed DataSets - Apache SPARK
Resilient Distributed DataSets - Apache SPARKResilient Distributed DataSets - Apache SPARK
Resilient Distributed DataSets - Apache SPARK
Taposh Roy
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
 
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
Edureka!
 
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Simplilearn
 
Apache Spark overview
Apache Spark overviewApache Spark overview
Apache Spark overview
DataArt
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
Chandler Huang
 
Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map Reduce
Apache Apex
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
Simplilearn
 
Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilHive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas Patil
Databricks
 

Similar to RDBMS vs Hadoop vs Spark (20)

finap ppt conference.pptx
finap ppt conference.pptxfinap ppt conference.pptx
finap ppt conference.pptx
SukhpreetSingh519414
 
Comparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and sparkComparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and spark
AgnihotriGhosh2
 
What is Apache Hadoop and its ecosystem?
What is Apache Hadoop and its ecosystem?What is Apache Hadoop and its ecosystem?
What is Apache Hadoop and its ecosystem?
tommychauhan
 
Big Data: RDBMS vs. Hadoop vs. Spark
Big Data: RDBMS vs. Hadoop vs. SparkBig Data: RDBMS vs. Hadoop vs. Spark
Big Data: RDBMS vs. Hadoop vs. Spark
Graisy Biswal
 
In15orlesss hadoop
In15orlesss hadoopIn15orlesss hadoop
In15orlesss hadoop
Worapol Alex Pongpech, PhD
 
Introduction to Apache hadoop
Introduction to Apache hadoopIntroduction to Apache hadoop
Introduction to Apache hadoop
Omar Jaber
 
Hadoop
HadoopHadoop
Hadoop
Zubair Arshad
 
SparkPaper
SparkPaperSparkPaper
SparkPaper
Suraj Thapaliya
 
Big Data Technology Stack : Nutshell
Big Data Technology Stack : NutshellBig Data Technology Stack : Nutshell
Big Data Technology Stack : Nutshell
Khalid Imran
 
Big Data Analytics Presentation on the resourcefulness of Big data
Big Data Analytics Presentation on the resourcefulness of Big dataBig Data Analytics Presentation on the resourcefulness of Big data
Big Data Analytics Presentation on the resourcefulness of Big data
nextstep013
 
Big data with java
Big data with javaBig data with java
Big data with java
Stefan Angelov
 
Big Data , Big Problem?
Big Data , Big Problem?Big Data , Big Problem?
Big Data , Big Problem?
Mohammadhasan Farazmand
 
Comparison - RDBMS vs Hadoop vs Apache
Comparison - RDBMS vs Hadoop vs ApacheComparison - RDBMS vs Hadoop vs Apache
Comparison - RDBMS vs Hadoop vs Apache
SandeepTaksande
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
AshishRathore72
 
Big data ppt
Big data pptBig data ppt
Big data ppt
Shweta Sahu
 
Exploiting Apache Spark's Potential Changing Enormous Information Investigati...
Exploiting Apache Spark's Potential Changing Enormous Information Investigati...Exploiting Apache Spark's Potential Changing Enormous Information Investigati...
Exploiting Apache Spark's Potential Changing Enormous Information Investigati...
rajeshseo5
 
Big Data Hadoop Technology
Big Data Hadoop TechnologyBig Data Hadoop Technology
Big Data Hadoop Technology
Rahul Sharma
 
Brief Introduction about Hadoop and Core Services.
Brief Introduction about Hadoop and Core Services.Brief Introduction about Hadoop and Core Services.
Brief Introduction about Hadoop and Core Services.
Muthu Natarajan
 
Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...
Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...
Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...
Dataconomy Media
 
BigData & Hadoop Ecosystem.pptx
BigData & Hadoop Ecosystem.pptxBigData & Hadoop Ecosystem.pptx
BigData & Hadoop Ecosystem.pptx
BibhasDeb1
 
Comparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and sparkComparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and spark
AgnihotriGhosh2
 
What is Apache Hadoop and its ecosystem?
What is Apache Hadoop and its ecosystem?What is Apache Hadoop and its ecosystem?
What is Apache Hadoop and its ecosystem?
tommychauhan
 
Big Data: RDBMS vs. Hadoop vs. Spark
Big Data: RDBMS vs. Hadoop vs. SparkBig Data: RDBMS vs. Hadoop vs. Spark
Big Data: RDBMS vs. Hadoop vs. Spark
Graisy Biswal
 
Introduction to Apache hadoop
Introduction to Apache hadoopIntroduction to Apache hadoop
Introduction to Apache hadoop
Omar Jaber
 
Big Data Technology Stack : Nutshell
Big Data Technology Stack : NutshellBig Data Technology Stack : Nutshell
Big Data Technology Stack : Nutshell
Khalid Imran
 
Big Data Analytics Presentation on the resourcefulness of Big data
Big Data Analytics Presentation on the resourcefulness of Big dataBig Data Analytics Presentation on the resourcefulness of Big data
Big Data Analytics Presentation on the resourcefulness of Big data
nextstep013
 
Comparison - RDBMS vs Hadoop vs Apache
Comparison - RDBMS vs Hadoop vs ApacheComparison - RDBMS vs Hadoop vs Apache
Comparison - RDBMS vs Hadoop vs Apache
SandeepTaksande
 
Exploiting Apache Spark's Potential Changing Enormous Information Investigati...
Exploiting Apache Spark's Potential Changing Enormous Information Investigati...Exploiting Apache Spark's Potential Changing Enormous Information Investigati...
Exploiting Apache Spark's Potential Changing Enormous Information Investigati...
rajeshseo5
 
Big Data Hadoop Technology
Big Data Hadoop TechnologyBig Data Hadoop Technology
Big Data Hadoop Technology
Rahul Sharma
 
Brief Introduction about Hadoop and Core Services.
Brief Introduction about Hadoop and Core Services.Brief Introduction about Hadoop and Core Services.
Brief Introduction about Hadoop and Core Services.
Muthu Natarajan
 
Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...
Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...
Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...
Dataconomy Media
 
BigData & Hadoop Ecosystem.pptx
BigData & Hadoop Ecosystem.pptxBigData & Hadoop Ecosystem.pptx
BigData & Hadoop Ecosystem.pptx
BibhasDeb1
 
Ad

Recently uploaded (20)

Data Analytics Overview and its applications
Data Analytics Overview and its applicationsData Analytics Overview and its applications
Data Analytics Overview and its applications
JanmejayaMishra7
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
How to join illuminati Agent in uganda call+256776963507/0741506136
How to join illuminati Agent in uganda call+256776963507/0741506136How to join illuminati Agent in uganda call+256776963507/0741506136
How to join illuminati Agent in uganda call+256776963507/0741506136
illuminati Agent uganda call+256776963507/0741506136
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTemplate_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
cegiver630
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjksPpt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
panchariyasahil
 
Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
FPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptxFPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptx
ssuser4ef83d
 
Defense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptxDefense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptx
Greg Makowski
 
computer organization and assembly language.docx
computer organization and assembly language.docxcomputer organization and assembly language.docx
computer organization and assembly language.docx
alisoftwareengineer1
 
chapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptxchapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptx
justinebandajbn
 
Calories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptxCalories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptx
TijiLMAHESHWARI
 
How iCode cybertech Helped Me Recover My Lost Funds
How iCode cybertech Helped Me Recover My Lost FundsHow iCode cybertech Helped Me Recover My Lost Funds
How iCode cybertech Helped Me Recover My Lost Funds
ireneschmid345
 
GenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.aiGenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.ai
Inspirient
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
Deloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit contextDeloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit context
Process mining Evangelist
 
Cleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdfCleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdf
alcinialbob1234
 
Data Analytics Overview and its applications
Data Analytics Overview and its applicationsData Analytics Overview and its applications
Data Analytics Overview and its applications
JanmejayaMishra7
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTemplate_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
cegiver630
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjksPpt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
panchariyasahil
 
Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
FPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptxFPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptx
ssuser4ef83d
 
Defense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptxDefense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptx
Greg Makowski
 
computer organization and assembly language.docx
computer organization and assembly language.docxcomputer organization and assembly language.docx
computer organization and assembly language.docx
alisoftwareengineer1
 
chapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptxchapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptx
justinebandajbn
 
Calories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptxCalories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptx
TijiLMAHESHWARI
 
How iCode cybertech Helped Me Recover My Lost Funds
How iCode cybertech Helped Me Recover My Lost FundsHow iCode cybertech Helped Me Recover My Lost Funds
How iCode cybertech Helped Me Recover My Lost Funds
ireneschmid345
 
GenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.aiGenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.ai
Inspirient
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
Deloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit contextDeloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit context
Process mining Evangelist
 
Cleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdfCleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdf
alcinialbob1234
 
Ad

RDBMS vs Hadoop vs Spark

  • 1. vs vs Submitted by: Aastha Joshi Aishwarya Singh Joel Keith Pais Laxmi P Vignesh
  • 3. RDBMS Stands for ‘Relational Database Management System’ It is a database that stores data in a structured format using rows and columns. One can execute queries on the data like adding, updating, and searching for values. It also provides a visual representation of the data. It is "relational" because the values within each table are related to each other. The relational structure makes it possible to run queries across multiple tables at once. Structured Query Language is the standard programming language used to access the database. ADVANTAGES Addresses the need for integrating, managing and analysing data from multiple sources across on- premises and cloud environments Ease to locate and access specific values within the database High flexibility due to storage, retrieval and publishing of JSON data within a relational database EXAMPLES
  • 4. It is a matter of the past when data were limited. Now, the world has already experienced the power of Big Data, and the same is used to analyze to frame different business strategies and others. Apache Hadoop is one of the kinds of open- source platforms that we can use to store and process relatively large datasets amounting from gigabytes to petabytes. This open-source allows multiple computers to make clusters and analyze the large datasets in parallel and effectively.
  • 5. Four main components of the Hadoop ecosystem: HADOOP DISTRIBUTED FILE SYSTEM (HDFS) A primary data storage system that runs on commodity hardware and manages enormous data collections. It also has a high data throughput and a high fault tolerance. YET ANOTHER RESOURCE NEGOTIATOR (YARN) YARN is a cluster resource manager that schedules tasks and assigns resources (such as CPU and memory) to applications. 1 2 3 HADOOP MAPREDUCE Breaks down the big data processing tasks into smaller ones, distributes them across different nodes, and then runs each one. 4 HADOOP COMMON (HADOOP CORE): A collection of common libraries and utilities on which the other three modules rely.
  • 6. Importance of Hadoop Ability to quickly store and handle large amounts of any type of data That's an important concern as data volumes and varieties continue to grow, notably from social media and the Internet of Things (IoT). Computer processing power. Hadoop's distributed computing model efficiently processes large amounts of data. The more computing nodes you use, the more processing power you have. Fault tolerance Data and application processing are protected against hardware failure. If a node fails, jobs are automatically transferred to other nodes, ensuring that the distributed computing does not fail. Multiple copies of all data are stored automatically. Flexibility Unlike traditional relational databases, we don’t have to preprocess data before storing it. We can store as much data as we want and decide how to use it later. It includes unstructured data like text, pictures, and videos. Low cost The open-source framework is free and stores large amounts of data by using commodity hardware. Scalability By simply adding nodes, we can easily expand our system to handle more data. A little administrative is required.
  • 7. Challenges in using Hadoop 1 MAPREDUCE PROGRAMMING ISN'T SUITED FOR EVERY PROBLEM It performs well for simple information requests and problems that can be broken down into independent units, but it is inefficient for iterative and interactive analytic operations. MapReduce is file-intensive. Iterative algorithms require multiple map-shuffle/sort-reduce phases to complete because the nodes mainly communicate through sorts and shuffles. This results in so many files being created between MapReduce phases, which is inefficient for advanced analytics computing. 2 It can be difficult to find entry-level programmers who have adequate Java expertise to be productive with MapReduce. That's one reason distribution providers are racing to put relational (SQL) technology on top of Hadoop. Programmers with SQL skills are easy to find than MapReduce skills. And, Hadoop administration seems a mix of art and science, requiring a basic understanding of operating systems, hardware, and Hadoop kernel settings. THERE’S A WIDELY ACKNOWLEDGED TALENT GAP 3 Another concern is the fragmented data protection challenges, which are being handled by new tools and technology. The Kerberos authentication protocol is a significant step toward securing Hadoop environments. DATA SECURITY 4 Hadoop lacks user-friendly, full-featured tools for data management, data cleansing, governance, and metadata services. FULL-FLEDGED DATA GOVERNANCE AND MANAGEMENT
  • 8. Apache Spark began in 2009 as a research project at UC Berkeley's AMPLab focused on data- intensive application areas. Apache Spark is an open-source, distributed processing system used for big data workloads. For rapid analytic queries against any quantity of data, it uses in-memory caching and efficient query execution. It allows code reuse across different workloads—batch processing, interactive queries, real- time analytics, machine learning, and graph processing—and provides development APIs in Java, Scala, Python, and R. Spark's objective was to build a new framework that was optimised for quick iterative processing, such as machine learning and interactive data analysis, while preserving Hadoop MapReduce's scalability and fault tolerance. The primary importance of Apache Spark in the Big data industry is because of its in-memory data processing that makes it a high-speed data processing engine compared to MapReduce. Apache Spark delivers a better-integrated framework which supports all ranges of Big data formats like batch data, text data, real-time streaming data, graphical data, etc. Apache Spark
  • 9. Core Components Spark SQL and Data Frames: Spark SQL allows users to run SQL and HQL queries in order to process structured and semi- structured data. Spark Streaming: Spark streaming facilitates the processing of live stream data i.e. log files. It also contains APIs to manipulate data streams. MLib Machine Learning: MLib is the Spark library with machine learning functionality. It contains various machine learning algorithms such as regressions, clustering, collaborative filtering, classification, etc. GraphX: The library that supports graph computation is known as GraphX. It enables users to perform graph manipulation. It also provides graph computation algorithms. Apache Spark Core API: It provides a platform to execute Spark applications. Apache Spark framework consists of the main five components that are responsible for the functioning of the Spark.
  • 10. Advantages Speed: For large-scale data processing, Spark is 100 times quicker than Hadoop. Apache Spark utilizes an in-memory (RAM) processing architecture. Ease of Use: Apache Spark provides simple APIs for working with big datasets. It has over 80 high-level operators that make creating parallel programs a breeze. Advanced Analytics: Spark does more than only support 'MAP' and 'reduce'. Machine learning (ML), graph algorithms, streaming data, SQL queries, and other features are also supported. Apache Spark is faster than most data warehouses. Dynamic: Apache Spark allows simple creation of parallel apps. Over 80 high-level operators are available through Spark. Multilingual: Python, Java, Scala, and more programming languages are supported by Apache Spark. Powerful: Because of its low-latency in-memory data processing capacity, Apache Spark can handle a wide range of analytics problems. It has well-developed libraries for graph analytics and machine learning techniques. Open-source: The best thing about Apache Spark is, it has a massive Open-source community behind it.