SlideShare a Scribd company logo
www.edureka.co/r-for-analytics
www.edureka.co/apache-spark-scala-training
Apache Spark: Beyond Hadoop MapReduce
Presenter: Vishal
Slide 2Slide 2Slide 2 www.edureka.co/apache-spark-scala-training
What will you learn today?
 Strength of MapReduce
 Limitations of MapReduce
 How MapReduce limitations can be overcome
 How Spark fits the bill
 Other exciting features in Spark
Strength of MapReduce
Slide 4Slide 4Slide 4 www.edureka.co/apache-spark-scala-training
Simple
Scalable
Fault
Tolerant
Minimal
data
motion
Strength of MapReduce
Independent of a programming language, such as
Java, C++ or Python.
It can process petabytes of data,
stored in HDFS on one cluster
MapReduce takes care of failures
using the replicated copies.
Process moves towards data to minimize Disk I/O
Limitations of MapReduce
Slide 6Slide 6Slide 6 www.edureka.co/apache-spark-scala-training
Real
Time
Complex
Algorithm
Re-reading
and parsing
Data
Minimal
Data
Motion
Graph
Processing
Iterative
Tasks
Random
Access
Limitations Of MR
Slide 7Slide 7Slide 7 www.edureka.co/apache-spark-scala-training
Feature Comparison with Spark
Fast 100x faster than MapReduce
Batch Processing Batch and Real-time Processing
Stores Data on Disk Stores Data in Memory
Written in Java Written in Scala
Hadoop MapReduce Hadoop Spark
Source: Databrix
What are the MR limitations and
how Spark overcomes it?
Slide 9Slide 9Slide 9 www.edureka.co/apache-spark-scala-training
Overcoming MR limitations
By Cutting down on the number
of Reads and Writes to the disc
Real
time
Slide 10Slide 10Slide 10 www.edureka.co/apache-spark-scala-training
Spark tries to keep things in-memory of its distributed workers, allowing for significantly faster/lower-latency
computations, whereas MapReduce keeps shuffling things in and out of disk.
Spark Cuts Down Read/Write I/O To Disk
Slide 11Slide 11Slide 11 www.edureka.co/apache-spark-scala-training
Overcoming MR limitations
Libraries for Machine
Learning & Streaming
Graph
processing
Complex
algorithm
Slide 12Slide 12Slide 12 www.edureka.co/apache-spark-scala-training
Libraries For ML, Graph Programming …
Machine Learning
Library
Graph
programming
Spark interface
For RDBMS lovers
Utility for
continuous
ingestion of data
Slide 13Slide 13Slide 13 www.edureka.co/apache-spark-scala-training
Overcoming MR limitations
Cyclic data flows
Random
access
Slide 14Slide 14Slide 14 www.edureka.co/apache-spark-scala-training
Cyclic Data Flows
• All jobs in spark comprise a series of operators and run on a set of data.
• All the operators in a job are used to construct a DAG (Directed Acyclic
Graph).
• The DAG is optimized by rearranging and combining operators where
possible.
Slide 15Slide 15Slide 15 www.edureka.co/apache-spark-scala-training
Spark Features makes its Architecture better
than MR
Other Spark Features In Demand
Slide 17Slide 17Slide 17 www.edureka.co/apache-spark-scala-training
Spark Features/Modules In Demand
Source: Typesafe
Slide 18Slide 18Slide 18 www.edureka.co/apache-spark-scala-training
New Features In 2015
Data Frames 
• Similar API to data frames in R and Pandas
• Automatically optimised via Spark SQL
• Released in Spark 1.3
SparkR 
• Released in Spark 1.4
• Exposes DataFrames, RDD’s & MLlibrary in R
Machine Learning Pipelines 
• High Level API
• Featurization
• Evaluation
• Model Tuning
External Data Sources 
• Platform API to plug Data-Sources into Spark
• Pushes logic into sources
Source: Databrix
Slide 19Slide 19Slide 19 www.edureka.co/apache-spark-scala-training
Get Certified in Spark from Edureka
Edureka's Spark and Scala course:
• Learn large-scale data processing by mastering the concepts of Scala, RDD, Traits, OOPS and Spark SQL
• Online Live Courses: 24 hours
• Assignments: 32 hours
• Project: 20 hours
• Lifetime Access + 24 X 7 Support
Go to www.edureka.co/apache-spark-scala-training
Batch starts from 10th October (Weekend Batch)
Thank You
Questions/Queries/Feedback/Survey
Recording and presentation will be made available to you within 24 hours

More Related Content

What's hot (20)

Big data Processing with Apache Spark & Scala
Big data Processing with Apache Spark & ScalaBig data Processing with Apache Spark & Scala
Big data Processing with Apache Spark & Scala
Edureka!
 
Spark Streaming
Spark StreamingSpark Streaming
Spark Streaming
Edureka!
 
5 Reasons why Spark is in demand!
5 Reasons why Spark is in demand!5 Reasons why Spark is in demand!
5 Reasons why Spark is in demand!
Edureka!
 
Apache spark
Apache sparkApache spark
Apache spark
Dona Mary Philip
 
Apache spark linkedin
Apache spark linkedinApache spark linkedin
Apache spark linkedin
Yukti Kaura
 
Apache spark - Architecture , Overview & libraries
Apache spark - Architecture , Overview & librariesApache spark - Architecture , Overview & libraries
Apache spark - Architecture , Overview & libraries
Walaa Hamdy Assy
 
End-to-End Data Pipelines with Apache Spark
End-to-End Data Pipelines with Apache SparkEnd-to-End Data Pipelines with Apache Spark
End-to-End Data Pipelines with Apache Spark
Burak Yavuz
 
An Introduction to Apache Spark
An Introduction to Apache SparkAn Introduction to Apache Spark
An Introduction to Apache Spark
Dona Mary Philip
 
Sydney Apache Spark Meetup - Spark Natural Language Processing
Sydney Apache Spark Meetup - Spark Natural Language ProcessingSydney Apache Spark Meetup - Spark Natural Language Processing
Sydney Apache Spark Meetup - Spark Natural Language Processing
Andy Huang
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
Rahul Jain
 
Spark For Faster Batch Processing
Spark For Faster Batch ProcessingSpark For Faster Batch Processing
Spark For Faster Batch Processing
Edureka!
 
5 things one must know about spark!
5 things one must know about spark!5 things one must know about spark!
5 things one must know about spark!
Edureka!
 
Sydney Spark Meetup - September 2015
Sydney Spark Meetup - September 2015Sydney Spark Meetup - September 2015
Sydney Spark Meetup - September 2015
Andy Huang
 
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
Spark Summit
 
Spark: The State of the Art Engine for Big Data Processing
Spark: The State of the Art Engine for Big Data ProcessingSpark: The State of the Art Engine for Big Data Processing
Spark: The State of the Art Engine for Big Data Processing
Ramaninder Singh Jhajj
 
An Introduction to Sparkling Water by Michal Malohlava
An Introduction to Sparkling Water by Michal MalohlavaAn Introduction to Sparkling Water by Michal Malohlava
An Introduction to Sparkling Water by Michal Malohlava
Spark Summit
 
Spark Will Replace Hadoop ! Know Why
Spark Will Replace Hadoop ! Know Why Spark Will Replace Hadoop ! Know Why
Spark Will Replace Hadoop ! Know Why
Edureka!
 
Apache spark
Apache sparkApache spark
Apache spark
TEJPAL GAUTAM
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
airisData
 
Spark_Part 1
Spark_Part 1Spark_Part 1
Spark_Part 1
Shashi Prakash
 
Big data Processing with Apache Spark & Scala
Big data Processing with Apache Spark & ScalaBig data Processing with Apache Spark & Scala
Big data Processing with Apache Spark & Scala
Edureka!
 
Spark Streaming
Spark StreamingSpark Streaming
Spark Streaming
Edureka!
 
5 Reasons why Spark is in demand!
5 Reasons why Spark is in demand!5 Reasons why Spark is in demand!
5 Reasons why Spark is in demand!
Edureka!
 
Apache spark linkedin
Apache spark linkedinApache spark linkedin
Apache spark linkedin
Yukti Kaura
 
Apache spark - Architecture , Overview & libraries
Apache spark - Architecture , Overview & librariesApache spark - Architecture , Overview & libraries
Apache spark - Architecture , Overview & libraries
Walaa Hamdy Assy
 
End-to-End Data Pipelines with Apache Spark
End-to-End Data Pipelines with Apache SparkEnd-to-End Data Pipelines with Apache Spark
End-to-End Data Pipelines with Apache Spark
Burak Yavuz
 
An Introduction to Apache Spark
An Introduction to Apache SparkAn Introduction to Apache Spark
An Introduction to Apache Spark
Dona Mary Philip
 
Sydney Apache Spark Meetup - Spark Natural Language Processing
Sydney Apache Spark Meetup - Spark Natural Language ProcessingSydney Apache Spark Meetup - Spark Natural Language Processing
Sydney Apache Spark Meetup - Spark Natural Language Processing
Andy Huang
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
Rahul Jain
 
Spark For Faster Batch Processing
Spark For Faster Batch ProcessingSpark For Faster Batch Processing
Spark For Faster Batch Processing
Edureka!
 
5 things one must know about spark!
5 things one must know about spark!5 things one must know about spark!
5 things one must know about spark!
Edureka!
 
Sydney Spark Meetup - September 2015
Sydney Spark Meetup - September 2015Sydney Spark Meetup - September 2015
Sydney Spark Meetup - September 2015
Andy Huang
 
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
Spark Summit
 
Spark: The State of the Art Engine for Big Data Processing
Spark: The State of the Art Engine for Big Data ProcessingSpark: The State of the Art Engine for Big Data Processing
Spark: The State of the Art Engine for Big Data Processing
Ramaninder Singh Jhajj
 
An Introduction to Sparkling Water by Michal Malohlava
An Introduction to Sparkling Water by Michal MalohlavaAn Introduction to Sparkling Water by Michal Malohlava
An Introduction to Sparkling Water by Michal Malohlava
Spark Summit
 
Spark Will Replace Hadoop ! Know Why
Spark Will Replace Hadoop ! Know Why Spark Will Replace Hadoop ! Know Why
Spark Will Replace Hadoop ! Know Why
Edureka!
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
airisData
 

Viewers also liked (18)

MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka
MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka
MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka
Edureka!
 
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...
Edureka!
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
datamantra
 
What Is Salesforce CRM? | Salesforce CRM Tutorial For Beginners | Salesforce ...
What Is Salesforce CRM? | Salesforce CRM Tutorial For Beginners | Salesforce ...What Is Salesforce CRM? | Salesforce CRM Tutorial For Beginners | Salesforce ...
What Is Salesforce CRM? | Salesforce CRM Tutorial For Beginners | Salesforce ...
Edureka!
 
5 things one must know about spark!
5 things one must know about spark!5 things one must know about spark!
5 things one must know about spark!
Edureka!
 
Hadoop Introduction
Hadoop IntroductionHadoop Introduction
Hadoop Introduction
tutorialvillage
 
Les Business Analysts face à l'agilité : de nouveaux challenges à relever
Les Business Analysts face à l'agilité : de nouveaux challenges à releverLes Business Analysts face à l'agilité : de nouveaux challenges à relever
Les Business Analysts face à l'agilité : de nouveaux challenges à relever
OCTO Technology Suisse
 
A Basic Introduction to the Hadoop eco system - no animation
A Basic Introduction to the Hadoop eco system - no animationA Basic Introduction to the Hadoop eco system - no animation
A Basic Introduction to the Hadoop eco system - no animation
Sameer Tiwari
 
Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform Webinar
Cloudera, Inc.
 
Understanding Big Data And Hadoop
Understanding Big Data And HadoopUnderstanding Big Data And Hadoop
Understanding Big Data And Hadoop
Edureka!
 
De la pensée projet à la pensée produit
De la pensée projet à la pensée produitDe la pensée projet à la pensée produit
De la pensée projet à la pensée produit
OCTO Technology Suisse
 
Cloud : en 2017, sortez du stratus !
Cloud : en 2017, sortez du stratus !Cloud : en 2017, sortez du stratus !
Cloud : en 2017, sortez du stratus !
OCTO Technology Suisse
 
Fault Tolerance with Kafka
Fault Tolerance with KafkaFault Tolerance with Kafka
Fault Tolerance with Kafka
Edureka!
 
Démystifions l'API-culture!
Démystifions l'API-culture!Démystifions l'API-culture!
Démystifions l'API-culture!
OCTO Technology Suisse
 
Afterwork Blockchain : la prochaine technologie disruptive ?
Afterwork Blockchain : la prochaine technologie disruptive ?Afterwork Blockchain : la prochaine technologie disruptive ?
Afterwork Blockchain : la prochaine technologie disruptive ?
OCTO Technology Suisse
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & Hadoop
Edureka!
 
Real-time Big Data Processing with Storm
Real-time Big Data Processing with StormReal-time Big Data Processing with Storm
Real-time Big Data Processing with Storm
viirya
 
MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka
MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka
MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka
Edureka!
 
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...
Edureka!
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
datamantra
 
What Is Salesforce CRM? | Salesforce CRM Tutorial For Beginners | Salesforce ...
What Is Salesforce CRM? | Salesforce CRM Tutorial For Beginners | Salesforce ...What Is Salesforce CRM? | Salesforce CRM Tutorial For Beginners | Salesforce ...
What Is Salesforce CRM? | Salesforce CRM Tutorial For Beginners | Salesforce ...
Edureka!
 
5 things one must know about spark!
5 things one must know about spark!5 things one must know about spark!
5 things one must know about spark!
Edureka!
 
Les Business Analysts face à l'agilité : de nouveaux challenges à relever
Les Business Analysts face à l'agilité : de nouveaux challenges à releverLes Business Analysts face à l'agilité : de nouveaux challenges à relever
Les Business Analysts face à l'agilité : de nouveaux challenges à relever
OCTO Technology Suisse
 
A Basic Introduction to the Hadoop eco system - no animation
A Basic Introduction to the Hadoop eco system - no animationA Basic Introduction to the Hadoop eco system - no animation
A Basic Introduction to the Hadoop eco system - no animation
Sameer Tiwari
 
Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform Webinar
Cloudera, Inc.
 
Understanding Big Data And Hadoop
Understanding Big Data And HadoopUnderstanding Big Data And Hadoop
Understanding Big Data And Hadoop
Edureka!
 
De la pensée projet à la pensée produit
De la pensée projet à la pensée produitDe la pensée projet à la pensée produit
De la pensée projet à la pensée produit
OCTO Technology Suisse
 
Fault Tolerance with Kafka
Fault Tolerance with KafkaFault Tolerance with Kafka
Fault Tolerance with Kafka
Edureka!
 
Afterwork Blockchain : la prochaine technologie disruptive ?
Afterwork Blockchain : la prochaine technologie disruptive ?Afterwork Blockchain : la prochaine technologie disruptive ?
Afterwork Blockchain : la prochaine technologie disruptive ?
OCTO Technology Suisse
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & Hadoop
Edureka!
 
Real-time Big Data Processing with Storm
Real-time Big Data Processing with StormReal-time Big Data Processing with Storm
Real-time Big Data Processing with Storm
viirya
 

Similar to Apache Spark beyond Hadoop MapReduce (20)

Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
Edureka!
 
Spark is going to replace Apache Hadoop! Know Why?
Spark is going to replace Apache Hadoop! Know Why?Spark is going to replace Apache Hadoop! Know Why?
Spark is going to replace Apache Hadoop! Know Why?
Edureka!
 
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
Edureka!
 
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Edureka!
 
Volodymyr Lyubinets "Introduction to big data processing with Apache Spark"
Volodymyr Lyubinets "Introduction to big data processing with Apache Spark"Volodymyr Lyubinets "Introduction to big data processing with Apache Spark"
Volodymyr Lyubinets "Introduction to big data processing with Apache Spark"
IT Event
 
Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...
Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...
Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...
Edureka!
 
Apache Spark & Scala
Apache Spark & ScalaApache Spark & Scala
Apache Spark & Scala
Edureka!
 
Apache Spark on HDinsight Training
Apache Spark on HDinsight TrainingApache Spark on HDinsight Training
Apache Spark on HDinsight Training
Synergetics Learning and Cloud Consulting
 
Introduction to Spark - DataFactZ
Introduction to Spark - DataFactZIntroduction to Spark - DataFactZ
Introduction to Spark - DataFactZ
DataFactZ
 
Unit II Real Time Data Processing tools.pptx
Unit II Real Time Data Processing tools.pptxUnit II Real Time Data Processing tools.pptx
Unit II Real Time Data Processing tools.pptx
Rahul Borate
 
4Introduction+to+Spark.pptx sdfsdfsdfsdfsdf
4Introduction+to+Spark.pptx sdfsdfsdfsdfsdf4Introduction+to+Spark.pptx sdfsdfsdfsdfsdf
4Introduction+to+Spark.pptx sdfsdfsdfsdfsdf
yafora8192
 
Is Spark the right choice for data analysis ?
Is Spark the right choice for data analysis ?Is Spark the right choice for data analysis ?
Is Spark the right choice for data analysis ?
Ahmed Kamal
 
Apache Spark Core
Apache Spark CoreApache Spark Core
Apache Spark Core
Girish Khanzode
 
Apache Spark Training | Spark Tutorial For Beginners | Apache Spark Certifica...
Apache Spark Training | Spark Tutorial For Beginners | Apache Spark Certifica...Apache Spark Training | Spark Tutorial For Beginners | Apache Spark Certifica...
Apache Spark Training | Spark Tutorial For Beginners | Apache Spark Certifica...
Edureka!
 
Apache Spark Fundamentals
Apache Spark FundamentalsApache Spark Fundamentals
Apache Spark Fundamentals
Zahra Eskandari
 
APACHE SPARK.pptx
APACHE SPARK.pptxAPACHE SPARK.pptx
APACHE SPARK.pptx
DeepaThirumurugan
 
Scrap Your MapReduce - Apache Spark
 Scrap Your MapReduce - Apache Spark Scrap Your MapReduce - Apache Spark
Scrap Your MapReduce - Apache Spark
IndicThreads
 
Spark
SparkSpark
Spark
Heena Madan
 
Apache spark with java 8
Apache spark with java 8Apache spark with java 8
Apache spark with java 8
Janu Jahnavi
 
Apache spark with java 8
Apache spark with java 8Apache spark with java 8
Apache spark with java 8
Janu Jahnavi
 
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
Edureka!
 
Spark is going to replace Apache Hadoop! Know Why?
Spark is going to replace Apache Hadoop! Know Why?Spark is going to replace Apache Hadoop! Know Why?
Spark is going to replace Apache Hadoop! Know Why?
Edureka!
 
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
Edureka!
 
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Edureka!
 
Volodymyr Lyubinets "Introduction to big data processing with Apache Spark"
Volodymyr Lyubinets "Introduction to big data processing with Apache Spark"Volodymyr Lyubinets "Introduction to big data processing with Apache Spark"
Volodymyr Lyubinets "Introduction to big data processing with Apache Spark"
IT Event
 
Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...
Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...
Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...
Edureka!
 
Apache Spark & Scala
Apache Spark & ScalaApache Spark & Scala
Apache Spark & Scala
Edureka!
 
Introduction to Spark - DataFactZ
Introduction to Spark - DataFactZIntroduction to Spark - DataFactZ
Introduction to Spark - DataFactZ
DataFactZ
 
Unit II Real Time Data Processing tools.pptx
Unit II Real Time Data Processing tools.pptxUnit II Real Time Data Processing tools.pptx
Unit II Real Time Data Processing tools.pptx
Rahul Borate
 
4Introduction+to+Spark.pptx sdfsdfsdfsdfsdf
4Introduction+to+Spark.pptx sdfsdfsdfsdfsdf4Introduction+to+Spark.pptx sdfsdfsdfsdfsdf
4Introduction+to+Spark.pptx sdfsdfsdfsdfsdf
yafora8192
 
Is Spark the right choice for data analysis ?
Is Spark the right choice for data analysis ?Is Spark the right choice for data analysis ?
Is Spark the right choice for data analysis ?
Ahmed Kamal
 
Apache Spark Training | Spark Tutorial For Beginners | Apache Spark Certifica...
Apache Spark Training | Spark Tutorial For Beginners | Apache Spark Certifica...Apache Spark Training | Spark Tutorial For Beginners | Apache Spark Certifica...
Apache Spark Training | Spark Tutorial For Beginners | Apache Spark Certifica...
Edureka!
 
Apache Spark Fundamentals
Apache Spark FundamentalsApache Spark Fundamentals
Apache Spark Fundamentals
Zahra Eskandari
 
Scrap Your MapReduce - Apache Spark
 Scrap Your MapReduce - Apache Spark Scrap Your MapReduce - Apache Spark
Scrap Your MapReduce - Apache Spark
IndicThreads
 
Apache spark with java 8
Apache spark with java 8Apache spark with java 8
Apache spark with java 8
Janu Jahnavi
 
Apache spark with java 8
Apache spark with java 8Apache spark with java 8
Apache spark with java 8
Janu Jahnavi
 

More from Edureka! (20)

What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaWhat to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | Edureka
Edureka!
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | Edureka
Edureka!
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | Edureka
Edureka!
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | Edureka
Edureka!
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaPython Programming Tutorial | Edureka
Python Programming Tutorial | Edureka
Edureka!
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaTop 5 PMP Certifications | Edureka
Top 5 PMP Certifications | Edureka
Edureka!
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | Edureka
Edureka!
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | Edureka
Edureka!
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| Edureka
Edureka!
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | Edureka
Edureka!
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | Edureka
Edureka!
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | Edureka
Edureka!
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | Edureka
Edureka!
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | Edureka
Edureka!
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | Edureka
Edureka!
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | Edureka
Edureka!
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka
Edureka!
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
Edureka!
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | Edureka
Edureka!
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | Edureka
Edureka!
 
What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaWhat to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | Edureka
Edureka!
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | Edureka
Edureka!
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | Edureka
Edureka!
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | Edureka
Edureka!
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaPython Programming Tutorial | Edureka
Python Programming Tutorial | Edureka
Edureka!
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaTop 5 PMP Certifications | Edureka
Top 5 PMP Certifications | Edureka
Edureka!
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | Edureka
Edureka!
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | Edureka
Edureka!
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| Edureka
Edureka!
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | Edureka
Edureka!
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | Edureka
Edureka!
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | Edureka
Edureka!
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | Edureka
Edureka!
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | Edureka
Edureka!
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | Edureka
Edureka!
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | Edureka
Edureka!
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka
Edureka!
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
Edureka!
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | Edureka
Edureka!
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | Edureka
Edureka!
 

Recently uploaded (20)

GDG Cloud Southlake #43: Tommy Todd: The Quantum Apocalypse: A Looming Threat...
GDG Cloud Southlake #43: Tommy Todd: The Quantum Apocalypse: A Looming Threat...GDG Cloud Southlake #43: Tommy Todd: The Quantum Apocalypse: A Looming Threat...
GDG Cloud Southlake #43: Tommy Todd: The Quantum Apocalypse: A Looming Threat...
James Anderson
 
AI in Java - MCP in Action, Langchain4J-CDI, SmallRye-LLM, Spring AI
AI in Java - MCP in Action, Langchain4J-CDI, SmallRye-LLM, Spring AIAI in Java - MCP in Action, Langchain4J-CDI, SmallRye-LLM, Spring AI
AI in Java - MCP in Action, Langchain4J-CDI, SmallRye-LLM, Spring AI
Buhake Sindi
 
European Accessibility Act & Integrated Accessibility Testing
European Accessibility Act & Integrated Accessibility TestingEuropean Accessibility Act & Integrated Accessibility Testing
European Accessibility Act & Integrated Accessibility Testing
Julia Undeutsch
 
MCP Dev Summit - Pragmatic Scaling of Enterprise GenAI with MCP
MCP Dev Summit - Pragmatic Scaling of Enterprise GenAI with MCPMCP Dev Summit - Pragmatic Scaling of Enterprise GenAI with MCP
MCP Dev Summit - Pragmatic Scaling of Enterprise GenAI with MCP
Sambhav Kothari
 
System Card: Claude Opus 4 & Claude Sonnet 4
System Card: Claude Opus 4 & Claude Sonnet 4System Card: Claude Opus 4 & Claude Sonnet 4
System Card: Claude Opus 4 & Claude Sonnet 4
Razin Mustafiz
 
Talk: On an adventure into the depths of Maven - Kaya Weers
Talk: On an adventure into the depths of Maven - Kaya WeersTalk: On an adventure into the depths of Maven - Kaya Weers
Talk: On an adventure into the depths of Maven - Kaya Weers
Kaya Weers
 
Master tester AI toolbox - Kari Kakkonen at Testaus ja AI 2025 Professio
Master tester AI toolbox - Kari Kakkonen at Testaus ja AI 2025 ProfessioMaster tester AI toolbox - Kari Kakkonen at Testaus ja AI 2025 Professio
Master tester AI toolbox - Kari Kakkonen at Testaus ja AI 2025 Professio
Kari Kakkonen
 
From Legacy to Cloud-Native: A Guide to AWS Modernization.pptx
From Legacy to Cloud-Native: A Guide to AWS Modernization.pptxFrom Legacy to Cloud-Native: A Guide to AWS Modernization.pptx
From Legacy to Cloud-Native: A Guide to AWS Modernization.pptx
Mohammad Jomaa
 
cloudgenesis cloud workshop , gdg on campus mita
cloudgenesis cloud workshop , gdg on campus mitacloudgenesis cloud workshop , gdg on campus mita
cloudgenesis cloud workshop , gdg on campus mita
siyaldhande02
 
With Claude 4, Anthropic redefines AI capabilities, effectively unleashing a ...
With Claude 4, Anthropic redefines AI capabilities, effectively unleashing a ...With Claude 4, Anthropic redefines AI capabilities, effectively unleashing a ...
With Claude 4, Anthropic redefines AI capabilities, effectively unleashing a ...
SOFTTECHHUB
 
SAP Sapphire 2025 ERP1612 Enhancing User Experience with SAP Fiori and AI
SAP Sapphire 2025 ERP1612 Enhancing User Experience with SAP Fiori and AISAP Sapphire 2025 ERP1612 Enhancing User Experience with SAP Fiori and AI
SAP Sapphire 2025 ERP1612 Enhancing User Experience with SAP Fiori and AI
Peter Spielvogel
 
Agentic AI - The New Era of Intelligence
Agentic AI - The New Era of IntelligenceAgentic AI - The New Era of Intelligence
Agentic AI - The New Era of Intelligence
Muzammil Shah
 
UiPath Community Berlin: Studio Tips & Tricks and UiPath Insights
UiPath Community Berlin: Studio Tips & Tricks and UiPath InsightsUiPath Community Berlin: Studio Tips & Tricks and UiPath Insights
UiPath Community Berlin: Studio Tips & Tricks and UiPath Insights
UiPathCommunity
 
What is DePIN? The Hottest Trend in Web3 Right Now!
What is DePIN? The Hottest Trend in Web3 Right Now!What is DePIN? The Hottest Trend in Web3 Right Now!
What is DePIN? The Hottest Trend in Web3 Right Now!
cryptouniversityoffi
 
AI Emotional Actors: “When Machines Learn to Feel and Perform"
AI Emotional Actors:  “When Machines Learn to Feel and Perform"AI Emotional Actors:  “When Machines Learn to Feel and Perform"
AI Emotional Actors: “When Machines Learn to Feel and Perform"
AkashKumar809858
 
UiPath Community Zurich: Release Management and Build Pipelines
UiPath Community Zurich: Release Management and Build PipelinesUiPath Community Zurich: Release Management and Build Pipelines
UiPath Community Zurich: Release Management and Build Pipelines
UiPathCommunity
 
Introducing FME Realize: A New Era of Spatial Computing and AR
Introducing FME Realize: A New Era of Spatial Computing and ARIntroducing FME Realize: A New Era of Spatial Computing and AR
Introducing FME Realize: A New Era of Spatial Computing and AR
Safe Software
 
Security Operations and the Defense Analyst - Splunk Certificate
Security Operations and the Defense Analyst - Splunk CertificateSecurity Operations and the Defense Analyst - Splunk Certificate
Security Operations and the Defense Analyst - Splunk Certificate
VICTOR MAESTRE RAMIREZ
 
"AI in the browser: predicting user actions in real time with TensorflowJS", ...
"AI in the browser: predicting user actions in real time with TensorflowJS", ..."AI in the browser: predicting user actions in real time with TensorflowJS", ...
"AI in the browser: predicting user actions in real time with TensorflowJS", ...
Fwdays
 
SDG 9000 Series: Unleashing multigigabit everywhere
SDG 9000 Series: Unleashing multigigabit everywhereSDG 9000 Series: Unleashing multigigabit everywhere
SDG 9000 Series: Unleashing multigigabit everywhere
Adtran
 
GDG Cloud Southlake #43: Tommy Todd: The Quantum Apocalypse: A Looming Threat...
GDG Cloud Southlake #43: Tommy Todd: The Quantum Apocalypse: A Looming Threat...GDG Cloud Southlake #43: Tommy Todd: The Quantum Apocalypse: A Looming Threat...
GDG Cloud Southlake #43: Tommy Todd: The Quantum Apocalypse: A Looming Threat...
James Anderson
 
AI in Java - MCP in Action, Langchain4J-CDI, SmallRye-LLM, Spring AI
AI in Java - MCP in Action, Langchain4J-CDI, SmallRye-LLM, Spring AIAI in Java - MCP in Action, Langchain4J-CDI, SmallRye-LLM, Spring AI
AI in Java - MCP in Action, Langchain4J-CDI, SmallRye-LLM, Spring AI
Buhake Sindi
 
European Accessibility Act & Integrated Accessibility Testing
European Accessibility Act & Integrated Accessibility TestingEuropean Accessibility Act & Integrated Accessibility Testing
European Accessibility Act & Integrated Accessibility Testing
Julia Undeutsch
 
MCP Dev Summit - Pragmatic Scaling of Enterprise GenAI with MCP
MCP Dev Summit - Pragmatic Scaling of Enterprise GenAI with MCPMCP Dev Summit - Pragmatic Scaling of Enterprise GenAI with MCP
MCP Dev Summit - Pragmatic Scaling of Enterprise GenAI with MCP
Sambhav Kothari
 
System Card: Claude Opus 4 & Claude Sonnet 4
System Card: Claude Opus 4 & Claude Sonnet 4System Card: Claude Opus 4 & Claude Sonnet 4
System Card: Claude Opus 4 & Claude Sonnet 4
Razin Mustafiz
 
Talk: On an adventure into the depths of Maven - Kaya Weers
Talk: On an adventure into the depths of Maven - Kaya WeersTalk: On an adventure into the depths of Maven - Kaya Weers
Talk: On an adventure into the depths of Maven - Kaya Weers
Kaya Weers
 
Master tester AI toolbox - Kari Kakkonen at Testaus ja AI 2025 Professio
Master tester AI toolbox - Kari Kakkonen at Testaus ja AI 2025 ProfessioMaster tester AI toolbox - Kari Kakkonen at Testaus ja AI 2025 Professio
Master tester AI toolbox - Kari Kakkonen at Testaus ja AI 2025 Professio
Kari Kakkonen
 
From Legacy to Cloud-Native: A Guide to AWS Modernization.pptx
From Legacy to Cloud-Native: A Guide to AWS Modernization.pptxFrom Legacy to Cloud-Native: A Guide to AWS Modernization.pptx
From Legacy to Cloud-Native: A Guide to AWS Modernization.pptx
Mohammad Jomaa
 
cloudgenesis cloud workshop , gdg on campus mita
cloudgenesis cloud workshop , gdg on campus mitacloudgenesis cloud workshop , gdg on campus mita
cloudgenesis cloud workshop , gdg on campus mita
siyaldhande02
 
With Claude 4, Anthropic redefines AI capabilities, effectively unleashing a ...
With Claude 4, Anthropic redefines AI capabilities, effectively unleashing a ...With Claude 4, Anthropic redefines AI capabilities, effectively unleashing a ...
With Claude 4, Anthropic redefines AI capabilities, effectively unleashing a ...
SOFTTECHHUB
 
SAP Sapphire 2025 ERP1612 Enhancing User Experience with SAP Fiori and AI
SAP Sapphire 2025 ERP1612 Enhancing User Experience with SAP Fiori and AISAP Sapphire 2025 ERP1612 Enhancing User Experience with SAP Fiori and AI
SAP Sapphire 2025 ERP1612 Enhancing User Experience with SAP Fiori and AI
Peter Spielvogel
 
Agentic AI - The New Era of Intelligence
Agentic AI - The New Era of IntelligenceAgentic AI - The New Era of Intelligence
Agentic AI - The New Era of Intelligence
Muzammil Shah
 
UiPath Community Berlin: Studio Tips & Tricks and UiPath Insights
UiPath Community Berlin: Studio Tips & Tricks and UiPath InsightsUiPath Community Berlin: Studio Tips & Tricks and UiPath Insights
UiPath Community Berlin: Studio Tips & Tricks and UiPath Insights
UiPathCommunity
 
What is DePIN? The Hottest Trend in Web3 Right Now!
What is DePIN? The Hottest Trend in Web3 Right Now!What is DePIN? The Hottest Trend in Web3 Right Now!
What is DePIN? The Hottest Trend in Web3 Right Now!
cryptouniversityoffi
 
AI Emotional Actors: “When Machines Learn to Feel and Perform"
AI Emotional Actors:  “When Machines Learn to Feel and Perform"AI Emotional Actors:  “When Machines Learn to Feel and Perform"
AI Emotional Actors: “When Machines Learn to Feel and Perform"
AkashKumar809858
 
UiPath Community Zurich: Release Management and Build Pipelines
UiPath Community Zurich: Release Management and Build PipelinesUiPath Community Zurich: Release Management and Build Pipelines
UiPath Community Zurich: Release Management and Build Pipelines
UiPathCommunity
 
Introducing FME Realize: A New Era of Spatial Computing and AR
Introducing FME Realize: A New Era of Spatial Computing and ARIntroducing FME Realize: A New Era of Spatial Computing and AR
Introducing FME Realize: A New Era of Spatial Computing and AR
Safe Software
 
Security Operations and the Defense Analyst - Splunk Certificate
Security Operations and the Defense Analyst - Splunk CertificateSecurity Operations and the Defense Analyst - Splunk Certificate
Security Operations and the Defense Analyst - Splunk Certificate
VICTOR MAESTRE RAMIREZ
 
"AI in the browser: predicting user actions in real time with TensorflowJS", ...
"AI in the browser: predicting user actions in real time with TensorflowJS", ..."AI in the browser: predicting user actions in real time with TensorflowJS", ...
"AI in the browser: predicting user actions in real time with TensorflowJS", ...
Fwdays
 
SDG 9000 Series: Unleashing multigigabit everywhere
SDG 9000 Series: Unleashing multigigabit everywhereSDG 9000 Series: Unleashing multigigabit everywhere
SDG 9000 Series: Unleashing multigigabit everywhere
Adtran
 

Apache Spark beyond Hadoop MapReduce

  • 2. Slide 2Slide 2Slide 2 www.edureka.co/apache-spark-scala-training What will you learn today?  Strength of MapReduce  Limitations of MapReduce  How MapReduce limitations can be overcome  How Spark fits the bill  Other exciting features in Spark
  • 4. Slide 4Slide 4Slide 4 www.edureka.co/apache-spark-scala-training Simple Scalable Fault Tolerant Minimal data motion Strength of MapReduce Independent of a programming language, such as Java, C++ or Python. It can process petabytes of data, stored in HDFS on one cluster MapReduce takes care of failures using the replicated copies. Process moves towards data to minimize Disk I/O
  • 6. Slide 6Slide 6Slide 6 www.edureka.co/apache-spark-scala-training Real Time Complex Algorithm Re-reading and parsing Data Minimal Data Motion Graph Processing Iterative Tasks Random Access Limitations Of MR
  • 7. Slide 7Slide 7Slide 7 www.edureka.co/apache-spark-scala-training Feature Comparison with Spark Fast 100x faster than MapReduce Batch Processing Batch and Real-time Processing Stores Data on Disk Stores Data in Memory Written in Java Written in Scala Hadoop MapReduce Hadoop Spark Source: Databrix
  • 8. What are the MR limitations and how Spark overcomes it?
  • 9. Slide 9Slide 9Slide 9 www.edureka.co/apache-spark-scala-training Overcoming MR limitations By Cutting down on the number of Reads and Writes to the disc Real time
  • 10. Slide 10Slide 10Slide 10 www.edureka.co/apache-spark-scala-training Spark tries to keep things in-memory of its distributed workers, allowing for significantly faster/lower-latency computations, whereas MapReduce keeps shuffling things in and out of disk. Spark Cuts Down Read/Write I/O To Disk
  • 11. Slide 11Slide 11Slide 11 www.edureka.co/apache-spark-scala-training Overcoming MR limitations Libraries for Machine Learning & Streaming Graph processing Complex algorithm
  • 12. Slide 12Slide 12Slide 12 www.edureka.co/apache-spark-scala-training Libraries For ML, Graph Programming … Machine Learning Library Graph programming Spark interface For RDBMS lovers Utility for continuous ingestion of data
  • 13. Slide 13Slide 13Slide 13 www.edureka.co/apache-spark-scala-training Overcoming MR limitations Cyclic data flows Random access
  • 14. Slide 14Slide 14Slide 14 www.edureka.co/apache-spark-scala-training Cyclic Data Flows • All jobs in spark comprise a series of operators and run on a set of data. • All the operators in a job are used to construct a DAG (Directed Acyclic Graph). • The DAG is optimized by rearranging and combining operators where possible.
  • 15. Slide 15Slide 15Slide 15 www.edureka.co/apache-spark-scala-training Spark Features makes its Architecture better than MR
  • 16. Other Spark Features In Demand
  • 17. Slide 17Slide 17Slide 17 www.edureka.co/apache-spark-scala-training Spark Features/Modules In Demand Source: Typesafe
  • 18. Slide 18Slide 18Slide 18 www.edureka.co/apache-spark-scala-training New Features In 2015 Data Frames  • Similar API to data frames in R and Pandas • Automatically optimised via Spark SQL • Released in Spark 1.3 SparkR  • Released in Spark 1.4 • Exposes DataFrames, RDD’s & MLlibrary in R Machine Learning Pipelines  • High Level API • Featurization • Evaluation • Model Tuning External Data Sources  • Platform API to plug Data-Sources into Spark • Pushes logic into sources Source: Databrix
  • 19. Slide 19Slide 19Slide 19 www.edureka.co/apache-spark-scala-training Get Certified in Spark from Edureka Edureka's Spark and Scala course: • Learn large-scale data processing by mastering the concepts of Scala, RDD, Traits, OOPS and Spark SQL • Online Live Courses: 24 hours • Assignments: 32 hours • Project: 20 hours • Lifetime Access + 24 X 7 Support Go to www.edureka.co/apache-spark-scala-training Batch starts from 10th October (Weekend Batch)
  • 20. Thank You Questions/Queries/Feedback/Survey Recording and presentation will be made available to you within 24 hours