SlideShare a Scribd company logo
www.edureka.co/data-science
Top 5 Algorithms Used in Data Science
Slide 2 www.edureka.co/data-science
What are we going to learn today ?
At the end of the session you will be able to understand :
 What is Data Science
 What does Data Scientists do
 Top 5 Data Science Algorithms
 Decision Tree
 Random Forest
 Association Rule Mining
 Linear Regression
 K-Means Clustering
 Demo on K-Means Clustering algorithm
Slide 3 www.edureka.co/data-science
Data Science
Slide 4 www.edureka.co/data-science
What is Data Science ?
Data science is nothing but extracting meaningful and actionable knowledge from data
Slide 5 www.edureka.co/data-science
Who are Data Scientists ?
Basically data scientists are humans who have multitude of skills and who love playing with data
Slide 6 www.edureka.co/data-science
Data Science from 1000 feet
Data Science
Visualization
Data Engineering
Statistics
Advanced Computing
Domain Expertise
Slide 7 www.edureka.co/data-science
Arsenal of a Data Scientist
Data Science
Data Architecture
Tool: Hadoop
Machine Learning
Tool: Mahout, Weka, Spark MLlib
Analytics
Tool: R, Python
Note that evaluating different machine learning algorithms is a daily work of a
data scientist. So it becomes very important for a data scientist to have a good
grip over various machine learning algorithms.
Slide 8 www.edureka.co/data-science
Machine Learning
Machine Learning is a method of teaching computers to make and improve predictions based on data
Machine learning is a huge field, with hundreds of different algorithms for solving myriad different problems
Supervised Learning : The categories of the data is already known
Unsupervised Learning : The learning process attempts to find appropriate category for the data
Slide 9 www.edureka.co/data-science
Decision TreeDecision Tree
Slide 10 www.edureka.co/data-science
Decision Tree Example
Training
Data
Slide 11 www.edureka.co/data-science
Decision Tree, Root : Student
Step-1
Student
Slide 12 www.edureka.co/data-science
Decision Tree, Root : Student
Step-2
Student
Income
Income
Medium
Slide 13 www.edureka.co/data-science
Decision Tree, Root : Student
Step-3
Student
Income
Income
YES
YES
Medium
Slide 14 www.edureka.co/data-science
Decision Tree, Root : Student
Student
Income Income
Age CR
CR
YES YES31….40
Medium
Step-4
Slide 15 www.edureka.co/data-science
Decision Tree, Root : Student
Student
Income Income
Age CR
CR
No
Yes
Yes
Yes
Yes
31….40
Medium
Step-5
Slide 16 www.edureka.co/data-science
Decision Tree, Root : Student
Student
Income Income
Age CR
No
Yes
31….40
Age
Age
Yes No
No
Yes
31….40
CR
Age
Yes No
> 40
31….40
Yes
Yes Yes
Fair
Medium
Step-6
Slide 17 www.edureka.co/data-science
Decision Tree, Root : Student
 1. student(no)^income(high)^age(<=30) => buys_computer(no)
 2. student(no)^income(high)^age(31…40) => buys_computer(yes)
 3. student(no)^income(medium)^CR(fair)^age(>40) => buys_computer(yes)
 4. student(no)^income(medium)^CR(fair)^age(<=30) => buys_computer(no)
 5. student(no)^income(medium)^CR(excellent)^age(>40) => buys_computer(no)
 6. student(no)^income(medium)^CR(excellent)^age(31..40) =>buys_computer(yes)
 7. student(yes)^income(low)^CR(fair) => buys_computer(yes)
 8. student(yes)^income(low)^CR(excellent)^age(31..40) => buys_computer(yes)
 9. student(yes)^income(low)^CR(excellent)^age(>40) => buys_computer(no)
 10. student(yes)^income(medium)=> buys_computer(yes)
 11. student(yes)^income(high)=> buys_computer(yes)
Classification rules :
Slide 18 www.edureka.co/data-science
Random ForestRandom Forest
Slide 19 www.edureka.co/data-science
Random Forest : Example
Suppose you're very indecisive about
watching a movie.
“Edge of Tomorrow”
You can do one of the following :
1. Either you ask your best friend,
whether you will like the movie.
2. Or You can ask your group of friends.
Slide 20 www.edureka.co/data-science
Random Forest : Example
In order to answer, your best friend first needs
to figure out what movies you like, so you give
her a bunch of movies and tell her whether you
liked each one or not (i.e., you give her a
labelled training set)
Example:
Do you like movies starring Emily Blunt ?
Ask
Best
Friend
Is it based on a
true incident?
Does Emily
Blunt star in it?
No
Is she the
main lead?
Yes, You will like
the movie
No Yes
No, You will
not like the
movie
No, You will not
like the movie
Slide 21 www.edureka.co/data-science
Random Forest : Example
But your best friend might not always generalize your
preferences very well (i.e., she overfits)
In order to get more accurate recommendations, you'd like
to ask a bunch of your friends e.g. Friend#1, Friend#2, and
Friend#3 and they vote on whether you will like a movie
The majority of the votes will decide the final outcome
Slide 22 www.edureka.co/data-science
Random Forest : Example
You didn’t
like ‘Far and
away’
You liked
‘Oblivion’
You like action
movies
You like Tom
Cruise
You like his
pairing with
Emily Blunt
Yes, You will like
the movie
Yes, You will
like the movie
Yes, You will
like the
movie
Friend 2
You did not
like ‘Top
Gun’
You loved
‘Godzilla’
Friend 1
No, You will
not like the
movie
Yes, You will
like the
movie
You hate Tom
Cruise
Friend 3
No, You will not
like the movie
Slide 23 www.edureka.co/data-science
What is Random Forest ?
Random Forest is an ensemble classifier made using many decision tree models.
What are ensemble models?
 Ensemble models combine the results from different models.
 The result from an ensemble model is usually better than the result from one of the individual models.
Slide 24 www.edureka.co/data-science
Association Rule MiningAssociation Rule Mining
Slide 25 www.edureka.co/data-science
Association Rule Mining
Slide 26 www.edureka.co/data-science
Association Rule Mining
 Association Rule Mining is a popular and well researched method for discovering interesting
relations between variables in large data.
 The rule found in the sales data of a supermarket would indicate that if a customer buys onions
and potatoes together, he or she is likely to also buy hamburger meat.
Slide 27 www.edureka.co/data-science
Linear RegressionLinear Regression
Slide 28 www.edureka.co/data-science
Regression Analysis – Linear Regression
Regression analysis helps understand how value of dependent variable changes when any one of
independent variable changes, while other independent variables are kept fixed
Linear Regression is the most popular algorithm used for prediction and forecasting
Slide 29 www.edureka.co/data-science
K-Means ClusteringK-Means Clustering
Slide 30 www.edureka.co/data-science
K-Means Clustering
The process by which objects are classified into
a number of groups so that they are as much
dissimilar as possible from one group to another
group, but as much similar as possible within
each group.
The objects in group 1 should be as similar as
possible.
But there should be much difference between
objects in different groups
The attributes of the objects are allowed to
determine which objects should be grouped
together.
Total population
Group 1
Group 2 Group 3
Group 4
Slide 31 www.edureka.co/data-science
Hands-On
Demo K-Means Clustering
Slide 32 Course Url
Thank You …
Questions/Queries/Feedback
Recording and presentation will be made available to you within 24 hours
Ad

More Related Content

Viewers also liked (12)

Health care and big data with hadoop – Beacuse prevention is better than cure
Health care and big data with hadoop – Beacuse prevention is better than cureHealth care and big data with hadoop – Beacuse prevention is better than cure
Health care and big data with hadoop – Beacuse prevention is better than cure
Edureka!
 
Big Data Analytics for Non-Programmers
Big Data Analytics for Non-ProgrammersBig Data Analytics for Non-Programmers
Big Data Analytics for Non-Programmers
Edureka!
 
Big Data Processing with Spark and Scala
Big Data Processing with Spark and Scala Big Data Processing with Spark and Scala
Big Data Processing with Spark and Scala
Edureka!
 
Spark for big data analytics
Spark for big data analyticsSpark for big data analytics
Spark for big data analytics
Edureka!
 
Is Data Scientist still the sexiest job of 21st century? Find Out!
Is Data Scientist still the sexiest job of 21st century? Find Out!Is Data Scientist still the sexiest job of 21st century? Find Out!
Is Data Scientist still the sexiest job of 21st century? Find Out!
Edureka!
 
Mastering in data warehousing & BusinessIintelligence
Mastering in data warehousing & BusinessIintelligenceMastering in data warehousing & BusinessIintelligence
Mastering in data warehousing & BusinessIintelligence
Edureka!
 
Clare Corthell: Learning Data Science Online
Clare Corthell: Learning Data Science OnlineClare Corthell: Learning Data Science Online
Clare Corthell: Learning Data Science Online
sfdatascience
 
Power of Python with Big Data
Power of Python with Big DataPower of Python with Big Data
Power of Python with Big Data
Edureka!
 
R and Visualization: A match made in Heaven
R and Visualization: A match made in HeavenR and Visualization: A match made in Heaven
R and Visualization: A match made in Heaven
Edureka!
 
Python for Big Data Analytics
Python for Big Data AnalyticsPython for Big Data Analytics
Python for Big Data Analytics
Edureka!
 
Machine Learning In Python | Python Machine Learning Tutorial | Deep Learning...
Machine Learning In Python | Python Machine Learning Tutorial | Deep Learning...Machine Learning In Python | Python Machine Learning Tutorial | Deep Learning...
Machine Learning In Python | Python Machine Learning Tutorial | Deep Learning...
Edureka!
 
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Edureka!
 
Health care and big data with hadoop – Beacuse prevention is better than cure
Health care and big data with hadoop – Beacuse prevention is better than cureHealth care and big data with hadoop – Beacuse prevention is better than cure
Health care and big data with hadoop – Beacuse prevention is better than cure
Edureka!
 
Big Data Analytics for Non-Programmers
Big Data Analytics for Non-ProgrammersBig Data Analytics for Non-Programmers
Big Data Analytics for Non-Programmers
Edureka!
 
Big Data Processing with Spark and Scala
Big Data Processing with Spark and Scala Big Data Processing with Spark and Scala
Big Data Processing with Spark and Scala
Edureka!
 
Spark for big data analytics
Spark for big data analyticsSpark for big data analytics
Spark for big data analytics
Edureka!
 
Is Data Scientist still the sexiest job of 21st century? Find Out!
Is Data Scientist still the sexiest job of 21st century? Find Out!Is Data Scientist still the sexiest job of 21st century? Find Out!
Is Data Scientist still the sexiest job of 21st century? Find Out!
Edureka!
 
Mastering in data warehousing & BusinessIintelligence
Mastering in data warehousing & BusinessIintelligenceMastering in data warehousing & BusinessIintelligence
Mastering in data warehousing & BusinessIintelligence
Edureka!
 
Clare Corthell: Learning Data Science Online
Clare Corthell: Learning Data Science OnlineClare Corthell: Learning Data Science Online
Clare Corthell: Learning Data Science Online
sfdatascience
 
Power of Python with Big Data
Power of Python with Big DataPower of Python with Big Data
Power of Python with Big Data
Edureka!
 
R and Visualization: A match made in Heaven
R and Visualization: A match made in HeavenR and Visualization: A match made in Heaven
R and Visualization: A match made in Heaven
Edureka!
 
Python for Big Data Analytics
Python for Big Data AnalyticsPython for Big Data Analytics
Python for Big Data Analytics
Edureka!
 
Machine Learning In Python | Python Machine Learning Tutorial | Deep Learning...
Machine Learning In Python | Python Machine Learning Tutorial | Deep Learning...Machine Learning In Python | Python Machine Learning Tutorial | Deep Learning...
Machine Learning In Python | Python Machine Learning Tutorial | Deep Learning...
Edureka!
 
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Edureka!
 

Similar to Top 5 algorithms used in Data Science (20)

Data Science : Make Smarter Business Decisions
Data Science : Make Smarter Business DecisionsData Science : Make Smarter Business Decisions
Data Science : Make Smarter Business Decisions
Edureka!
 
Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...
Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...
Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...
Edureka!
 
Business Analytics Decision Tree in R
Business Analytics Decision Tree in RBusiness Analytics Decision Tree in R
Business Analytics Decision Tree in R
Edureka!
 
Greg Wilson - We Know (but ignore) More Than We Think
Greg Wilson - We Know (but ignore) More Than We ThinkGreg Wilson - We Know (but ignore) More Than We Think
Greg Wilson - We Know (but ignore) More Than We Think
#DevTO
 
Data Science Isn't a Fad: Let's Keep it That Way
Data Science Isn't a Fad: Let's Keep it That WayData Science Isn't a Fad: Let's Keep it That Way
Data Science Isn't a Fad: Let's Keep it That Way
Melinda Thielbar
 
The Quest for Learner Engagement
The Quest for Learner EngagementThe Quest for Learner Engagement
The Quest for Learner Engagement
Karl Kapp
 
What is Data Science? A Complete Guide to Tools, Careers, AI, and Future Trends
What is Data Science? A Complete Guide to Tools, Careers, AI, and Future TrendsWhat is Data Science? A Complete Guide to Tools, Careers, AI, and Future Trends
What is Data Science? A Complete Guide to Tools, Careers, AI, and Future Trends
Dina G
 
Inferring networks of substitute and complementary products
Inferring networks of substitute and complementary productsInferring networks of substitute and complementary products
Inferring networks of substitute and complementary products
Turi, Inc.
 
Application of Clustering in Data Science using Real-life Examples
Application of Clustering in Data Science using Real-life Examples Application of Clustering in Data Science using Real-life Examples
Application of Clustering in Data Science using Real-life Examples
Edureka!
 
Data Science: The Product Manager's Primer
Data Science: The Product Manager's PrimerData Science: The Product Manager's Primer
Data Science: The Product Manager's Primer
Product School
 
Logistic Regression In Data Science
Logistic Regression In Data ScienceLogistic Regression In Data Science
Logistic Regression In Data Science
Edureka!
 
Module 1.3 data exploratory
Module 1.3  data exploratoryModule 1.3  data exploratory
Module 1.3 data exploratory
Sara Hooker
 
3_English 10_Q4 (Terms used in Research).pptx
3_English 10_Q4 (Terms used in Research).pptx3_English 10_Q4 (Terms used in Research).pptx
3_English 10_Q4 (Terms used in Research).pptx
ReverieArevalo
 
Sentiment Analysis In Retail Domain
Sentiment Analysis In Retail DomainSentiment Analysis In Retail Domain
Sentiment Analysis In Retail Domain
Edureka!
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
Hongwei Zhao
 
1 decisiontree dtree18[1]
1 decisiontree dtree18[1]1 decisiontree dtree18[1]
1 decisiontree dtree18[1]
翀莺 缪
 
learningIntro.doc
learningIntro.doclearningIntro.doc
learningIntro.doc
butest
 
learningIntro.doc
learningIntro.doclearningIntro.doc
learningIntro.doc
butest
 
[Webinar] How Big Data and Machine Learning Are Transforming ITSM
[Webinar] How Big Data and Machine Learning Are Transforming ITSM[Webinar] How Big Data and Machine Learning Are Transforming ITSM
[Webinar] How Big Data and Machine Learning Are Transforming ITSM
SunView Software, Inc.
 
Business Analytics with R
Business Analytics with RBusiness Analytics with R
Business Analytics with R
Edureka!
 
Data Science : Make Smarter Business Decisions
Data Science : Make Smarter Business DecisionsData Science : Make Smarter Business Decisions
Data Science : Make Smarter Business Decisions
Edureka!
 
Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...
Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...
Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...
Edureka!
 
Business Analytics Decision Tree in R
Business Analytics Decision Tree in RBusiness Analytics Decision Tree in R
Business Analytics Decision Tree in R
Edureka!
 
Greg Wilson - We Know (but ignore) More Than We Think
Greg Wilson - We Know (but ignore) More Than We ThinkGreg Wilson - We Know (but ignore) More Than We Think
Greg Wilson - We Know (but ignore) More Than We Think
#DevTO
 
Data Science Isn't a Fad: Let's Keep it That Way
Data Science Isn't a Fad: Let's Keep it That WayData Science Isn't a Fad: Let's Keep it That Way
Data Science Isn't a Fad: Let's Keep it That Way
Melinda Thielbar
 
The Quest for Learner Engagement
The Quest for Learner EngagementThe Quest for Learner Engagement
The Quest for Learner Engagement
Karl Kapp
 
What is Data Science? A Complete Guide to Tools, Careers, AI, and Future Trends
What is Data Science? A Complete Guide to Tools, Careers, AI, and Future TrendsWhat is Data Science? A Complete Guide to Tools, Careers, AI, and Future Trends
What is Data Science? A Complete Guide to Tools, Careers, AI, and Future Trends
Dina G
 
Inferring networks of substitute and complementary products
Inferring networks of substitute and complementary productsInferring networks of substitute and complementary products
Inferring networks of substitute and complementary products
Turi, Inc.
 
Application of Clustering in Data Science using Real-life Examples
Application of Clustering in Data Science using Real-life Examples Application of Clustering in Data Science using Real-life Examples
Application of Clustering in Data Science using Real-life Examples
Edureka!
 
Data Science: The Product Manager's Primer
Data Science: The Product Manager's PrimerData Science: The Product Manager's Primer
Data Science: The Product Manager's Primer
Product School
 
Logistic Regression In Data Science
Logistic Regression In Data ScienceLogistic Regression In Data Science
Logistic Regression In Data Science
Edureka!
 
Module 1.3 data exploratory
Module 1.3  data exploratoryModule 1.3  data exploratory
Module 1.3 data exploratory
Sara Hooker
 
3_English 10_Q4 (Terms used in Research).pptx
3_English 10_Q4 (Terms used in Research).pptx3_English 10_Q4 (Terms used in Research).pptx
3_English 10_Q4 (Terms used in Research).pptx
ReverieArevalo
 
Sentiment Analysis In Retail Domain
Sentiment Analysis In Retail DomainSentiment Analysis In Retail Domain
Sentiment Analysis In Retail Domain
Edureka!
 
1 decisiontree dtree18[1]
1 decisiontree dtree18[1]1 decisiontree dtree18[1]
1 decisiontree dtree18[1]
翀莺 缪
 
learningIntro.doc
learningIntro.doclearningIntro.doc
learningIntro.doc
butest
 
learningIntro.doc
learningIntro.doclearningIntro.doc
learningIntro.doc
butest
 
[Webinar] How Big Data and Machine Learning Are Transforming ITSM
[Webinar] How Big Data and Machine Learning Are Transforming ITSM[Webinar] How Big Data and Machine Learning Are Transforming ITSM
[Webinar] How Big Data and Machine Learning Are Transforming ITSM
SunView Software, Inc.
 
Business Analytics with R
Business Analytics with RBusiness Analytics with R
Business Analytics with R
Edureka!
 
Ad

More from Edureka! (20)

What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaWhat to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | Edureka
Edureka!
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | Edureka
Edureka!
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | Edureka
Edureka!
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | Edureka
Edureka!
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaPython Programming Tutorial | Edureka
Python Programming Tutorial | Edureka
Edureka!
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaTop 5 PMP Certifications | Edureka
Top 5 PMP Certifications | Edureka
Edureka!
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | Edureka
Edureka!
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | Edureka
Edureka!
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| Edureka
Edureka!
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | Edureka
Edureka!
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | Edureka
Edureka!
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | Edureka
Edureka!
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | Edureka
Edureka!
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | Edureka
Edureka!
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | Edureka
Edureka!
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | Edureka
Edureka!
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka
Edureka!
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
Edureka!
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | Edureka
Edureka!
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | Edureka
Edureka!
 
What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaWhat to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | Edureka
Edureka!
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | Edureka
Edureka!
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | Edureka
Edureka!
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | Edureka
Edureka!
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaPython Programming Tutorial | Edureka
Python Programming Tutorial | Edureka
Edureka!
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaTop 5 PMP Certifications | Edureka
Top 5 PMP Certifications | Edureka
Edureka!
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | Edureka
Edureka!
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | Edureka
Edureka!
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| Edureka
Edureka!
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | Edureka
Edureka!
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | Edureka
Edureka!
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | Edureka
Edureka!
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | Edureka
Edureka!
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | Edureka
Edureka!
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | Edureka
Edureka!
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | Edureka
Edureka!
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka
Edureka!
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
Edureka!
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | Edureka
Edureka!
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | Edureka
Edureka!
 
Ad

Recently uploaded (20)

Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
Build 3D Animated Safety Induction - Tech EHS
Build 3D Animated Safety Induction - Tech EHSBuild 3D Animated Safety Induction - Tech EHS
Build 3D Animated Safety Induction - Tech EHS
TECH EHS Solution
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
Build 3D Animated Safety Induction - Tech EHS
Build 3D Animated Safety Induction - Tech EHSBuild 3D Animated Safety Induction - Tech EHS
Build 3D Animated Safety Induction - Tech EHS
TECH EHS Solution
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 

Top 5 algorithms used in Data Science

  • 2. Slide 2 www.edureka.co/data-science What are we going to learn today ? At the end of the session you will be able to understand :  What is Data Science  What does Data Scientists do  Top 5 Data Science Algorithms  Decision Tree  Random Forest  Association Rule Mining  Linear Regression  K-Means Clustering  Demo on K-Means Clustering algorithm
  • 4. Slide 4 www.edureka.co/data-science What is Data Science ? Data science is nothing but extracting meaningful and actionable knowledge from data
  • 5. Slide 5 www.edureka.co/data-science Who are Data Scientists ? Basically data scientists are humans who have multitude of skills and who love playing with data
  • 6. Slide 6 www.edureka.co/data-science Data Science from 1000 feet Data Science Visualization Data Engineering Statistics Advanced Computing Domain Expertise
  • 7. Slide 7 www.edureka.co/data-science Arsenal of a Data Scientist Data Science Data Architecture Tool: Hadoop Machine Learning Tool: Mahout, Weka, Spark MLlib Analytics Tool: R, Python Note that evaluating different machine learning algorithms is a daily work of a data scientist. So it becomes very important for a data scientist to have a good grip over various machine learning algorithms.
  • 8. Slide 8 www.edureka.co/data-science Machine Learning Machine Learning is a method of teaching computers to make and improve predictions based on data Machine learning is a huge field, with hundreds of different algorithms for solving myriad different problems Supervised Learning : The categories of the data is already known Unsupervised Learning : The learning process attempts to find appropriate category for the data
  • 10. Slide 10 www.edureka.co/data-science Decision Tree Example Training Data
  • 11. Slide 11 www.edureka.co/data-science Decision Tree, Root : Student Step-1 Student
  • 12. Slide 12 www.edureka.co/data-science Decision Tree, Root : Student Step-2 Student Income Income Medium
  • 13. Slide 13 www.edureka.co/data-science Decision Tree, Root : Student Step-3 Student Income Income YES YES Medium
  • 14. Slide 14 www.edureka.co/data-science Decision Tree, Root : Student Student Income Income Age CR CR YES YES31….40 Medium Step-4
  • 15. Slide 15 www.edureka.co/data-science Decision Tree, Root : Student Student Income Income Age CR CR No Yes Yes Yes Yes 31….40 Medium Step-5
  • 16. Slide 16 www.edureka.co/data-science Decision Tree, Root : Student Student Income Income Age CR No Yes 31….40 Age Age Yes No No Yes 31….40 CR Age Yes No > 40 31….40 Yes Yes Yes Fair Medium Step-6
  • 17. Slide 17 www.edureka.co/data-science Decision Tree, Root : Student  1. student(no)^income(high)^age(<=30) => buys_computer(no)  2. student(no)^income(high)^age(31…40) => buys_computer(yes)  3. student(no)^income(medium)^CR(fair)^age(>40) => buys_computer(yes)  4. student(no)^income(medium)^CR(fair)^age(<=30) => buys_computer(no)  5. student(no)^income(medium)^CR(excellent)^age(>40) => buys_computer(no)  6. student(no)^income(medium)^CR(excellent)^age(31..40) =>buys_computer(yes)  7. student(yes)^income(low)^CR(fair) => buys_computer(yes)  8. student(yes)^income(low)^CR(excellent)^age(31..40) => buys_computer(yes)  9. student(yes)^income(low)^CR(excellent)^age(>40) => buys_computer(no)  10. student(yes)^income(medium)=> buys_computer(yes)  11. student(yes)^income(high)=> buys_computer(yes) Classification rules :
  • 19. Slide 19 www.edureka.co/data-science Random Forest : Example Suppose you're very indecisive about watching a movie. “Edge of Tomorrow” You can do one of the following : 1. Either you ask your best friend, whether you will like the movie. 2. Or You can ask your group of friends.
  • 20. Slide 20 www.edureka.co/data-science Random Forest : Example In order to answer, your best friend first needs to figure out what movies you like, so you give her a bunch of movies and tell her whether you liked each one or not (i.e., you give her a labelled training set) Example: Do you like movies starring Emily Blunt ? Ask Best Friend Is it based on a true incident? Does Emily Blunt star in it? No Is she the main lead? Yes, You will like the movie No Yes No, You will not like the movie No, You will not like the movie
  • 21. Slide 21 www.edureka.co/data-science Random Forest : Example But your best friend might not always generalize your preferences very well (i.e., she overfits) In order to get more accurate recommendations, you'd like to ask a bunch of your friends e.g. Friend#1, Friend#2, and Friend#3 and they vote on whether you will like a movie The majority of the votes will decide the final outcome
  • 22. Slide 22 www.edureka.co/data-science Random Forest : Example You didn’t like ‘Far and away’ You liked ‘Oblivion’ You like action movies You like Tom Cruise You like his pairing with Emily Blunt Yes, You will like the movie Yes, You will like the movie Yes, You will like the movie Friend 2 You did not like ‘Top Gun’ You loved ‘Godzilla’ Friend 1 No, You will not like the movie Yes, You will like the movie You hate Tom Cruise Friend 3 No, You will not like the movie
  • 23. Slide 23 www.edureka.co/data-science What is Random Forest ? Random Forest is an ensemble classifier made using many decision tree models. What are ensemble models?  Ensemble models combine the results from different models.  The result from an ensemble model is usually better than the result from one of the individual models.
  • 24. Slide 24 www.edureka.co/data-science Association Rule MiningAssociation Rule Mining
  • 26. Slide 26 www.edureka.co/data-science Association Rule Mining  Association Rule Mining is a popular and well researched method for discovering interesting relations between variables in large data.  The rule found in the sales data of a supermarket would indicate that if a customer buys onions and potatoes together, he or she is likely to also buy hamburger meat.
  • 27. Slide 27 www.edureka.co/data-science Linear RegressionLinear Regression
  • 28. Slide 28 www.edureka.co/data-science Regression Analysis – Linear Regression Regression analysis helps understand how value of dependent variable changes when any one of independent variable changes, while other independent variables are kept fixed Linear Regression is the most popular algorithm used for prediction and forecasting
  • 29. Slide 29 www.edureka.co/data-science K-Means ClusteringK-Means Clustering
  • 30. Slide 30 www.edureka.co/data-science K-Means Clustering The process by which objects are classified into a number of groups so that they are as much dissimilar as possible from one group to another group, but as much similar as possible within each group. The objects in group 1 should be as similar as possible. But there should be much difference between objects in different groups The attributes of the objects are allowed to determine which objects should be grouped together. Total population Group 1 Group 2 Group 3 Group 4
  • 32. Slide 32 Course Url Thank You … Questions/Queries/Feedback Recording and presentation will be made available to you within 24 hours