SlideShare a Scribd company logo
Linear Regression Analysis | Linear Regression in Python | Machine Learning Algorithms | Simplilearn
Profit Estimation of a Company
Which companies
shall we invest?
Venture Capital firm
A Venture Capital firm is trying to understand which companies should they invest
Profit Estimation of a Company
Idea
Based on companies expenses
Predict the profit companies make
Decide companies to invest
Profit Estimation of a Company
Administration
Marketing
State
R&D
Based on
Expenditure and
Location
Company Calculate profit
Profit Estimation of a Company
For simplicity, lets consider a single variable (R&D) and find out which companies to invest in
R&D
Profit
R&D
Profit
Companies spending
more on R&D make
good profit, let’s
invest in them
Plotting profit based on R&D
expenditure
Prediction line to estimate profit
What’s in it for you?
Machine Learning Algorithms
Understanding Linear Regression
Introduction to Machine Learning
Applications of Linear Regression
Multiple Linear Regression
Use Case – Profit Estimation of Companies
Introduction to Machine Learning
Introduction to Machine Learning
Based on the amount of rainfall, how much would be the crop yield?
Crop Field Predict crop yieldBased on Rainfall
Independent and Dependent Variables
Independent variable Dependent variable
A variable whose value does not change
by the effect of other variables and is
used to manipulate the dependent
variable. It is often denoted as X.
A variable whose value change when
there is any manipulation in the values of
independent variables. It is often denoted
as Y.
Crop yield depends on the amount of
rainfall received
Rainfall – Independent variable Crop yield – Dependent variable
In our example:
Numerical and Categorical Values
Data
SalaryAge Height Gender
Dog’s
BreedColor
12345
167891
46920
12345
90984
Numerical Categorical
A
C
D
E
B
Machine Learning Algorithms
Machine Learning
Algorithms
Supervised
Unsupervised
Reinforcement
Machine Learning Algorithms
Machine Learning
Algorithms
Supervised
ClassificationRegression
Machine Learning Algorithms
Machine Learning
Algorithms
Supervised
Regression
Simple Linear
Regression
Polynomial Linear
Regression
Multiple Linear
Regression
Applications of Linear Regression
Applications of Linear Regression
Economic Growth
Used to determine the Economic Growth of a country or a state in the coming
quarter, can also be used to predict the GDP of a country
Applications of Linear Regression
Product price
Can be used to predict what would be the price of a product in the future
Applications of Linear Regression
Housing sales
To estimate the number of houses a builder would sell and at what price in the
coming months
Applications of Linear Regression
Score Prediction
To predict the number of runs a player would score in the coming matches based
on previous performance
Understanding Linear Regression
Understanding Linear Regression
Linear Regression is a statistical model used to predict the relationship between
independent and dependent variables.
Examine 2 factors
Which variables in
particular are
significant predictors
of the outcome
variables?
1
How significant is
the Regression line
to make predictions
with highest
possible accuracy
2
Regression Equation
The simplest form of a simple linear regression equation with one dependent and one independent variable is represented by:
y = m x + c*
y ---> Dependent Variable
x ---> Independent Variable
c ---> Coefficient of the line
y2 - y1
x2 – x1
m =m ---> Slope of the line
Y
X
c
m
y2
y1
x2x1
Prediction using the Regression line
Rainfall (X)
CropYield(Y)
Plotting the amount of Crop Yield based on
the amount of Rainfall
Prediction using the Regression line
Rainfall (X)
CropYield(Y)
Plotting the amount of Crop Yield based on
the amount of Rainfall
Rainfall (X)
CropYield(Y)
Prediction using the Regression line
The Red point on the Y axis is the amount of Crop
Yield you can expect for some amount of Rainfall
(X) represented by Green dot
Rainfall (X)
CropYield(Y)
Plotting the amount of Crop Yield based on
the amount of Rainfall
Rainfall (X)
CropYield(Y)
Regression Line
Intuition behind the Regression line
Lets consider a sample dataset with 5 rows and find out how to draw the regression
line
X Y
1 2
2 4
3 5
4 4
5 5
Independent
variable
Dependent variable
Plotting the data points
Intuition behind the Regression line
Calculate the mean of X and Y and plot the values
X Y
1 2
2 4
3 5
4 4
5 5
Independent
variable
Dependent variable
Plotting the mean of X and Y
Mean 3 4
Intuition behind the Regression line
Regression line should ideally pass through the mean of X and Y
X Y
1 2
2 4
3 5
4 4
5 5
Independent
variable
Dependent variable
Regression line
Mean 3 4
(3,4)
Intuition behind the Regression line
Drawing the equation of the Regression line
_
_ _
X Y (X ) (Y ) (X Y)
1 2 1 4 2
2 4 4 16 8
3 5 9 25 15
4 4 16 16 16
5 5 25 25 25
= 66
Linear equation is represented as Y = m X + c
=m =
*
Y = m X + c
= 0.6 3 + 2.2
= 4
*
2 2
= 55 = 86
*
= 15 = 20
c=
*
((n (X Y))-( (X) (Y))***
((n (X ))-( (X) )*
2 2
((5 66)-(15 20))* *
((5 55))-(225)*
=0.6
(( (Y) (X ))-( (X) (X Y)*
2
* *
((n (X ))-( (X) )*
2 2 = 2.2
Intuition behind the Regression line
Lets find out the predicted values of Y for corresponding values of X using the linear
equation where m=0.6 and c=2.2
Here the blue points represent the actual Y
values and the brown points represent the
predicted Y values. The distance between the
actual and predicted values are known as
residuals or errors. The best fit line should
have the least sum of squares of these errors
also known as
e square.
(3,4)
Y
Y=0.6 1+2.2=2.8
Y=0.6 2+2.2=3.4
Y=0.6 3+2.2=4
Y=0.6 4+2.2=4.6
Y=0.6 5+2.2=5.2
pred
*
*
*
*
*
Intuition behind the Regression line
Lets find out the predicted values of Y for corresponding values of X using the linear
equation where m=0.6 and c=2.2
(3,4)
_ _
X Y Y (Y-Y ) (Y-Y )
1 2 2.8 -0.8 0.64
2 4 3.4 0.6 0.36
3 5 4 1 1
4 4 4.6 -0.6 0.36
5 5 5.2 -0.2 0.04
= 2.4
pred pred pred
2
The sum of squared errors for this regression line is 2.4. We check this
error for each line and conclude the best fit line having the least e square
value.
Finding the Best fit line
Minimizing the Distance: There are lots of ways to minimize the distance between the line and the data points like Sum of
Squared errors, Sum of Absolute errors, Root Mean Square error etc.
We keep moving this line through the
data points to make sure the Best fit
line has the least square distance
between the data points and the
regression line
Multiple Linear Regression
Multiple Linear Regression
Simple Linear
Regression
Multiple Linear
Regression
Y = m x + c*
2 *
Y = m x + m x + m x + ………. + m x + c*1 1 * 2 3 3*2 n n*
Independent variables (IDV’s)
Dependent variable (DV) Coefficient
nm1, m2, m3…..m
Slopes
Implementation of Linear Regression
Use case implementation of Linear Regression
Let’s understand how
Multiple Linear
Regression works by
implementing it in Python
Use case implementation of Linear Regression
1000 Companies
Profit
Expenditure
Based on
Predict
Use case implementation of Linear Regression
Predicting Profit of 1000 companies based on the attributes mentioned in the figure:
Profit
Estimation
Use case implementation of Linear Regression
Predicting Profit of 1000 companies based on the attributes mentioned in the figure:
R&D Spend
1
Profit
Estimation
Use case implementation of Linear Regression
Predicting Profit of 1000 companies based on the attributes mentioned in the figure:
R&D Spend
1
Administration
2
Profit
Estimation
Use case implementation of Linear Regression
Predicting Profit of 1000 companies based on the attributes mentioned in the figure:
R&D Spend
1
Marketing Spend
3
Administration
2
Profit
Estimation
Use case implementation of Linear Regression
Predicting Profit of 1000 companies based on the attributes mentioned in the figure:
R&D Spend
1
State
4
Marketing Spend
3
Administration
2
Profit
Estimation
Use case implementation of Linear Regression
Predicting Profit of 1000 companies based on the attributes mentioned in the figure:
R&D Spend
1
State
4
Marketing Spend
3
Administration
2
ProfitProfit
Profit
Estimation
Use case implementation of Linear Regression
Predicting Profit of 1000 companies based on the attributes mentioned in the figure:
R&D Spend
1
State
4
Marketing Spend
3
Administration
2
ProfitProfit
Profit
Estimation
Predict Profit
Use case implementation of Linear Regression
1. Import the libraries:
Use case implementation of Linear Regression
2. Load the Dataset and extract independent and dependent variables:
Use case implementation of Linear Regression
3. Data Visualization:
Use case implementation of Linear Regression
4. Encoding Categorical Data:
5. Avoiding Dummy Variable Trap:
Use case implementation of Linear Regression
6. Splitting the data into Train and Test set:
7. Fitting Multiple Linear Regression Model to Training set:
Use case implementation of Linear Regression
8. Predicting the Test set results:
Use case implementation of Linear Regression
9. Calculating the Coefficients and Intercepts:
Use case implementation of Linear Regression
10. Evaluating the model:
R squared value of 0.91 proves the model is a good model
Use case summary
We successfully trained our model with
certain predictors and estimated the
profit of companies using linear
regression
Key Takeaways
Linear Regression Analysis | Linear Regression in Python | Machine Learning Algorithms | Simplilearn

More Related Content

What's hot (20)

PDF
Logistic regression in Machine Learning
Kuppusamy P
 
PPTX
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Simplilearn
 
PDF
Machine Learning Course | Edureka
Edureka!
 
PPTX
Naive Bayes Classifier | Naive Bayes Algorithm | Naive Bayes Classifier With ...
Simplilearn
 
PDF
Support Vector Machines ( SVM )
Mohammad Junaid Khan
 
PPTX
Data Science Training | Data Science For Beginners | Data Science With Python...
Simplilearn
 
PPTX
Support Vector Machine ppt presentation
AyanaRukasar
 
PDF
Naive Bayes
CloudxLab
 
PDF
Machine learning
Amit Kumar Rathi
 
PDF
Linear regression
MartinHogg9
 
PPT
Support Vector Machines
nextlib
 
PDF
K - Nearest neighbor ( KNN )
Mohammad Junaid Khan
 
PPTX
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
Simplilearn
 
PDF
Machine Learning in 10 Minutes | What is Machine Learning? | Edureka
Edureka!
 
PPTX
Decision Trees
Student
 
PDF
Machine learning Algorithms
Walaa Hamdy Assy
 
PPTX
Essential NumPy
zekeLabs Technologies
 
PDF
Machine Learning Deep Learning AI and Data Science
Venkata Reddy Konasani
 
PDF
Deep learning - A Visual Introduction
Lukas Masuch
 
PDF
Introduction to Statistical Machine Learning
mahutte
 
Logistic regression in Machine Learning
Kuppusamy P
 
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Simplilearn
 
Machine Learning Course | Edureka
Edureka!
 
Naive Bayes Classifier | Naive Bayes Algorithm | Naive Bayes Classifier With ...
Simplilearn
 
Support Vector Machines ( SVM )
Mohammad Junaid Khan
 
Data Science Training | Data Science For Beginners | Data Science With Python...
Simplilearn
 
Support Vector Machine ppt presentation
AyanaRukasar
 
Naive Bayes
CloudxLab
 
Machine learning
Amit Kumar Rathi
 
Linear regression
MartinHogg9
 
Support Vector Machines
nextlib
 
K - Nearest neighbor ( KNN )
Mohammad Junaid Khan
 
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
Simplilearn
 
Machine Learning in 10 Minutes | What is Machine Learning? | Edureka
Edureka!
 
Decision Trees
Student
 
Machine learning Algorithms
Walaa Hamdy Assy
 
Essential NumPy
zekeLabs Technologies
 
Machine Learning Deep Learning AI and Data Science
Venkata Reddy Konasani
 
Deep learning - A Visual Introduction
Lukas Masuch
 
Introduction to Statistical Machine Learning
mahutte
 

Similar to Linear Regression Analysis | Linear Regression in Python | Machine Learning Algorithms | Simplilearn (20)

PDF
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Maninda Edirisooriya
 
PPTX
Regression Analysis.pptx
arsh260174
 
PPTX
Regression Analysis Techniques.pptx
YutaItadori
 
PPTX
linear regression in machine learning.pptx
shifaaya815
 
PPTX
Linear Regression final-1.pptx thbejnnej
mathukiyak44
 
PDF
Linear Regression
SourajitMaity1
 
PPTX
Detail Study of the concept of Regression model.pptx
truptikulkarni2066
 
PPTX
Linear regression.pptx
ssuserb8a904
 
PPTX
Artifical Intelligence And Machine Learning Algorithum.pptx
Aishwarya SenthilNathan
 
PDF
Machine learning Introduction
Kuppusamy P
 
PDF
MachineLearning_Unit-II.FHDGFHJKpptx.pdf
22eg105n49
 
PPTX
REGRESSION METasdfghjklmjhgftrHODS1.pptx
cajativ595
 
PPTX
Lecture 8 Linear and Multiple Regression (1).pptx
haseebayy45
 
PPTX
Linear regression aims to find the "best-fit" linear line
rnycsepp
 
PPTX
Linear regression
Harikrishnan K
 
PPTX
regression analysis presentation slides.
nsnatraj23
 
PPTX
MachineLearning_Unit-II.pptxScrum.pptxAgile Model.pptxAgile Model.pptxAgile M...
22eg105n11
 
PPT
Data Analysison Regression
jamuga gitulho
 
PPTX
Regression vs Neural Net
Ratul Alahy
 
PPTX
Different Types of Machine Learning Algorithms
rahmedraj93
 
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Maninda Edirisooriya
 
Regression Analysis.pptx
arsh260174
 
Regression Analysis Techniques.pptx
YutaItadori
 
linear regression in machine learning.pptx
shifaaya815
 
Linear Regression final-1.pptx thbejnnej
mathukiyak44
 
Linear Regression
SourajitMaity1
 
Detail Study of the concept of Regression model.pptx
truptikulkarni2066
 
Linear regression.pptx
ssuserb8a904
 
Artifical Intelligence And Machine Learning Algorithum.pptx
Aishwarya SenthilNathan
 
Machine learning Introduction
Kuppusamy P
 
MachineLearning_Unit-II.FHDGFHJKpptx.pdf
22eg105n49
 
REGRESSION METasdfghjklmjhgftrHODS1.pptx
cajativ595
 
Lecture 8 Linear and Multiple Regression (1).pptx
haseebayy45
 
Linear regression aims to find the "best-fit" linear line
rnycsepp
 
Linear regression
Harikrishnan K
 
regression analysis presentation slides.
nsnatraj23
 
MachineLearning_Unit-II.pptxScrum.pptxAgile Model.pptxAgile Model.pptxAgile M...
22eg105n11
 
Data Analysison Regression
jamuga gitulho
 
Regression vs Neural Net
Ratul Alahy
 
Different Types of Machine Learning Algorithms
rahmedraj93
 
Ad

More from Simplilearn (20)

PPTX
Top 50 Scrum Master Interview Questions | Scrum Master Interview Questions & ...
Simplilearn
 
PPTX
Bagging Vs Boosting In Machine Learning | Ensemble Learning In Machine Learni...
Simplilearn
 
PPTX
Future Of Social Media | Social Media Trends and Strategies 2025 | Instagram ...
Simplilearn
 
PPTX
SQL Query Optimization | SQL Query Optimization Techniques | SQL Basics | SQL...
Simplilearn
 
PPTX
SQL INterview Questions .pTop 45 SQL Interview Questions And Answers In 2025 ...
Simplilearn
 
PPTX
How To Start Influencer Marketing Business | Influencer Marketing For Beginne...
Simplilearn
 
PPTX
Cyber Security Roadmap 2025 | How To Become Cyber Security Engineer In 2025 |...
Simplilearn
 
PPTX
How To Become An AI And ML Engineer In 2025 | AI Engineer Roadmap | AI ML Car...
Simplilearn
 
PPTX
What Is GitHub Copilot? | How To Use GitHub Copilot? | How does GitHub Copilo...
Simplilearn
 
PPTX
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Simplilearn
 
PPTX
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Simplilearn
 
PPTX
Top 7 High Paying AI Certifications Courses For 2025 | Best AI Certifications...
Simplilearn
 
PPTX
Data Cleaning In Data Mining | Step by Step Data Cleaning Process | Data Clea...
Simplilearn
 
PPTX
Top 10 Data Analyst Projects For 2025 | Data Analyst Projects | Data Analysis...
Simplilearn
 
PPTX
AI Engineer Roadmap 2025 | AI Engineer Roadmap For Beginners | AI Engineer Ca...
Simplilearn
 
PPTX
Machine Learning Roadmap 2025 | Machine Learning Engineer Roadmap For Beginne...
Simplilearn
 
PPTX
Kotter's 8-Step Change Model Explained | Kotter's Change Management Model | S...
Simplilearn
 
PPTX
Gen AI Engineer Roadmap For 2025 | How To Become Gen AI Engineer In 2025 | Si...
Simplilearn
 
PPTX
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Simplilearn
 
PPTX
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Simplilearn
 
Top 50 Scrum Master Interview Questions | Scrum Master Interview Questions & ...
Simplilearn
 
Bagging Vs Boosting In Machine Learning | Ensemble Learning In Machine Learni...
Simplilearn
 
Future Of Social Media | Social Media Trends and Strategies 2025 | Instagram ...
Simplilearn
 
SQL Query Optimization | SQL Query Optimization Techniques | SQL Basics | SQL...
Simplilearn
 
SQL INterview Questions .pTop 45 SQL Interview Questions And Answers In 2025 ...
Simplilearn
 
How To Start Influencer Marketing Business | Influencer Marketing For Beginne...
Simplilearn
 
Cyber Security Roadmap 2025 | How To Become Cyber Security Engineer In 2025 |...
Simplilearn
 
How To Become An AI And ML Engineer In 2025 | AI Engineer Roadmap | AI ML Car...
Simplilearn
 
What Is GitHub Copilot? | How To Use GitHub Copilot? | How does GitHub Copilo...
Simplilearn
 
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Simplilearn
 
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Simplilearn
 
Top 7 High Paying AI Certifications Courses For 2025 | Best AI Certifications...
Simplilearn
 
Data Cleaning In Data Mining | Step by Step Data Cleaning Process | Data Clea...
Simplilearn
 
Top 10 Data Analyst Projects For 2025 | Data Analyst Projects | Data Analysis...
Simplilearn
 
AI Engineer Roadmap 2025 | AI Engineer Roadmap For Beginners | AI Engineer Ca...
Simplilearn
 
Machine Learning Roadmap 2025 | Machine Learning Engineer Roadmap For Beginne...
Simplilearn
 
Kotter's 8-Step Change Model Explained | Kotter's Change Management Model | S...
Simplilearn
 
Gen AI Engineer Roadmap For 2025 | How To Become Gen AI Engineer In 2025 | Si...
Simplilearn
 
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Simplilearn
 
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Simplilearn
 
Ad

Recently uploaded (20)

PPTX
How to Manage Allocation Report for Manufacturing Orders in Odoo 18
Celine George
 
PPTX
HUMAN RESOURCE MANAGEMENT: RECRUITMENT, SELECTION, PLACEMENT, DEPLOYMENT, TRA...
PRADEEP ABOTHU
 
PDF
Introduction presentation of the patentbutler tool
MIPLM
 
PDF
Aprendendo Arquitetura Framework Salesforce - Dia 03
Mauricio Alexandre Silva
 
PPTX
DAY 1_QUARTER1 ENGLISH 5 WEEK- PRESENTATION.pptx
BanyMacalintal
 
PPTX
Identifying elements in the story. Arrange the events in the story
geraldineamahido2
 
PDF
Council of Chalcedon Re-Examined
Smiling Lungs
 
PPTX
ENGlish 8 lesson presentation PowerPoint.pptx
marawehsvinetshe
 
PDF
Women's Health: Essential Tips for Every Stage.pdf
Iftikhar Ahmed
 
PDF
Is Assignment Help Legal in Australia_.pdf
thomas19williams83
 
PDF
The History of Phone Numbers in Stoke Newington by Billy Thomas
History of Stoke Newington
 
PPTX
PPT-Q1-WK-3-ENGLISH Revised Matatag Grade 3.pptx
reijhongidayawan02
 
PDF
Biological Bilingual Glossary Hindi and English Medium
World of Wisdom
 
PPTX
Post Dated Cheque(PDC) Management in Odoo 18
Celine George
 
PDF
Mahidol_Change_Agent_Note_2025-06-27-29_MUSEF
Tassanee Lerksuthirat
 
PDF
Vani - The Voice of Excellence - Jul 2025 issue
Savipriya Raghavendra
 
PDF
Governor Josh Stein letter to NC delegation of U.S. House
Mebane Rash
 
PPTX
SD_GMRC5_Session 6AB_Dulog Pedagohikal at Pagtataya (1).pptx
NickeyArguelles
 
PPTX
EDUCATIONAL MEDIA/ TEACHING AUDIO VISUAL AIDS
Sonali Gupta
 
PDF
IMPORTANT GUIDELINES FOR M.Sc.ZOOLOGY DISSERTATION
raviralanaresh2
 
How to Manage Allocation Report for Manufacturing Orders in Odoo 18
Celine George
 
HUMAN RESOURCE MANAGEMENT: RECRUITMENT, SELECTION, PLACEMENT, DEPLOYMENT, TRA...
PRADEEP ABOTHU
 
Introduction presentation of the patentbutler tool
MIPLM
 
Aprendendo Arquitetura Framework Salesforce - Dia 03
Mauricio Alexandre Silva
 
DAY 1_QUARTER1 ENGLISH 5 WEEK- PRESENTATION.pptx
BanyMacalintal
 
Identifying elements in the story. Arrange the events in the story
geraldineamahido2
 
Council of Chalcedon Re-Examined
Smiling Lungs
 
ENGlish 8 lesson presentation PowerPoint.pptx
marawehsvinetshe
 
Women's Health: Essential Tips for Every Stage.pdf
Iftikhar Ahmed
 
Is Assignment Help Legal in Australia_.pdf
thomas19williams83
 
The History of Phone Numbers in Stoke Newington by Billy Thomas
History of Stoke Newington
 
PPT-Q1-WK-3-ENGLISH Revised Matatag Grade 3.pptx
reijhongidayawan02
 
Biological Bilingual Glossary Hindi and English Medium
World of Wisdom
 
Post Dated Cheque(PDC) Management in Odoo 18
Celine George
 
Mahidol_Change_Agent_Note_2025-06-27-29_MUSEF
Tassanee Lerksuthirat
 
Vani - The Voice of Excellence - Jul 2025 issue
Savipriya Raghavendra
 
Governor Josh Stein letter to NC delegation of U.S. House
Mebane Rash
 
SD_GMRC5_Session 6AB_Dulog Pedagohikal at Pagtataya (1).pptx
NickeyArguelles
 
EDUCATIONAL MEDIA/ TEACHING AUDIO VISUAL AIDS
Sonali Gupta
 
IMPORTANT GUIDELINES FOR M.Sc.ZOOLOGY DISSERTATION
raviralanaresh2
 

Linear Regression Analysis | Linear Regression in Python | Machine Learning Algorithms | Simplilearn

  • 2. Profit Estimation of a Company Which companies shall we invest? Venture Capital firm A Venture Capital firm is trying to understand which companies should they invest
  • 3. Profit Estimation of a Company Idea Based on companies expenses Predict the profit companies make Decide companies to invest
  • 4. Profit Estimation of a Company Administration Marketing State R&D Based on Expenditure and Location Company Calculate profit
  • 5. Profit Estimation of a Company For simplicity, lets consider a single variable (R&D) and find out which companies to invest in R&D Profit R&D Profit Companies spending more on R&D make good profit, let’s invest in them Plotting profit based on R&D expenditure Prediction line to estimate profit
  • 6. What’s in it for you? Machine Learning Algorithms Understanding Linear Regression Introduction to Machine Learning Applications of Linear Regression Multiple Linear Regression Use Case – Profit Estimation of Companies
  • 8. Introduction to Machine Learning Based on the amount of rainfall, how much would be the crop yield? Crop Field Predict crop yieldBased on Rainfall
  • 9. Independent and Dependent Variables Independent variable Dependent variable A variable whose value does not change by the effect of other variables and is used to manipulate the dependent variable. It is often denoted as X. A variable whose value change when there is any manipulation in the values of independent variables. It is often denoted as Y. Crop yield depends on the amount of rainfall received Rainfall – Independent variable Crop yield – Dependent variable In our example:
  • 10. Numerical and Categorical Values Data SalaryAge Height Gender Dog’s BreedColor 12345 167891 46920 12345 90984 Numerical Categorical A C D E B
  • 11. Machine Learning Algorithms Machine Learning Algorithms Supervised Unsupervised Reinforcement
  • 12. Machine Learning Algorithms Machine Learning Algorithms Supervised ClassificationRegression
  • 13. Machine Learning Algorithms Machine Learning Algorithms Supervised Regression Simple Linear Regression Polynomial Linear Regression Multiple Linear Regression
  • 15. Applications of Linear Regression Economic Growth Used to determine the Economic Growth of a country or a state in the coming quarter, can also be used to predict the GDP of a country
  • 16. Applications of Linear Regression Product price Can be used to predict what would be the price of a product in the future
  • 17. Applications of Linear Regression Housing sales To estimate the number of houses a builder would sell and at what price in the coming months
  • 18. Applications of Linear Regression Score Prediction To predict the number of runs a player would score in the coming matches based on previous performance
  • 20. Understanding Linear Regression Linear Regression is a statistical model used to predict the relationship between independent and dependent variables. Examine 2 factors Which variables in particular are significant predictors of the outcome variables? 1 How significant is the Regression line to make predictions with highest possible accuracy 2
  • 21. Regression Equation The simplest form of a simple linear regression equation with one dependent and one independent variable is represented by: y = m x + c* y ---> Dependent Variable x ---> Independent Variable c ---> Coefficient of the line y2 - y1 x2 – x1 m =m ---> Slope of the line Y X c m y2 y1 x2x1
  • 22. Prediction using the Regression line Rainfall (X) CropYield(Y) Plotting the amount of Crop Yield based on the amount of Rainfall
  • 23. Prediction using the Regression line Rainfall (X) CropYield(Y) Plotting the amount of Crop Yield based on the amount of Rainfall Rainfall (X) CropYield(Y)
  • 24. Prediction using the Regression line The Red point on the Y axis is the amount of Crop Yield you can expect for some amount of Rainfall (X) represented by Green dot Rainfall (X) CropYield(Y) Plotting the amount of Crop Yield based on the amount of Rainfall Rainfall (X) CropYield(Y) Regression Line
  • 25. Intuition behind the Regression line Lets consider a sample dataset with 5 rows and find out how to draw the regression line X Y 1 2 2 4 3 5 4 4 5 5 Independent variable Dependent variable Plotting the data points
  • 26. Intuition behind the Regression line Calculate the mean of X and Y and plot the values X Y 1 2 2 4 3 5 4 4 5 5 Independent variable Dependent variable Plotting the mean of X and Y Mean 3 4
  • 27. Intuition behind the Regression line Regression line should ideally pass through the mean of X and Y X Y 1 2 2 4 3 5 4 4 5 5 Independent variable Dependent variable Regression line Mean 3 4 (3,4)
  • 28. Intuition behind the Regression line Drawing the equation of the Regression line _ _ _ X Y (X ) (Y ) (X Y) 1 2 1 4 2 2 4 4 16 8 3 5 9 25 15 4 4 16 16 16 5 5 25 25 25 = 66 Linear equation is represented as Y = m X + c =m = * Y = m X + c = 0.6 3 + 2.2 = 4 * 2 2 = 55 = 86 * = 15 = 20 c= * ((n (X Y))-( (X) (Y))*** ((n (X ))-( (X) )* 2 2 ((5 66)-(15 20))* * ((5 55))-(225)* =0.6 (( (Y) (X ))-( (X) (X Y)* 2 * * ((n (X ))-( (X) )* 2 2 = 2.2
  • 29. Intuition behind the Regression line Lets find out the predicted values of Y for corresponding values of X using the linear equation where m=0.6 and c=2.2 Here the blue points represent the actual Y values and the brown points represent the predicted Y values. The distance between the actual and predicted values are known as residuals or errors. The best fit line should have the least sum of squares of these errors also known as e square. (3,4) Y Y=0.6 1+2.2=2.8 Y=0.6 2+2.2=3.4 Y=0.6 3+2.2=4 Y=0.6 4+2.2=4.6 Y=0.6 5+2.2=5.2 pred * * * * *
  • 30. Intuition behind the Regression line Lets find out the predicted values of Y for corresponding values of X using the linear equation where m=0.6 and c=2.2 (3,4) _ _ X Y Y (Y-Y ) (Y-Y ) 1 2 2.8 -0.8 0.64 2 4 3.4 0.6 0.36 3 5 4 1 1 4 4 4.6 -0.6 0.36 5 5 5.2 -0.2 0.04 = 2.4 pred pred pred 2 The sum of squared errors for this regression line is 2.4. We check this error for each line and conclude the best fit line having the least e square value.
  • 31. Finding the Best fit line Minimizing the Distance: There are lots of ways to minimize the distance between the line and the data points like Sum of Squared errors, Sum of Absolute errors, Root Mean Square error etc. We keep moving this line through the data points to make sure the Best fit line has the least square distance between the data points and the regression line
  • 33. Multiple Linear Regression Simple Linear Regression Multiple Linear Regression Y = m x + c* 2 * Y = m x + m x + m x + ………. + m x + c*1 1 * 2 3 3*2 n n* Independent variables (IDV’s) Dependent variable (DV) Coefficient nm1, m2, m3…..m Slopes
  • 35. Use case implementation of Linear Regression Let’s understand how Multiple Linear Regression works by implementing it in Python
  • 36. Use case implementation of Linear Regression 1000 Companies Profit Expenditure Based on Predict
  • 37. Use case implementation of Linear Regression Predicting Profit of 1000 companies based on the attributes mentioned in the figure: Profit Estimation
  • 38. Use case implementation of Linear Regression Predicting Profit of 1000 companies based on the attributes mentioned in the figure: R&D Spend 1 Profit Estimation
  • 39. Use case implementation of Linear Regression Predicting Profit of 1000 companies based on the attributes mentioned in the figure: R&D Spend 1 Administration 2 Profit Estimation
  • 40. Use case implementation of Linear Regression Predicting Profit of 1000 companies based on the attributes mentioned in the figure: R&D Spend 1 Marketing Spend 3 Administration 2 Profit Estimation
  • 41. Use case implementation of Linear Regression Predicting Profit of 1000 companies based on the attributes mentioned in the figure: R&D Spend 1 State 4 Marketing Spend 3 Administration 2 Profit Estimation
  • 42. Use case implementation of Linear Regression Predicting Profit of 1000 companies based on the attributes mentioned in the figure: R&D Spend 1 State 4 Marketing Spend 3 Administration 2 ProfitProfit Profit Estimation
  • 43. Use case implementation of Linear Regression Predicting Profit of 1000 companies based on the attributes mentioned in the figure: R&D Spend 1 State 4 Marketing Spend 3 Administration 2 ProfitProfit Profit Estimation Predict Profit
  • 44. Use case implementation of Linear Regression 1. Import the libraries:
  • 45. Use case implementation of Linear Regression 2. Load the Dataset and extract independent and dependent variables:
  • 46. Use case implementation of Linear Regression 3. Data Visualization:
  • 47. Use case implementation of Linear Regression 4. Encoding Categorical Data: 5. Avoiding Dummy Variable Trap:
  • 48. Use case implementation of Linear Regression 6. Splitting the data into Train and Test set: 7. Fitting Multiple Linear Regression Model to Training set:
  • 49. Use case implementation of Linear Regression 8. Predicting the Test set results:
  • 50. Use case implementation of Linear Regression 9. Calculating the Coefficients and Intercepts:
  • 51. Use case implementation of Linear Regression 10. Evaluating the model: R squared value of 0.91 proves the model is a good model
  • 52. Use case summary We successfully trained our model with certain predictors and estimated the profit of companies using linear regression

Editor's Notes