0% found this document useful (0 votes)
9 views

Welcome To:: Simple Linear Regression

The document discusses simple linear regression. It begins by introducing regression as a technique for modeling relationships between variables when the response variable is continuous. Simple linear regression involves modeling the relationship between a single predictor or independent variable and the response or dependent variable. The document then covers estimating regression coefficients using the least squares method and the maximum likelihood estimation approach. It also discusses assumptions of the linear regression model and evaluating model fit using the coefficient of determination.

Uploaded by

Aasmi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Welcome To:: Simple Linear Regression

The document discusses simple linear regression. It begins by introducing regression as a technique for modeling relationships between variables when the response variable is continuous. Simple linear regression involves modeling the relationship between a single predictor or independent variable and the response or dependent variable. The document then covers estimating regression coefficients using the least squares method and the maximum likelihood estimation approach. It also discusses assumptions of the linear regression model and evaluating model fit using the coefficient of determination.

Uploaded by

Aasmi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

IBM ICE (Innovation Centre for Education)

Welcome to:
Simple linear regression

9.1
Unit objectives IBM ICE (Innovation Centre for Education)
IBM Power Systems

After completing this unit you should be able to:

• Understand the concept of supervised learning


• Understand regression and its variants along with real world problems solved using
regression

• Have an understanding about estimation of parameters


• Have an insight on testing of significance
• Have knowledge on logistic regression analysis
Introduction IBM ICE (Innovation Centre for Education)
IBM Power Systems
• Deterministic models are mathematical models in which outcomes are precisely determined
through known relationships among system states and events.

• No random variation in the outcomes.


– given input will always produce the same output.
– a known chemical reaction which always gives known chemical compounds
– a computer program which gives expected output for a given set of inputs.

• Stochastic models give ranges of values for variables in the form of probability distributions.

• Learning the deterministic models is a process of learning a deterministic function which maps
each input variable to an output variable.

• Two main paradigms in deterministic learning.


– Supervised learning.
– Unsupervised learning.

• Third paradigm reinforcement learning.


Supervised learning IBM ICE (Innovation Centre for Education)
IBM Power Systems

Number of correct classifica tions


Accuracy 
Total number of test cases
Regression IBM ICE (Innovation Centre for Education)
IBM Power Systems

• Regression is a standard statistical technique for performing supervised learning when all
variables are continuous.

• It is just like classification except that the response variable is continuous.

• Basically regression is given with y values corresponding to x values to fit a curve which can
represent most of the y values (best fit).
Regression examples IBM ICE (Innovation Centre for Education)
IBM Power Systems
Regression models IBM ICE (Innovation Centre for Education)
IBM Power Systems
Steps in regression analysis IBM ICE (Innovation Centre for Education)
IBM Power Systems

• Regression analysis includes the following steps:


– Statement of the problem under consideration.
– Choice of relevant variables.
– Collection of data on relevant variables.
– Specification of model.
– Choice of method for fitting the data.
– Fitting of model.
– Model validation and criticism.
– Using the chosen model(s) for the solution of the posed problem and forecasting.
Linear regression IBM ICE (Innovation Centre for Education)
IBM Power Systems

• Linear regression is used to study the linear relationship between a dependent variable Y
(blood pressure) and one or more independent variables X (age, weight, sex).

• The dependent variable Y must be continuous, while the independent variables may be
either continuous (age), binary (sex), or categorical (social status).
Simple linear regression IBM ICE (Innovation Centre for Education)
IBM Power Systems

• Consider a simple linear regression model


y = β 0 + β1 X + ε
where y is termed as the dependent or study variable and X is termed as independent or
explanatory variable. The terms β0 and β1 are the parameters of the model. The parameter β0
is termed as intercept term and the parameter β1 is termed as slope parameter. These
parameters are usually called as regression coefficients.

The independent variable is viewed as controlled by the experimenter, so it is considered as


non-stochastic whereas y is viewed as a random variable with
E ( y ) = β 0 + β1 X
And
Var ( y) = σ 2 .
Least squares estimation IBM ICE (Innovation Centre for Education)
IBM Power Systems

• Suppose a sample of n sets of paired observations (xi , yi )(i = 1,2,..., n) are available. These
observations are assumed to satisfy the simple linear regression model and so we can write
– yi = β 0 + β1 xi + εi (i = 1, 2,..., n).
Least squares regression: Line of best fit IBM ICE (Innovation Centre for Education)
IBM Power Systems

• Our aim is to calculate the values m (slope) and b (y-intercept) in the equation of a line : y =
mx + b

• Where:
‾ y = how far up
‾ x = how far along
‾ m = Slope or Gradient (how steep the line is)
‾ b = the Y intercept (where the line crosses the Y axis)
Illustration IBM ICE (Innovation Centre for Education)
IBM Power Systems

• Sam found how many hours of sunshine vs how many ice creams were sold at the shop
from Monday to Friday.

• Let us find the best m (slope) and b (y-intercept) that suits that data: y = mx + b

"x" "y"
Hours of Sunshine Ice Creams Sold

2 4
3 5
5 7
7 10
9 15
Direct regression method IBM ICE (Innovation Centre for Education)
IBM Power Systems

• This method is also known as the ordinary least squares estimation. Assuming that a set of n
paired observations on (x, y ), i = 1,2,..., n are available which satisfy the linear regression
model y = β0 + β1 X + ε . So we can write the model for each observation as
– yi = β0 + β1xi + εi , , (i = 1,2,..., n) .

• The direct regression approach minimizes the sum of squares due to errors given by
– S ( β 0 , β1 ) = ∑ ε i2, = ∑( yi − β 0 − β1xi )2, with respect to β 0 and β1.

• The partial derivatives of S(β0 , β1 ) with respect to β0 are


( , )
– = −2 ∑ 𝑦𝑖 − 𝛽 −𝛽 𝑥

• and the partial derivative of S(β0 , β1 ) with respect to β1 is


( , )
– = −2 ∑ 𝑦𝑖 − 𝛽 −𝛽 𝑥 𝑥

• The solution of β0 and β1 is obtained by setting

( , ) ( , )

• The solutions of these two equations are called the direct regression estimators, or usually
called as the ordinary least squares (OLS) estimators of β0 and β1.
Maximum likelihood estimation IBM ICE (Innovation Centre for Education)
IBM Power Systems

• We assume that εi ' s (i = 1,2,..., n) are independent and identically distributed following a
normal distribution N (0,σ 2 ). Now we use the method of maximum likelihood to estimate the
parameters of the linear regression model
– yi = β 0 + β1 xi + εi (i = 1, 2,..., n),

• The observations yi (i = 1,2,..., n) are independently distributed with N (β0 + β1xi ,σ2 ) for all i
= 1,2,..., n. The likelihood function of the given observations (xi , yi ) and unknown parameters
β0 , β1 and σ2 is

• The maximum likelihood estimates of β0 , β1 and σ2 can be obtained by maximizing L(xi, yi ;


β0, β1, σ2 )or equivalently ln L L(xi, yi ; β0, β1, σ2 ) where
Matrix approach IBM ICE (Innovation Centre for Education)
IBM Power Systems

1
1
y= X= .
.
.
1

0   1 
   
y  X      1  and    2 
 
   

 k  n 
Regression assumptions and model
properties IBM ICE (Innovation Centre for Education)
IBM Power Systems
Coefficient of determination (R-squared) IBM ICE (Innovation Centre for Education)
IBM Power Systems
• R-squared is a statistical measure that’s used to assess the goodness of fit of our regression
model.

• R-squared is given by: R2 = 1 – SSE /SST, where SSE is the sum of squared error and SST
is the sum of squared errors of regression model.
Example IBM ICE (Innovation Centre for Education)
IBM Power Systems

• Consider the following data


Testing for significance IBM ICE (Innovation Centre for Education)
IBM Power Systems

• To test for a significant regression relationship, we must conduct a hypothesis test to


determine whether the value of b1 is zero.

• Two tests are commonly used:


– t test and F test
• Both the t test and F test require an estimate of s2, the variance of e in the regression model.
• The mean square error (MSE) provides the estimate of s2, and the notation s2 is also used.
– s 2 = MSE = SSE/(n - 2)
• where

SSE   (y i - yˆ i ) 2  (y i - b0 - b1 x i ) 2
Testing hypothesis in simple linear
regression IBM ICE (Innovation Centre for Education)
IBM Power Systems

• Use of t-Tests
Illustration IBM ICE (Innovation Centre for Education)
IBM Power Systems
Checking model adequacy IBM ICE (Innovation Centre for Education)
IBM Power Systems

• Fitting a regression model requires several assumptions.


‾ Errors are uncorrelated random variables with mean zero.
‾ Errors have constant variance and
‾ Errors be normally distributed.

• The analyst should always consider the validity of these assumptions to be doubtful and
conduct analyses to examine the adequacy of the model.
• The residuals from a regression model are ei = yi - ŷi , where yi is an actual observation
and ŷi is the corresponding fitted value from the regression model.
• Analysis of the residuals is frequently helpful in checking the assumption that the errors
are approximately normally distributed with constant variance, and in determining whether
additional terms in the model would be useful.
Over-fitting IBM ICE (Innovation Centre for Education)
IBM Power Systems

• A statistical model begins to describe the random error in the data rather than the
relationships between variables.
– Produce misleading R-squared values, regression coefficients, and p-values

• Graphical Illustration of overfitting regression models


Detecting over-fit models:
Cross validation IBM ICE (Innovation Centre for Education)
IBM Power Systems

• Cross-validation is a resampling procedure used to evaluate machine learning models on a


limited data sample.
• The general procedure:
– Shuffle the dataset randomly.
– Split the dataset into k groups.
– For each unique group:
• Take the group as a hold out or test data set.
• Take the remaining groups as a training data set.
• Fit a model on the training set and evaluate it on the test set.
• Retain the evaluation score and discard the model.
– Summarize the skill of the model using the sample of model evaluation scores.
Cross validation:The ideal procedure IBM ICE (Innovation Centre for Education)
IBM Power Systems

• Divide data into three sets, training, validation and test sets.

• Find the optimal model on the training set, and use the test set to check its predictive
capability .

• See how well the model can predict the test set.

• The validation error gives an unbiased estimate of the predictive power of a model.
Logistic regression IBM ICE (Innovation Centre for Education)
IBM Power Systems

• Linear regression can be generalised for two class classifiers (binary classifiers).

• We can recall, if the output y belongs to the set {0, 1}, it is called binary classifier.

• The classification procedure in which, linear regression is generalised for binary


classification, by replacing the Gaussian distribution to Bernoulli distribution, is called logistic
regression.

• It can be modelled as,

• Ber() stands for Bernoulli distribution which is more appropriate when the response is binary.
Checkpoint (1 of 2) IBM ICE (Innovation Centre for Education)
IBM Power Systems

Multiple choice questions:

1. The best fitting trend is one for which the sum of squares of error is
a) Zero
b) Minimum (Least)
c) Maximum
d) None

2. A coefficient of correlation is computed to be -0.95 means that


a) The relationship between two variables is weak.
b) The relationship between two variables is strong and positive
c) The relationship between two variables is strong and but negative
d) Correlation coefficient cannot have this value

3. Suppose your model is over-fitting. Which of the following is NOT a valid way to try and
reduce the over-fitting?
a) Increase the amount of training data
b) Improve the optimization algorithm being used for error minimization.
c) Decrease the model complexity
d) Reduce the noise in the training data
Checkpoint solutions (1 of 2) IBM ICE (Innovation Centre for Education)
IBM Power Systems

Multiple choice questions:

1. The best fitting trend is one for which the sum of squares of error is
a) Zero
b) Minimum (Least)
c) Maximum
d) None

2. A coefficient of correlation is computed to be -0.95 means that


a) The relationship between two variables is weak.
b) The relationship between two variables is strong and positive
c) The relationship between two variables is strong and but negative
d) Correlation coefficient cannot have this value

3. Suppose your model is over-fitting. Which of the following is NOT a valid way to try and
reduce the over-fitting?
a) Increase the amount of training data
b) Improve the optimization algorithm being used for error minimization
c) Decrease the model complexity
d) Reduce the noise in the training data
Checkpoint (2 of 2) IBM ICE (Innovation Centre for Education)
IBM Power Systems

Fill in the blanks:

1. ___________ is a learning from labeled data using classification and regression models.
2. ____________is considered as the number of observations in regression analysis.
3. The percent of total variation of the dependent variable Y explained by the set of
independent variables X is measured by ______________.
4. _________ is also known as the ordinary least squares estimation.

True or False:

1. Variance of residual is called as Standard error of regression analysis. True/False


2. The correlation coefficient is the geometric mean of two regression coefficients. True/False
3. The value of the coefficient of correlation r lies between 0 and 1. True/False
Checkpoint solutions(2 of 2) IBM ICE (Innovation Centre for Education)
IBM Power Systems

Fill in the blanks:

1. Supervised learning is a learning from labeled data using classification and regression
models.
2. Degree of freedom is considered as the number of observations in regression analysis.
3. The percent of total variation of the dependent variable Y explained by the set of
independent variables X is measured by co-efficient of determination.
4. Direct regression method is also known as the ordinary least squares estimation.

True or False:

1. Variance of residual is called as Standard error of regression analysis. True


2. The correlation coefficient is the geometric mean of two regression coefficients. True
3. The value of the coefficient of correlation r lies between 0 and 1. False
Question bank IBM ICE (Innovation Centre for Education)
IBM Power Systems

Two marks questions:

1. What is meant by linear regression? Illustrate.


2. Define SSE and SSR.
3. Define coefficient of determination with its importance.
4. Define adjusted R-squared with an example.

Four marks questions:

1. Define regression. What are its applications?


2. Explain simple linear regression with an example.
3. Discuss least squares estimations with an example.
4. Explain line of best fit with an example.

Eight marks questions:

1. Explain k-fold cross validation with an example.


2. Describe the procedure of significance testing in SLR.
Unit summary IBM ICE (Innovation Centre for Education)
IBM Power Systems

Having completed this unit you should be able to:

• Understand the concept of supervised learning


• Understand regression and its variants along with real world problems solved using
regression

• Have an understanding about estimation of parameters


• Have an insight on testing of significance

• Have knowledge on logistic regression analysis

You might also like