Welcome To:: Simple Linear Regression
Welcome To:: Simple Linear Regression
Welcome to:
Simple linear regression
9.1
Unit objectives IBM ICE (Innovation Centre for Education)
IBM Power Systems
• Stochastic models give ranges of values for variables in the form of probability distributions.
• Learning the deterministic models is a process of learning a deterministic function which maps
each input variable to an output variable.
• Regression is a standard statistical technique for performing supervised learning when all
variables are continuous.
• Basically regression is given with y values corresponding to x values to fit a curve which can
represent most of the y values (best fit).
Regression examples IBM ICE (Innovation Centre for Education)
IBM Power Systems
Regression models IBM ICE (Innovation Centre for Education)
IBM Power Systems
Steps in regression analysis IBM ICE (Innovation Centre for Education)
IBM Power Systems
• Linear regression is used to study the linear relationship between a dependent variable Y
(blood pressure) and one or more independent variables X (age, weight, sex).
• The dependent variable Y must be continuous, while the independent variables may be
either continuous (age), binary (sex), or categorical (social status).
Simple linear regression IBM ICE (Innovation Centre for Education)
IBM Power Systems
• Suppose a sample of n sets of paired observations (xi , yi )(i = 1,2,..., n) are available. These
observations are assumed to satisfy the simple linear regression model and so we can write
– yi = β 0 + β1 xi + εi (i = 1, 2,..., n).
Least squares regression: Line of best fit IBM ICE (Innovation Centre for Education)
IBM Power Systems
• Our aim is to calculate the values m (slope) and b (y-intercept) in the equation of a line : y =
mx + b
• Where:
‾ y = how far up
‾ x = how far along
‾ m = Slope or Gradient (how steep the line is)
‾ b = the Y intercept (where the line crosses the Y axis)
Illustration IBM ICE (Innovation Centre for Education)
IBM Power Systems
• Sam found how many hours of sunshine vs how many ice creams were sold at the shop
from Monday to Friday.
• Let us find the best m (slope) and b (y-intercept) that suits that data: y = mx + b
"x" "y"
Hours of Sunshine Ice Creams Sold
2 4
3 5
5 7
7 10
9 15
Direct regression method IBM ICE (Innovation Centre for Education)
IBM Power Systems
• This method is also known as the ordinary least squares estimation. Assuming that a set of n
paired observations on (x, y ), i = 1,2,..., n are available which satisfy the linear regression
model y = β0 + β1 X + ε . So we can write the model for each observation as
– yi = β0 + β1xi + εi , , (i = 1,2,..., n) .
• The direct regression approach minimizes the sum of squares due to errors given by
– S ( β 0 , β1 ) = ∑ ε i2, = ∑( yi − β 0 − β1xi )2, with respect to β 0 and β1.
( , ) ( , )
• The solutions of these two equations are called the direct regression estimators, or usually
called as the ordinary least squares (OLS) estimators of β0 and β1.
Maximum likelihood estimation IBM ICE (Innovation Centre for Education)
IBM Power Systems
• We assume that εi ' s (i = 1,2,..., n) are independent and identically distributed following a
normal distribution N (0,σ 2 ). Now we use the method of maximum likelihood to estimate the
parameters of the linear regression model
– yi = β 0 + β1 xi + εi (i = 1, 2,..., n),
• The observations yi (i = 1,2,..., n) are independently distributed with N (β0 + β1xi ,σ2 ) for all i
= 1,2,..., n. The likelihood function of the given observations (xi , yi ) and unknown parameters
β0 , β1 and σ2 is
1
1
y= X= .
.
.
1
0 1
y X 1 and 2
k n
Regression assumptions and model
properties IBM ICE (Innovation Centre for Education)
IBM Power Systems
Coefficient of determination (R-squared) IBM ICE (Innovation Centre for Education)
IBM Power Systems
• R-squared is a statistical measure that’s used to assess the goodness of fit of our regression
model.
• R-squared is given by: R2 = 1 – SSE /SST, where SSE is the sum of squared error and SST
is the sum of squared errors of regression model.
Example IBM ICE (Innovation Centre for Education)
IBM Power Systems
SSE (y i - yˆ i ) 2 (y i - b0 - b1 x i ) 2
Testing hypothesis in simple linear
regression IBM ICE (Innovation Centre for Education)
IBM Power Systems
• Use of t-Tests
Illustration IBM ICE (Innovation Centre for Education)
IBM Power Systems
Checking model adequacy IBM ICE (Innovation Centre for Education)
IBM Power Systems
• The analyst should always consider the validity of these assumptions to be doubtful and
conduct analyses to examine the adequacy of the model.
• The residuals from a regression model are ei = yi - ŷi , where yi is an actual observation
and ŷi is the corresponding fitted value from the regression model.
• Analysis of the residuals is frequently helpful in checking the assumption that the errors
are approximately normally distributed with constant variance, and in determining whether
additional terms in the model would be useful.
Over-fitting IBM ICE (Innovation Centre for Education)
IBM Power Systems
• A statistical model begins to describe the random error in the data rather than the
relationships between variables.
– Produce misleading R-squared values, regression coefficients, and p-values
• Divide data into three sets, training, validation and test sets.
• Find the optimal model on the training set, and use the test set to check its predictive
capability .
• See how well the model can predict the test set.
• The validation error gives an unbiased estimate of the predictive power of a model.
Logistic regression IBM ICE (Innovation Centre for Education)
IBM Power Systems
• Linear regression can be generalised for two class classifiers (binary classifiers).
• We can recall, if the output y belongs to the set {0, 1}, it is called binary classifier.
• Ber() stands for Bernoulli distribution which is more appropriate when the response is binary.
Checkpoint (1 of 2) IBM ICE (Innovation Centre for Education)
IBM Power Systems
1. The best fitting trend is one for which the sum of squares of error is
a) Zero
b) Minimum (Least)
c) Maximum
d) None
3. Suppose your model is over-fitting. Which of the following is NOT a valid way to try and
reduce the over-fitting?
a) Increase the amount of training data
b) Improve the optimization algorithm being used for error minimization.
c) Decrease the model complexity
d) Reduce the noise in the training data
Checkpoint solutions (1 of 2) IBM ICE (Innovation Centre for Education)
IBM Power Systems
1. The best fitting trend is one for which the sum of squares of error is
a) Zero
b) Minimum (Least)
c) Maximum
d) None
3. Suppose your model is over-fitting. Which of the following is NOT a valid way to try and
reduce the over-fitting?
a) Increase the amount of training data
b) Improve the optimization algorithm being used for error minimization
c) Decrease the model complexity
d) Reduce the noise in the training data
Checkpoint (2 of 2) IBM ICE (Innovation Centre for Education)
IBM Power Systems
1. ___________ is a learning from labeled data using classification and regression models.
2. ____________is considered as the number of observations in regression analysis.
3. The percent of total variation of the dependent variable Y explained by the set of
independent variables X is measured by ______________.
4. _________ is also known as the ordinary least squares estimation.
True or False:
1. Supervised learning is a learning from labeled data using classification and regression
models.
2. Degree of freedom is considered as the number of observations in regression analysis.
3. The percent of total variation of the dependent variable Y explained by the set of
independent variables X is measured by co-efficient of determination.
4. Direct regression method is also known as the ordinary least squares estimation.
True or False: