Engineering Analysis & Statistics: Lect. # 11
Engineering Analysis & Statistics: Lect. # 11
Regression Analysis 2
A statistical procedure used to find relationships
among a set of variables
For example, in a chemical process, suppose
that the yield of the product is related to the
process-operating temperature.
Regression analysis can be used to build a
model to predict yield at a given temperature
level.
2/7/2019
Regression Analysis 3
Y-data X-data
2/7/2019
Correlation 4
A correlation is a relationship between two variables.
Typically, we take x to be the independent variable.
We take y to be the dependent variable. Data is
represented by a collection of ordered pairs (x, y).
The strength and direction of a linear relationship
between two variables is represented by the
correlation coefficient.
Suppose that there are n ordered pairs (x, y) that
make up a sample from a population. The correlation
coefficient r is given by:
2/7/2019
Regression Analysis 5
In regression analysis, there is a dependent
variable, which is the one you are trying to
explain, and one or more independent variables
that are related to it.
Example: predicting sales as a function of
marketing, Sales dependent (y), & marketing
independent (x)
You can express the relationship as a linear
equation, such as:
y = a + bx
2/7/2019
Regression Analysis 6
y = a + bx
• y is the dependent variable
• x is the independent variable
• a is a constant
• b is the slope of the line
• For every increase of 1 in x, y changes by an amount equal
to b
• Some relationships are perfectly linear and fit this equation
exactly. Your cell phone bill, for instance, may be:
2/7/2019
Example Problem 8
The time x in years that an employee spent at a
company and the employee's hourly pay, y, for 5
employees are listed in the table below. Calculate
and interpret the correlation coefficient and
equation of regression line. Also predict the hourly
pay rate of an employee who has worked for 20
years.
x 5 3 4 10 15
y 25 20 21 35 38
2/7/2019
Solution Example 9
x y x2 y2 Xy
5 25 25 625 125
3 2 9 400 60
4 21 16 441 84
10 35 100 1225 350
15 38 225 1444 570
xs y x2 y2 xy Calculate the numerator
= 802
37 139 375 4135 1189
Problems 10
Problem 1
Consider the following set of points: {(-2 , -1) , (1 , 1) , (3 , 2)}
a) Find the least square regression line for the given data points.
b) Plot the given points and the regression line in the same rectangular system
of axes.
Problem 2
a) Find the least square regression line for the following set of data {(-1 , 0),(0
, 2),(1 , 4),(2 , 5)}
b) Plot the given points and the regression line in the same rectangular
system of axes.
2/7/2019
Regression Analysis 11
Developing a procedure to find out equation that
fits a data.
It is used to predict the value of a certain
parameter dependent on same data.
In statistics terminology the variable being
predicted is called the dependent variable and the
variable(s) used to predict the data is called
independent variable.
Example: in tensile testing force is independent and
elongation is dependent variable.
2/7/2019
2/7/2019
Regression Equation 14
E(y)=0+1xi (2)
The graph of the simple linear regression equation
is a straight line; 0 is the y-intercept of the
regression line, 1 is the slope, and E(y) is the
mean or expected value of y for a given value of x.
2/7/2019
Regression Equation 15
POSITIVE LINEAR RELATIONSHIP
2/7/2019
Regression Equation 16
NEGATIVE LINEAR RELATIONSHIP
2/7/2019
Regression Equation 17
NO RELATIONSHIP
2/7/2019
2/7/2019
Estimated Regression Equation 19
2/7/2019
Regression Analysis
Regression 20
Dependent variable (y)
Weight
variations that height does
160
not explain.
If you take a sample of 140
actual heights and
120
weights, you might see
something like the graph to 100
the right. 60 65 70 75
Height
2/7/2019
Regression Analysis 22
The line in the graph shows the
average relationship described by
the equation. Often, none of the
220 actual observations lie on the line.
The difference between the line and
200 any individual observation is the
error.
180 The equation is:
Weight
Weight = C + 5.7*Height +
160 This equation does not mean that
people who are short enough will have
140 a negative weight. The
observations that contributed to this
120
analysis were all for heights between
5’ and 6’4”. The model will likely
provide a reasonable estimate for
100 anyone in this height range.
60 65 70 75
Height
2/7/2019
Regression Analysis 23
Regression finds the line that best fits the observations. It
does this by finding the line that results in the lowest sum of
squared errors.
Since the line describes the mean of the effects of the
independent variables, by definition, the sum of the actual
errors will be zero.
If you add up all of the values of the dependent variable and
you add up all the values predicted by the model, the sum is
the same.
That is, the sum of the negative errors (for points below the
line) will exactly offset the sum of the positive errors (for points
above the line).
Summing just the errors wouldn’t be useful because the sum
is always zero. So, instead, regression uses the sum of the
squares of the errors. An Ordinary Least Squares (OLS)
regression finds the line that results in the lowest sum of
squared errors. 2/7/2019
Linear regression 24
•Linear dependence: constant rate of increase of one
variable with respect to another.
•Regression analysis describes the relationship
between two (or more) variables.
•Examples:
– Income and educational level
– Demand for electricity and the weather
– Home sales and interest rates
•Our focus:
–Gain some understanding of the mechanics.
• The regression line
• Regression error
– Learn how to interpret and use the results.
– Learn how to setup a regression analysis.
2/7/2019
Simple Linear regression
Simple Linear Regression 25
b1 = slope
b0 (y intercept) = ∆y/ ∆x
2/7/2019
Observation: y
Dependent variable
Prediction: y^
Zero
Independent variable (x)
Prediction error:
Observation: y
Prediction: y^
Zero
y=^
y+
Actual = Explained + Error
2/7/2019
Regression Analysis 28
Least squares (LS) used
y for estimation of regression coefficients
b0 y=b0+b1x+
b1
2/7/2019
Least Regression Analysis
Calculating SSR 31
Dependent variable
Population mean: y
2/7/2019
∑ ̅
∑ ̅
(4)
n = 10 𝑥 140 𝑦 1300
2/7/2019
Least Regression Analysis
Calculating SSR 35
Rt xi Yi xi -𝑥̅ yi - (xi - 𝒙 yi - 𝒚 (xi - 𝑥̅ 2
i
1 2 58
2 6 105
3 8 88
4 8 118
5 12 117
6 16 13
7 20 15
8 20 169
9 22 149
10 26 202
n = 10
𝑥 140 𝑦 1300 (xi −𝒙 yi −𝒚 𝟐
(xi −𝒙
∑ ̅ 𝑦𝑖 60 5𝑥
𝑏 ∑ ̅
=5 = 60
2/7/2019
2/7/2019
Coefficient ofCalculating
Determination
SSR 37
2/7/2019
Coefficient ofCalculating
Determination
SSR 38
The sum of square due to error (SSE) is a
measure of the error.
𝟐
SSE = 𝒊 𝒊
In the example, xi = 2, yi = 58
𝑦𝑖 = 60 +5x = 60 + 5(2) = 70
yi - 𝑦𝑖 = 58 – 70 = -12
(yi - 𝑦𝑖 )2 = 144
After computing & squaring the residuals then sum all the
errors that is used to predict sales.
2/7/2019
Coefficient ofCalculating
Determination
SSR 39
Estimation of sales with out knowing size of population &
without any related variables
Rest xi Yi Predicted Sales Error Squared Error
i 𝑦𝑖 60 5𝑥 (yi -𝑦𝑖 ) (yi -𝑦𝑖 )2
1 2 58
2 6 105
3 8 88
4 8 118
5 12 117
6 16 13
7 20 15
8 20 169
9 22 149
10 26 202 𝑆𝑆𝐸= 1530
2/7/2019
Coefficient Regression
of Determination
Formulas 40
The Total Sum of Squares (SST) is equal to SSR (Sum of
squares due to regression)+ SSE (Sum of squares due to
error)
Mathematically,
SSR = ∑ (𝑦𝑖 – 𝑦 )2 (sum of squares due to regression variation)
SSE = ∑ ( yi – 𝑦𝑖 ) (measure of unexplained variation)
SST = SSR + SSE = ∑ ( yi – 𝑦)2 (measure of total variation in y)
2/7/2019
Coefficient ofCalculating
Determination
SSR 41
Computation of total sum of squares
Coefficient Regression
of Determination
Formulas 42
2/7/2019
Coefficient Regression
of Determination
Formulas 43
SSE
SST
SSR
2/7/2019
Exercise problems
Regression Formulas 44
2/7/2019