0% found this document useful (0 votes)
18 views

Ch2 Linear Regression Analysis

Uploaded by

Anshu Ingle
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Ch2 Linear Regression Analysis

Uploaded by

Anshu Ingle
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

CHAPTER 2

LINEAR REGRESSION
ANALYSIS
Instructor: Lena Ahmadi

During
the
Lecture
E6-2004 Skype: L2ahm Ext.
Office L2ahm adi@u 37160
After Hour adi w...
the
Lecture

In Person Message Call


Course notes are copyrighted material. © Professor Alex Penlidis, 2019.
This copy is for individual use only in connection with this course.
It may not be resold or used to make additional copies.
Tel: (519) 888-4567 x36634, E-mail: [email protected]
1
Major Topics
• Chapter 1:
Statistical Background
• Chapter 2:
Regression Analysis
• Chapter 3:
Statistical Design of Experiments
• Chapter 4:
Design/Analysis of Single Factor Experiments
• Chapter 5:
Blocking
• Chapter 6:
Multifactor Experiments
• Chapter 7:
Multifactor Experiments
• Chapter 8
Response Surface Methods/CCD
• Chapter 9:
Response Surface Methods/BBD,
Face Centered Designs, Nonlinear
Regression
• Chapter 10:
Data Transformations
• Chapter 11:
The Analysis of Undesigned Data,
• Chapter 12:
Concluding Remarks
2
Learning Objectives
• Understand fitting LRM
• Understand basic regression
model inference, including tests
for significance of regression,
tests on individual models
parameter and confidence
intervals on the parameter
• Know how to use basic
regression model diagnostics,
such as residual plots,…

3
Previous Topic:
• Basic Stats
• Confidence Interval
• Hypothesis Testing
• P_Value
• Correlation and more…
• Linear Reg. Analysis (basics)

4
Topic Outline
• A few general remarks on
LRMS
• Estimation of parameters in
LRMs
• Inferences on the parameters
• Regression model diagnostics
– Scaled residual, PRESS,
R-Student
– Testing for lack of fit

5
A few general remarks on
LRMS

6
A few general remarks…

• Quick review of some material from 2nd yr


Applied Stats, but also new material,
especially on diagnostic checks

• A more formal and generalized treatment, that


can be used in any situation/course and with
any data set

• LLS; OLS; OLLS


– Linearly least square
– Ordinary least square
– Ordinary regression
– …

7
8
WHY IS MODELING
IMPORTANT?

• If we want to develop and apply strategies


which would increase productivity, improve
quality and/or give desired property
trajectories, it is important to have a valid
model capable of predicting at least the major
effects of the process variables.

• Several levels (of complexity) of mathematical


models.

• Ask questions to the model, i.e., pose


questions (“what if” scenarios) to a simulation
of the actual process!

9
USES OF MODELS

1. Process Understanding
– Direct further experimentation
– Reservoir/repository of knowledge
– Couplings/interactions, especially when
many factors change simultaneously!
– Transferability/inference engine

– Best way to find out what you do not


know about the process!

2. Process Simulation
– Parameter estimation
– Sensitivity analysis
– Design

10
USES OF MODELS (cont’d)

3. Process Optimization
– Grade changes (usually highly non-linear
problems)

4. Process Safety

5. Model-based Control
– Filtering/infer unmeasured properties
– Sensor development/selection
– Optimal sensor location

6. Education and Training

11
MODELING STAGES

• Step 1: Understand the process; specify


process model forms
• Step 2: Lab/plant experimentation and data
collection
• Step 3: Test model’s predictive powers
• Step 4: Extrapolate (with caution) to other
operating conditions

Steps interrelated

Iterative procedure (e.g., from Step 3 back to


Step 1, or from Step 4 back to Step 2?)

Sequential, iterative, optimal!

12
From 2nd year, (Review) @ home
You are welcome to ask your questions during
office hour, over coffee, skype meeting, ….

GENERAL REGRESSION
MODEL

g ( yi ) = f ( x i ,  ) +  i i = 1,2,..., n
*

where :

g ( ), f ( ) = known mathematic al functions

 = px1 vector of unknown parameters to


be estimated

*
= denotes " true" values

 i =" error" - discrepanc y between th e " true"


value of g ( yi ) and the one actually measured

13
(Review)

• Error (aggregate term) results from:


– Failure of xi to represent the values of x
actually used.
– Unsuspected values of other independent
variables; stochastic factors; lurking
variables.
– Measurement errors; etc.

• Assumption is that the independent variables


(x) are perfectly known, which is often not
realizable; however, more or less valid if errors
in x are much less than errors in y.

• Examples:
Linear
yi = 1* +  2* xi +  i
 
Nonlinear
yi = 1* + exp  2* xi +  i
yi = 1* +  2* x1i +  3* x2i +  4* x1i x2i +  5* x12i +  6* x22i +  i Linear
1* x1i
ln( yi ) = + i Nonlinear
2.8 +  2* x2i
x x  Linear
yi = 1* sin( x1i ) +  2* exp  1i 2i  +  i
 9.8 

14
(Review)

LINEAR REGRESSION

• Linear regression is concerned with models


linear in the parameters:

g ( yi ) = 1* f1 ( x i ) +  2* f 2 ( x i ) + ..... +  *p f p ( x i ) +  i i = 1,2,..., n

• A linear-in-the-parameters model satisfies:


g ( yi )
 h( 1 ,  2 ,....,  p ) j
 j

• Linear model in matrix notation:


y = X * +

where :
y = g ( yi ) i th element
nx1


X = f j ( xi )  i, j th element
nxp

 
 * =  *j j th element
px1

 =  i  i th element
nx1
15
(Review)

Assumptions Inherent in Ordinary Linear Regression:

• The model is valid.

• The values of the independent variables (X) are


perfectly known at each trial.

• The errors, , are additive to the true values of the


quantity being measured (Y); they do not covary
with each other (uncorrelated); and they are of
constant, but possibly unknown, variance

E ( ) = 0 V ( ) = I 2

• We have not assumed anything about the


distribution of the errors so far.

16
(Review)

Estimation of parameters in
LRMs

17
(Review)

MAXIMUM LIKELIHOOD
ESTIMATE OF THE LINEAR
REGRESSION PARAMETERS

• Now, assume:
 : N (0, I 2 )
y = X * +
 y : N ( X  * , I 2 )

• The Multivariate normal probability density


function:
z : N (  ,V )

exp − (z −  ) V (z −  )
1  1  −1 
f ( z) =
(2 ) n 2 V  2 
12

• For the dependent variable y:


1  1  * 
f ( y) = exp  − ( y − X  *
) I ( y − X  )
(2 ) 
n2 n
 2 2

18
(Review)

• When y, X and 2 are known, but * is not, the


pdf for y becomes the likelihood function for
the unknown values of *:
L(  ) =
1
(2 ) 
exp
 1
( 
)(
− 2 y − X  y − X   )
 2 
n/2 n

(
Let : Z (  ) = y − X  ) (y − X  ); K1 =
1
(2 ) n / 2  n
; K 2 =
1
2 2

L(  ) = K1 exp− K 2 Z (  )

• Maximizing L() is equivalent to finding the


value of  which minimizes Z(). The
minimum is found to be:


̂ = X X ( ) −1

X y
We also call this OLSE, meaning ordinary least
squares estimator

• This value of  is always the least squares (LS)


estimator of *. It is the maximum likelihood
(ML) estimate, if the errors are normally
distributed.

19
(Review)

EXAMPLE
16.4
Model : yi = 1* +  2* xi +  3* + i
xi
Data :
y x

( )
13.84 10
−1
11.70
10.11
15
20
̂ = X  X Xy
10.82 25
10.07 30
13.84  1 10 1.640
  1
 11.70  15 1.093
 
y =  10.11 X = 1 20 0.820
   
 10.82  1 25 0.656
 10.01 1 0.546 
  30

 5 100 4.756 55.8147 − 1.4968 − 26.9965


X  X =  100 2250 82.0 
 
(X  X )
−1
=

0.0412 0.7081 

4.756 82.0 5.287  13.4929 

 56.54   4.083
   
X  y = 1088.70  ˆ =  0.109
 56.38   5.294 
   

20
Ice Cream Example

• Tutorial, Question of Ch 2_ex 1, on


LRM

• Supposed to be review material from


2nd yr Stats

21
Inferences on the
parameters

22
INFERENCES ON THE
PARAMETERS

• The mean and E ( ˆ ) =  *


covariance of ̂ :
( )
V ( ˆ ) = X  X
−1
2

V ( ˆ ) = (X  X )
i ii
−1
2

( )
• Under the assumption of −1
normality for the errors: ˆ : N (  * , X  X 2
)

( )
ˆi : N (  *i , X  X ii
−1
 2)

(Review)

23
(Review)

• Confidence intervals for the individual


parameter estimates are given by:
ˆi  z 2 ( )ii  2 known

X X
−1

ˆi  t 2, n − p s (X  X )ii
−1
 2 unknown

Recall :
Z(  ) = ( y − X  ) ( y − X  )

( 
) (
= y y − ˆ X  y +  − ˆ X  X  − ˆ )
The minimum of Z (  ) occurs when  = ˆ . This is called
the residual sum of squares.

RSS = y y − ˆ X  y Note: In some books, instead of SSE
SSE
(sum pf squared of Error), you would
SSE
RSS see RSS (Residual sum of squared)
s2 =
n− p or SSres (Sum of squared of
residual),all are the same
• For the previous example, 95% confidence
intervals for the parameters are:

1 : 4.082  18.523
 2 : 0.109  0.503
 3 : 5.294  9.107
• Any concerns?
24
Ice Cream Example

• Tutorial, Inference of Question Ch


2_ex 1, on LRM

• Supposed to be review material from


2nd yr Stats

25
JOINT CONFIDENCE
REGIONS

• So far we have considered confidence


intervals on individual parameters only; this
was done as if the other parameters and
related regressors did not exist!

• It can be shown that a joint confidence


region (JCR) for all “p” parameters is given by:

( ) ( )( )
 
 −  X X  − ˆ   2  2p,
ˆ  2 known

( −  ) (X X )( − ˆ )  ps2f p,n -p,  2 unknown


ˆ  

26
MODEL PREDICTIONS

• Let w be a 1xp row vector having the same


format as a row of the X matrix.
• For a specific set of independent variables x0,

yˆ 0 = w0 ̂
• From this we can obtain information about:
– the mean response (confidence interval)
– the prediction of a single future
measurement (prediction interval)
• Confidence interval (CI) on fitted values:

( )
yˆ 0  z 2 w0 X  X
−1
w0

• Prediction interval (PI) on a single future


occurrence:
yˆ 0  z 2 w0 ( X  X )−1 w0 + 1
27
EXTENSIONS IN LEAST
SQUARES REGRESSION
(and parting thoughts)
• Non-normality:
– Assume i : N(0,2)
– Needed for tests of significance and
confidence intervals.
– Estimates are still “BLUE” (Best Linear
Unbiased Estimates) if other assumptions
hold.
– Transformations or other likelihood
functions can be used.

• Heterogeneous Variance:
– Assume V() = I2
– affects the minimum variance property of
the parameter estimates; loss of precision.

28
Some remarks

• Correlated errors:
– loss of parameter precision; invalidates
significance tests
– occurs when data are collected in a time
sequence; pollutant emissions with time,
biological studies, batch reactors.
– Time Series Analysis.
– Generalized Least Squares.

• Influential data points and outliers:


– influential points have a greater impact on
the least squares results, yet all points are
weighted equally.
– Outliers are values of the dependent
variable that are inconsistent with the bulk
of the data.

29
– Little confidence can be placed in least
squares estimates dominated by such
points.
– Study the impact of such points.

• Model inadequacies:
– assume that the model is valid.
– The parameter estimates are unbiased only
if the model is correct.
• Collinearity
– solution is very unstable; small changes in
the independent or dependent variables
can cause drastic changes in the
parameter estimates.
– Variances of the parameter estimates
involved (near singularity situations)
become very large.
– Impact of collinearity very serious, if the
objective is to estimate parameters or
identify important variables; less serious, if
objective is simply fitting.
– What else can we do to protect ourselves
in practical cases?

30
Problems With Highly Correlated Parameter
Estimates

• Usually means higher variances; i.e.,


parameters are less precisely estimated.
• Individual confidence intervals are less
meaningful. Testing H0: j =0 can lead to
erroneous conclusions in the presence of high
covariance.
• There is more uncertainty about which
independent variables are important.
• Correlations (covariances) between parameter
estimates become large if corresponding
columns of X matrix are highly non-orthogonal
(ill-conditioning).
• Area of the joint confidence region will be
smaller for orthogonal designs.
• Length of individual confidence intervals are
much smaller for orthogonal designs.
• Orthogonal designs (to be discussed in later
chapters on DOE) lead to circular joint
confidence regions.

31
Some Useful Equations

32
33
Var (β)

𝒚)
Var (ෝ

Var (e)

Var (𝒚𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 )

34
Ozone Example

• Tutorial, Ch 2_ex 2, on LRM

• Supposed to be review material from


2nd yr Stats

35
Regression model
diagnostics

36
Model Adequacy Checking

The fitting of linear regression model,


estimation of parameters testing of
hypothesis properties of the estimator are
based on following major assumptions:
1. The relationship between the study
variable and explanatory variables
is linear, at least approximately.
2. The error term has zero mean.
3. The error term has constant
variance.
4. The errors are uncorrelated.
5. The errors are normally distributed.

37
What are diagnostic methods
to check the violation of
regression assumption?
• The validity of these assumption is needed for
the results to be meaningful. If these
assumptions are violated, the result can be
incorrect and may have serious consequences.

• So such underlying assumptions have to be


verified before attempting to regression
modeling. Such information is not available
from the summary statistic such as t-statistic,
F-statistic or coefficient of determination.

• Several diagnostic methods to check the


violation of regression assumption are based
on the study of model residuals with the help of
various types of graphics.

38
Diagnostic Methods

• Checking of linear relationship between


study and explanatory variables:

– Case of one regressor (explanatory


variable)
– Case of more than one regressor

• Residual analysis
1. Standardized residuals
2. Studentized residuals
3. PRESS residuals
4. R-student

39
Case of one regressor

• If there is only one explanatory variable in the


model, then it is easy to check the existence of
linear relationship between y and X by scatter
diagram of the available data.

40
Case of more than one
regressor

• To check the assumption of linearity between study


variable and explanatory variables, the scatter plot
matrix of the data can be used. A scatterplot matrix
is a two dimensional array of two dimension plots
where each form contains a scatter diagram except
for the diagonal.
• Thus, each plot sheds some light on the
relationship between a pair of variables. It gives
more information than the correlation coefficient
between each pair of variables because it gives a
sense of linearity or nonlinearity of the relationship
and some awareness of how the individual data
points are arranged over the region. It is a scatter
diagram of :

41
Another option…
(for Case of more than one regressor)
Another option to present the scatterplot is
– Present the scatterplots in the upper
triangular part of plot matrix.
Mention the corresponding correlation
coefficients in the lower triangular part of the
matrix.
Suppose there are only two explanatory variables
and the model is ,
then the scatterplot matrix looks like as follows:

42
But…

• It is to be kept is mind that we get only the


information on pairs of variables through
the scatterplot of

whereas the assumption of linearity is


between y and jointly with

• If some of the explanatory variables are


themselves interrelated, then these scatter
diagrams can be misleading. Some other
methods of sorting out the relationships
between several explanatory variables and
a study variable are used.

43
Residual Analysis

• The residual is defined as the difference


between the observed and fitted value of
study variable.
• Residual can be viewed as the deviation
between the data and the fit. So it is also
a measure of the variability in the
response variable that is not explained by
the regression model.
• Residuals can be thought as the
observed values of the model errors. So it
can be expected that if there is any
departure from the assumptions on
random errors, then it should be shown
up by the residual. Analysis of residual
helps is finding the model inadequacies.
44
Some remarks

• Residuals have zero mean as:

• Approximate average variance of residuals is


estimated by: Mean squared of residual is
the same as MSE in ANOVA

n–p n-p n–p

• Sometimes it is easier to work with scaled


residuals. We discuss four methods for scaling
the residuals.

45
Standardized residuals

The residuals are standardized based on the concept of


residual minus its mean and divided by its standard
deviation.

Therefore:

46
Studentized residuals

• The studentized residuals use the exact variance


of ei. (The standardized residuals use the
approximate variance of e as MSres.)

• Let’s first find the variance of ei :


In the model , the OLSE
(ordinary least squares estimator, page 19)
of beta is β = , and the residual
vector is:

47
• Thus , so residuals are the same linear
transformation of y and ɛ.

• The matrix (I – H) is symmetric and idempotent


but generally not diagonal. So residuals have
different variances and they are correlated.

• So

• Therefore, studentized residual:

48
Standardized or studentized
residual?

• In many situations, the variance of


residuals stabilizes (particularly in
large data sets) and there may be
little difference between standardized
and studentized residuals. In such
cases both residuals often convey
equivalent information.

• However, since any point with a


large residual and large hii is
potentially highly influential on the
least-squares fit, so examination of
studentized residual is generally
recommended.
49
PRESS
(prediction error sum of
squares)
• The PRESS residuals are defined as observed
value minus fitted value of the ith response
based of all the observations excepts the ith
one.
• Why PRESS? If y at i is really unusual, then
the regression model based on all the
observations may be overly influenced by this
observations. This could produce a similar
fitted and observed value and consequently e
will be small. So it will be difficult to detect
any outlier.

• For example if R2predictor is 0.89, then it


indicates that the model is expected to explain
about 89% of the variability in predicting new
observations. 50
R-Student

• The studentized residual is often considered


as an outlier diagnostic. This is referred to as
internal scaling of the residuals because
MSres is an internally generated estimate of σ2
obtained from the fitting the model to all n
observation .

• This estimate of σ2 is used instead of MSres to


produce an externally studentized residual,
usually called R-student. R-student offers a
more formal procedure for outlier detection via
hypothesis testing.

51
Testing for lack of fit

Consider the model:


Then the (i,j)th residual is :

52
Modeling Stage

General Regression Model

Maximum Likelihood

Join Confidence Region

Model Diagnostics
Concept Map

Design of
Statistical

Experiments

53
General Intro:
Reminder:
Some Useful Equations

54
55
56
57

You might also like