MBAS901 2 Lecture
MBAS901 2 Lecture
E.g.: “Did a particular drug help decrease blood pressure level in patients”.
Value estimation Value estimation attempts to estimate or predict, for each individual, the numerical value of some variable for that individual
E.g.: “Among all the customers, which are likely to respond to a given offer?”
Clustering Clustering attempts to group individuals in a population together by their similarity, but not driven by any specific purpose
E.g.: “Find people who are similar to you in terms of the products they have liked or have purchased”.
Network Link Prediction Network Link Prediction attempts to predict connections between data items, usually by suggesting that a link should exist, and possibly also
estimating the strength of the link.
E.g.: “Since you and Karen share 10 friends, maybe you’d like to be Karen’s friend?”
Change Point Detection Change Detection in a time series data attempts to detect changes quickly that are significant for either action to be taken or as a result of an
action taken
E.g.: “Has there been an increase in global temperature over a period of time to alarm us about global warming?”
Forecasting Time Series Forecasting Time Series involves taking models fit on historical data and using them to predict future observations
E.g. “What will be the temperatures for the next few days in a city?”
Lecture Outline
TABLE 4.1
LOCAL
AREA SALES
PAYROLL
($100,000s)
($100,000,000s)
1 6 3
2 8 4
3 9 6
4 5 4
5 4.5 2
6 9.5 5
Copyright ©2015 Pearson
4–
Education, Inc.
13
Triple A
Construction
Figure 4.1 – Scatter Diagram
12 –
10 –
Sales ($100,000)
8–
6–
4–
2–
0 |– | | | | | | | |
0 1 2 3 4 5 6 7 8
Payroll ($100 million)
12 –
10 –
Sales ($100,000)
8–
6–
4–
2–
0 |– | | | | | | | |
0 1 2 3 4 5 6 7 8
Payroll ($100 million)
Y 0 1 X
where
Y = dependent variable (response)
X = independent variable (predictor or explanatory)
0 = intercept (value of Y when X = 0)
1 = slope of the regression line
= random error
12 –
10 –
Sales ($100,000)
8–
6– 1
4–
2–
Y 0 1 X
0 | | | | | | | |
0 1 2 3 4 5 6 7 8
|–
Payroll ($100 million)
BUT true values for the slope and intercept are not known
Estimated using sample data
This is a machine
learning model
Yˆ b0 b1X
where
^
Y = predicted value of Y
b0 = estimate of β0, based on sample results
b1 = estimate of β1, based on sample results
2
Regression analysis minimizes the sum of squared errors 𝑒2 = Y −
Y
Least-squares regression
X nX
average (mean) of X values
Y
Y n average (mean) of Y values
b1
( X X )(Y
Y )
( X X )2
b0 Y b1 X
Copyright ©2015 Pearson
4 – 20
Education, Inc.
Example: Triple A
Construction
TABLE 4.2 – Regression calculations
Y X (X – X)2 (X – X)(Y – Y)
6 3 (3 – 4)2 = 1 (3 – 4)(6 – 7) = 1
8 4 (4 – 4)2 = 0 (4 – 4)(8 – 7) = 0
9 6 (6 – 4)2 = 4 (6 – 4)(9 – 7) = 4
5 4 (4 – 4)2 = 0 (4 – 4)(5 – 7) = 0
4.5 2 (2 – 4)2 = 4 (2 – 4)(4.5 – 7) = 5
9.5 5 (5 – 4)2 = 1 (5 – 4)(9.5 – 7) = 2.5
ΣY = 42 ΣX = 24 Σ(X – X)2 = 10 Σ(X – X)(Y – Y) = 12.5
Y = 42/6 = 7 X = 24/6 = 4
Y 6
Y 42
7
6
b
6 ( X – X )(Y – Y 12.5
1 ) 10 1.25
( X – X ) 2
b0 Y b1 X 7 – (1.25)(4) 2
Y 6 7
Y 42
6
sales = 2 + 1.25(payroll) 6
b
( X – X )(Y – Y ) 12.5 1.
If the payroll 1next year is $600 million 25
( X – X )
2
10
Y 6 7
Y 42
6
sales = 2 + 1.25(payroll) 6
b
( X – X )(Y – Y ) 12.5 1.
If the payroll 1next year is $600 million 25
( X – X )
2
10
Yˆ 2 + 1.25(6) 9.5
b0 Y − b1 X 7 – (1.25)(4) 2
• With average error, positive and negative errors cancel each other
out => average error is always 0 => not a useful indicator!
Y X Y e
6 3 2 + 1.25(3) = 5.75 0.25
8 4 2 + 1.25(4) = 7.00 1
5 4 2 + 1.25(4) = 7.00 -2
∑e = 0
• With average error, positive and negative errors cancel each other
out => average error is always 0 => not a useful indicator!
Yˆ)2
(Y SST = 22.5
SSE = 6.875
• Sum of squares SSR = 15.625
Hard to interpret
=> need something else!
regression: SSR (Yˆ −Y )2
SSR SSE
r 2 1–
SST SST
15.625
For Triple A Construction: r 2 0.6944
22.5
SSR SSE
r 2 1–
SST SST
15.625
For Triple A Construction: r 2 0.6944
22.5
About 69% of the variability of
the revenue (Y) is explained by
The closer to 1, the better the equation based on payroll (X)
Copyright ©2015 Pearson
4 – 34
Education, Inc.
Correlation Coefficient
|𝑟| = 𝑟2
• High r2
Both the sum and the mean of the residuals are equal to zero. That is, Σ e = 0
and e = 0
X
Copyright ©2015 Pearson
4 – 42
Education, Inc.
Residual Plots: Nonconstant error variance
Error
l
Lecture Outline
What if:
- There are MORE THAN ONE independent variables?
- Relationship between variables is NOT linear?
- Data is NOT normally distributed?
Multiple Linear Regression
• Extensions to the simple linear model
• Models with more than one independent variable
where
Y= dependent variable (response variable)
Xi = ith independent variable (predictor or explanatory
variable)
0 = intercept (value of Y when all Xi = 0)
i = coefficient of the ith independent variable
k= number of independent variables
= random error
Copyright ©2015 Pearson Education, Inc. 4 – 48
More than 2 variables: finding a
(hyper)plane
X2
X1
Example: 2 input variables
Develop a model to determine the suggested listing price for houses based on the size and age
of the house
Yˆ b0 b1 X1 b2 X 2
where
Yˆ = predicted value of dependent variable (selling price)
b0 = Y intercept
Xi = value of the two independent variables (square footage and age)
respectively 4–
50
bi = slopes for X1 and X2 respectively
Selects a sample of houses that have sold recently and records the data
This is the
model
Yˆ b0 b1 X1 b2 X 2 b3X 3
where
Yˆ = predicted value of dependent variable (selling price)
b0 = Y intercept
Xi = value of the three independent variables (square footage, age, condition)
respectively 4–
52
bi = slopes for X1, X2 and X3 respectively
Selects a sample of houses that have sold recently and records the data
This is the
model
Selling Price ($) = 150 + 16.6 SquareFootage – 0.8 Age + 15.6Excellent + 35Mint + 5Good
Evaluating Multiple Regression Models
• For this reason, the adjusted r2 value is often used to determine the
usefulness of an additional variable
Selling Price ($) = 150 + 0.7 Selling Price (AUD) + 16.6 SF – 0.8 Age + 15.6 Excellent + 35 Mint
• R2 = 0.89
• Significance F < 0.00001
• All parameters significant
Selling Price ($) = 150 + 0.7 Selling Price (AUD) + 16.6 SF – 0.8 Age + 15.6 Excellent + 35 Mint
• R2 = 0.89
• Significance F < 0.00001
• All parameters significant
Selling Price ($) = 150 + 16.6 SF + 0.8 Age - 15.6 Excellent - 35 Mint
• R2 = 0.89
• Significance F < 0.00001
• All parameters significant
Backward selection
5. Start with all candidate variables
6. Test the deletion of each variable using a chosen model fit criterion
7. Delete the variable (if any) whose loss gives the most statistically
insignificant deterioration of the model fit
8. Go to 2 until no further variables can be deleted without a
statistically significant loss of fit
Variables selections methods
Fast Backward: Available for logistic regression models, this technique uses a
numeric shortcut to compute the next selection iteration quicker than backward
selection.
Lasso: This method adds and removes candidate effects based on a version of
ordinary least squares, where the sum of the absolute regression coefficients is
constrained.
TABLE 4.6
WEIGHT WEIGHT
MPG (1,000 LBS.) MPG (1,000 LBS.)
12 4.58 20 3.18
13 4.66 23 2.68
15 4.02 24 2.65
18 2.53 33 1.70
19 3.09 36 1.95
19 3.11 42 1.92
45 –
25 –
MPG
20 –
15 –
10 –
5–
0 |– | | | |
1.00 | 4.00 5.00
2.00
3.00
Weight (1,000 lb.)
45 –
40 –
35 –
30 –
25 –
MPG
20 –
15 –
10 –
5–
0 |– | | | |
1.00 | 4.00 5.00
2.00
3.00
Weight (1,000 lb.)
*
* * *
* *
** * * ** ** * * *
** *
Nonlinear relationship Linear relationship
New model:
Yˆ b0 b1 X1 b2 X
2
log(Y)
• Generalized linear models extend the theory and methods of linear models
to data that is not normally distributed. A link function is used to take into
account the distribution of the response variable.
• There is only one response variable (Response).
• Model effects or explanatory variables can be any of the following effects:
• continuous (Continuous effects)
• categorical (Classification effects)
• interaction terms (Interaction effects)
Generalized Linear Model (GLM)
We can write a general equation for linear model as follows —
y = W0 + W1*X1 + W2*X2 + W3*X3 + ... + Wn*Xn + e
We will generalize this model that results in GLMs for modelling data originating
from the exponential family of probability distributions such as Normal, Binomial,
Poisson among others.
There are 3 components of GLM.
• Random Component — Which defines the response variable y and its probability
distribution. One important assumption is that the responses from y1 to yn are
independent to each other.
• 2. Systematic Component — Which defines which explanatory variables we want
to include in our model. It also allows the interactions among explanatory
variables such as — X1*X2, X1² etc. That is the part that we model. It is also
called linear predictor of covariates i.e. X1, X2, … , Xn and coefficients i.e W1, W2,
… , Wn.
• 3. Link Component — Which connects Random and Systematic component. It is
the function of expected value of response variable i.e. E(Y) which enables
linearity in the parameters and allows E(Y) to be non-linearly related to the
explanatory variables. It is the link function that generalizes the linear models.
Generalized Linear Model (GLM)
Generalized Linear Model (GLM)
Generalized Linear Model Results in SAS Viya:
Summary Bar
Residual
Fit Plot
Summary
Assessment
Generalized Additive Model (GAM)
A GAM is a linear model with a key difference: A GAM is allowed to learn non-
linear features.
• GAMs relax the restriction that the relationship must be a simple weighted sum
• Instead assume that the outcome can be modelled by a sum of arbitrary
functions of each feature.
• To do this, we simply replace beta coefficients from Linear Regression with a
flexible function called a spline.
• Splines are complex functions that allow us to model non-linear relationships for
each feature.
• The sum of many splines forms a GAM.
• The basic linear regression equation is defined by the sum of a linear
combination of variables.
Generalized Additive Model (GAM)
Generalized Additive Model (GAM)
Generalized Additive Model
25
Copyright © SAS Institute Inc. All rights reserved.
Using Splines in a Generalized Additive Model
Not complex
enough
...
Video Tutorials on SAS Viya
Regression Models in SAS Viya