0% found this document useful (0 votes)

138 views

AB1202 Statistics and Analysis: Model Building

This document discusses model building techniques including polynomial models with one random variable, models with qualitative random variables, multicollinearity, and correlation matrices. It covers constructing polynomial models by including higher order terms like x-squared, coding qualitative variables numerically, interpreting the effects of included variables, and using correlation matrices to detect multicollinearity between explanatory variables.

Uploaded by

xthele

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

138 views

AB1202 Statistics and Analysis: Model Building

Uploaded by

xthele

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

AB1202

Statistics and Analysis

Lecture 10
Model Building

Chin Chee Kai

[email protected]
Nanyang Business School
Nanyang Technological University
NBS 2016S1 AB1202 CCK-STAT-018
2

Model Building
• Polynomial Models with One Random Variable
• Models with Qualitative Random Variables
• Multicollinearity
• Correlation Matrix
• Selecting “Better” Models
• Stepwise Modeling
• Stepwise Forward
• Stepwise Backward
NBS 2016S1 AB1202 CCK-STAT-018
3

Polynomial Models - One Random Variable

• We have seen one-variable polynomial model of
order 1 in Simple Linear Regression: 𝑦 = 𝑏0 + 𝑏1 𝑥
• Order 2: 𝑦 = 𝑏0 + 𝑏1 𝑥 + 𝑏2 𝑥 2 (more interesting)
• Order 3: 𝑦 = 𝑏0 + 𝑏1 𝑥 + 𝑏2 𝑥 2 + 𝑏3 𝑥 3 (very interesting)
• In practice, this is about enough since it is gets
increasingly difficult to explain higher order models.

• We will focus on order 2 models only in this course.

Order 3 or higher order models are constructed in
similar manners.
NBS 2016S1 AB1202 CCK-STAT-018
4

Constructing 1-Var Polynomial Model

• Data captured is still (𝑥𝑖 , 𝑦𝑖 ) for 𝑖 = 1,2, … , 𝑛 samples.
• For order 2, the data for 𝑥 2 term is given by a pseudo-data
formed by squaring individual 𝑥𝑖 to give a set of 𝑥𝑖2 values.
• Then we treat 𝑥𝑖2 as if it were another variable and perform a
multiple regression to explain variations of 𝑦 with 𝑥 and 𝑥 2 .
• Eg 𝑦 8 7 9 12 Note:
Dropping 𝑥 2
𝑥 2 3 5 8
term from
𝑥2 4 9 25 64 Order 2 model
does not make
it the Order 1
Order 1 : 𝑦 = 5.5714 + 0.7619𝑥 regression
𝑅2 = 0.8707, Adj-𝑅2 = 0.8061 model!

Order 2 : 𝑦 = 8.3182 − 0.6212𝑥 + 0.1364𝑥 2

𝑅2 = 0.9459, Adj-𝑅2 = 0.8377
NBS 2016S1 AB1202 CCK-STAT-018
5

Models with Qualitative Random Variables

• Real life data typically has plenty of qualitative
variables with values that cannot be compared.
▫ Eg favorite sports, economy/business/first-class seat,
type of housing, etc.
▫ Some values are sort-of comparable, like hotel rating
(3, 4, 5, 6 stars) – we can say hotels with 3 stars
provide facilities “less than” 6-star hotels. But we
also do not say 3-star hotel is less effective in serving
customers – they cater to different customers.
• So depends on how we use the (especially
numerical) values. If we interpret/treat the values
as qualitative, then we should use the following
technique to build model.
NBS 2016S1 AB1202 CCK-STAT-018
6

Coding Qualitative Random Variables

• Key is to encode qualitative values with 0’s
and 1’s.
▫ Eg Has investment (1) vs no-investment (0).
▫ Eg Male (1) vs female (0).
▫ Eg Lives in HDB (1) vs not in HDB (0).
• Binary qualitative values are common, and
are easily encoded into 0’s and 1’s. The choice
on which value is 0 or 1 is completely
artificial, though we tend to use 1 for the value
we are more interested in.
NBS 2016S1 AB1202 CCK-STAT-018
7

Coding Qualitative Random Variables

• What if there are 3 or more alternative values?
▫ Eg Economy/ Business/ First-class seat?
• Answer is not using 0, 1 and 2. We still ONLY
use 0’s and 1’s, but introduce more variables.
One data variable
0 − not Economy becomes 2 model
𝑋𝐸 = * … variables
1 − Economy
𝑿 𝑿𝑬 𝑿𝑩
0 − not Business Economy 1 0
𝑋𝐵 = *
1 − Business Economy 1 0
Business 0 1
There is no 𝑋𝐹 for First-Class. Economy 1 0
First-Class 0 0
𝑋𝐸 = 0 and 𝑋𝐵 = 0 implies Business 0 1
First-Class! Economy 1 0
NBS 2016S1 AB1202 CCK-STAT-018
8

Coding Qualitative Random Variables

Liking Y Age A
Distance
Flown D
Seat Class X • We get a raw regression
8.5 42 230000 Economy model encompassing all
6.5 37 100000 Economy
7.1 23 85000 Business variables.
9.5
7.6
18
35
90000
150000
Economy
First-Class
• Then we derive a set of
3.4 56 60000 Business models by turning “on” and
6.6 31 180000 Economy
“off” each encoded variable to
assess the effects of each
Distance
Y
(Liking Age A
Flown D
(hundred k
X_E X_B qualitative value.
)

8.5 42
km)
2.3 1 0
• Effect of turning “on” and
6.5 37 1 1 0 “off” the encoded variables is
7.1 23 0.85 0 1
9.5 18 0.9 1 0 to change the y-intercept
7.6 35 1.5 0 0
3.4 56 0.6 0 1 value
6.6 31 1.8 1 0
NBS 2016S1 AB1202 CCK-STAT-018
9

Interpreting Coded Model

Raw Model is 𝑦 = 9.8994 − 0.1063 𝐴 + 0.948 𝐷 − 0.144 𝑋𝐸 − 1.1368 𝑋𝐵

Class X 𝑿𝑬 , 𝑿𝑩 Model

Economy 𝑋𝐸 = 1, 𝑋𝐵 = 0 𝑦 = 9.7554 − 0.1063 𝐴 + 0.948 𝐷

Business 𝑋𝐸 = 0, 𝑋𝐵 = 1 𝑦 = 8.7626 − 0.1063 𝐴 + 0.948 𝐷
First-Class 𝑋𝐸 = 0, 𝑋𝐵 = 0 𝑦 = 9.8994 − 0.1063 𝐴 + 0.948 𝐷

• Being in Economy seat class • Being in First-Class

decreases the liking, on seat gives the highest
average, by 0.144 points. average liking, all
• Being in Business seat class other factors being
decreases the liking, on the same.
average, by 1.1368 points.
NBS 2016S1 AB1202 CCK-STAT-018
10

Multicollinearity
• Multicollinearity is said to occur whenever explanatory
variables are dependent on one another.
• Bad thing to happen in a model. In serious cases, it can
make the resulting model completely useless.
• Eg: Consider 𝑦 = 𝑏0 + 𝑏1 𝑥1 + 𝑏2 𝑥2
• Suppose 𝑋2 is so correlated with 𝑋1 that in fact they are
the same; 𝑋2 = 𝑋1 (but you didn’t realize it)
• Our model would degenerate into
𝑦 = 𝑏0 + 𝑏1′ 𝑥1
you can easily see that the gradient which we think belongs
to 𝑥1 actually gets distorted from its true value (should’ve
been 𝑏1 but we only observe values which are closer to
(𝑏1 + 𝑏2 ). The variance of this gradient also gets inflated.
NBS 2016S1 AB1202 CCK-STAT-018
11

Multicollinearity – Can’t Avoid

• Yet we cannot avoid some level of multicollinearity,
since practical data would always have little bit of
correlation with various other data.
• Thus, key is not to eliminate, but to reduce serious
multicollinearity cases.
• Two ways to detect multicollinearity:
▫ Correlation Matrix (or correlation diagrams),
▫ Variance Inflation Factor
• We will only look at Correlation Matrix.
NBS 2016S1 AB1202 CCK-STAT-018
12

Correlation Matrix
• It is a table of correlations of all variables with all
variables.
• Shows suspicious multicollinearity
variables when a cell has correlation Dependent variable is
magnitude close to 1. expected to have some
correlation with
• Use Excel’s Data Analysis  Correlation to explanatory variables.
get the correlation matrix. Use Conditional So this column is not
Formatting to color-highlight strong values important.

(very red and very green).

Liking Distance
Age A X_E X_B
Y Flown D
Liking Y Age A Distance D X_E X_B
8.5 42 2.3 1 0
6.5 37 1 1 0
Liking Y 1.0000
7.1 23 0.85 0 1 Age A (0.7472) 1.0000
9.5 18 0.9 1 0 Distance D 0.4331 0.0401 1.0000
7.6 35 1.5 0 0
X_E 0.4836 (0.2560) 0.4531 1.0000
3.4 56 0.6 0 1
6.6 31 1.8 1 0 X_B (0.6312) 0.2687 (0.6204) (0.7303) 1.0000
NBS 2016S1 AB1202 CCK-STAT-018
13

Multicollinearity – Can’t Avoid

• When two explanatory variables have strong
correlation, we may want to remove one of
them.
• Removing variable is an exercise of art and
statistics.
▫ Contextual understanding: eg, remove variable that
is a derived variable rather than a source.
▫ Remove variable giving lower 𝑅2 or Adjusted-𝑅2
▫ Remove variable which correlates with several
variables.
NBS 2016S1 AB1202 CCK-STAT-018
14

Selecting “Better” Models

• Need an objective way to see if a model is good or not.
▫ Notice we say “a way”, not “the way”.
• An objective function is a formula that combines one
or more outputs of a model into a decimal number so
that the goodness of the model can be compared
(larger value better or the other way).

• Must agree on what is an objective function

• Examples of objective function:
▫ F-test statistic (larger better),
▫ p-value (smaller better),
▫ 𝑅2 or Adjusted-𝑅2 (larger better),
▫ AIC (Akaike Information Criteria, used by R, smaller
better)
NBS 2016S1 AB1202 CCK-STAT-018
15

Stepwise Modeling
• When we have gathered many variables and lots of
samples, often we might not know where to begin.
• We can (let computer) “search” for the best model
by incrementally trying one variable at a time.
• Use selected objective function to adaptively zero-in
on the best model.

• This is cool! We get best model automatically!

• This is dangerous! The “best” model we get may
not make much practically sense (it may be best
numerically, but the variables selected may be quite
meaningless)
NBS 2016S1 AB1202 CCK-STAT-018
16

Forward Stepwise
1. Start with null model: 𝑦 = 𝑏0 (ie no explanatory
variables). Calculate its objective value (eg AIC).
2. For the remaining variables, try adding one
variable to existing model and calculating
objective value. The variable that results in most
improved objective value will be actually added.
3. Repeat step 2 until no more variable can
improve objective value.
NBS 2016S1 AB1202 CCK-STAT-018
17

Forward Stepwise in R
• Consider again the airline seating class data set.
We wonder which variable(s) we should be using to
best explain fluctuations in liking of the airline.
datatext = "Liking_Y Age_A Distance_D X_E X_B
8.5 42 2.3 1 0
6.5 37 1 1 0
7.1 23 0.85 0 1
9.5 18 0.9 1 0
7.6 35 1.5 0 0
3.4 56 0.6 0 1
6.6 31 1.8 1 0
" R’s step() function
d<-read.delim(textConnection(datatext),
header=TRUE, does this in one line.
sep="",
strip.white=TRUE)
model_null = lm(d$Liking_Y ~ 1)
model_full = lm(d$Liking_Y ~ d$Age_A + d$Distance_D + d$X_E + d$X_B)
model <- step(model_null, model_full$formula, direction="forward")

### model_back <- step(model_full, model_null$formula, direction="backward")

NBS 2016S1 AB1202 CCK-STAT-018
18

Forward Stepwise Results in R

Start: AIC=10.09
d$Liking_Y ~ 1
Starting with null model, R tries to
Df Sum of Sq RSS AIC add one of A, X_B, X_E and D and
+ d$Age_A 1 12.4121 9.8221 6.3711 ranks their performance by AIC
+ d$X_B 1 8.8573 13.3770 8.5334
<none> 22.2343 10.0901 values. Seem like adding A is best to
+ d$X_E 1 5.2001 17.0342 10.2252 reduce AIC from current 10.09 to
+ d$Distance_D 1 4.1709 18.0634 10.6358 6.3711.
Step: AIC=6.37
d$Liking_Y ~ d$Age_A R tries to add one of remaining
Df Sum of Sq RSS AIC
variables X_B, X_E and D and ranks
+ d$Distance_D 1 4.7750 5.0471 3.7103 their performances by AIC values.
+ d$X_B 1 4.4387 5.3835 4.1620 Seem like adding D is best to reduce
<none> 9.8221 6.3711
AIC from current 6.3711 to 3.7103.
+ d$X_E 1 2.0335 7.7887 6.7473

Step: AIC=3.71
d$Liking_Y ~ d$Age_A + d$Distance_D
Best model is to use Age and Distance
Df Sum of Sq RSS AIC only. Model is:
<none> 5.0471 3.7103
+ d$X_B 1 0.79654 4.2506 4.5080
Liking = 9.2235 – 0.1177 Age + 1.4647
+ d$X_E 1 0.18538 4.8617 5.4484 Distance
NBS 2016S1 AB1202 CCK-STAT-018
19

Backward Stepwise
1. Start with full model: 𝑦 = 𝑏0 + 𝑏1 𝑥1 + ⋯ + 𝑏𝑘 𝑥𝑘
(ie all explanatory variables). Calculate its
objective value (eg AIC).
2. For all the variables in the current model, test by
removing one variable from existing model and
calculating objective value. The variable
removed that results in most improved objective
value will be actually removed.
3. Repeat step 2 until removal of any variable can
no longer improve objective value.
• Very simply done in R. Use R’s step() function
with direction=“backward” will do.
NBS 2016S1 AB1202 CCK-STAT-018
20

Backward Stepwise Results in R

Start: AIC=6.48
d$Liking_Y ~ d$Age_A + d$Distance_D + d$X_E
+ d$X_B Starting with full model, R tries to
REMOVE one of A, D, X_E, and
Df Sum of Sq RSS AIC X_B and ranks their performances
- d$X_E 1 0.0164 4.2506 4.5080
- d$X_B 1 0.6276 4.8617 5.4484
by AIC values. Seem like removing
- d$Distance_D 1 1.1392 5.3734 6.1488 X_E is best to reduce AIC from
<none> 4.2341 6.4809 current 6.48 to 4.5080.
- d$Age_A 1 9.0560 13.2901 12.4878

Step: AIC=4.51 R tries to REMOVE one of

d$Liking_Y ~ d$Age_A + d$Distance_D + d$X_B
remaining variables A, D, X_B and
Df Sum of Sq RSS AIC ranks their performances by AIC
- d$X_B 1 0.7965 5.0471 3.7103 values. Seem like removing X_B is
- d$Distance_D 1 1.1329 5.3835 4.1620
<none> 4.2506 4.5080
best to reduce AIC from current
- d$Age_A 1 9.0640 13.3146 10.5007 4.5080 to 3.7103.

Step: AIC=3.71
d$Liking_Y ~ d$Age_A + d$Distance_D Best model is to use Age and
Df Sum of Sq RSS AIC Distance only. Model is:
<none> 5.0471 3.7103 Liking = 9.2235 – 0.1177 Age +
- d$Distance_D 1 4.775 9.8221 6.3711 1.4647 Distance
- d$Age_A 1 13.016 18.0634 10.6358

Stat2 Textbook
No ratings yet
Stat2 Textbook
1,656 pages
Horizon 2025
No ratings yet
Horizon 2025
56 pages
Game of Thrones - Ep 510 Mothers Mercy-D Benioff and D.B PDF
No ratings yet
Game of Thrones - Ep 510 Mothers Mercy-D Benioff and D.B PDF
49 pages
Course Notes For Unit 6 of The Udacity Course ST101 Introduction To Statistics PDF
No ratings yet
Course Notes For Unit 6 of The Udacity Course ST101 Introduction To Statistics PDF
23 pages
ECON 342 AE Model Specification and Data Problems 2021
No ratings yet
ECON 342 AE Model Specification and Data Problems 2021
43 pages
Lecture Week 12 - Intro To Regression
No ratings yet
Lecture Week 12 - Intro To Regression
5 pages
Linear Regression (1)
No ratings yet
Linear Regression (1)
65 pages
Regression Models Notes
No ratings yet
Regression Models Notes
13 pages
Unit 3 Assignment DIRECTIONS R spr18
No ratings yet
Unit 3 Assignment DIRECTIONS R spr18
28 pages
Correlation and Regression 2020
No ratings yet
Correlation and Regression 2020
63 pages
Multiple Regression
No ratings yet
Multiple Regression
49 pages
Cheat Sheet Statistics
No ratings yet
Cheat Sheet Statistics
3 pages
Week 03 Regression
No ratings yet
Week 03 Regression
14 pages
Chapter 03
No ratings yet
Chapter 03
19 pages
9.bivariate Analysis
No ratings yet
9.bivariate Analysis
64 pages
Descriptive Stats (E.g., Mean, Median, Mode, Standard Deviation) Z-Test &/or T-Test For A Single Population Parameter (E.g., Mean)
No ratings yet
Descriptive Stats (E.g., Mean, Median, Mode, Standard Deviation) Z-Test &/or T-Test For A Single Population Parameter (E.g., Mean)
43 pages
Formulas
No ratings yet
Formulas
12 pages
Advanced Quantitative Methods
No ratings yet
Advanced Quantitative Methods
12 pages
Introduction To Data Analysis: Professor David Richardson IIT Stuart School of Business
No ratings yet
Introduction To Data Analysis: Professor David Richardson IIT Stuart School of Business
31 pages
ML PPT 2
No ratings yet
ML PPT 2
206 pages
Lectures 14 15
No ratings yet
Lectures 14 15
66 pages
Lecture 3
No ratings yet
Lecture 3
35 pages
STA302 Week12 Full
No ratings yet
STA302 Week12 Full
30 pages
Chapter 03
No ratings yet
Chapter 03
19 pages
Regn_lect_5
No ratings yet
Regn_lect_5
9 pages
Regression With Categorical Variables
No ratings yet
Regression With Categorical Variables
28 pages
8_2_correlations+models_ninell
No ratings yet
8_2_correlations+models_ninell
44 pages
Correlation Regression
No ratings yet
Correlation Regression
58 pages
Dummy Variable
No ratings yet
Dummy Variable
10 pages
Lecture+8+ +Linear+Regression
No ratings yet
Lecture+8+ +Linear+Regression
45 pages
Corr and Regress
No ratings yet
Corr and Regress
42 pages
Chapter 1
No ratings yet
Chapter 1
24 pages
Chapter 3 MLR
No ratings yet
Chapter 3 MLR
40 pages
Full Download STAT2 Modelling with Regression and ANOVA 2nd Edition Ann R. Cannon PDF DOCX
100% (6)
Full Download STAT2 Modelling with Regression and ANOVA 2nd Edition Ann R. Cannon PDF DOCX
76 pages
r Cheat Sheet
No ratings yet
r Cheat Sheet
9 pages
probabiliy 2
No ratings yet
probabiliy 2
12 pages
Econometrics__2__Notes (2)
No ratings yet
Econometrics__2__Notes (2)
14 pages
13simple linear regression
No ratings yet
13simple linear regression
127 pages
IB AAHL 4.2 Correlation & Regression
No ratings yet
IB AAHL 4.2 Correlation & Regression
12 pages
Data Analytics Lesson 11 Notes
No ratings yet
Data Analytics Lesson 11 Notes
8 pages
Intro Stats 4th Edition Veaux Test Bank instant download
100% (5)
Intro Stats 4th Edition Veaux Test Bank instant download
49 pages
PARAMETRIC-TEST
No ratings yet
PARAMETRIC-TEST
49 pages
Dummy Variables
No ratings yet
Dummy Variables
60 pages
The Nature of Econometrics and The Modelling Process: Session 1
No ratings yet
The Nature of Econometrics and The Modelling Process: Session 1
51 pages
Clase 2
No ratings yet
Clase 2
48 pages
6 Continuous Data Analysis
No ratings yet
6 Continuous Data Analysis
49 pages
4.2 Correlation & Regression
No ratings yet
4.2 Correlation & Regression
12 pages
Stats10_Chapter+4 2
No ratings yet
Stats10_Chapter+4 2
54 pages
(eBook PDF) Introduction to Econometrics, 4th Global Edition download pdf
100% (6)
(eBook PDF) Introduction to Econometrics, 4th Global Edition download pdf
56 pages
Regression
No ratings yet
Regression
36 pages
Simple Linear Regression and Correlation
No ratings yet
Simple Linear Regression and Correlation
77 pages
Correlation & Regression
No ratings yet
Correlation & Regression
65 pages
Lec 3EFCFull
No ratings yet
Lec 3EFCFull
50 pages
CS361 FA23 Lec3 Post
No ratings yet
CS361 FA23 Lec3 Post
62 pages
Screenshot 2024-12-15 at 8.15.38 PM
No ratings yet
Screenshot 2024-12-15 at 8.15.38 PM
138 pages
(Ebook) STAT2: Modelling with Regression and ANOVA by Ann R. Cannon, George W. Cobb, Bradley A. Hartlaub, Julie M. Legler, Robin H. Lock, Thomas L. Moore, Allan J. Rossman, Jeffrey A. Witmer ISBN 9781319209506, 1319209505 pdf download
100% (1)
(Ebook) STAT2: Modelling with Regression and ANOVA by Ann R. Cannon, George W. Cobb, Bradley A. Hartlaub, Julie M. Legler, Robin H. Lock, Thomas L. Moore, Allan J. Rossman, Jeffrey A. Witmer ISBN 9781319209506, 1319209505 pdf download
61 pages
Session 6-15 - Unit II & III: Probability and Distribution, Classical Tests
No ratings yet
Session 6-15 - Unit II & III: Probability and Distribution, Classical Tests
34 pages
Regressi On
No ratings yet
Regressi On
16 pages
3rd Module EDBA Contiuation1
No ratings yet
3rd Module EDBA Contiuation1
6 pages
72901
No ratings yet
72901
55 pages
Bivariate Data Analysis
100% (1)
Bivariate Data Analysis
34 pages
GCSE Maths Revision: Cheeky Revision Shortcuts
From Everand
GCSE Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (2)
Travel Insurance Policy Contract
No ratings yet
Travel Insurance Policy Contract
49 pages
Roaming Settings
No ratings yet
Roaming Settings
7 pages
Class11-Revenue MGT - Before Class
No ratings yet
Class11-Revenue MGT - Before Class
37 pages
The Strategic Contingency Model PDF
No ratings yet
The Strategic Contingency Model PDF
1 page
Rhetoric and Persuasion: Effectiveness Is Enhanced
No ratings yet
Rhetoric and Persuasion: Effectiveness Is Enhanced
2 pages
Consent Form (Interview) Evidence Based Reflective Learning Report
No ratings yet
Consent Form (Interview) Evidence Based Reflective Learning Report
2 pages
BE2601 Course Assessments S1 AY 2019-20 (SOFV)
No ratings yet
BE2601 Course Assessments S1 AY 2019-20 (SOFV)
24 pages
AB1202 Statistics and Analysis
No ratings yet
AB1202 Statistics and Analysis
18 pages
AB1202 Lect 06
No ratings yet
AB1202 Lect 06
14 pages
AB1202 Statistics and Analysis: Time Series Predictive Models
No ratings yet
AB1202 Statistics and Analysis: Time Series Predictive Models
15 pages
AB1202 Lect 05
No ratings yet
AB1202 Lect 05
17 pages
AB1202 Statistics and Analysis
No ratings yet
AB1202 Statistics and Analysis
16 pages
Tourism & Hospitality Case Challenge 2017: Preliminary Round Case Question
No ratings yet
Tourism & Hospitality Case Challenge 2017: Preliminary Round Case Question
5 pages
Tourism & Hospitality Case Challenge 2017: Semi-Finals Case Question
No ratings yet
Tourism & Hospitality Case Challenge 2017: Semi-Finals Case Question
6 pages
AB1202 Statistics and Analysis: (Part 1 of 2) Concepts of Probability
No ratings yet
AB1202 Statistics and Analysis: (Part 1 of 2) Concepts of Probability
17 pages
AB1202 Statistics and Analysis: Sampling Distributions and Confidence Intervals
No ratings yet
AB1202 Statistics and Analysis: Sampling Distributions and Confidence Intervals
15 pages
Team Alpha +: Case Study: Uber
No ratings yet
Team Alpha +: Case Study: Uber
12 pages
Risk Intelligence Challenge 2017: Information Session
No ratings yet
Risk Intelligence Challenge 2017: Information Session
12 pages
Syzygy Super Computing
No ratings yet
Syzygy Super Computing
3 pages
Portfolio Selection Problems in Practice: A Comparison Between Linear and Quadratic Optimization Models
No ratings yet
Portfolio Selection Problems in Practice: A Comparison Between Linear and Quadratic Optimization Models
28 pages
Projects Influenced By: International Traditions
No ratings yet
Projects Influenced By: International Traditions
15 pages
USFDA 483s PPT
No ratings yet
USFDA 483s PPT
28 pages
Pile Supported Foundation (Pile Cap) Analysis Design ACI318 14 PDF
100% (1)
Pile Supported Foundation (Pile Cap) Analysis Design ACI318 14 PDF
34 pages
History of Volley Ball
No ratings yet
History of Volley Ball
8 pages
Disini v. Secretary of Justice
100% (1)
Disini v. Secretary of Justice
1 page
First Language English IGCSE 2nd Edition Cox pdf download
100% (1)
First Language English IGCSE 2nd Edition Cox pdf download
81 pages
26-John Deere EE
No ratings yet
26-John Deere EE
242 pages
neha
No ratings yet
neha
24 pages
Princess Appraisal Form
No ratings yet
Princess Appraisal Form
8 pages
MTC Bxg2408097-2a
No ratings yet
MTC Bxg2408097-2a
1 page
GE 111 Module 2
No ratings yet
GE 111 Module 2
66 pages
(en) Programming Microsoft Visual C# 2005 - The Language (MS Press , 2006)
No ratings yet
(en) Programming Microsoft Visual C# 2005 - The Language (MS Press , 2006)
1,264 pages
Maintenance Coordinator Job Description
No ratings yet
Maintenance Coordinator Job Description
3 pages
Shiddat Auditions Rules
No ratings yet
Shiddat Auditions Rules
3 pages
Companies Act 2013
No ratings yet
Companies Act 2013
69 pages
Assignment at Ericsson Project As A RIGGER Responsibilities
No ratings yet
Assignment at Ericsson Project As A RIGGER Responsibilities
2 pages
Togaf by Open Group
100% (2)
Togaf by Open Group
26 pages
Slac TRF 9 11
No ratings yet
Slac TRF 9 11
4 pages
Chapter 5 Part 2
No ratings yet
Chapter 5 Part 2
13 pages
Abishek Kumar New
No ratings yet
Abishek Kumar New
3 pages
Mombasa County PBB Estimates - 2018 - 2019
No ratings yet
Mombasa County PBB Estimates - 2018 - 2019
144 pages
True Space Marines Codex
60% (5)
True Space Marines Codex
10 pages
OLYMPIC-LOP 10-FILE CHINH THUC
No ratings yet
OLYMPIC-LOP 10-FILE CHINH THUC
11 pages
Design of Flexible Pavement
No ratings yet
Design of Flexible Pavement
53 pages
Oite - 1998
No ratings yet
Oite - 1998
164 pages
UNIT-II: Aromatic Amines: Basicity of Amines, Effect of Substituents On Basicity, and Synthetic Uses of Aryl Diazonium Salts
No ratings yet
UNIT-II: Aromatic Amines: Basicity of Amines, Effect of Substituents On Basicity, and Synthetic Uses of Aryl Diazonium Salts
17 pages
Jon Kolko-Exposing The Magic of Design - A Practitioner's Guide To The Methods and Theory of Synthesis (Oxford Series in Human-Technology Interaction) PDF
100% (2)
Jon Kolko-Exposing The Magic of Design - A Practitioner's Guide To The Methods and Theory of Synthesis (Oxford Series in Human-Technology Interaction) PDF
206 pages
Essay On Load Shedding in Pakistan (Rolling Blackout)
No ratings yet
Essay On Load Shedding in Pakistan (Rolling Blackout)
4 pages

AB1202 Statistics and Analysis: Model Building

Uploaded by

AB1202 Statistics and Analysis: Model Building

Uploaded by

AB1202

Statistics and Analysis

Chin Chee Kai

Polynomial Models - One Random Variable

• We will focus on order 2 models only in this course.

Constructing 1-Var Polynomial Model

Order 2 : 𝑦 = 8.3182 − 0.6212𝑥 + 0.1364𝑥 2

Models with Qualitative Random Variables

Coding Qualitative Random Variables

Coding Qualitative Random Variables

Coding Qualitative Random Variables

Interpreting Coded Model

Economy 𝑋𝐸 = 1, 𝑋𝐵 = 0 𝑦 = 9.7554 − 0.1063 𝐴 + 0.948 𝐷

• Being in Economy seat class • Being in First-Class

Multicollinearity – Can’t Avoid

(very red and very green).

Multicollinearity – Can’t Avoid

Selecting “Better” Models

• Must agree on what is an objective function

• This is cool! We get best model automatically!

### model_back <- step(model_full, model_null$formula, direction="backward")

Forward Stepwise Results in R

Backward Stepwise Results in R

Step: AIC=4.51 R tries to REMOVE one of

You might also like