BIOSTATISTICS

Uploaded by

Minu Maria Rose

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

BIOSTATISTICS

Uploaded by

Minu Maria Rose

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 15

CHOOSING THE

BEST MODELS
IN REGRESSION
Dr Minu Maria Rose
2nd Semester
JSS School of Public
Health
INTRODUCTION
• Model selection is an important part of any statistical analysis, and
indeed is central to the pursuit of science in general.

• For a good regression model, you want to include the variables that you
are specifically testing along with other variables that affect the response
in order to avoid biased results.

• Many tools for selecting the best model have been suggested in the
literature.
REGRESSION ANALYSIS

LINEAR REGRESSION

LOGISTIC REGRESSION

POLYNOMIAL REGRESSION

COX/ HAZARD
PROPORTIONAL
REGRESSION
R SQUARED
• Most popular, used in linear regression.
• Square of the correlation coefficient.
• It is the proportion of the variation in Y that is accounted by the variation in X.
• R2 varies between zero (no linear relationship) and one (perfect linear
relationship).
• R2, officially known as the coefficient of determination, is defined as the sum of
squares due to the regression divided by the adjusted total sum of squares of Y.
• A higher R-squared indicates the model is a good fit
• Lower R-squared indicates the model is not a good fit.

5
ADVANTAGES OF R2

• It represents the strength of the fit (on average, your predicted values do not deviate
much from your actual data).
DISADVANTAGES OF R2
• It does not tell you if the model is good.
• Whether the data you’ve chosen is biased.
• Or even if you’ve chosen the correct modelling method.
The R² value ranges from 0 to 1, with higher values denoting a strong fit, and lower
values denoting a weak
R2 < 0.5 – Weak fit

0.5 < R2 < 0.8 – Moderate fit

R2 > 0.8 – Strong fit

ADJUSTED R SQUARED
• For a multiple regression model, R-squared increases or remains the same as
we add new predictors to the model.
• Adjusted R-squared eliminates this drawback.
• It only increases if the newly added predictor improves the model’s predicting
power.
• Adding independent and irrelevant predictors to a regression model results in a
decrease in the adjusted R-squared.
CONTINUE
• Generally, you choose the models that have higher adjusted and predicted R-squared
values.
• The adjusted R squared increases only if the new term improves the model more than
would be expected by chance and it can also decrease with poor quality predictors.
• The predicted R-squared is a form of cross-validation and it can also decrease.
• Cross-validation determines how well your model generalizes to other data sets by
partitioning your data.
PSEUDO R SQUARED
• Pseudo R-squared value between of 0.2 to 0.4 indicates excellent fit.
• These are “pseudo” R-squared because they look like R-squared in the sense that they are
on a similar scale, ranging from 0 to 1 (though some pseudo R-squareds never achieve 0
or 1) with higher values indicating better model fit,
• But they cannot be interpreted as one would interpret an R-squared.
• Different types are: Efron’s, Macfadden’s, Cox and snell etc…
• They are valid and useful in evaluating multiple models predicting the same outcome on
the same dataset.
Continue..
• Used when the outcome variable is nominal or ordinal such that the
coefficient of determination R2 cannot be applied as a measure for
goodness of fit.
• R2 = 1 – [ln LL(Mˆfull)]/[ln LL(Mˆintercept)]
• A pseudo R-squared only has meaning when compared to another
pseudo R-squared of the same type, on the same data, predicting the
same outcome. In this situation, the higher pseudo R-squared indicates
which model better predicts the outcome.
P VALUE

• Low p-values indicate terms that are statistically significant

• “Reducing the model” refers to the practice of including all candidate

predictors in the model, and then systematically removing the term with
the highest p-value one-by-one until you are left with only significant
predictors.

• If the P-Value is less than the significance level (usually 0.05) then your
model fits the data well.
AKAIKE INFORMATION CRITERION
• The Akaike information criterion (AIC) is a mathematical method for evaluating how
well a model fits the data it was generated from.
• In statistics, AIC is used to compare different possible models and determine which one is
the best fit for the data.
• AIC estimates the quality of each model, relative to each of the other models.
• AIC deals with both the risk of overfitting and the risk of underfitting.
• When a statistical model is used to represent the process that generated the data, the
representation will almost never be exact; so some information will be lost by using the
model to represent the process. AIC estimates the relative amount of information lost by a
given model: the less information a model loses, the higher the quality of that model.

1
2
ROOT MEAN SQUARED ERROR (RMSE), which measures the average error performed by the model
in predicting the outcome for an observation. Mathematically, the RMSE is the square root of the mean
squared error (MSE), which is the average squared difference between the observed actual outcome
values and the values predicted by the model.
So, MSE = mean((observeds - predicteds)^2) and RMSE = sqrt(MSE).
The lower the RMSE, the better the model.
RESIDUAL STANDARD ERROR (RSE), also known as the model sigma, is a variant of the RMSE
adjusted for the number of predictors in the model. The lower the RSE, the better the model. In practice,
the difference between RMSE and RSE is very small, particularly for large multivariate data.
MEAN ABSOLUTE ERROR (MAE), like the RMSE, the MAE measures the prediction error.
Mathematically, it is the average absolute difference between observed and predicted outcomes, MAE =
mean(abs(observeds - predicteds)). MAE is less sensitive to outliers compared to RMSE.
http://www.sthda.com/
english/articles/38-
regression-model-
validation/158-regression-
model-accuracy-metrics-r-
square-aic-bic-cp-and-
more/

Refer this link for BIC

14
Thank you

MATH6183 Introduction+Regression
No ratings yet
MATH6183 Introduction+Regression
70 pages
Regression Analysis
No ratings yet
Regression Analysis
7 pages
Coefficient of Determination
No ratings yet
Coefficient of Determination
4 pages
R
0% (1)
R
5 pages
Rio Thesis _054559
No ratings yet
Rio Thesis _054559
53 pages
Coefficient of Determination and Interpretation of Determination Coefficient
No ratings yet
Coefficient of Determination and Interpretation of Determination Coefficient
7 pages
Coefficient of Determination
No ratings yet
Coefficient of Determination
4 pages
Module 1
No ratings yet
Module 1
19 pages
Mme 8201-4-Linear Regression Models
No ratings yet
Mme 8201-4-Linear Regression Models
24 pages
Least Square Method Definition
No ratings yet
Least Square Method Definition
7 pages
Module07 - Model Selection and Regularization
No ratings yet
Module07 - Model Selection and Regularization
46 pages
What Is R
No ratings yet
What Is R
4 pages
Alasan R2
No ratings yet
Alasan R2
24 pages
Intermediate Analytics-Chai Square and ANOA-Week 2-1
No ratings yet
Intermediate Analytics-Chai Square and ANOA-Week 2-1
45 pages
Coefficient of Determination
No ratings yet
Coefficient of Determination
8 pages
Session 2
No ratings yet
Session 2
21 pages
5-LR Doc - R Sqared-Bias-Variance-Ridg-Lasso
No ratings yet
5-LR Doc - R Sqared-Bias-Variance-Ridg-Lasso
26 pages
The Coefficient of Determination R-Squared Is More Informative Than SMAPE, MAE, MAPE, MSE and RMSE in Regression Analysis Evaluation
No ratings yet
The Coefficient of Determination R-Squared Is More Informative Than SMAPE, MAE, MAPE, MSE and RMSE in Regression Analysis Evaluation
25 pages
Concepts - Regression Overview
No ratings yet
Concepts - Regression Overview
14 pages
Concepts - Model Evaluation (Data Mining Fundamentals)
No ratings yet
Concepts - Model Evaluation (Data Mining Fundamentals)
40 pages
P4 New - CHeat Sheet End-Term
No ratings yet
P4 New - CHeat Sheet End-Term
7 pages
Assesing Performance of Regression-Error Measures
No ratings yet
Assesing Performance of Regression-Error Measures
5 pages
Regression Notes
No ratings yet
Regression Notes
7 pages
Evaluation Metrics for Your Regression Model - Analytics Vidhya
No ratings yet
Evaluation Metrics for Your Regression Model - Analytics Vidhya
6 pages
LM02 Evaluating Regression Model Fit and Interpreting Model Results IFT Notes
No ratings yet
LM02 Evaluating Regression Model Fit and Interpreting Model Results IFT Notes
9 pages
Diagnostic Tests2
No ratings yet
Diagnostic Tests2
25 pages
3 Da
No ratings yet
3 Da
16 pages
Intermediate Analytics-Regression-Week 1
No ratings yet
Intermediate Analytics-Regression-Week 1
52 pages
SRM Notes
No ratings yet
SRM Notes
38 pages
UNIT 1 DS_End_Term
No ratings yet
UNIT 1 DS_End_Term
6 pages
Regression basics
No ratings yet
Regression basics
27 pages
Linear Regression-Part 2
No ratings yet
Linear Regression-Part 2
26 pages
Mid-1 ML
No ratings yet
Mid-1 ML
12 pages
Evaluation Metrics For Regression: Dr. Jasmeet Singh Assistant Professor, Csed Tiet, Patiala
No ratings yet
Evaluation Metrics For Regression: Dr. Jasmeet Singh Assistant Professor, Csed Tiet, Patiala
13 pages
Regression Notes
No ratings yet
Regression Notes
6 pages
Chapter 3 Notes
No ratings yet
Chapter 3 Notes
5 pages
REGRESSION ANALYSIS 1 and 2 Notes
No ratings yet
REGRESSION ANALYSIS 1 and 2 Notes
9 pages
Unit 4
No ratings yet
Unit 4
7 pages
Lecture 09_02.09.2024_Regression-01
No ratings yet
Lecture 09_02.09.2024_Regression-01
62 pages
0 Regularization PDF
No ratings yet
0 Regularization PDF
88 pages
FDA UNIT 5
No ratings yet
FDA UNIT 5
20 pages
A Guide On How To Compare Different Models in Linear Progression
No ratings yet
A Guide On How To Compare Different Models in Linear Progression
8 pages
The Coefficient of Determination R-Squared Is More Informative Than SMAPE, MAE, MAPE, MSE, and RMSE in Regression Analysis Evaluation
No ratings yet
The Coefficient of Determination R-Squared Is More Informative Than SMAPE, MAE, MAPE, MSE, and RMSE in Regression Analysis Evaluation
28 pages
Regression
No ratings yet
Regression
24 pages
PROBLEMS ch05
No ratings yet
PROBLEMS ch05
117 pages
SVM Regressor
No ratings yet
SVM Regressor
13 pages
Unit 3
No ratings yet
Unit 3
24 pages
Unit-III (Data Analytics)
100% (1)
Unit-III (Data Analytics)
15 pages
3rd Module EDBA Contiuation1
No ratings yet
3rd Module EDBA Contiuation1
6 pages
Regression Models - Follow
No ratings yet
Regression Models - Follow
7 pages
Additional Notes 3 - Forecasting Model Performance
No ratings yet
Additional Notes 3 - Forecasting Model Performance
5 pages
WINSEM2023-24 MAT6015 ETH VL2023240501308 2024-03-19 Reference-Material-I
No ratings yet
WINSEM2023-24 MAT6015 ETH VL2023240501308 2024-03-19 Reference-Material-I
39 pages
Machine Learning Basics 1683717543
No ratings yet
Machine Learning Basics 1683717543
15 pages
1.1 Regression Analysis
No ratings yet
1.1 Regression Analysis
33 pages
Important points for Regression
No ratings yet
Important points for Regression
6 pages
Simple Linear Regression Part I - Updated FA18
No ratings yet
Simple Linear Regression Part I - Updated FA18
59 pages
Unit-III
No ratings yet
Unit-III
13 pages
Metrix in ML
No ratings yet
Metrix in ML
7 pages
Final Answer Bank
No ratings yet
Final Answer Bank
10 pages
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet
IAS 19 EMPLOYEE BENEFITS
No ratings yet
IAS 19 EMPLOYEE BENEFITS
15 pages
Education 46 Pitacco A
No ratings yet
Education 46 Pitacco A
18 pages
An Introductory Guide in The Construction of Actuarial Models: A Preparation For The Actuarial Exam C/4
100% (1)
An Introductory Guide in The Construction of Actuarial Models: A Preparation For The Actuarial Exam C/4
350 pages
TABEL PV Dan OA
No ratings yet
TABEL PV Dan OA
4 pages
(Morton Lane) Alternative Risk Strategies
No ratings yet
(Morton Lane) Alternative Risk Strategies
725 pages
Group 1 ECON6006 Financial Econometrics Assignment 2 Submission
No ratings yet
Group 1 ECON6006 Financial Econometrics Assignment 2 Submission
20 pages
LassoRegression
No ratings yet
LassoRegression
3 pages
An Empirical Survey on Social Media Usage Affect Academic Performance
No ratings yet
An Empirical Survey on Social Media Usage Affect Academic Performance
12 pages
Python For Actuary
No ratings yet
Python For Actuary
106 pages
Econometrics Eviews 5
No ratings yet
Econometrics Eviews 5
8 pages
Cvar Var PPT Unlocked
No ratings yet
Cvar Var PPT Unlocked
42 pages
Worksheet 1 Theme 1
100% (1)
Worksheet 1 Theme 1
4 pages
Pengantar MFK Puskesmas
100% (1)
Pengantar MFK Puskesmas
63 pages
Comparing Two Regression Slopes by Means of An ANCOVA
No ratings yet
Comparing Two Regression Slopes by Means of An ANCOVA
4 pages
Woolworths Annual Report
No ratings yet
Woolworths Annual Report
2 pages
Actuarial Science Personal Statement
100% (1)
Actuarial Science Personal Statement
10 pages
GEN MATH FINALS reviewer
No ratings yet
GEN MATH FINALS reviewer
2 pages
Stata IV Simple Example
No ratings yet
Stata IV Simple Example
7 pages
SOA Issues
No ratings yet
SOA Issues
2 pages
STAT3301 - Term Exam 2 - CH11 Study Package
No ratings yet
STAT3301 - Term Exam 2 - CH11 Study Package
6 pages
JSO (Test- 10) Paid
No ratings yet
JSO (Test- 10) Paid
6 pages
Erasor 2.0
No ratings yet
Erasor 2.0
51 pages
Chapter 2 Slides Handout
No ratings yet
Chapter 2 Slides Handout
48 pages
Module 7 Content
No ratings yet
Module 7 Content
10 pages
A1 - Risk Classification Statement of Principles
No ratings yet
A1 - Risk Classification Statement of Principles
4 pages
Study Id66977 Demographics in Vietnam
No ratings yet
Study Id66977 Demographics in Vietnam
43 pages
Regression Course outline
No ratings yet
Regression Course outline
5 pages
Ordinal Logistic Regression MC
No ratings yet
Ordinal Logistic Regression MC
36 pages
(Ebook) Applied Econometrics by Dimitrios Asteriou, Stephen G. Hall ISBN 9781352012026, 1352012022 pdf download
100% (3)
(Ebook) Applied Econometrics by Dimitrios Asteriou, Stephen G. Hall ISBN 9781352012026, 1352012022 pdf download
57 pages
Final Persistent Systmes
No ratings yet
Final Persistent Systmes
7 pages

BIOSTATISTICS

Uploaded by

BIOSTATISTICS

Uploaded by

CHOOSING THE

0.5 < R2 < 0.8 – Moderate fit

R2 > 0.8 – Strong fit

• Low p-values indicate terms that are statistically significant

• “Reducing the model” refers to the practice of including all candidate

Refer this link for BIC

You might also like