Comparisonbetween Multiple Regressionand Bayesian Regressionwith
Comparisonbetween Multiple Regressionand Bayesian Regressionwith
net/publication/375642024
CITATIONS READS
0 54
1 author:
Samira Muhamad
University of Sulaimani
18 PUBLICATIONS 20 CITATIONS
SEE PROFILE
All content following this page was uploaded by Samira Muhamad on 15 November 2023.
University of Sulaimani
Department of statistic &
Information
[email protected]
Abstract:
This research performs linear regression modeling with Bayesian approach. There
are many kinds of regression analysis. The most repeatedly manipulated by
researchers is the Frequentist methods that are often utilized in linear regression are
Ordinary Least Square (OLS) and Maximum Likelihood Estimation (MLE), it has
dissimilarity among two types of regression Frequentist and Bayesian in expression of
looking at a parameter, along with the Bayesian development, assorted studies have
manifested better modeling outcomes than the Frequentist method. After applying two
regression methods, we conclude this result: Three variables (affected the numbers of
eggs in infertility, or parameters of the three independent variables are significant, the
value of explain these variables is 0.698 .The MSE are equal in two methods. We
conclude that the Bayesian regression method is best model for this data. As well as
Bayesian approach can be used as an alternative method for the model. The
comparison between Bayesian and Frequentist (OLS) modeling results have shown by
using several criteria such as RMSE, MAPE , AIC,BIC, ,MAD, and R2. The results
showed that the linear regression method by using Bayesian approach is better than
Frequentist method using OLS.
Keywords: linear regression, Bayesian regression, Ordinary least square (OLS), prior
distribution, posterior distribution.
Aim of research: In this work we can compare between two methods of regression
which is (OLS Ordinary least square and Bayesian regression) or the foremost
objectives of this study includes:
(1) A linear regression model is built by using Ordinary Least Square (OLS) method.
11
Statistics and information 2023 ( كانون الثاني1) (العدد6) المجلد.مجلة كلية دجلة الجامعة
(2) A linear regression model is built by using Bayesian approach.
(3) To compare between two method and discover the best linear model.
Introduction:
In statistic one of the most habitually method used is linear regression analysis to
analyzing data. There are two types of attributes in this method, first known as
dependent variable which is variables that effected or the values that depend on other
variables, and second variable is independent or explanatory variables which impact or
affected to dependent variable. In regression analysis some response variables can
only be one variable and the data scale is interval / ratio so that data can only be in the
form of numerical data [22]. While the independent variable can be more than one
variable and the data scale can be (nominal / ordinal) meaning that categorical data or
(interval / ratio) of numerical data. If only one independent variable is used then the
model is called simple linear regression analysis, whereas if the number of variables are
more than one then it is called multiple linear regression analysis [30]. There are three
goals in linear regression analysis to know the relationship among the response variable
and the independent variables, to test whether the independent variables have effect
over the dependent variable, and to predict the value of variable dependent based on
independent variables that have been determined. Formation of the regression model is
done by estimating the parameters of the regression model. So that will yield regression
coefficient for each independent variable [22]. The most frequent method used by
researchers is the Frequentist or Classic method by using OLS (Ordinary Least Square)
or MLE (Maximum Likelihood Estimation) [3]. First method is also called the least
squares method, done by minimizing the number of errors from the regression equation.
The parameters are obtained from minimizing the error function of the equation [21].
While MLE is done by maximizing the probability density function (probability density
function) of a data. So by using both of the methods, there are classical assumptions
that must be met from the results of regression modeling. Some assumptions that must
be met are error independent, identical and normal distribution [23] .In addition to these
two methods, there are other methods that can be used to estimate the parameters of
regression model which is Bayesian approach. The difference between Frequentist and
Bayesian methods is the point of view of the parameters. The Bayesian approach view’s
the parameters as a random variable so that the value is not a single value as in the
Frequentist point of view [1] .
Relate work or literature Reviews:
There are many study have been carried out in Bayesian method such as:
1. Edward R. ,et al (2010): The main target of this study is to improve a Bayesian
“sum-of-trees” model where each tree is constrained by a regularization prior to be a
weak learner, and fitting and inferenceare accomplished via an iterative Bayesian
backfitting MCMC algorithmthat generates samples from a posterior .
12
Statistics and information 2023 ( كانون الثاني1) (العدد6) المجلد.مجلة كلية دجلة الجامعة
2. Brodersen, et al (2015): The main aim of this research is to proposes to infer
causal impact on the funamental point of a diffusion-regression state-space model
that predicts the counter factual market response in a synthetic control that would
have occurred had no intervention taken place .
3. In (2018) ,Qian, H.: published the research the aim of this research is to
establish the normal-inverse-gamma summation operator, which combines
Bayesian regression results from different data sources and conduct to a simple
split-and-merge algorithm for big data regressions.
4. Hubin, A.,ET AL in (2018):worked in Bayesian model the main objective of this
study is to will adapt an advanced developmental algorithm called GMJMCMC
(Genetically modified Mode Jumping Markov Chain Monte Carlo) to execute
Bayesian model selection in the space of logic regression models.
5. Neelon, B. (2018): The main intention of this study is to propose an efficient
Bayesian approach to fitting ZINB models. Moreover, the proposed data-augmented
Gibbs sampler makes use of easily sampled Polya-Gamma random variables;
conditional on these latent variables, inference proceeds via straightforward
Bayesian inference for linear models.
1. Linear Regression Model:
One of the customary statistical technique that used to elucidate the relationship
among (independent) explanatory variables and the (response) dependent variable is
Regression analysis. Therefore, the necessity of this analysis is selecting a pertinent
model for the available data is. It has been used in many areas of applications, including
dental health, medical research, and economics [23]. Regression analysis includes
several variations, such as linear, multiple linear, and nonlinear. There are two
descriptions of linearity one of them is linearity for the parameter and the other is
linearity for the variable. Linear model is called modeling if the model is linear for the
parameter. The model is called linear model as long as the method is linear to the
parameter even if the model is not linear to the attribute or linear to the attribute
[31],[13]. Multiple linear regression models can be utilized to assess the relationship
between the dependent variable with two or more independent variables. Before doing
linear regression modeling, first it must be ensured that there is a relationship or
correlation between each explanatory variable to the response variable and the
relationship between the outcome variable with all independent variables are linear.
Simple linear and multiple linear are the most familiar models. Nonlinear regression
analysis is regularly used for more convoluted data sets in which the response and
explanatory variables show a nonlinear relationship [24]. We start with multiple linear
regression, the "simplest" of the regression analyses. It is defined to be the least
squares estimator of a regression model with two or more than two explanatory
(independent) variable in which a straight line is fit through (n) points so that the sum of
squared residuals (SSR), The goal is predicting the unknown parameters by minimize (
∑ 𝜺𝒊𝟐 ). The intention is to select the parameters that the residuals are "small." While
there are various methods with which to minimize the residuals [11]. The model can be
13
Statistics and information 2023 ( كانون الثاني1) (العدد6) المجلد.مجلة كلية دجلة الجامعة
thought of as a similar to the slope-intercept form of a line with an error (residual) term.
The simple linear regression model is given by:
𝑦 = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + ⋯+ 𝛽 𝑥 + 𝜀 ... (1)
Where: i= 1, 2, ..., n, yi is the response (dependent) variable, Xi is the explanatory
/predictor (independent) variables, and (𝛽 , 𝛽 ) 𝑎re the unknown parameters that are to
be estimated, and (𝜀 ) is the residual term[30] . the, term (𝜀 ) is independent and
identically distributed (i.i.d) with a normal distribution with (0) mean and unknown ( 𝜎 )
variance[12],[29] the multiple linear regression reduces to simple linear regression and it
is a generalization of the simple linear regression. by using matrix notation the equation
can be writing as :
Y X … (2)
likelihood prior
posterior …(4)
m arg inal likelihood
Based on the probability density function (pdf) above can be defined likelihood function
of these variables are as follows [26],[4]:
n
1 1
P(Y X , , 2 ) EXP (Y X )(Y X )
i 1 2 2
2
n 1
P (Y X , , 2 ) ( 2 ) 2 EXP (Y X )(Y X )
2
S2
n 1
P(Y X , , ) ( ) EXP 2 ( 2 ) 2 EXP (Y X )(Y X ) ….(6)
2 2 2
2 2
There are several prior distributions that can be operate in the Bayesian approach of
linear regression model, one of which is the distribution of prior conjugate [8],[23].
Estimation of regression model parameters with Bayesian approach can be done by
iterating at marginal posterior. Posterior distribution is calculated by multiplying the prior
distribution and likelihood function [3],[10],[20].
p ( , 2 Y , X ) P( Y X , , 2 ) P ( 2 ) P( 2 )
15
Statistics and information 2023 ( كانون الثاني1) (العدد6) المجلد.مجلة كلية دجلة الجامعة
n 1
p ( , 2
Y , X ) ( 2
) 2
EXP ( Y X ) ( Y X )
2
2
( 2 1 ) S2 k 1
( )
2
EXP 2
( 2 ) 2 E X P 2 ( ) ( ) …(7)
2
2
The process of obtaining the estimation of regression model parameters with Bayesian
approach can be done by using MCMC (Markov Chain Monte Carlo) algorithm [9],[21]
[23],.
3. Priors:
Statistical models by utilizing Bayesian statistics vital the formulation of a set of prior
distributions for any unknown parameters. The probability distributions express one's
uncertainty about an unknown quantity, p, before the "data" is taken into account. It is
indicated to the attribute’s uncertainty rather than randomness to the uncertain quantity.
The unknown quantity may be a parameter or latent attribute. It is often the purely
subjective assessment of an experienced specialist. The most controversial element of
Bayesian Analysis is utilize of a prior distribution. Criticized for introducing subjective
information, the use of a prior is purely an educated guess and can vary from one
scientist to another. There are two types of priors. The first type, called the Conjugate
Prior, occurs when the posterior distribution has the same form as the prior distribution.
The second type, called the Non informative Prior, is used when we have very little
knowledge or information about the prior distribution. The non-informative prior is used
to "conform to the Bayesian model in a correct parametric form." [3],[30].
Choose conjugate prior for ( , 2 ) to ensure an analytic expression for the posterior
p ( , 2 ) p ( 2 ) p ( 2 ) where:
2 : N ( , 2 A1 )
0 S02
2 :
20
[Hence has a normal prior and has selected inverse chi-square prior.
2 2
p ( , 2
y, X ) p( y X , 2 ) p( 2 )P ( 2 )
n 1 k 1
( 2 ) 2
exp ( y X ) ( y X ) ( 2 ) 2 e x p ( ) ( )
2
2
2
(
0
1) S 2
( 2 ) 2
e x p 0 20 (8 )
2
And the joint posterior then takes the form: If the posterior distributions p(θ|x) are in the
same family as the prior probability distribution p(θ), conjugate distributions are called
16
Statistics and information 2023 ( كانون الثاني1) (العدد6) المجلد.مجلة كلية دجلة الجامعة
for both of the prior and posterior ,also the prior is called a conjugate prior for the
likelihood[16],[26].
Below for some likelihood functions, if you choose a certain prior, the posterior ends
up being in the same distribution as the prior. Such a prior then is called a Conjugate
Prior. The conjugate prior simplifies the mathematics involved in Bayesian data analysis
[22],[26].Some commonly used conjugate priors are listed in table (1).
f X ( x)L X Y y( x)
f X
Y y( x)
f X ( x ) L X Y y ( x ) dx
4 MSE M SE
SSE MSE :Mean square error
d.f
5 RMS RM SE M SE Root square of MSE :Mean square
E error
6 MAD n
Mean absolute division
xi x
M AD i1
17
Statistics and information 2023 ( كانون الثاني1) (العدد6) المجلد.مجلة كلية دجلة الجامعة
7 MAPE 1 n
x i xˆ Mean Absolute Percentage Error
M APE
n
i1 xi (MAPE)
2. Application part:
In this section we discusse practical part and all results will be presented to the
applied of the study by using statically package (JASP and SYSTAT12) program.
And then to determine whether there is all independent variables are effected the
depended variable or not multiple regression analysis according to the (OLS)
method or (frequentist method).
Ta
95% Collinearity
ble
confidence Tolerance
t- ( 2
variables Estimate Std.Error Sig. interval VIF
value )
Lower
Fitt
upper
ing
Intercept 42.142 1.92 21.84 0.000 38.33 45.94 Mu
ltip
X1 -0.573 0.038 -14.90 0.000 -0.649 -0.497 0.939 1.06 le
Re
X2 0.272 0.131 2.07 0.003 0.013 0.532 0.753 1.32 gre
ssi
X3 -0.094 0.078 -1.20 0.228 -0.248 0,060 0.663 1.50 on
Mo
X4 -0.838 0.089 -9.45 0.000 -1.013 -0,663 0.701 1.42 del
(O
X5 -0.102 0.034 -3.03 0.003 -0.196 -0.036 0.980 1.02 LS
)
X6 0.478 0.67 0.70 0.470 -0.853 1.810 0.802 1.24
19
Statistics and information 2023 ( كانون الثاني1) (العدد6) المجلد.مجلة كلية دجلة الجامعة
Table (4)
Analysis of Variance
Source Sum of Df Mean F- P-
Squares Square Ratio Value
Model 5365.75 6 894.29 70.57 0.000
Residual 2217.55 175 12.67
Total 7583.30
One of the basic assumption before analysis is the realization of the normality for
residual we can improve that throw Figure (1)
21
Statistics and information 2023 ( كانون الثاني1) (العدد6) المجلد.مجلة كلية دجلة الجامعة
Q-Q Plot
22
Statistics and information 2023 ( كانون الثاني1) (العدد6) المجلد.مجلة كلية دجلة الجامعة
23
Statistics and information 2023 ( كانون الثاني1) (العدد6) المجلد.مجلة كلية دجلة الجامعة
10) From the marginal Posterior Distributions for coefficient of regression show that
the variables ( X3 and X6), although they do not have a significant effect on the
model, but through the probability of the model, we can say that if these variables
is included in the model, the probability of this including is (0.879,0.871) there is a
large spike at lower confident interval to upper bound of two variables which is (-
0.118 ,0.013) and (-0.006,1.141) respectively .
3: Comparison between two methods throw the Model selection criteria.
From table (7) above we showed that the throw the three criteria (
AIC,BIC,MAD,and MAPE) the Bayesian regression approach is better than OLS
,while throw the two criteria (MSE,RMSE) for two methods are equal ,and (R2) for
ordinary least square largest than Bayesian method ,knowing that the value
(R2=0.698) in Bayesian for three variable only . then the Bayesian regression
method is best model for this data .
Reference:
1. Ali .Ahmed, Inglis. Alan n., Prado. Estevão, Bruna Wundervald,(2013).” Bayesian
Linear Regression “1Statistics Solutions. (2013).
2. A.Hubin,, G. Storvik,, & F.Frommlet, (2018). A novel algorithmic approach to
Bayesian Logic Regression. Bayesian Analysis.
3. Ausin .M. Concepcion ,(2009),”Bayesian Inference “Universidad Carlos III de
Madrid , Master in Business Administration and Quantitative Methods
4. Ahmad .Sheikh P. (2011).” Bayesian Regression Analysis with Examples in S- PLUS
and R “Journal of Modern Applied Statistical Methods,Vol.10,Issue1, Article 24 .
5. Bagnell.D, Hua.Rushane, and Kambam, Dheeraj R.” Statistical Techniques in
Robotics “ (16-831, F14), Lecture#16, (Thursday November 6).
6. Burnham,K.P.etal.,(2004),)”multimodel inference understanding AIC and BIC in
selection”Sociological methods & Research ,33(2):p(261-304).
7. Chipman.Hugh A.,Edward I. and McCulloch George .E.,(2010),”Additive
Regression Trees “.the annals of applied statistics ,Vol 4 ,No.1 ,Page (266-
298) .
8. Clyde.Merlise.M.,etal.,(2021), “. An Introduction to Bayesian Thinking a
Companion to the Statistics with R Course “
9. Congdon .Peter, (2003) ,” Applied Bayesian Modelling”, John Wiley & Sons, Ltd.
10. Congdon Peter ,(2004), “ Bayesian model choice based on Monte Carlo estimates
of posterior model probabilities “Department of Geography, Queen Mary
University of London, Mile End Road, London E1 4NS, UK
11. Esian .Nelsy , (2018).” . An Introduction to Bayesian Linear Regression”
12. Evans .Sara, (2012),”Bayesian regression analysis” University of Louisville, M.A.,
University of Louisville.
13. Fox. John,(2018),” Bayesian Estimation of Regression Models “An Appendix to
Fox & Weisberg an R Companion to Applied Regression, third edition.
14. H. Qian,, (2018). Big Data Bayesian Linear Regression and Variable Selection by
Normal-Inverse-Gamma Summation. Bayesian Analysis, 13(4), 1007-1031.
15. Johnson, Nels,(2011).”Bayesian Methods for Regression in R”, Laboratory for
Interdisciplinary Statistical Analysis.
16. K.H.,Brodersen ,F. Gallusser,, Koehler, J., Remy, N., & Scott, S. L. (2015).
Inferring causal impact using Bayesian structural time-series models. The
Annals of Applied Statistics, 9(1), 247-274.
17. Neelon, B. (2018). Bayesian Zero-Inflated Negative Binomial Regression Based
on Pólya-Gamma Mixtures. Bayesian Analysis.
26
View publication stats
Statistics and information 2023 ( كانون الثاني1) (العدد6) المجلد.مجلة كلية دجلة الجامعة
18. Miroshikov .Alexey , Savelev .Evgeny. and , Conlon .Erin M.(2015), “An R
package for Bayesian Linear Models for Big Data and Data Science “
19. M.H.Walker, and Tobler,K.J.,2020.Female infertility .Stat Pears [Internet].
20. Minka.Thomas P.,(2010),”Bayesian linear regression “
21. Permaia.S.D., Heruna Tanty, (2018) ,” Linear regression model using Bayesian
approach for energy performance of residential building “ 3rd International
Conference on Computer Science and Computational Intelligence . Science
Direct. Procedia Computer Science 135 (671–677).
22. Rice.Ken ,2017) ,”Bayesian Statistics for Genetics “Lecture 4: Linear Regression
23. Smith. Adam N, (2021).”Notes on Bayesian Linear Regression “,July 31.
24. Shekaramiz, Mohammad,. Moon. Todd. K,(2019),” Note on Bayesian Linear
Regression”Electrical and Computer Engineering Faculty Publications
25. Wakefield, Jon,( 2013) ,”Bayesian and Frequentist Regression Methods . Springer
New York .
26. Yun Lin ,Meng,(2013)," Bayesian statistics “technical Report No.2 ,May 6 .
27. Look at website: Wallace H, Matthew and Tobler J .Kyle,(2021),concept of the
infertility is available from (https://www.ncbi.nlm.nil.gov/books/NBK556033/).
28. Bayesian linear regression ST440/540: Applied Bayesian Statistics Spring, 2018
29. Look at www.inf.ed.ac.uk/teaching/couses /mlpr MLPR: w3b Iain Murray and Arno
Onken, ( 2019) .
30. Look at APPM 5720: Bayesian Computation.
31. Look at , http://www.inf.ed.ac.uk/teaching/courses/mlpr/2019/.
27