0% found this document useful (0 votes)
95 views

Regression Tutorial 101 With NumXL

This is the first entry in what will become an ongoing series on regression analysis and modeling. In this tutorial, we will start with the general definition or topology of a regression model, and then use NumXL program to construct a preliminary model. Next, we will closely examine the different output elements in an attempt to develop a solid understanding of regression, which will pave the way to a more advanced treatment in future issues

Uploaded by

NumXL Pro
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
95 views

Regression Tutorial 101 With NumXL

This is the first entry in what will become an ongoing series on regression analysis and modeling. In this tutorial, we will start with the general definition or topology of a regression model, and then use NumXL program to construct a preliminary model. Next, we will closely examine the different output elements in an attempt to develop a solid understanding of regression, which will pave the way to a more advanced treatment in future issues

Uploaded by

NumXL Pro
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Regression101Tutorial 1 SpiderFinancialCorp,2013

Tutorial:Regression101
Thisisthefirstentryinwhatwillbecomeanongoingseriesonregressionanalysisandmodeling.Inthis
tutorial,wewillstartwiththegeneraldefinitionortopologyofaregressionmodel,andthenuseNumXL
programtoconstructapreliminarymodel.Next,wewillcloselyexaminethedifferentoutputelements
inanattempttodevelopasolidunderstandingofregression,whichwillpavethewaytoamore
advancedtreatmentinfutureissues.
Inthistutorial,wewilluseasampledatasetgatheredfrom20differentsalespersons.Theregression
modelattemptstoexplainandpredictasalespersonsweeklysales(dependentvariable)usingtwo
explanatoryvariables:Intelligence(IQ)andextroversion.
DataPreparation
First,letsorganizeourinputdata.Althoughnotnecessary,itiscustomarytoplaceallindependent
variables(Xs)ontheleft,whereeachcolumnrepresentsasinglevariable.Intherightmostcolumn,we
placetheresponseorthedependentvariablevalues.

Inthisexample,wehave20observationsandtwoindependent(explanatory)variables.Theamountof
weeklysalesistheresponseordependentvariable.

Process
Nowwearereadytoconductourregressionanalysis.First,selectanemptycellinyourworksheet
whereyouwishtheoutputtobegenerated,thenlocateandclickontheregressioniconintheNumXL

Regression101Tutorial 2 SpiderFinancialCorp,2013

tab(ortoolbar).

NowtheRegressionWizardwillappear.

Selectthecellsrangefortheresponse/dependentvariablevalues(i.e.weeklysales).Selectthecells
rangefortheexplanatory(independent)variablesvalues.
Notes:
1. Thecellsrangeincludes(optional)theheading(Label)cell,whichwouldbeusedintheoutput
tableswhereitreferencesthosevariables.
2. Theexplanatoryvariables(i.e.X)arealreadygroupedbycolumns(eachcolumnrepresentsa
variable),sowedontneedtochangethat.
3. LeavetheVariableMaskfieldblankfornow.Wewillrevisitthisfieldinlaterentries.
4. Bydefault,theoutputcellsrangeissettothecurrentselectedcellinyourworksheet.
Finally,onceweselecttheXandYcellsrange,theoptions,ForecastandMissingValuestabswill
becomeavailable(enabled).
Next,selecttheOptionstab.

Regression101Tutorial 3 SpiderFinancialCorp,2013

Initially,thetabissettothefollowingvalues:
- Theregressionintercept/constantisleftblank.Thisindicatesthattheregressioninterceptwill
beestimatedbytheregression.Tosettheregressiontoafixedvalue(e.g.zero(0)),enterit
there.
- Thesignificancelevel(aka. o )issetto5%.
- Intheoutputsection,themostcommonregressionanalysisisselected.
- Forautomodeling,letsleaveitunchecked.Wewilldiscussthisfunctionalityinalaterissue.
Now,clickontheMissingValuestab.

Regression101Tutorial 4 SpiderFinancialCorp,2013

Inthistab,youcanselecttheapproachtohandlemissingvaluesinthedataset(XandY).Bydefault,any
missingvaluefoundinXorinYinanyobservationwouldexcludetheobservationfromtheanalysis.
Thistreatmentisagoodapproachforouranalysis,soletsleaveitunchanged.
Now,clickOktogeneratetheoutputtables.

Analysis
Letsnowexaminethedifferentoutputtablesmoreclosely.
1. RegressionStatistics
Inthistable,anumberofsummarystatisticsforthegoodnessoffitoftheregressionmodel,giventhe
sample,isdisplayed.
1. Thecoefficientofdetermination(Rsquare)describes
theratioofvariationinYdescribedbytheregression.
2. TheadjustedRsquareisanalterationofRsquareto
takeintoaccountthenumberofexplanatoryvariables.
3. Thestandarderror(o )istheregressionerror.In
otherwords,theerrorintheforecasthasastandard
deviationaround$332.
4. Loglikelihoodfunction(LLF),Akaikeinformation
criterion(AIC),andSchwartz/Bayesianinformationcriterion(SBIC)aredifferentprobabilistic
measuresforthegoodnessoffit.
5. Finally,Observationsisthenumberofnonmissingobservationsusedintheanalysis.

Regression101Tutorial 5 SpiderFinancialCorp,2013

2. ANOVA
Beforewecanseriouslyconsidertheregressionmodel,wemustanswerthefollowingquestion:
Istheregressionmodelstatisticallysignificantorastatisticaldataanomaly?
Theregressionmodelwehavehypothesizedis:

1 1, 2 2,
2

~i .i .d~ (0, )
i i i i i i
i
Y Y e X X e
e N
o | |
o
= + = + + +

Where:
-

i
Y istheestimatedvaluefortheithobservation.
-
i
e istheerrortermfortheithobservation.
-
i
e isassumedtobeindependentandidenticallydistributed(Gaussian).
-
2
o istheregressionvariance(standarderrorsquared).
-
1 2
, | | aretheregressioncoefficients.
- o istheinterceptortheconstantoftheregression.
Alternatively,thequestioncanbestatedasfollows:

1 2
1
: 0
: 0
1 k 2
o
k
H
H
| |
|
= =
- =
s s

Theanalysisofvariance(ANOVA)tableanswersthisquestion.

Inthefirstrowofthetable(i.e.Regression),wecomputethetestscore(FStat)andPValue,then
comparethemagainstthesignificancelevel(o ).Inourcase,theregressionmodelisstatisticallyvalid,
anditdoesexplainsomeofthevariationinvaluesofthedependentvariable(weeklysales).
Theremainingcalculationsinthetablearesimplytohelpustogettothispoint.Tobecomplete,we
describeditscomputation,butyoucanskipthattothenexttable.

Regression101Tutorial 6 SpiderFinancialCorp,2013

- df isthedegreesoffreedom.(Forregression,itisthenumberofexplanatoryvariables( p ).
Forthetotal,itisthenumberofnonmissingobservationsminusone ( 1) N ,andforresiduals,
itisthedifferencebetweenthetwo( 1 N p )).
- SumofSquare(SS):

( )
( )
( )
2
1
2
1
2
1

N
i
i
N
i
i
N
i i
i
SSR Y Y
SST Y Y
SSE Y Y
=
=
=
=
=
=

- MeanSquare(MS):

1
SSR
MSR
p
SSE
SSE
N p
=
=

- TestStatistics:

, 1
~ ()
p N p
MSR
F F
MSE

=

3. ResidualsDiagnosisTable
Onceweconfirmthattheregressionmodelexplainssomeofthevariationinthevaluesoftheresponse
variable(weeklysales),wecanexaminetheresidualstomakesurethattheunderlyingmodels
assumptionsaremet.
1 1, 2 2,
2

~i .i .d~ (0, )
i i i i i i
i
Y Y e X X e
e N
o | |
o
= + = + + +

Usingthestandardizedresiduals(i.e.
i
i
e
o
),weperformaseriesofstatisticalteststothemean,variance,
skew,excesskurtosisandfinally,thenormalityassumption.

Regression101Tutorial 7 SpiderFinancialCorp,2013

Inthisexample,thestandardizedresidualspassthetestswith95%confidence.
Note:thestandardized(akastudentized)residualsarecomputedusingthepredictionerror(
pred
S )
foreachobservation.
pred
S takesintoaccounttheerrorsinthevaluesoftheregressioncoefficient,in
additiontothegeneralregressionerror(RMSEor o ).
4. RegressionCoefficientsTable
Onceweestablishthattheregressionmodelissignificant,wecanlookcloserattheregression
coefficients.

Eachcoefficient(includingtheintercept)isshownonaseparaterow,andwecomputethefollowing
statistics:
- Value(i.e.
1
, ,... o | )
- Standarderrorinthecoefficientvalue.
- Testscore(Tstat)forthefollowinghypothesis:

: 0
: 0
o k
o k
H
H
|
|
=
=

- ThePValuesoftheteststatistics(usingStudentstdistribution)
- Upperandlowerlimitsoftheconfidenceintervalforthecoefficientvalue.
- Areject/acceptdecisionforthesignificanceofthecoefficientvalue.
Inourexample,onlytheextroversionvariableisfoundsignificantwhiletheinterceptandthe
Intelligencearenotfoundsignificant.

Regression101Tutorial 8 SpiderFinancialCorp,2013

Conclusion
Inthisexample,wefoundthattheregressionmodelisstatisticallysignificantinexplainingthevariation
inthevaluesoftheweeklysalesvariable,itsatisfiesthemodelsassumptions,butthevalueofoneor
moreregressioncoefficientisnotsignificantlydifferentfromzero.
Whatdowedonow?
Theremaybeanumberofreasonswhythisisthecase,includingpossiblemulticollinearitybetweenthe
variablesorsimplythatonevariableshouldnotbeincludedinthemodel.Asthenumberofexplanatory
variablesincreases,answeringsuchquestiongetsmoreinvolved,andweneedfurtheranalysis.
Wewillcoverthisparticularissueinaseparateentryofourseries.

You might also like