Regression Tutorial 201 With NumXL
Regression Tutorial 201 With NumXL
Regression 201
Thisisthethirdentryinourregressionanalysisandmodelingseries.Inthistutorial,wecontinuethe analysisdiscussionwestartedearlierbyleveragingamoreadvancedtechniqueinfluentialdata analysistohelpusimprovethemodel,and,asaresult,thereliabilityoftheforecast. Again,wewilluseasampledatasetgatheredfrom20differentsalespersons.Theregressionmodel attemptstoexplainandpredicttheweeklysalesforeachperson(dependentvariable)usingtwo explanatoryvariables:intelligence(IQ)andextroversion.
Data Preparation
Similartowhatwedidinourearliertutorial,weorganizeoursampledatabyplacingthevalueofeach variableinaseparatecolumnandeachobservationinaseparaterow. Next,weintroducethemask.ThemaskisaBooleanarray(0,1)thatchooseswhichvariableis included(orexcluded)intheanalysis. Initially,atthetopofthetable,letsinsertthemaskcellsarray;eachwithavalueof1(i.e.included). Thearrayisshownbelowhighlightedbelow:
Inthisexample,wehave20observationsandtwoindependent(explanatory)variables.Theresponseor dependentvariableistheweeklysales.
Process
Nowwearereadytoconductourregressionanalysis.First,selectanemptycellinyourworksheet whereyouwishtheoutputtobegenerated,thenlocateandclickontheregressioniconintheNumXL
Regression201Tutorial
SpiderFinancialCorp,2013
tab(ortoolbar).
NowtheRegressionWizardwillappear.
Selectthecellsrangefortheresponse/dependentvariablevalues(i.e.weeklysales).Selectthecells rangefortheexplanatory(independent)variablesvalues.ForVariables(X)Mask,selectthecellsatthe topofthedatatable(Booleanarray). Notes: 1. Thecellsrangeincludes(optional)theheading(Label)cell,whichwouldbeusedintheoutput tableswhereitreferencesthosevariables. 2. Theexplanatoryvariables(i.e.X)arealreadygroupedbycolumns(eachcolumnrepresentsa variable),sowedontneedtochangethat. 3. Bydefault,theoutputcellsrangeissettothecurrentselectedcellinyourworksheet. Pleasenotethat,onceweselecttheXandYcellsrange,theoptions,ForecastandMissingValues tabsbecomeavailable(enabled). Next,selecttheOptionstab.
Regression201Tutorial
SpiderFinancialCorp,2013
Now,clicktheMissingValuestab.
Regression201Tutorial
SpiderFinancialCorp,2013
Toassesstheinfluencethateachobservationexertsonourmodel,wecalculateacoupleofstatistical measures:leverageandCooksdistance.
Regression201Tutorial
SpiderFinancialCorp,2013
Selectthecellnexttotheresponsevariable. Intheformulabar,typeintheMLR_FITTEDfunction,thenclickthefxbutton.
TheFunctionWizardpopsup.Selecttheinputcellsrange,mask,andaReturntypeof4forthe leveragestatistics.ClickOK.
Regression201Tutorial
SpiderFinancialCorp,2013
Now,tocalculatetheCooksdistance,selectthecellnexttoLeverageandrepeatthesame steps,butwiththereturntype=5.
Analysis
NowthatwehavetheleverageandCooksdistancestatistics,letsinterprettheirfindings. Regression201Tutorial 6 SpiderFinancialCorp,2013
Leverage(H)
Toexaminethisassumption,letsremoveSalesman11fromourinputdataandexaminetheresulting regression.Todoso,justinsertan#N/Avalueinanyinputvariableofthisobservation.
(Fulldataset) Omittingsalesman#11
70%
Cook'sDistance(D)
60%
50%
40%
30%
20%
7
10%
SpiderFinancialCorp,2013
0% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
4 asathreshold(which N
NotethattheleveragestatisticsandCooksdistancereturn#N/Aforthismissingvalue. Letsnowexaminetheregressionstatisticsbeforeandafterwedroppedthesixteenthobservation.
(FullDataset)
(WithoutSalesman#16)
Tohelpexplainwhatmakesanobservationinfluential,letsexaminetheextroversionvs.weeklysales graphbelow:
4 21% ) 19
Regression201Tutorial
SpiderFinancialCorp,2013
Conclusion
Inthistutorial,wehaveshownthatexcludingobservation#16isbeneficialtoourmodelingeffortsasit exertsignificantinfluenceonourcoefficientcalculation. Next,usingtheremaining19observations,letsrecalculate(SHIFT+F9)theregressionstatistics,ANOVA, residualsdiagnosis,stepwiseregression,etc.
Regression201Tutorial
10
SpiderFinancialCorp,2013
Theoptimalsetoftheinputvariablesisthesameasearlier.Letsdroptheintelligencevariable(by settingitsvalueto0inthemask),andrecalculate
Theregressionerroris$307(vs.$332beforeweremovedsalesman#16).
Regression201Tutorial
11
SpiderFinancialCorp,2013
$4,500
$4,000
$3,500
$3,000
$2,500
$2,000
$1,500 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
TheFinalquestionwemayaskourselves;Istheregressionstableoverthesampledataset?Nextissue.
Regression201Tutorial
12
SpiderFinancialCorp,2013