0% found this document useful (0 votes)
100 views

Regression Tutorial 201 With NumXL

This is the third entry in our regression analysis and modeling series. In this tutorial, we continue the analysis discussion we started earlier by leveraging a more advanced technique – influential data analysis ‐ to help us improve the model, and, as a result, the reliability of the forecast.

Uploaded by

NumXL Pro
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
100 views

Regression Tutorial 201 With NumXL

This is the third entry in our regression analysis and modeling series. In this tutorial, we continue the analysis discussion we started earlier by leveraging a more advanced technique – influential data analysis ‐ to help us improve the model, and, as a result, the reliability of the forecast.

Uploaded by

NumXL Pro
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Tutorial:

Regression 201
Thisisthethirdentryinourregressionanalysisandmodelingseries.Inthistutorial,wecontinuethe analysisdiscussionwestartedearlierbyleveragingamoreadvancedtechniqueinfluentialdata analysistohelpusimprovethemodel,and,asaresult,thereliabilityoftheforecast. Again,wewilluseasampledatasetgatheredfrom20differentsalespersons.Theregressionmodel attemptstoexplainandpredicttheweeklysalesforeachperson(dependentvariable)usingtwo explanatoryvariables:intelligence(IQ)andextroversion.

Data Preparation
Similartowhatwedidinourearliertutorial,weorganizeoursampledatabyplacingthevalueofeach variableinaseparatecolumnandeachobservationinaseparaterow. Next,weintroducethemask.ThemaskisaBooleanarray(0,1)thatchooseswhichvariableis included(orexcluded)intheanalysis. Initially,atthetopofthetable,letsinsertthemaskcellsarray;eachwithavalueof1(i.e.included). Thearrayisshownbelowhighlightedbelow:

Inthisexample,wehave20observationsandtwoindependent(explanatory)variables.Theresponseor dependentvariableistheweeklysales.

Process
Nowwearereadytoconductourregressionanalysis.First,selectanemptycellinyourworksheet whereyouwishtheoutputtobegenerated,thenlocateandclickontheregressioniconintheNumXL

Regression201Tutorial

SpiderFinancialCorp,2013

tab(ortoolbar).

NowtheRegressionWizardwillappear.

Selectthecellsrangefortheresponse/dependentvariablevalues(i.e.weeklysales).Selectthecells rangefortheexplanatory(independent)variablesvalues.ForVariables(X)Mask,selectthecellsatthe topofthedatatable(Booleanarray). Notes: 1. Thecellsrangeincludes(optional)theheading(Label)cell,whichwouldbeusedintheoutput tableswhereitreferencesthosevariables. 2. Theexplanatoryvariables(i.e.X)arealreadygroupedbycolumns(eachcolumnrepresentsa variable),sowedontneedtochangethat. 3. Bydefault,theoutputcellsrangeissettothecurrentselectedcellinyourworksheet. Pleasenotethat,onceweselecttheXandYcellsrange,theoptions,ForecastandMissingValues tabsbecomeavailable(enabled). Next,selecttheOptionstab.

Regression201Tutorial

SpiderFinancialCorp,2013

Initially,thetabissettothefollowingvalues: Theregressionintercept/constantisleftblank.Thisindicatesthattheregressioninterceptwill beestimatedbytheregression.Tosettheregressiontoafixedvalue(e.g.zero(0)),enterit there. Thesignificancelevel(aka. )issetto5% Inoutputsection,themostcommonregressionanalysisisselected. Forautomodeling,checkthisoption.

Now,clicktheMissingValuestab.

Regression201Tutorial

SpiderFinancialCorp,2013

Inthistab,youcanselectanapproachtohandlemissingvaluesinthedataset(XandY).Bydefault,any missingvaluefoundinXorinYinanyobservationwouldexcludetheobservationfromtheanalysis. Thistreatmentisagoodapproachforouranalysis,soletsleaveitunchanged. Now,clickOKtogeneratetheoutputtables.

Toassesstheinfluencethateachobservationexertsonourmodel,wecalculateacoupleofstatistical measures:leverageandCooksdistance.

Regression201Tutorial

SpiderFinancialCorp,2013

Selectthecellnexttotheresponsevariable. Intheformulabar,typeintheMLR_FITTEDfunction,thenclickthefxbutton.

TheFunctionWizardpopsup.Selecttheinputcellsrange,mask,andaReturntypeof4forthe leveragestatistics.ClickOK.

MLR_FITTEDreturnsanarrayofvalues,butyouwillinitiallyonlyseethe1stvalue. Todisplaythefullarray,selectallthecellsbelow(totheendofthesample).PressF2,thenpress CTRL+SHIFT+ENTERtocopythearrayformula.

Regression201Tutorial

SpiderFinancialCorp,2013

Now,tocalculatetheCooksdistance,selectthecellnexttoLeverageandrepeatthesame steps,butwiththereturntype=5.

Analysis
NowthatwehavetheleverageandCooksdistancestatistics,letsinterprettheirfindings. Regression201Tutorial 6 SpiderFinancialCorp,2013

1. Leverage Statistics (H)


Leveragestatisticsmeasure thedistanceofan observationfromthecenter ofthedata.Inourexample, theintelligenceand extroversionvaluesfor Salesman11arefurthest fromtheaverage.Doesthis meanSalesman11isan outlier?Doesthismeanhe exertsinfluenceonthe calculationoftheregression coefficient?
40% 35% 30% 25% 20% 15% 10% 5% 0% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Leverage(H)

Toexaminethisassumption,letsremoveSalesman11fromourinputdataandexaminetheresulting regression.Todoso,justinsertan#N/Avalueinanyinputvariableofthisobservation.

(Fulldataset) Omittingsalesman#11

Droppingobservation11madethingsatbestthesameasearlier.Weoptedtorecoverthisobservation backintothesample. Insum,theleveragestatisticsdo notnecessarilyimplyanoutlier, butmerelyadistantobservation withfewneighbors.

70%

Cook'sDistance(D)

60%

2. Cooks Distance (D)


TheCooksdistancecorrectsfor weaknessintheleverage statistics,andisthusmore Regression201Tutorial

50%

40%

30%

20%

7
10%

SpiderFinancialCorp,2013

0% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

indicativeofinfluentialdata.Furthermore,therearefewheuristicsforthethresholdvaluesofCooks distancetodetectaninfluentialdatum.Forouranalysis,weoftenuse translatesto20%forthe20observationsinourdataset). Usingthethresholdorjustlookingattheearlierplot,wedetectthatSalesman16exertsthehighest influenceonourregression,soletsvoidthisobservation(bysetting#N/Ainoneoftheinputvariables).

4 asathreshold(which N

NotethattheleveragestatisticsandCooksdistancereturn#N/Aforthismissingvalue. Letsnowexaminetheregressionstatisticsbeforeandafterwedroppedthesixteenthobservation.

(FullDataset)

(WithoutSalesman#16)

Asyoumayalreadyhavenoticed,theregressionimprovedsignificantlyoneverydimension(e.g.R square,stderror,etc.).Salesman#16seemstobeaninfluentialoutlier,sowelldrophim. Regression201Tutorial 8 SpiderFinancialCorp,2013

Tohelpexplainwhatmakesanobservationinfluential,letsexaminetheextroversionvs.weeklysales graphbelow:

Wedrawthelineartrendasaproxyforourregressionmodel.Theblack(circle)datapointrepresents Salesman16.Itslocation(extroversionandweeklysalesvalue)ispullingtheregression(dashed)line towardit,affectingthevalueoftheregressionslopeandintercept. Droppingthisobservationreleasestheregressionline,adjustingittobetterfittheremainingpoints. LetstakeanotherlookattheCooksdistanceplot(withoutSalesman16,andwithathresholdof

4 21% ) 19

Regression201Tutorial

SpiderFinancialCorp,2013

TheCooksdistancevaluesforthedifferentplotsaredistributedsomewhatuniformly,andwemaystop there. Note:Bearinmindthatourthresholdruleismerelyaheuristic(ruleofthumb),andshouldnotbetaken rigidly,butratherasaguideline.

Conclusion
Inthistutorial,wehaveshownthatexcludingobservation#16isbeneficialtoourmodelingeffortsasit exertsignificantinfluenceonourcoefficientcalculation. Next,usingtheremaining19observations,letsrecalculate(SHIFT+F9)theregressionstatistics,ANOVA, residualsdiagnosis,stepwiseregression,etc.

Regression201Tutorial

10

SpiderFinancialCorp,2013

Theoptimalsetoftheinputvariablesisthesameasearlier.Letsdroptheintelligencevariable(by settingitsvalueto0inthemask),andrecalculate

Theregressionerroris$307(vs.$332beforeweremovedsalesman#16).

Regression201Tutorial

11

SpiderFinancialCorp,2013


$4,500

$4,000

$3,500

$3,000

$2,500

$2,000

$1,500 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

TheFinalquestionwemayaskourselves;Istheregressionstableoverthesampledataset?Nextissue.

Regression201Tutorial

12

SpiderFinancialCorp,2013

You might also like