Assignment Brief 2023
Assignment Brief 2023
The coursework is an individual piece of assessment, requiring you to analyse the ORGANICS dataset
within SAS Enterprise Miner, using the directed data mining techniques covered in the IMAT3613
module, and detailing your results, interpretations, conclusions and recommendations in a well-
structured technical report. You are provided with:
1. This Brief.
2. The ORGANICS dataset contains 10,000 observations and 13 variables shown in Appendix B.
3. The coursework will be assessed according to the marking grid in Appendix C.
4. Self/Peer Assessment Rubric Appendix D.
5. Template Report in Appendix A.
SUMMARY
Let tus tdive tright tin tand tperform ta tregression tanalysis tusing tthe tvariables tapi00, tacs_k3,
tmeals tand tfull. tThese tmeasure tthe tacademic tperformance tof tthe tschool t(api00), tthe taverage
tclass tsize tin tkindergarten tthrough t3rd tgrade t(acs_k3), tthe tpercentage tof tstudents treceiving
tfree tmeals t(meals) t– twhich tis tan tindicator tof tpoverty, tand tthe tpercentage tof tteachers twho
thave tfull tteaching tcredentials t(full). tWe texpect tthat tbetter tacademic tperformance twould tbe
tassociated twith tlower tclass tsize, tfewer tstudents treceiving tfree tmeals, tand ta thigher
tpercentage tof tteachers thaving tfull tteaching tcredentials. tBelow, twe tuse tproc treg tfor trunning
tthis tregression tmodel tfollowed tby tthe tSAS toutput.
tParameter tEstimates
tParameter tStandard
Variable tLabel tDF tEstimate tError tt tValue tPr t> t|t|
Let’s tfocus ton tthe tthree tpredictors, twhether tthey tare tstatistically tsignificant tand, tif tso, tthe
tdirection tof tthe trelationship. tThe taverage tclass tsize t(acs_k3, tb=-2.68), tis tnot tsignificant
t(p=0.0553), tbut tonly tjust tso, tand tthe tcoefficient tis tnegative twhich twould tindicate tthat tlarger
tclass tsizes tis trelated tto tlower tacademic tperformance t— twhich tis twhat twe twould texpect.
tNext, tthe teffect tof tmeals t(b=-3.70, tp<.0001) tis tsignificant tand tits tcoefficient tis tnegative
tindicating tthat tthe tgreater tthe tproportion tstudents treceiving tfree tmeals, tthe tlower tthe
tacademic tperformance. tPlease tnote, tthat twe tare tnot tsaying tthat tfree tmeals tare tcausing tlower
tacademic tperformance. tThe tmeals tvariable tis thighly trelated tto tincome tlevel tand tfunctions
tmore tas ta tproxy tfor tpoverty. tThus, thigher tlevels tof tpoverty tare tassociated twith tlower
tacademic tperformance. tThis tresult talso tmakes tsense. tFinally, tthe tpercentage tof tteachers twith
tfull tcredentials t(full, tb=0.11, tp=.2321) tseems tto tbe tunrelated tto tacademic tperformance. tThis
twould tseem tto tindicate tthat tthe tpercentage tof tteachers twith tfull tcredentials tis tnot tan
timportant tfactor tin tpredicting tacademic tperformance t— tthis tresult twas tsomewhat tunexpected.
First, tlet’s tuse tproc tcontents tto tlearn tmore tabout tthis tdata tfile. tWe tcan tverify thow tmany
tobservations tit thas tand tsee tthe tnames tof tthe tvariables tit tcontains. t
t1 t906 t41 t693 t600 t93 t67 t9 t0 t11 t16 t22 t0 t0 t0 t0 t0 t. t76 t24 t247 t2
t2 t889 t41 t570 t501 t69 t92 t21 t0 t33 t15 t32 t0 t0 t0 t0 t0 t. t79 t19 t463 t3
t3 t887 t41 t546 t472 t74 t97 t29 t0 t36 t17 t25 t0 t0 t0 t0 t0 t. t68 t29 t395 t3
t4 t876 t41 t571 t487 t84 t90 t27 t0 t27 t20 t30 t36 t45 t9 t9 t0 t1.91000 t87 t11 t418 t3
t5 t888 t41 t478 t425 t53 t89 t30 t0 t44 t18 t31 t50 t50 t0 t0 t0 t1.50000 t87 t13 t520 t3
Time tseries tis ta tsequence tof tobservations trecorded tat tregular ttime tintervals twith tmany
tapplications tsuch tas tin tdemand tand tsales, tnumber tof tvisitors tto ta twebsite, tstock tprice, tetc.
tIn tthis tsection, twe tfocus ton ttwo ttime tseries tdatasets tthat tone tis tthe tUS tEncompass tHealth
tCorporations tsales tand tthe tother tis tthe tsoft tEncompass tHealth tCorporation tsales.
The tSAS tpackage tdata tfile. tThe tfirst t5 trows tare tshown tas tbelow.
[9]:
df_Encompass tHealth tCorporation.head()
[9]:
sale year month
s
date
2023-01- 401 2023 Jan
01
2023-02- 482 2023 Feb
01
2023-03- 507 2023 Mar
01
2023-04- 508 2023 Apr
01
2023-05- 517 2023 May
01
[10]:
df_Encompass tHealth tCorporation.head()
[10]:
sales year quarter
date
2022-03-31 1807.3 2022 Q1
7
2022-06-30 2355.3 2022 Q2
2
2022-09-30 2591.8 2022 Q3
3
2022-12-31 2236.3 2022 Q4
9
2023-03-31 1549.1 2023 Q1
4
There tare tunivariate tand tmultivariate ttime tseries twhere t- tA tunivariate ttime tseries tis ta tseries
twith ta tsingle ttime-dependent tvariable, tand t- tA tMultivariate ttime tseries thas tmore tthan tone
ttime-dependent tvariable. tEach tvariable tdepends tnot tonly ton tits tpast tvalues tbut talso thas
tsome tdependency ton tother tvariables. tThis tdependency tis tused tfor tforecasting tfuture tvalues.
Our tdatasets tare tunivariate ttime tseries. tTime tseries tdata tcan tbe tthought tof tas tspecial tcases
tof tpanel tdata. tPanel tdata t(or tlongitudinal tdata) talso tinvolves tmeasurements tover ttime. tThe
tdifference tis tthat, tin taddition tto ttime tseries, tit talso tcontains tone tor tmore trelated tvariables
tthat tare tmeasured tfor tthe tsame ttime tperiods.
Now, tWe tplot tthe ttime tseries tdata
[11]:
plot_time_series(df_Encompass tHealth tCorporation, t'sales', ttitle='Encompass tHealth tCorporation
tSales')
[12]:
plot_time_series(df_Encompass tHealth tCorporation, t'sales', ttitle='Encompass tHealth tCorporation
tSales')
White tNoise
A ttime tseries tis twhite tnoise tif tthe tobservations tare tindependent tand tidentically tdistributed
twith ta tmean tof tzero. tThis tmeans tthat tall tobservations thave tthe tsame tvariance tand teach
tvalue thas ta tzero tcorrelation twith tall tother tvalues tin tthe tseries. tWhite tnoise tis tan timportant
tconcept tin ttime tseries tanalysis tand tforecasting tbecause:
Predictability: tif tthe ttime tseries tis twhite tnoise, tthen, tby tdefinition, tit tis trandom. tWe tcannot
treasonably tmodel tit tand tmake tpredictions.
Model tdiagnostics: tthe tseries tof terrors tfrom ta ttime tseries tforecast tmodel tshould tideally tbe
twhite tnoise.
[13]:
pd.Series(np.random.randn(200)).plot(title='Random tWhite tNoise')
plt.show()
INFLATION tFORECASTING t
Assuming tthat tyou tare tan teconomist tworking tat tthe tReserve tBank tof tAustralia t(RBA), tand
tyou thave tbeen ttasked tto tforecast tquarterly tination tfor tthe tnext t4 tquarters t(i.e, tSep-2023,
tDec2023, tMar-2024, tand tJun-2024) tusing tautoregressive tmoving taverage t(ARMA) ttype
tmodels. tHistorical tination tdata tcan tbe tdownloaded tfrom tRBA twebsite: thttps://
twww.rba.gov.au/statistics/tables/xls/g01hist.xls?v=2023-10-04-10-19-06The tforecast tdata tare
tcollected t(only tas tpoint tforecastsFootnote1) tin tthe tperiod tfrom t2023 tto t2024 tfor t6
tinstitutions twhich thave tcontinuously tproduced tforecastsFootnote2 tregarding tinflation tand treal
tgrowth trate tof tGDP tfor tCroatia, talbeit, tunderstandably, tat tdifferent tfrequencies tand tpoints tin
ttime. tThus, tsome tof tthe tforecast thorizons tin tthe tcollected tdata tset twere tleft twith tfewer tdata
tand tfewer tcontributors tand twere ttherefore tnot tincluded tin tthe tanalysis. tThe tnumber tof
tforecast tdata tper tinstitution tand thorizon tis tpresented tin tTables t1 tand t2 tregarding tthe tGDP
tgrowth tand tinflation trespectively.
Regarding tboth tthe tGDP tgrowth trate tand tinflation, tforecast thorizons t21, t27 tand t30 months
tahead twere tfiltered tout tdue tto thaving t30 tor tless tforecasts tin ttotal tand tmostly tonly tfour
tcontributing tinstitutions tout tof tsix. tThis talso texplains twhy tadding tmore tforecasters tto tthe
tanalysis tis tnot teasy. tThe tinstitutions tchosen tfor tthe tpurposes tof tthis tresearch thave ta tlot tof
tmatching tforecast thorizons t(the tonly tnotable texception tbeing tinstitutions t5 tand t6 twhose
tmutual thorizons tdo tnot tmatch tat tall) twhich thelps twith teconometric ttests tand tthe
tinterpretation tof tresults. tFurthermore, tit tshould tbe tmentioned tthat tfor tthe tpurpose tof
tconducting teconometric ttests tin tthe tfifth tsection tof tthe tpaper, tfurther tforecast thorizons
t(containing t4 tor tless tforecasts) thad tto tbe teliminated tfor tthe tinstitution tnumber t1 tfor tboth
tvariables tanalysed. tThis, talso timplies tthat tinstitution tnumber t6 tbarely tmet tthe tinclusion
tcriteria tregarding tthe tavailable tforecast tdata tfor tinflation.
The tgroup tof tsix tforecasting tinstitutions tconsists tof ttwo tinternational tinstitutions tand tfour
tdomestic tones tout tof twhich ttwo tare tprivately towned tfinancial tinstitutions tand tthe tothers tare
tfrom tthe tpublic tsector. tThe tinitial tanalysis tof tcollected tdata tin tterms tof tsimple tMAE t(Mean
tAbsolute tError) tgenerally tshows ta trising ttrend tin tforecast terror tas tthe tforecast thorizon tgets
tlonger t(for tboth tinflation tand tGDP tgrowth trate) tas texpected. tThis tis tpresented tin tFigures t1
tand t2.
the taverage tforecast terror tfor tall tinstitutions tcan tbe ttracked tover tthe tanalysed tperiod tfor
teach tforecast thorizon. tFigure t3 tfor tthe tGDP tgrowth trate tshows tagain tthe tgrowing tforecast
terror tas tthe tforecast thorizon tgets tlonger tbut talso tshows tthe tsignificant tinfluence tof tfinancial
tcrisis twith tthe tbiggest tforecast terror tin tthe tyear t2021across tforecast thorizons. tA tsmaller trise
tin tforecast terrors tis talso tpresent tin tthe tyear t2022 tas ta tfall tin tGDP tdeclined tfrom t−7.4%
tand t−1.7% tin t2019and t2023to t−0,3% tin t2023 tinducing toptimism twhich tlater tturned tout tto
tbe tunsubstantiated tas tthe tnew tgovernment ttook toffice tin t2023. tOnly t3 tforecast thorizons t(the
tshortest, tthe tlongest tand tthe tmiddle tone) tare treported tas tthe taverage tMAE tfor tall tother
thorizons texhibit tsimilar tbehaviour tthat tfalls tin tbetween tof twhat tis tpresented there.
REFRENCES:
1. Bank tof tEngland. t(2015, tNovember). tEvaluating tforecast tperformance. tLondon: tIndependent
tEvaluation tOffice. t[Google tScholar]
2. Baghestani, tH., t& tDanila, tL. t(2014). tOn tthe taccuracy tof tanalysts’ tforecasts tof tinflation tin tan
temerging tmarket teconomy. tEastern tEuropean tEconomics, t52(4), t32–46. t[Taylor t& tFrancis
tOnline] t[Web tof tScience t®], t[Google tScholar]
3. Baghestani, tH., t& tMarchon, tC. t(2015). tOn tthe taccuracy tof tprivate tforecasts tof tinflation tand
tgrowth tin tBrazil. tJournal tof tEconomics tand tFinance, t39, t370–381. tdoi:10.1007/s12197-013-
9263-1 t[Crossref], t[Google tScholar]
4. Behrens, tC., tPierdzioch, tC., t& tRisse, tM. t(2018). tTesting tthe toptimality tof tinflation tforecasts
tunder tflexible tloss twith trandom tforests. tEconomic tModelling, t72, t270–277.
tdoi:10.1016/j.econmod.2018.02.004 t[Crossref] t[Web tof tScience t®], t[Google tScholar]
5. Boero, tG., tSmith, tJ., t& tWallis, tK. tF. t(2008). tEvaluating ta tthree-dimensional tpanel tof tpoint
tforecasts: tthe tBank tof tEngland tSurvey tof tExternal tForecasters. tInternational tJournal tof
tForecasting, t24(3), t354–367. tdoi:10.1016/j.ijforecast.2008.04.003 t[Crossref] t[Web tof tScience
t®], t[Google tScholar]
6. Cabanillas, tL. tG., t& tTerzi, tA. t(2012). tThe taccuracy tof tthe tEuropean tCommission's tforecasts
tre-examined t(Economic tPaper tNo. t476). tBrussels: tDirectorate-General tEconomic tand tFinancial
tAffairs t(DG tECFIN), tEuropean tCommission. t[Google tScholar]
7. Capistran, tC., t& tLopez-Moctezuma, tG. t(2014). tForecast trevisions tof tMexican tinflation tand
tGDP tgrowth. tInternational tJournal tof tForecasting, t30, t177–191. t[Crossref] t[Web tof tScience
t®], t[Google tScholar]
8. Carvalho, tF. tA., t& tMinella, tA. t(2012). tSurvey tforecasts tin tBrazil: tA tprismatic tassessment tof
tepidemiology, tperformance, tand tdeterminants. tJournal tof tInternational tMoney tand tFinance,
t31(6), t1371–1391. tdoi:10.1016/j.jimonfin.2012.02.006 t[Crossref] t[Web tof tScience t®], t[Google
tScholar]
9. Chen, tQ., tCostantini, tM., t& tDeschamps, tB. t(2016). tHow taccurate tare tprofessional tforecasts
tin tAsia? tEvidence tfrom tten tcountries. tInternational tJournal tof tForecasting, t32(1), t154–167.
tdoi:10.1016/j.ijforecast.2015.05.004 t[Crossref] t[Web tof tScience t®], t[Google tScholar]
10. Clements, tM. tP., tJoutz, tF., t& tStekler, tH. tO. t(2007). tAn tevaluation tof tthe tforecasts tof tthe
tFederal tReserve: ta tpooled tapproach. tJournal tof tApplied tEconometrics, t22(1), t121–136.
tdoi:10.1002/jae.954 t[Crossref] t[Web tof tScience t®], t[Google tScholar]