0% found this document useful (0 votes)
78 views

Assignment Brief 2023

This document provides instructions for an individual data mining coursework assignment requiring analysis of the ORGANICS dataset within SAS Enterprise Miner. Students are asked to use directed data mining techniques covered in the course to analyze the dataset, detail the results in a structured technical report, and will be assessed according to criteria in the appendices. The document includes this brief, information on the dataset containing 10,000 observations and 13 variables, and templates for the report, self-assessment, and marking criteria.

Uploaded by

IT'S SIMPLE
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views

Assignment Brief 2023

This document provides instructions for an individual data mining coursework assignment requiring analysis of the ORGANICS dataset within SAS Enterprise Miner. Students are asked to use directed data mining techniques covered in the course to analyze the dataset, detail the results in a structured technical report, and will be assessed according to criteria in the appendices. The document includes this brief, information on the dataset containing 10,000 observations and 13 variables, and templates for the report, self-assessment, and marking criteria.

Uploaded by

IT'S SIMPLE
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Data Mining Report: A Patchwork Assignment

The coursework is an individual piece of assessment, requiring you to analyse the ORGANICS dataset
within SAS Enterprise Miner, using the directed data mining techniques covered in the IMAT3613
module, and detailing your results, interpretations, conclusions and recommendations in a well-
structured technical report. You are provided with:

1. This Brief.
2. The ORGANICS dataset contains 10,000 observations and 13 variables shown in Appendix B.
3. The coursework will be assessed according to the marking grid in Appendix C.
4. Self/Peer Assessment Rubric Appendix D.
5. Template Report in Appendix A.
SUMMARY

Let tus tdive tright tin tand tperform ta tregression tanalysis tusing tthe tvariables tapi00, tacs_k3,
tmeals tand tfull. tThese tmeasure tthe tacademic tperformance tof tthe tschool t(api00), tthe taverage
tclass tsize tin tkindergarten tthrough t3rd tgrade t(acs_k3), tthe tpercentage tof tstudents treceiving
tfree tmeals t(meals) t– twhich tis tan tindicator tof tpoverty, tand tthe tpercentage tof tteachers twho
thave tfull tteaching tcredentials t(full). tWe texpect tthat tbetter tacademic tperformance twould tbe
tassociated twith tlower tclass tsize, tfewer tstudents treceiving tfree tmeals, tand ta thigher
tpercentage tof tteachers thaving tfull tteaching tcredentials. tBelow, twe tuse tproc treg tfor trunning
tthis tregression tmodel tfollowed tby tthe tSAS toutput.

proc treg tdata="c:sasregelemapi";


tmodel tapi00 t= tacs_k3 tmeals tfull;
run;

The tREG tProcedure


Model: tMODEL1
Dependent tVariable: tapi00 tapi t2000

tAnalysis tof tVariance

tSum tof tMean


Source tDF tSquares tSquare tF tValue tPr t> tF

Model t3 t2634884 t878295 t213.41 t<.0001


Error t309 t1271713 t4115.57673
Corrected tTotal t312 t3906597

Root tMSE t64.15276 tR-Square t0.6745


Dependent tMean t596.40575 tAdj tR-Sq t0.6713
Coeff tVar t10.75656

tParameter tEstimates

tParameter tStandard
Variable tLabel tDF tEstimate tError tt tValue tPr t> t|t|

Intercept tIntercept t1 t906.73916 t28.26505 t32.08 t<.0001


acs_k3 tavg tclass tsize tk-3 t1 t-2.68151 t1.39399 t-1.92 t0.0553
meals tpct tfree tmeals t1 t-3.70242 t0.15403 t-24.04 t<.0001
full tpct tfull tcredential t1 t0.10861 t0.09072 t1.20 t0.2321

Let’s tfocus ton tthe tthree tpredictors, twhether tthey tare tstatistically tsignificant tand, tif tso, tthe
tdirection tof tthe trelationship. tThe taverage tclass tsize t(acs_k3, tb=-2.68), tis tnot tsignificant
t(p=0.0553), tbut tonly tjust tso, tand tthe tcoefficient tis tnegative twhich twould tindicate tthat tlarger
tclass tsizes tis trelated tto tlower tacademic tperformance t— twhich tis twhat twe twould texpect.
tNext, tthe teffect tof tmeals t(b=-3.70, tp<.0001) tis tsignificant tand tits tcoefficient tis tnegative
tindicating tthat tthe tgreater tthe tproportion tstudents treceiving tfree tmeals, tthe tlower tthe
tacademic tperformance. tPlease tnote, tthat twe tare tnot tsaying tthat tfree tmeals tare tcausing tlower
tacademic tperformance. tThe tmeals tvariable tis thighly trelated tto tincome tlevel tand tfunctions
tmore tas ta tproxy tfor tpoverty. tThus, thigher tlevels tof tpoverty tare tassociated twith tlower
tacademic tperformance. tThis tresult talso tmakes tsense. tFinally, tthe tpercentage tof tteachers twith
tfull tcredentials t(full, tb=0.11, tp=.2321) tseems tto tbe tunrelated tto tacademic tperformance. tThis
twould tseem tto tindicate tthat tthe tpercentage tof tteachers twith tfull tcredentials tis tnot tan
timportant tfactor tin tpredicting tacademic tperformance t— tthis tresult twas tsomewhat tunexpected.

1.2 tExamining tdata

First, tlet’s tuse tproc tcontents tto tlearn tmore tabout tthis tdata tfile. tWe tcan tverify thow tmany
tobservations tit thas tand tsee tthe tnames tof tthe tvariables tit tcontains. t

proc tcontents tdata="c:sasregelemapi" t;


run;
The tCONTENTS tProcedure

Data tSet tName: tc:sasregelemapi tObservations: t400


Member tType: tDATA tVariables: t21
Engine: tV8 tIndexes: t0
Created: t4:58 tSaturday, tJanuary t9, t1960 tObservation tLength: t83
Last tModified: t4:58 tSaturday, tJanuary t9, t1960 tDeleted tObservations: t0
Protection: tCompressed: tNO
Data tSet tType: tSorted: tNO
Label:

t-----Engine/Host tDependent tInformation-----

Data tSet tPage tSize: t8192


Number tof tData tSet tPages: t5
First tData tPage: t1
Max tObs tper tPage: t98
Obs tin tFirst tData tPage: t56
Number tof tData tSet tRepairs: t0
File tName: tc:sasregelemapi.sas7bdat
Release tCreated: t7.0000M0
Host tCreated: tWIN_NT

t-----Alphabetic tList tof tVariables tand tAttributes-----


t# tVariable tType tLen tPos tLabel
-----------------------------------------------------------------------------
11 tacs_46 tNum t3 t39 tavg tclass tsize t4-6
10 tacs_k3 tNum t3 t36 tavg tclass tsize tk-3
t3 tapi00 tNum t4 t12 tapi t2000
t4 tapi99 tNum t4 t16 tapi t1999
17 tavg_ed tNum t8 t57 tavg tparent ted
15 tcol_grad tNum t3 t51 tparent tcollege tgrad
t2 tdnum tNum t4 t8 tdistrict tnumber
t7 tell tNum t3 t27 tenglish tlanguage tlearners
19 temer tNum t3 t73 tpct temer tcredential
20 tenroll tNum t4 t76 tnumber tof tstudents
18 tfull tNum t8 t65 tpct tfull tcredential
16 tgrad_sch tNum t3 t54 tparent tgrad tschool
t5 tgrowth tNum t4 t20 tgrowth t1999 tto t2000
13 thsg tNum t3 t45 tparent thsg
21 tmealcat tNum t3 t80 tPercentage tfree tmeals tin t3 tcategories
t6 tmeals tNum t3 t24 tpct tfree tmeals
t9 tmobility tNum t3 t33 tpct t1st tyear tin tschool
12 tnot_hsg tNum t3 t42 tparent tnot thsg
t1 tsnum tNum t8 t0 tschool tnumber
14 tsome_col tNum t3 t48 tparent tsome tcollege
t8 tyr_rnd tNum t3 t30 tyear tround tschool t
We twill tnot tgo tinto tall tof tthe tdetails tof tthis toutput. tNote tthat tthere tare t400 tobservations
tand t21 tvariables. tWe thave tvariables tabout tacademic tperformance tin t2000 tand t1999 tand tthe
tchange tin tperformance, tapi00, tapi99 tand tgrowth trespectively. tWe talso thave tvarious
tcharacteristics tof tthe tschools, te.g., tclass tsize, tparents teducation, tpercent tof tteachers twith tfull
tand temergency tcredentials, tand tnumber tof tstudents. tNote tthat twhen twe tdid tour toriginal
tregression tanalysis tit tsaid tthat tthere twere t313 tobservations, tbut tthe tproc tcontents toutput
tindicates tthat twe thave t400 tobservations tin tthe tdata tfile.

proc tprint tdata="c:sasregelemapi"(obs=5) t;


run;
tm ts tc tg
to tn to to tr tm
tg ty tb ta ta to tm tl ta ta te te
ta ta tr tm tr ti tc tc tt te t_ td tv tn ta
ts td tp tp to te t_ tl ts ts t_ t_ tg t_ tg tf te tr tl
tO tn tn ti ti tw ta te tr ti t_ t_ th th tc tr ts t_ tu tm to tc
tb tu tu t0 t9 tt tl tl tn tt tk t4 ts ts to ta tc te tl te tl ta
ts tm tm t0 t9 th ts tl td ty t3 t6 tg tg tl td th td tl tr tl tt

t1 t906 t41 t693 t600 t93 t67 t9 t0 t11 t16 t22 t0 t0 t0 t0 t0 t. t76 t24 t247 t2
t2 t889 t41 t570 t501 t69 t92 t21 t0 t33 t15 t32 t0 t0 t0 t0 t0 t. t79 t19 t463 t3
t3 t887 t41 t546 t472 t74 t97 t29 t0 t36 t17 t25 t0 t0 t0 t0 t0 t. t68 t29 t395 t3
t4 t876 t41 t571 t487 t84 t90 t27 t0 t27 t20 t30 t36 t45 t9 t9 t0 t1.91000 t87 t11 t418 t3
t5 t888 t41 t478 t425 t53 t89 t30 t0 t44 t18 t31 t50 t50 t0 t0 t0 t1.50000 t87 t13 t520 t3

TIME tSERIES tANALYSIS t

Time tseries tis ta tsequence tof tobservations trecorded tat tregular ttime tintervals twith tmany
tapplications tsuch tas tin tdemand tand tsales, tnumber tof tvisitors tto ta twebsite, tstock tprice, tetc.
tIn tthis tsection, twe tfocus ton ttwo ttime tseries tdatasets tthat tone tis tthe tUS tEncompass tHealth
tCorporations tsales tand tthe tother tis tthe tsoft tEncompass tHealth tCorporation tsales.
The tSAS tpackage tdata tfile. tThe tfirst t5 trows tare tshown tas tbelow.
[9]:
df_Encompass tHealth tCorporation.head()
[9]:
sale year month
s
date
2023-01- 401 2023 Jan
01
2023-02- 482 2023 Feb
01
2023-03- 507 2023 Mar
01
2023-04- 508 2023 Apr
01
2023-05- 517 2023 May
01
[10]:
df_Encompass tHealth tCorporation.head()
[10]:
sales year quarter
date
2022-03-31 1807.3 2022 Q1
7
2022-06-30 2355.3 2022 Q2
2
2022-09-30 2591.8 2022 Q3
3
2022-12-31 2236.3 2022 Q4
9
2023-03-31 1549.1 2023 Q1
4
There tare tunivariate tand tmultivariate ttime tseries twhere t- tA tunivariate ttime tseries tis ta tseries
twith ta tsingle ttime-dependent tvariable, tand t- tA tMultivariate ttime tseries thas tmore tthan tone
ttime-dependent tvariable. tEach tvariable tdepends tnot tonly ton tits tpast tvalues tbut talso thas
tsome tdependency ton tother tvariables. tThis tdependency tis tused tfor tforecasting tfuture tvalues.
Our tdatasets tare tunivariate ttime tseries. tTime tseries tdata tcan tbe tthought tof tas tspecial tcases
tof tpanel tdata. tPanel tdata t(or tlongitudinal tdata) talso tinvolves tmeasurements tover ttime. tThe
tdifference tis tthat, tin taddition tto ttime tseries, tit talso tcontains tone tor tmore trelated tvariables
tthat tare tmeasured tfor tthe tsame ttime tperiods.
Now, tWe tplot tthe ttime tseries tdata
[11]:
plot_time_series(df_Encompass tHealth tCorporation, t'sales', ttitle='Encompass tHealth tCorporation
tSales')

[12]:
plot_time_series(df_Encompass tHealth tCorporation, t'sales', ttitle='Encompass tHealth tCorporation
tSales')

White tNoise
A ttime tseries tis twhite tnoise tif tthe tobservations tare tindependent tand tidentically tdistributed
twith ta tmean tof tzero. tThis tmeans tthat tall tobservations thave tthe tsame tvariance tand teach
tvalue thas ta tzero tcorrelation twith tall tother tvalues tin tthe tseries. tWhite tnoise tis tan timportant
tconcept tin ttime tseries tanalysis tand tforecasting tbecause:
Predictability: tif tthe ttime tseries tis twhite tnoise, tthen, tby tdefinition, tit tis trandom. tWe tcannot
treasonably tmodel tit tand tmake tpredictions.
Model tdiagnostics: tthe tseries tof terrors tfrom ta ttime tseries tforecast tmodel tshould tideally tbe
twhite tnoise.
[13]:
pd.Series(np.random.randn(200)).plot(title='Random tWhite tNoise')
plt.show()

INFLATION tFORECASTING t

Assuming tthat tyou tare tan teconomist tworking tat tthe tReserve tBank tof tAustralia t(RBA), tand
tyou thave tbeen ttasked tto tforecast tquarterly tination tfor tthe tnext t4 tquarters t(i.e, tSep-2023,
tDec2023, tMar-2024, tand tJun-2024) tusing tautoregressive tmoving taverage t(ARMA) ttype
tmodels. tHistorical tination tdata tcan tbe tdownloaded tfrom tRBA twebsite: thttps://
twww.rba.gov.au/statistics/tables/xls/g01hist.xls?v=2023-10-04-10-19-06The tforecast tdata tare
tcollected t(only tas tpoint tforecastsFootnote1) tin tthe tperiod tfrom t2023 tto t2024 tfor t6
tinstitutions twhich thave tcontinuously tproduced tforecastsFootnote2 tregarding tinflation tand treal
tgrowth trate tof tGDP tfor tCroatia, talbeit, tunderstandably, tat tdifferent tfrequencies tand tpoints tin
ttime. tThus, tsome tof tthe tforecast thorizons tin tthe tcollected tdata tset twere tleft twith tfewer tdata
tand tfewer tcontributors tand twere ttherefore tnot tincluded tin tthe tanalysis. tThe tnumber tof
tforecast tdata tper tinstitution tand thorizon tis tpresented tin tTables t1 tand t2 tregarding tthe tGDP
tgrowth tand tinflation trespectively.
Regarding tboth tthe tGDP tgrowth trate tand tinflation, tforecast thorizons t21, t27 tand t30 months
tahead twere tfiltered tout tdue tto thaving t30 tor tless tforecasts tin ttotal tand tmostly tonly tfour
tcontributing tinstitutions tout tof tsix. tThis talso texplains twhy tadding tmore tforecasters tto tthe
tanalysis tis tnot teasy. tThe tinstitutions tchosen tfor tthe tpurposes tof tthis tresearch thave ta tlot tof
tmatching tforecast thorizons t(the tonly tnotable texception tbeing tinstitutions t5 tand t6 twhose
tmutual thorizons tdo tnot tmatch tat tall) twhich thelps twith teconometric ttests tand tthe
tinterpretation tof tresults. tFurthermore, tit tshould tbe tmentioned tthat tfor tthe tpurpose tof
tconducting teconometric ttests tin tthe tfifth tsection tof tthe tpaper, tfurther tforecast thorizons
t(containing t4 tor tless tforecasts) thad tto tbe teliminated tfor tthe tinstitution tnumber t1 tfor tboth
tvariables tanalysed. tThis, talso timplies tthat tinstitution tnumber t6 tbarely tmet tthe tinclusion
tcriteria tregarding tthe tavailable tforecast tdata tfor tinflation.
The tgroup tof tsix tforecasting tinstitutions tconsists tof ttwo tinternational tinstitutions tand tfour
tdomestic tones tout tof twhich ttwo tare tprivately towned tfinancial tinstitutions tand tthe tothers tare
tfrom tthe tpublic tsector. tThe tinitial tanalysis tof tcollected tdata tin tterms tof tsimple tMAE t(Mean
tAbsolute tError) tgenerally tshows ta trising ttrend tin tforecast terror tas tthe tforecast thorizon tgets
tlonger t(for tboth tinflation tand tGDP tgrowth trate) tas texpected. tThis tis tpresented tin tFigures t1
tand t2.
the taverage tforecast terror tfor tall tinstitutions tcan tbe ttracked tover tthe tanalysed tperiod tfor
teach tforecast thorizon. tFigure t3 tfor tthe tGDP tgrowth trate tshows tagain tthe tgrowing tforecast
terror tas tthe tforecast thorizon tgets tlonger tbut talso tshows tthe tsignificant tinfluence tof tfinancial
tcrisis twith tthe tbiggest tforecast terror tin tthe tyear t2021across tforecast thorizons. tA tsmaller trise
tin tforecast terrors tis talso tpresent tin tthe tyear t2022 tas ta tfall tin tGDP tdeclined tfrom t−7.4%
tand t−1.7% tin t2019and t2023to t−0,3% tin t2023 tinducing toptimism twhich tlater tturned tout tto
tbe tunsubstantiated tas tthe tnew tgovernment ttook toffice tin t2023. tOnly t3 tforecast thorizons t(the
tshortest, tthe tlongest tand tthe tmiddle tone) tare treported tas tthe taverage tMAE tfor tall tother
thorizons texhibit tsimilar tbehaviour tthat tfalls tin tbetween tof twhat tis tpresented there.

REFRENCES:

1. Bank tof tEngland. t(2015, tNovember). tEvaluating tforecast tperformance. tLondon: tIndependent
tEvaluation tOffice. t[Google tScholar]
2. Baghestani, tH., t& tDanila, tL. t(2014). tOn tthe taccuracy tof tanalysts’ tforecasts tof tinflation tin tan
temerging tmarket teconomy. tEastern tEuropean tEconomics, t52(4), t32–46. t[Taylor t& tFrancis
tOnline] t[Web tof tScience t®], t[Google tScholar]
3. Baghestani, tH., t& tMarchon, tC. t(2015). tOn tthe taccuracy tof tprivate tforecasts tof tinflation tand
tgrowth tin tBrazil. tJournal tof tEconomics tand tFinance, t39, t370–381. tdoi:10.1007/s12197-013-
9263-1 t[Crossref], t[Google tScholar]
4. Behrens, tC., tPierdzioch, tC., t& tRisse, tM. t(2018). tTesting tthe toptimality tof tinflation tforecasts
tunder tflexible tloss twith trandom tforests. tEconomic tModelling, t72, t270–277.
tdoi:10.1016/j.econmod.2018.02.004 t[Crossref] t[Web tof tScience t®], t[Google tScholar]
5. Boero, tG., tSmith, tJ., t& tWallis, tK. tF. t(2008). tEvaluating ta tthree-dimensional tpanel tof tpoint
tforecasts: tthe tBank tof tEngland tSurvey tof tExternal tForecasters. tInternational tJournal tof
tForecasting, t24(3), t354–367. tdoi:10.1016/j.ijforecast.2008.04.003 t[Crossref] t[Web tof tScience
t®], t[Google tScholar]
6. Cabanillas, tL. tG., t& tTerzi, tA. t(2012). tThe taccuracy tof tthe tEuropean tCommission's tforecasts
tre-examined t(Economic tPaper tNo. t476). tBrussels: tDirectorate-General tEconomic tand tFinancial
tAffairs t(DG tECFIN), tEuropean tCommission. t[Google tScholar]
7. Capistran, tC., t& tLopez-Moctezuma, tG. t(2014). tForecast trevisions tof tMexican tinflation tand
tGDP tgrowth. tInternational tJournal tof tForecasting, t30, t177–191. t[Crossref] t[Web tof tScience
t®], t[Google tScholar]
8. Carvalho, tF. tA., t& tMinella, tA. t(2012). tSurvey tforecasts tin tBrazil: tA tprismatic tassessment tof
tepidemiology, tperformance, tand tdeterminants. tJournal tof tInternational tMoney tand tFinance,
t31(6), t1371–1391. tdoi:10.1016/j.jimonfin.2012.02.006 t[Crossref] t[Web tof tScience t®], t[Google
tScholar]
9. Chen, tQ., tCostantini, tM., t& tDeschamps, tB. t(2016). tHow taccurate tare tprofessional tforecasts
tin tAsia? tEvidence tfrom tten tcountries. tInternational tJournal tof tForecasting, t32(1), t154–167.
tdoi:10.1016/j.ijforecast.2015.05.004 t[Crossref] t[Web tof tScience t®], t[Google tScholar]
10. Clements, tM. tP., tJoutz, tF., t& tStekler, tH. tO. t(2007). tAn tevaluation tof tthe tforecasts tof tthe
tFederal tReserve: ta tpooled tapproach. tJournal tof tApplied tEconometrics, t22(1), t121–136.
tdoi:10.1002/jae.954 t[Crossref] t[Web tof tScience t®], t[Google tScholar]

You might also like