STA3022 Test2 Solutions
STA3022 Test2 Solutions
Question 1 [5 marks]
(a) What is test-retest reliability? (1)
(b) What is internal consistency reliability? (1)
(c) How do you measure internal consistency? Provide three formula’s or explanations, not just
the names of the methods. (3)
Answer to Q1
(a) One of
A reliable measuring instrument in this context is one that gives consistent scores when used
repeatedly.
Or
There should be high correlations between test scores taken over multiple trials.
(b) The group of questions is internally-consistent or reliable if they are able to measure the same
underlying construct.
Answer to Q2
(a) 𝑑12 = √∑𝑝𝑗=1(𝑥1𝑗 − 𝑥2𝑗 )2 = √(10 − 15)2 + (8 − 16)2 + (16 − 4)2 + (3 − 14)2 =
2
(c) 𝑠𝑡𝑟𝑒𝑠𝑠 = ∑𝑛−1 𝑛
𝑖=1 ∑𝑗=𝑖+1(𝑑𝑖𝑗 − 𝛿𝑖𝑗 )
The aim of MDS is to find a representationof the samples so that the dissimilarities
between
them in the plot, given by 𝛿𝑖𝑗 , match the given dissimilarities 𝑑𝑖𝑗 as closely as possible
(optimally).
If the symbols are reversed, no marks are deducted as long as die descriptions are correct.
The current study aims to identify what factors make some people believe that they are lucky and others
believe that they are unlucky. The study is based on a survey of 62 STA3022F students who answered the
following questions in an online questionnaire (possible responses for categorical variables are given in
brackets).
A discriminant analysis model has been constructed with the aim of identify which, if any, of the four
independent variables are able to distinguish between the two groups (groups labelled as “Yes”, and “No”).
Questions:
2
b) Which groups is the discriminant model able to significantly discriminate between? Provide
statistical evidence at the 5% level to support your answer. Clearly state all null and alternate
hypotheses. (4)
c) Use the cut-off value rule to classify Respondent 4. Clearly indicate the classification rule. Is this a
correct classification? (5.5)
d) Compare the overall hit rate with two chance criteria and use these comparisons to evaluate the
overall quality of the discriminant model (4)
e) Evaluate whether the discriminant model is better at predicting some groups than others. (Hint:
Calculate the correct classification rate for each group)
(1.5)
12 12
2 2
𝑑 = (−1.0242 − 1.0974) = 4.501187
12 (ratio)
12 (answer)
(𝑛 − 1 − 𝑝)𝑛1 𝑛2 2 (62 − 1 − 4) ∗ 34 ∗ 28
𝐹𝑦𝑒𝑠,𝑙𝑜𝑤 = 𝑑 = ∗ 4.501187 = 16.41481
𝑝(𝑛 − 2)(𝑛1 +𝑛2 ) 4 ∗ (62 − 2) ∗ (34 + 28)
(or alternatively they can say that the F calculated is very high)
3
Q3-c) First we need to calculate the cut-off value
12 (ratio)
12 (answer)
𝑛1 𝑍̅ 2 + 𝑛2 𝑍̅1 34 ∗ 1.0974 + 28 ∗ (−1.0242)
𝐶𝑢𝑡 − 𝑜𝑓𝑓 = = = 0.1392581
𝑛1 + 𝑛2 34 + 28
12
If Z<0.1392581 then classify as “YES”
12 12
Since Z4 < 0.1392581, classify as “YES”, hence the centroid for Yes is negative
12 12
𝐻𝑚𝑎𝑥 = max(34/62, 28/62) = 54.84%
12 12
34 2 28 2
𝐻𝑝𝑟𝑜𝑝 = ( ) + ( ) = 50.47%
62 62
12 12
Hit-rate is greater than both 𝐻𝑚𝑎𝑥 and 𝐻𝑝𝑟𝑜𝑝 , therefore this indicates a good hit-rate.
4
Q3-e) Evaluate the hit-rate for each category
28 12
𝐻𝑖𝑡 − 𝑟𝑎𝑡𝑒(𝑦𝑒𝑠) = = 82.4%
34
12
24
𝐻𝑖𝑡 − 𝑟𝑎𝑡𝑒(𝑛𝑜) = = 85.7%
28
Both correct classification rates are similar and very good. 12
Q4-a) Interpret the Classification Tree and define an appropriate decision rule for selecting a
positive return.
(1) If CACL<=1.1694 & ROCS<=4.4486 & WCTA<= - 0.3326, then classify as Not Fail 12
(2) If CACL<=1.1694 & ROCS<=4.4486 & WCTA> - 0.3326, classify as Fail 12
(3) If CACL<=1.1694 & ROCS>4.4486, then classify as NotFail 12
(4) If CACL>1.1694 & CLTA<=0.70635 & Sales <=3091.5, then classify as Fail 12
(5) If CACL>1.1694 & CLTA<=0.70635 & Sales >3091.5, then classify as Not Fail
(6) If CACL>1.1694 & CLTA>0.70635, then classify as Fail 12
12
Q4-b)
Firm SALES ROCS CLTA CACL WCTA FAIL
2 16149 -1.07 1.22 0.62 -0.46 0
Q4-c)
OR
30 2 30 2
𝐷𝐼1 = 1 − (( ) + ( ) ) = 0.5
60 60
5
The variable is chosen according to the reduction in the DI. The variable that creates the maximum
reduction in the index is chosen for splitting the node. 12
Q4-d)
Bonsai techniques check the several stopping criteria before letting the tree grow fully. 12
Pruning techniques let the grow fully and then start pruning the tree. 12
Q4-d)
Classification Table
Predicted Groups
Fail 12 NotF Total
Observed Fail
12
21+2+2=25 29-25=412 29
Groups NotF 31-28=3 2+4+22=28 31 12
12 12 12 (totals)
Total 28 32 60
12
(totals)
OR
Classification Table
Predicted Groups
Fail 12 NotF Total
Observed Fail
12
21+2+2=25 1+1+3=512 30
Groups NotF 2+0+0=2 2+4+22=28 30 12
12 12 12 (totals)
Total 27 33 60
12
(totals)
OR
Classification Table
Predicted Groups
Fail 12 NotF Total
Observed Fail
12
21+2+2=25 1+1+3=512 29
Groups NotF 2+0+0=2 2+4+22=28 31 12
12 12 12 (totals)
Total 27 33 60
12
(totals) 6
7