0% found this document useful (0 votes)
13 views

2015 No Memo Test 3

Memo

Uploaded by

alutakaunda
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

2015 No Memo Test 3

Memo

Uploaded by

alutakaunda
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

UNIVERSITY OF CAPE TOWN

STATISTICAL SCIENCES DEPARTMENT


STA3022F: RESEARCH AND SURVEY STATISTICS

CLASS TEST 2

06 MAY 2015

TIME: 1 ½ hours Total Marks: 50


Answer ALL questions. (4 pages – 4 questions)
Marks are allocated for intermediate calculations.

QUESTION 1 [5 marks]
(a) What is test-retest reliability? (1)
(b) What is internal consistency reliability? (1)
(c) How do you measure internal consistency? Provide three formula’s or explanations, not just
the names of the methods. (3)

QUESTION 2 [16 marks]


(a) In the painters data set in the R package MASS the subjective assessment, on a 0 to 20 integer
scale, of 54 classical painters is given. The painters were assessed on four characteristics:
composition, drawing, colour and expression. Calculate the Euclidean distance between the
following two samples:
> painters[1:2,]
Composition Drawing Colour Expression
Da Udine 10 8 16 3
Da Vinci 15 16 4 14
(3)
(b) Why is there no need to scale the data set before calculating the Euclidean distance? (1)
(c) Define 𝑠𝑡𝑟𝑒𝑠𝑠 and explain how it is used. (5)
(d) Explain step by step how to perform hierarchical clustering with the centroid method. (7)

QUESTION 3 [17 marks]

The current study aims to identify what factors make some people believe that they are lucky and others
believe that they are unlucky. The study is based on a survey of 62 STA3022F students who answered the
following questions in an online questionnaire (possible responses for categorical variables are given in
brackets).

1. Do you consider yourself to be a lucky person? (Yes/No)


2. What is your age?

1
3. What is your gender? (1 = Male; 0 = Female)
4. Have you ever won a competition before? (1 = Yes; 0 = No)
5. How many economic courses have you completed?

A discriminant analysis model has been constructed with the aim of identify which, if any, of the four
independent variables are able to distinguish between the two groups (groups labelled as “Yes”, and “No”).
Questions:

a) Write down the discriminant function. (2)

b) Can the discriminant model able to significantly discriminate between the two groups? Provide
statistical evidence at the 5% level to support your answer. Clearly state all null and alternate
hypotheses. (4)

c) Use the cut-off value rule to classify Respondent 4. Clearly indicate the classification rule. Is this a
correct classification? (5.5)

d) Compare the overall hit rate with two chance criteria and use these comparisons to evaluate the
overall quality of the discriminant model (4)

e) Evaluate whether the discriminant model is better at predicting some groups than others. (Hint:
Calculate the correct classification rate for each group) (1.5)

Data for the first 15 respondents

ID Q1 Q2 Q3 Q4 Q5
1 Yes 21 Female No 3
2 Yes 21 Male Yes 3
3 No 21 Female No 3
4 Yes 20 Male No 2
5 No 20 Male No 2
6 Yes 20 Female No 2
7 No 21 Male Yes 2
8 No 21 Female No 3
9 No 19 Male No 2
10 Yes 21 Female Yes 4
11 Yes 21 Male Yes 2
12 No 20 Male No 2
13 No 20 Male Yes 2
14 No 20 Male No 2
15 Yes 20 Female Yes 2

> fit <- lda(Q1 ~ Q2+Q3d+Q4d+Q5,data=luck, method="moment")


> fit
Call:
lda(Q1 ~ Q2+Q3d+Q4d+Q5,data=luck,method = "moment")

Prior probabilities of groups:


Yes No
0.4888889 0.5111111

Group means:
Q2 Q3d Q4d Q5
Yes 20.75 0.428 0.2857143 3.20000
No 20.20 0.750 0.3636364 2.52273

2
Coefficients of linear discriminants:
LD1
Constant 0.254
Q2 -2.948
Q3d 0.085
Q4d 1.383
Q5 -0.011

Classification Table
Predicted Groups
yes no Total
Observed yes 28 6 34
Groups no 4 24 28

Total 32 30 62

> centroidYes
[1] -1.0242

> centroidNo
[1] 1.0974

QUESTION 4 [12 marks]

In a 2001 paper titled “Variable precision rough set theory and data discretisation: an application to corporate
failure prediction”, Beynon and Peel use a number of financial performance ratios to build a model that is
able to discriminate between firms in the UK that fail and those that do not fail. Data of 60 randomly chosen
firms was collected on the following set of financial variables.

SALES Sales in 1000's of pounds


ROCS profit before tax/capital employed
FFTL funds flow/total liabilities
GEAR (current liabilities + long-term debt)/total assets
CLTA current liabilities/total assets
CACL current assets/current liabilities
QACL (current assets - stock)/current liabilities
WCTA (current assets - current liabilities)/total assets
AGE number of years company has been operating
CHAUD coded 1 is company changed auditor in previous 3 years, 0 otherwise
BIG6 coded 1 if the company is audited by a big 6 auditor, 0 otherwise
FAIL coded 1 if company failed, 0 otherwise

Refer to the attached Classification tree and answer the following questions.

Questions:

a) Define a set of decision rules indicating the circumstances under which firms can be predicting as
failing or not failing. (3)

b) Which group would Firm 2 be classified to? Is this a correct classification? (2)

3
c) Calculate the diversity index for node 1 (Root Node) and comment why CALC variable is chosen as
a splitting variable? (2)

d) Briefly explain the differences between the Bonsai and Pruning techniques. (1)

e) Construct the classification matrix. (4)

Data for the first 5 firms only


Firm SALES ROCS FFTL GEAR CLTA CACL QACL WCTA AGE CHAUD BIG6 FAIL
1 6762 7.54 0.15 0.62 0.62 1.55 0.74 0.34 74 0 0 0
2 16149 -1.07 0.03 1.22 1.22 0.62 0.32 -0.46 29 0 1 0
3 8086 15.20 0.62 0.33 0.33 2.36 1.75 0.45 51 0 1 0
4 7646 31.22 0.63 0.52 0.48 1.64 1.49 0.31 25 0 0 0
5 36067 10.96 0.35 0.38 0.38 1.59 1.16 0.22 33 0 1 0

1 NotFail

29/31

CACL<=1.1694 CACL>1.1694

2 Fail 3 NotFail

23/8 7/22

ROCS<=4.4486
ROCS>4.4486 CLTA<=0.70635 CLTA>0.70635

4 Fail 5 NotFail 6 NotFail 7 Fail

22/4 1/4 5/22 2/0

WCTA<= - 0.3326 WCTA> - 0.3326 SALES<=3091.5


SALES>3091.5

8 NotFail 9 Fail 10 Fail 11 NotFail

1/2 21/2 2/0 3/22

𝑛 𝐷𝐼 +𝑛 𝐷𝐼 = 𝐷𝐼 − 𝑊𝐴𝐷𝐼
=
𝑛 +𝑛

(𝑛 − 1 − 𝑝)𝑛 𝑛 𝑛 𝑍̅ + 𝑛 𝑍̅
= 𝑑 =
𝑝(𝑛 − 2)(𝑛 + 𝑛 ) 𝑛 +𝑛

𝐹, , . = 2.557
𝐹, , . = 2.513
=1− 𝜌

You might also like