100% found this document useful (1 vote)
252 views10 pages

SUSS BSBA: BUS105 Jul 2020 TOA Answers

The document provides sample data and statistical analysis on various topics: 1) It includes descriptive statistics like means, medians, modes, ranges etc. for male and female stress level data. 2) It performs statistical tests like confidence intervals, hypothesis testing to analyze brand recall times and determine if there are differences between male and female stress levels. 3) Regression analysis is conducted to determine the relationship between years of work experience and monthly salary. The R-squared value indicates a moderate positive correlation.

Uploaded by

Azido Azide
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
252 views10 pages

SUSS BSBA: BUS105 Jul 2020 TOA Answers

The document provides sample data and statistical analysis on various topics: 1) It includes descriptive statistics like means, medians, modes, ranges etc. for male and female stress level data. 2) It performs statistical tests like confidence intervals, hypothesis testing to analyze brand recall times and determine if there are differences between male and female stress levels. 3) Regression analysis is conducted to determine the relationship between years of work experience and monthly salary. The R-squared value indicates a moderate positive correlation.

Uploaded by

Azido Azide
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 10

SUSS

BUS 105
July 2020 TOA – Sample Answers

Q1
Male

Mean 242.1522
Standard Error 6.102424
Median 230
Mode 185
Standard
Deviation 41.38866
Sample Variance 1713.021
Kurtosis -1.18461
Skewness 0.359638
Range 131
Minimum 184
Maximum 315
Sum 11139
Count 46

Female

Mean 243.3529
Standard Error 6.278055
Median 240
Mode 193
Standard
Deviation 44.83428
Sample Variance 2010.113
Kurtosis -1.19982
Skewness 0.376686
Range 155
Minimum 180
Maximum 335
Sum 12411
Count 51

Measure of location is a single value that describes the central tendency of a set of data.
(Zheng, F., Soh, I. & Tan, C, 2020) The three measures of location can be mean,
median and mode. Firstly, the mean is the addition of all of observations in a sample
and then divided by the total number of observations. (Zheng, F., Soh, I. & Tan, C,
2020) “The median is the midpoint of the values after all observations have been
ordered either from smallest to largest, or vice versa.” (Zheng, F., Soh, I. & Tan, C,
2020) The mode is the value in the observation that shows up the most and can be
found in the peak of a histogram.

Measure of dispersion shows how spread out the data is. “A small value for a measure
of dispersion would mean that the data are closely clustered” (Zheng, F., Soh, I. & Tan,
C, 2020) The three measures of dispersion can be range, variance and standard
deviation. The range is the difference between the biggest and smallest values in the
sample or data set, therefore the simplest measure of dispersion. “Variance is the
arithmetic mean of the squared deviations from the mean.” (Zheng, F., Soh, I. & Tan,
C, 2020) Standard deviation is the square root of variance and it uses the relevant units
of measurement used in a particular research, therefore can be interpreted more easily
compared to variance.

Male
The mean, median and mode are 242, 230 and 185 respectively.
The range, variance and standard deviation are 131, 1713.02 and 41.39 respectively.

Female
The mean, median and mode are 243, 240 and 193 respectively.
The range, variance and standard deviation are 155, 2010.11 and 44.83 respectively.

b)

The three statistical concerns are:

1. We are concerned with whether the samples are random samples. We could verify
that the sample is a random sample by adopting a sampling method like simple random
sampling or systemic random sampling is important to ensure that the sample is
random.

2. There is a high chance for the sample to be biased because these samples (46 males
and 51 females) are from people who volunteered. There could be a chance that people
who are under a lot of stress are too embarrassed to volunteer for the test. Hence, result
in biasness. Sample size might also be too small as 46 males and 51 females might not
be substantial enough to represent the whole population.

3. We are afraid the two samples, male and female, are not from populations that are
normally distributed. We could verify that the sample was from a normally distributed
population by plotting a histogram to see if there is a bell-shaped distribution. If there
is, the population is said to be normally distributed.

If I have a free hand, I would start by using stratified random sampling to ensure that
the two larger samples of male and female includes data from subgroups which are
based on some characteristic e.g. age, industry of work or income level. Then, I would
chart the two samples on two different histogram and compare the shape and skewness,
and to check if there is a bell shape to verify normality in distribution. Next, I would
use a 95% confidence interval to estimate the population mean stress level for both
male and female sample, and compare the intervals. In addition, I can also use a pooled
t-test, since population standard deviation is unknown, to hypothesise if the mean stress
level of female (could be represented with μ1) is more than ( > ) the mean stress level of
male (could be represented with μ2). If the hypothesis shows that μ1 is indeed more than
μ2, then we can conclude that females feel more stressful in life as compared to males.
Q2
a)

i) P(Male) = 300/520 = 0.5769 = 57.69%

ii) P(Female and Likes Swimming) = 110/520 = 0.2115 = 21.15%

iii) P(Male or Likes Swimming)


= (300/520) + (360/520) – (250/520) = 0.7885 = 78.85%

iv) P(Female | Likes Swimming) = (110/520)/(360/520) = 0.3056 = 30.56%

b)

i)

μ = 600
σ = 25

Let X be the weight of the bag of grapes

Answer from Excel using norm.dist


P(585<X<623) = P(X<623) – P(X<585)
= 0.8212 – 0.2743
=0.5469
= 54.69%

ii)

n = 20
standard error = 25/ √20

Answer from Excel using norm.dist


P(X>610) = 1 – 0.9632 = 0.0368 = 3.68%

Q3
a)

BrandA

Mean 6.99
Standard Error 0.02414
Median 7
Mode 7
Standard Deviation 0.132222
Sample Variance 0.017483
Kurtosis -0.00575
Skewness -0.47565
Range 0.5
Minimum 6.7
Maximum 7.2
Sum 209.7
Count 30
Confidence Level(90.0%) 0.041018

BrandB

Mean 6.95
Standard Error 0.021841
Median 7
Mode 7
Standard Deviation 0.119626
Sample Variance 0.01431
Kurtosis 0.947897
Skewness -0.84169
Range 0.5
Minimum 6.6
Maximum 7.1
Sum 208.5
Count 30
Confidence Level(90.0%) 0.03711

Brand A
Lower Limit of Confidence Interval = 6.99 – 0.041 = 6.95 (2 d.p)
Upper Limit of Confidence Interval = 6.99 + 0.041 = 7.03 (2 d.p)

The 90% confidence interval for mean time taken to recall Brand A is between 6.95
seconds and 7.03 seconds. This confidence interval is an interval estimate of the
population mean time taken to recall Brand A. This means we are making a guess that
the population mean time to recall Brand A is between 6.95 seconds and 7.03 seconds.
The 90% significance means that even if it may not mean that the population mean will
always be in between 6.95 and 7.03, but if we repeat the measurements 100 times, 90
times we will find the population mean in the interval.

Brand B
Lower Limit of Confidence Interval = 6.95 – 0.037 = 6.91 (2 d.p)
Upper Limit of Confidence Interval = 6.95 + 0.037 = 6.99 (2 d.p)

The 90% confidence interval for mean time taken to recall Brand B is between 6.91
seconds and 6.99 seconds. This confidence interval is an interval estimate of the
population mean time taken to recall Brand B. This means we are making a guess that
the population mean time to recall Brand B is between 6.91 seconds and 6.99 seconds.
The 90% significance means that even if it may not mean that the population mean will
always be in between 6.91 and 6.99, but if we repeat the measurements 100 times, 90
times we will find the population mean in the interval.

We can conclude that the recall time for both brands are similar as both confidence
intervals (Brand A and B) overlaps with each other.

b)

t-Test: Paired Two Sample for Means

  BrandA BrandB
Mean 6.99 6.95
Variance 0.017483 0.01431
Observations 30 30
Pearson Correlation 0.250709
Hypothesized Mean
Difference 0
df 29
t Stat 1.418295
P(T<=t) one-tail 0.083381
t Critical one-tail 1.311434
P(T<=t) two-tail 0.166762
t Critical two-tail 1.699127  

H0: μd = 0
H1: μd not equal 0

where μd is the mean recall time difference between Brand A and Brand B

Step 2: Select the level of significance

The level of significance decided is α = 0.10.

Step 3: Decide on a test statistic


We will use paired t-test since population standard deviation is unknown and samples
are not independent, and because this is a one-after-another situation.

Step 4: Develop a decision rule

If p-value < 0.10, we reject H0 and accept H1.

Step 5: Compute the value of the test statistic, make a decision regarding the null
hypothesis, and interpret the results

From Excel output table


This is a two-tailed test.
Since p-value of 0.1668 is > 0.10, we do not reject H0, and therefore accept H0.

Therefore, we can conclude that the recall time of both brands are similar.

c)

The conclusion should be similar because both results from Part (a) and (b) utilizes the
same set of data, and the same significance level (α) of 10%.

Q4
a)

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.668571618
R Square 0.446988008
Adjusted R Square 0.417362366
Standard Error 4553.75398
Observations 60

ANOVA
Significanc
  df SS MS F eF
Regression 3 938617133.9 312872378 15.08787562 2.585E-07
Residual 56 1161253817 20736675.31
Total 59 2099870951      

Standard
  Coefficients Error t Stat P-value Lower 95% Upper 95%
Intercept 18451.05898 4671.938098 3.949337212 0.000221452 9092.0423 27810.075
FOODTOT 4.731679354 0.792841218 5.968003738 1.71358E-07 3.1434275 6.3199311
-
ROOMS -2.510341861 0.902262789 2.782273515 0.007343509 -4.317791 0.7028923
PERFORMANCE 2243.661787 1204.870001 1.862160885 0.067830204 -169.9829 4657.3064

The linear equation is:

ŷ = 18451.059 + 4.732X1 – 2.510X2 + 2243.662X3

where X1 is the total amount spend on food, X2 is the total room occupancy, X3 is
whether there were performances or not that month, and ŷ is the estimated mean
monthly beverage sales.

Other things unchanged


For every additional $1,000 in amount spend on food, estimated mean monthly
beverage sales will increase by $4,732.
For every additional room occupancy, estimated mean monthly beverage sales will
decrease by $2,510.
If there were performances in the month, estimated mean monthly beverage sales will
increase by $2,243,662.

b)
ŷ = 18451.059 + 4.732(2500) – 2.510(5000) + 2243.662(1)
= 19974.721

My estimate of the beverage sales for that month is $19,974,721.

c)

Relevant Excel output table is shown above in part (a).

The coefficient of multiple determination is 0.4470.


The adjusted coefficient of multiple determination is 0.4174.

This means 44.70% (or 41.74% respectively) of the variation in beverage sales could be
explained by the variation of the independent variables, which are the amount spend on
food, total room occupancy and whether there were performances.

The better measure in this case would be the adjusted coefficient of multiple
determination because it accounts for both model simplicity and predictive ability. The
unadjusted one simply accounts for the predictive ability, hence adding more new
independent variables to the model would increase it. However, in this case, there are
already three independent variables. Hence, using the adjusted coefficient of multiple
determination would prevent overloading of independent variables as to a certain point
after continuously adding new independent variable one at a time, the adjusted
coefficient of multiple determination would start to decrease, showing that the model
has become more complex.

d)

Step 1: State the null and alternate hypotheses

H0: β1 = 0
H1: β1 not equal 0
Where β1 is the coefficient of amount spend on food (X1)

H0: β2 = 0
H1: β2 not equal 0

Where β2 is the coefficient of total room occupancy (X2)

H0: β3 = 0
H1: β3 not equal 0

Where β3 is the coefficient of whether there were performances (X3)

Step 2: Select the level of significance

The level of significance is α = 0.05


Step 3: Decide on a test statistic

We will perform the individual t-tests.

Step 4: Develop a decision rule

Reject H0 if p-value < 0.05.

Step 5: Compute the value of the test statistic, make a decision regarding the null
hypothesis, and interpret the results

P-value for amount spend on food (X1) = 1.71 x 10-7


Since P-value < 0.05, reject H0 and accept H1.

P-value for total room occupancy (X2) = 0.0073


Since P-value < 0.05, reject H0 and accept H1.

P-value for performances (X3) = 0.0678


P-value > 0.05, accept H0 and reject H1.

Therefore, the two independent variables (Amount Spent on Food and Total Room
Occupancy) are significant, while the independent variable (Whether there were
performances) is insignificant and therefore should be dropped from the model.

You might also like