0% found this document useful (0 votes)
27 views

Hypothesis Testing Excercise 1st August 2023

This document discusses several statistical tests and analyses including independent and dependent sample t-tests, ANOVA, chi-square tests, and regression. It provides sample data and questions for students to conduct the relevant analyses and interpret the results.

Uploaded by

Shahzaib Ali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

Hypothesis Testing Excercise 1st August 2023

This document discusses several statistical tests and analyses including independent and dependent sample t-tests, ANOVA, chi-square tests, and regression. It provides sample data and questions for students to conduct the relevant analyses and interpret the results.

Uploaded by

Shahzaib Ali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Excercise- A

Independent Sample case - Parametric Approach


Q-1A)

A production engineer claims that there is no difference in the variance of nut diameter
manufactured by two different methods. The data shows nuts diameter (in centimeters),
produced by two methods with the following diameters.
a) At the 5% level of significance can we conclude that nut diameters produced by two
different methods are not normally distributed.
b) At the 5 % level of significance, can you reject the production engineer’s claim?
c) At the 5% level of significance can we conclude that average size of nut diameters
produced by two different methods is different?

A non -parametric case for Independent Sample case


Q-1B) Mann Whitney test
A private industry analyst claims that there is no difference in the median salaries earned by
workers in the wholesale trade and manufacturing industries. A random sample of 10
wholesale trade and 10 manufacturing workers and their salaries (in thousands of dollars) are
shown in the table. At 5 % level of significance, can you reject the analyst’s claim?
Assuming same shape population.

Data file is given in excel file as Q-1B

Q-1C Homework
A nationwide shipping firm purchased a new computer system to track its shipments, pickups,
and deliveries. Employees were expected to need about 2 hours to learn how to use the system.
In fact, some employees could use the system in very little time, whereas others took
considerably longer. Someone suggested that the reason for this difference might be that only
some employees had experience with this kind of computer system. To test this suggestion,
independent samples of employees with and without such experience were randomly selected.
The times, in minutes, required for these employees to learn how to use the system are given in
Q-1C. At the 5% significance level, do the data provide sufficient evidence to conclude that the
mean learning time for all employees without experience exceeds the mean learning time for all
employees with experience? Assuming same shape population.

 Normality test.
 Equality of mean ( Parametric approach or non-parametric approach)
Data file is given in excel file as Q-1C

Dependent Sample Case

Q-2A) (Paired Wilcoxon -Test Non Parametric Test )

The baseball coach suggests that a baseball clinic will help players raise their batting averages.
The data shows the batting averages of 14 players before participating in the clinic and two
months after participating in the clinic.

a) At the 5% significance level do the data provide evidence to conclude that differences in
batting scores are normally distributed.
In the very first step do create a difference variable and check its normality. If differences
are normally distributed we will be proceeding for parametric approach otherwise we will
apply nonparametric method.
Ho: Differences in batting scores are normally distributed

Ha: Differences in batting scores are not normally distributed


Tests of Normality

Kolmogorov-Smirnova Shapiro-Wilk

Statistic df Sig. Statistic df Sig.

Diff .244 14 .024 .821 14 .009

a. Lilliefors Significance Correction

P-value < level of significance


0.009 < 0.05 Reject null hypothesis
At the 5% significance level data do provide evidence to conclude that differences in batting
scores are not normally distributed.
Note: As the normality test failed for difference variable, there is a scope of applying non-
parametric test, that is Paired Wilcoxon test.
b) Based on your answer from part (a), can you conclude that the clinic helped the players
raise their batting averages? What is the appropriate test you use here to test the claim?
Ho: µ1=µ2

Ha: µ1< µ2

Test Statisticsa

After - Before

Z -.175b
Asymp. Sig. (2-tailed) .861

a. Wilcoxon Signed Ranks Test


b. Based on negative ranks.
P-value > level of significance
0.861 > 0.05 Do not reject null hypothesis
At the 5% significance level data do provide evidence to conclude that baseball clinic help the
players to improve their batting averages.

Data file is given in excel file as Q-2

Q2B (Paired t test – Parametric Approach)

A medical researcher claims that a new drug affects the number of headache hours experienced
by headache sufferers. The number of headache hours (per day) experienced by eight randomly
selected patients before and after taking the drug are shown in the table. Use α= 0.05 to test that
differences are normally distributed. Based on your results from part (a), do you support the
researcher’s claim?

Normality test
Ho : Differences in number of headache hours suffered are normally distributed

Ha: Differences in number of headache hours suffered are not normally distributed

Tests of Normality

Kolmogorov-Smirnova Shapiro-Wilk

Statistic df Sig. Statistic df Sig.

diff .201 8 .200* .898 8 .279

*. This is a lower bound of the true significance.


a. Lilliefors Significance Correction
P-value > level of significance
0.279 >0.05 do not reject null hypothesis
At the 5% significance level data do not provide evidence to conclude that differences in number
of headache hours suffered are not normally distributed

Note: As the normality test passed, there is a scope of applying parametric approach ( 1-e
Paired Sample t test)

Ho: µ1=µ2

Ha: µ1> µ2
Paired Samples Test

Paired Differences t df Sig. (2-

Mean Std. Std. Error 90% Confidence Interval of tailed)

Deviation Mean the Difference

Lower Upper

Before - .8500 .6655 .2353 .4042 1.2958 3.613 7 .009


Pair 1
After

P-value < level of significance


0.009 <0.05 Reject null hypothesis

At the 5% significance level data do provide evidence to conclude that medicine is effective
in reducing the number of headache hours suffered.

c) Formulate and interpret the 90% confidence interval for the differences in number
of headache hours suffered.
We are 90% confident that the differences in number of headache hours suffered
are lying somewhere between 0.4042 to 1.296 hours.

Non- Parametric approach to examine the association between two Categorical Variables

Chi Square test for independence

Q-3)

The data related to gender and their choice of eating ice creams is given. Test the hypothesis at
the 0.05 significance level that there is a significant association between that favorite way to eat
ice cream and Gender?

Data file is given in excel file as Q-3

Ho:

Ha:

It’s a test for categorical variables

 It is a non-parametric test.
 SPSS does not allow you to change level of significance
 To apply Chi Square test for independence , Frequency of categorical variables
must be given
 To apply Chi square test for independence you need to assign weight.

E = (Row total * Column Total ) / Grand total


E11= (1000*628) / 2200 = 285.45
You can examine the association between two categorical variable if you observe cell
percentages by Row or Cell percentages by Column.
If the cell percentages are identical, We claim independence of two categorical
variables.

Non-parametric approach – Chi Square goodness of fit test

Q-4)

The distribution of the opinions of U.S. parents on whether a college education is worth the
expense is given . An economist believes that the distribution of the opinions of U.S. teenagers is
different from the distribution for U.S. parents. The economist randomly selects 200 U.S.
teenagers and asks each whether a college education is worth the expense. The results are shown
in the table. At the 5 % level of significance are the distributions different?

Data file is given in excel file as Q-4

Analysis of Variance Parametric Approach ( Case 1)

Q-5A

a) The number of grams of fiber per serving for a random sample of three different
kinds of foods is listed. Is there sufficient evidence at the 0.05 level of significance to
conclude that there is a difference in mean fiber content among breakfast cereals,
fruits, and vegetables?

b) At the 0.05 level of significance, is there evidence that variances of fiber content
differ?

Q-5B The data describes the salaries of undergraduate workers from four regions. Perform the
hypothesis at the 0.05 significance level that the mean salary of undergraduates differs in four
regions.

Q-5c) Analysis of Variance Non -parametric approach

KWS test

A researcher believes that the mean earnings of top-paid actors, athletes, and musicians are the
same. The earnings (in millions of dollars) for several randomly selected people from each
category are shown in the table at the left. Assume that the populations are normally distributed,
the samples are independent, and the population variances are equal. At α = .10 can you reject
the claim that the mean earnings are the same for the three categories?
Data file is given in excel file as Q-5c

Inferential Methods in Regression and Correlation


Q-7)
6 marks
A critically important aspect of customer service in a supermarket is the waiting time at the
checkout (defined as the time the customer enters the line until he or she is served). Data were
collected during time periods in which a constant number of checkout counters were open. The
total number of customers in the store and the waiting times (in minutes) were recorded.

Data file is given in excel file as Q-7

a) At the 0.05 level of significance, is there evidence of a linear relationship between the
number of customers and the waiting time on the checkout line?
b) At the 0.05 percent of significance can we conclude that number of customers is a
significant determinant of waiting time on the checkout line?

One Population Variance case


Q-8) 4 marks
A researcher knows from past studies that the standard deviation of the time it takes to inspect a
car is 16.8 minutes. A sample of 24 cars is selected and inspected. The standard deviation is 12.5
minutes. At α= 0.05, can it be concluded that the standard deviation has changed?

You might also like