0% found this document useful (0 votes)
106 views

Assignment 4

The document defines statistical terms related to hypothesis testing using t-tests and z-tests. It includes: 1) Definitions of Cohen's d, confidence intervals, confidence limits, critical values of r and t, degrees of freedom, and the sampling distribution of t. 2) Characteristics of the sampling distribution of t and an example to explain degrees of freedom. 3) Assumptions underlying proper use of the t-test. 4) Similarities and differences between the z-test and t-test. 5) An explanation of why the z-test is more powerful than the t-test.

Uploaded by

Manya
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
106 views

Assignment 4

The document defines statistical terms related to hypothesis testing using t-tests and z-tests. It includes: 1) Definitions of Cohen's d, confidence intervals, confidence limits, critical values of r and t, degrees of freedom, and the sampling distribution of t. 2) Characteristics of the sampling distribution of t and an example to explain degrees of freedom. 3) Assumptions underlying proper use of the t-test. 4) Similarities and differences between the z-test and t-test. 5) An explanation of why the z-test is more powerful than the t-test.

Uploaded by

Manya
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Statistics Assignment

(Student’s t Test for Single Samples)

Submitted by: Manya Krishna

MPCP Ist year, IBS-GFSU


Quesn 1:Define

1.Cohen’s d: Cohen has provided a simple method for determining the magnitude of real
effect. Used with t test the method relies on the fact that there is a direct relationship between
the size of the real effect and size of the mean difference. The Cohen’s d is defined as

Cohen's d is an appropriate effect size for the comparison between two means. It can be
used, for example, to accompany the reporting of t-test and ANOVA results. It is also widely
used in meta-analysis.

2. Confidence interval: A confidence interval, in statistics, refers to the probability that a


population parameter will fall between two set values for a certain proportion of times.
Confidence intervals measure the degree of uncertainty or certainty in a sampling method. A
confidence interval can take any number of probabilities, with the most common being a 95%
or 99% confidence level.

3. Confidence limits: Confidence limits are values that bound the confidence interval

4. Critical value of r: Critical value of a statistic is the value that bounds the critical region.
Critical values of r are essentially cut-off values that define regions where the test statistic is
unlikely to lie. For degerming whether a correlation exists in the population, it is necessary to
test the significance of obtained r.

5. Critical value of t : Critical values of t are essentially cut-off values that define regions
where the test statistic is unlikely to lie.

6. The degrees of freedom : Degree of freedom (df) for any statistic is the number of scores
that are free to vary in calculating that statistic.

7.Sampling distribution of t : Sampling distribution of t is a probability distribution of the t


values that would occur if all possible samples of fixed size N were drawn from the null
hypothesis population. It gives the following:
a)All the possible different t values for sample of size N

b)The probability of getting each value if sampling is random from the null hypothesis
population.

8.Student’s t test: Student’s t test is a practical, quite powerful test widely used in the
behavioural sciences. It is used for (a) testing hypothesis involving single sample experiments
(b) estimating the population mean by constructing confidence intervals and (c) testing the
significance of Pearson r.

Quesn 2:Assuming the assumptions underlying the t test are met, what are the characteristics
of the sampling distribution of t?

The sampling distribution of t refers to the distribution of t values that would be obtained if a
value of t were calculated for each sample mean for all possible random samples of a given
size from a population

Characteristics of sampling distribution of t will be as follows

a) Symmetrical, unimodal, bell-shaped (similar to the normal curve)


b) For small values of df, tails are inflated, and the mean is smaller than the mean of a
corresponding normal curve
c) It gets closer and closer to a normal curve as the number of df increases

Quesn 3.Elaborate on what is meant by degrees of freedom. Use an example.

The degrees of freedom in a statistical calculation represent how many values involved in a
calculation have the freedom to vary. The degrees of freedom can be calculated to help
ensure the statistical validity of chi-square tests, t-tests and even the more advanced f-tests.
These tests are commonly used to compare observed data with data that would be expected to
be obtained according to a specific hypothesis.

For example, let's suppose a drug trial is conducted on a group of patients and it is
hypothesized that the patients receiving the drug would show increased heart rates compared
to those that did not receive the drug. The results of the test could then be analyzed to
determine whether the difference in heart rates is considered significant, and degrees of
freedom are part of the calculations.

Because degrees of freedom calculations identify how many values in the final
calculation are allowed to vary, they can contribute to the validity of an outcome. These
calculations are dependent upon the sample size, or observations, and the parameters to
be estimated, but generally, in statistics, degrees of freedom equal the number of
observations minus the number of parameters. This means there are more degrees of
freedom with a larger sample size.

Quesn 4.What are the assumptions underlying the proper use of the t test?

1. The first assumption made regarding t-tests concerns the scale of measurement. The
assumption for a t-test is that the scale of measurement applied to the data collected
follows a continuous or ordinal scale, such as the scores for an IQ test.
2. The second assumption made is that of a simple random sample, that the data is
collected from a representative, randomly selected portion of the total population.
3. The third assumption is the data, when plotted, results in a normal distribution, bell-
shaped distribution curve. When a normal distribution is assumed, one can specify a
level of probability (alpha level, level of significance, p) as a criterion for acceptance.
In most cases, a 5% value can be assumed.
4. The fourth assumption is a reasonably large sample size is used. A larger sample size
means the distribution of results should approach a normal bell-shaped curve.
5. The final assumption is homogeneity of variance. Homogeneous, or equal, variance
exists when the standard deviations of samples are approximately equal.

Quesn 5. Discuss the similarities and differences between the z and t tests.

Sr. Points of comparison Z Score T score


No.

1 Standardization of Its standardization from Its standardization from


data population data Sample Data
2 Data Size When Population is When the population is
known or above 30, one not known or the sample
can use Z score size is less than 30, the T
score is used.

3 Mean An average is always An average is always 50.


zero.

4 Range It Ranges from -3 to 3. It ranges from 20 and 80.

5 Standard Deviation Its standard deviation is Its standard deviation is


always 1 always 10

6 Derived Result The derived result can be The derived result can
negative never be negative

7 Preference Comparatively less More preferable as it


preferable, as supports covers a higher range, but
large data with an increase in size it
has its inherent limitation

8 Distribution Z score is part of Z T score is part of T


distribution distribution

9 With the increase in With the increase in size, With the increase in size,
size the Z score tends to be its usefulness reduces.
used

Quesn 6.Explain in a short paragraph why the z test is more powerful than the t test.

The z test is more powerful than the t test because

a)With the t test we lose a degree of freedom as df=n-1


b)With the z test, the sample mean is closer to the population mean than with the t test
c)The standard deviation of the sample raw scores is larger with the t test
d)With the z test, we know σ(population standard deviation), whereas with the t test, we
must estimate it

Quesn7.Which of the following two statements is technically more correct? (1) We are 95%
confident that the population mean lies in the interval 80–90, or (2) We are 95% confident
that the interval 80–90 contains the population mean. Explain.

The first statement ie. We are 95% confident that the population mean lies in the interval 80–
90 is technically more correct than we are 95% confident that the interval 80–90 contains the
population mean.As 95% confidence interval is a range of values that can be 95% certain
contains the true mean of the population. This is not the same as a range that contains 95% of
the values.

Quesn 8. Explain why df= N-1when the t test is used with single samples.

The degrees of freedom (DF) refers to the amount of information which our data
provides that we can "spend" to estimate the values of unknown population
parameters, and calculate the variability of these estimates. This value is determined by
the number of observations in the sample. Increasing sample size provides more information
about the population, and thus increases the degrees of freedom in your data.

The 1-sample t-test estimates only one parameter: the population mean. The sample size of n
constitutes n pieces of information for estimating the population mean and its variability.
One degree of freedom is spent estimating the mean, and the remaining n-1 degrees of
freedom estimate variability. Therefore, a 1-sample t-test uses a t-distribution with n-1
degrees of freedom

Quesn 9.If the sample correlation coefficient has a value different from zero (e.g., r=0.45),
this automatically means that the correlation in the population is also different from zero. Is
this statement correct? Explain.

If the sample correlation coefficient has a value different from zero it doesn’t mean that the
correlation in the population is also different. For verifying this there is a need to conduct test
of significance( t test) of the correlation coefficient(r=0.45 in this case). A hypothesis test of
the “significance of the correlation coefficient” helps to decide whether the linear
relationship in the sample data is strong enough to use to model the relationship in the
population.

Quesn 10.For the same set of sample scores, is the 99% confidence interval for the
population mean greater or smaller than the 95% confidence interval? Does this make sense?
Explain.

Ans: 99% confidence interval will be wider than a 95% confidence interval because to be
more confident that the true population value falls within the interval we will need to allow
more potential values within the interval.

From the sample mean x we try to estimate the population mean µ


µ=x±z.SE
The upper limit of the mean is given by
µ=x+z.SE
The lower limit of the mean is given by
µ=x−z.SE
Where:
µ is population Mean
x is sample Mean
SE standard error
z is the critical value. Its value is defined by the confidence level.
If the confidence level is 95% z value is 1.96
If the confidence level is 99% z value is 2.58
Thereby with an increase in confidence level the chance of population mean to fall
within the range is high.
Quesn 11: A sample set of 30 scores has a mean equal to 82 and a standard deviation of 12.
Can we reject the hypothesis that this sample is a random sample from a normal population
with M= 85?
Ans 11: From the given question it can be inferred that:
n= 30 x= 82 s=12 M= 85
Df= n-1 =29

Therefore, Tobt= x-M/s/√n


= 82-85/12√30
=-3/2.19
T obtained= -1.37
Tobt =1.37, and Tcrit with 29 df =2.756

Since Tobt is less than 2.756, we retain the hypothesis that this sample is a random
sample from a normal population. It is reasonable to consider the sample a random
sample from a population with M=85.

Quesn 12: A sample set of 29 scores has a mean of 76 and a standard deviation of 7. Can we
accept the hypothesis that the sample is a random sample from a population with a mean
greater than 72?
Ans 12: From the given question it can be inferred that:
n= 29 x= 76 s=7 M= 72
Df= n-1 =28

Therefore, Tobt= x-M/s/√n


= 76-72/7√29
=5/1.30
T obtained= 3.84

Tobt = 3.08, and Tcrit with 28 df = 2.467. Since tobt > 2.467, we can reject the hypothesis,
which specifies that the sample is a random sample from a population with a mean < 72.
Therefore, we can accept the hypothesis that the sample is a random sample from a
population with a mean > 72.

Quesn 13: Is it reasonable to consider a sample with N=22,Xobt=42, and s=9 to be a random
sample from a normal population with M=38?
From the given question it can be inferred that:

n=22, Xobt = 42, s=9, M= 38

Tobt= Xobt - M/ (s/ √n)

= 42-38/ (9/ √22)

= 2.09

Now for df= 21, Tcrit= 1.721.

Since Tobt > Tcrit , we can infer it is not reasonable to consider the sample a random
sample from a normal population with M=38.

Quesn 14: Using each of the following random samples, determine the 95% and 99%
confidence intervals for the population mean
a).Xobt =25,s=6,N=15

b).Xobt =120,s=8,N=30

c).Xobt =0.6,s=5.5,N=24

d)Redo part a with N = 30. What happens to the confidence interval as N increases?

a)Xobt =25,s=6,N=15

μlower =Xobt - sX tcrit general equation for lower confidence limit

μupper =Xobt + sX tcrit general equation for upper confidence limit

sx=6/√15=1.55

for 99% alpha=0.005

μlower=25-1.55(2.977)=20.39

μupper= 25+1.55(2.977)=29.61

for 95% alpha= 0.025

μlower=25-1.55(2.145)=21.68

μupper= 25+1.55(2.145)=28.32

d) N=30

sx=6/√30=1.09

for 99% alpha=0.005

μlower=25-1.09(2.756)=21.99

μupper= 25+1.09(2.756)=28.00

for 95% alpha= 0.025

μlower=25-1.09(2.045)=22.78

μupper= 25+1.09(2.045)=27.22
It can be clearly inferred that with increase in N from 15 to 30 the width of the confidence
interval decreases as the standard error decreases.

b).Xobt =120,s=8,N=30

sx=8/√30=1.46

for 99% alpha=0.005

μlower=120-1.46(2.756)=115.98

μupper= 120+1.46(2.756)=124.02

for 95% alpha= 0.025

μlower=120-1.46(2.045)=117.02

μupper= 120+1.46(2.045)=122.98

c) Xobt =30.6,s=5.5,N=24

sx=5.5/√24=1.12

for 99% alpha=0.005

μlower=30.6-1.12(2.807)=27.46

μupper= 30.6+1.12(2.807)=33.74

for 95% alpha= 0.025

μlower=30.6-1.12(2.069)=28.28

μupper= 30.6+1.12(2.069)=32.92

Quesn 15) In Problem 21 of Chapter 12, a student conducted an experiment on 25


schizophrenic patients to test the effect of a new technique on the amount of time
schizophrenics need to stay institutionalized. The results showed that under the new
treatment, the 25 schizophrenic patients stayed a mean duration of 78 weeks, with a standard
deviation of 20 weeks. Previously collected data on a large number of schizophrenic patients
showed a normal distribution of scores, with a mean of 85 weeks and a standard deviation of
15 weeks.. For the present problem, assume that the standard deviation of the population is
unknown. Also explain the difference in conclusion between Problem 21 and this one.
" obt = 78, s= 20 M= 85, σ= 15
i) N=25, 𝑿

tobt= 𝑋$obt - M/ (s/ √N)

= 78-85/ (20/ √25)

= - 1.75

Now for df= 24 tcrit= 2.064.

Since tobt < tcrit , we retain H0. Therefore, it cannot be concluded that the amount of time
spent by schizophrenics for treatment at the institution will be reduced through the
application of the newly developed technique.

ii) The difference in conclusion as compared to Problem 21 in Chapter 12, is due to the
greater sensitivity of z when the population variance and SD are known, whereas while
applying t- test, the population SD and variance have to be assumed.

Quesn 16: As the principal of a private high school, you are interested in finding out how the
training in mathematics at your school compares with that of the public schools in your area.
For the last 5 years, the public schools have given all graduating seniors a mathematics
proficiency test. The distribution has a mean of 78. You give all the graduating seniors in
your school the same mathematics proficiency test. The results show a distribution of 41
scores, with a mean of 83 and a standard deviation of 12.2.

a. What is the alternative hypothesis? Use a nondirectional hypothesis.

b. What is the null hypothesis?

Alternative hypothesis: There is significant difference in impact of training in mathematics


in proficiency test

Null Hypothesis: There is no significant difference in impact of training in mathematics in


proficiency test

Tobt= x-M/s/√n
= 83-78/12.2√41
=5/1.90
T obtained= 2.631
T critical =2.021
It can be seen that t obtained > than t critical hence null hypothesis will be rejected
Quesn 17) A college counselor wants to determine the average amount of time first-year
students spend studying. He randomly samples 61 students from the freshman class and asks
them how many hours a week they study. The mean of the resulting scores is 20 hours, and
the standard deviation is 6.5 hours.

a. Construct the 95% confidence interval for the population mean.

b. Construct the 99% confidence interval for the population mean.

a) Xobt= 20, N=61, df=60, s = 6.5

µlower= 20-6.5/ √ 61 (2)=19.10

µupper=20+6.5/√ 61(2)= 20.90

b) µlower= 20-6.5/ √ 61(2.660)=17.79

µupper=20+6.5/√ 61(2.660)= 22.21

Quesn 18: A professor in the women’s studies program believes that the amount of smoking
by women has increased in recent years. A complete census taken 2 years ago of women
living in a neighbouring city showed that the mean number of cigarettes smoked daily by the
women was 5.4 with a standard deviation of 2.5. To assess her belief, the professor
determined the daily smoking rate of a random sample of 200 women currently living in that
city. The data show that the number of cigarettes smoked daily by the 200 women has a mean
of 6.1 and a standard deviation of 2.7.
a. Is the professor’s belief correct? Assume a directional H1.Be sure that the most sensitive
test is used to analyze the data.
b. Assume the population mean is unknown and reanalyze the data using the same alpha
level. What is your conclusion this time?
c. Explain any differences between part a and part b.
d. Determine the size of the effect found in part b.

From the above question it can be inferred that xobt=. 6.1, µ=5.4, n= 200, σ=2.5, s=2.7
Null hypothesis(Ho) = There will be no significant increase in amount of smoking by
women of that city in the recent years.

Alternate Hypothesis(H1)= there will be a significant increase in amount of smoking by


women of that city in the recent years.

a) We would prefer to use the z test the null hypothesis as it would be the most sensitive
case in this case.

zobt = (xobt– µ) / (σ / √n)


= (6.1-5.4)/(2.5/ √200) = 3.96
For α = 0.05 zcrit = 1.645, therefore we have zobt > z crit, which leads us to reject Ho

And accept the directional hypothesis H1.

It can hence be inferred that there will be a significant increase in amount of smoking
by women of that city in the recent years.

b) If we assume that population SD isn’t given then we will be using the t test of
significance to t test the null hypothesis

tobt = (tobt– µ) / (s/ √n)


= (6.1-5.4)/(2.7/ √200) = 3.66
For df= 199 tcrit= 1.645
therefore we have tobt > t crit, which leads us to reject Ho
and accept the directional hypothesis H1.It can hence be inferred that there will be a
significant increase in amount of smoking by women of that city in the recent years.

c)There has been found no difference in conclusion with respect to rejection of null
hypothesis in part a and part b i.e, using z test and t test respectively.

d) Cohen’s d is used to find effect size

d= Mean difference/ SD of sample(considering popln SD is assumed to be unknown)

=6.1-5.4/2.7 =0.26

According to cohen’s criteria this is suggestive of medium effect size.


Quesn 19: A cognitive psychologist believes that a particular drug improves short-term
memory. The drug is safe, with no side effects. An experiment is con- ducted in which 8
randomly selected subjects are given the drug and then given a short time to memorize a list
of 10 words. The subjects are then tested for retention 15 minutes after the memorization
period. The number of words correctly recalled by each subject is as follows: 8, 9, 10, 6, 8, 7,
9, 7. Over the past few years, the psychologist has collected a lot of data using this task with
similar subjects. Although he has lost the original data, he remembers that the mean was 6
words correctly recalled and that the data were normally distributed.

a. On the basis of these data, what can we conclude about the effect of the drug on short- term
memory?

b. Determine the size of the effect.

It can be inferred that N=8, µ= 6

X X-Mean (X-mean)2
8 0 0
9 1 1
10 2 4
6 -2 4
8 0 0
7 -1 1
9 1 1
7 -1 1
∑= 12
64
Mean =
64/n=64/8=8

Mean i.e. Xobt= 8, SS=12

S= sqroot(SS/(n - 1))

= sqroot(12/7)= 1.308
Solving for tobt,

tobt= 𝑋obt - µ/ (s/ √N)

= 8-6/ (1.308/ √8)

= 4.33

Now for df= 7 and α= 0.05, tcrit= 2.365

Since tobt > tcrit , we reject H0. Therefore, it can be concluded that the drug affects and
improves Short term memory

a) Calculating, size of effect

d= Xobt- µ |/ s

= 8-6/ 1.308= 1.53

It can be hence inferred that the sample mean differs from the population mean by 1.53 SD
units. According to Cohen’s criterion, this is a large effect indicating that there is a strong
relationship between the two variables.

Quesn 20: A physician employed by a large corporation believes that due to an increase in
sedentary life in the past decade, middle-age men have become fatter. In 1995, the
corporation measured the percentage of fat in their employees. For the middle-age men, the
scores were normally distributed, with a mean of 22%. To test her hypothesis, the physician
measures the fat percentage in a random sample of 12 middle- age men currently employed
by the corporation. The fat percentages found were as follows: 24, 40, 29, 32, 33, 25, 15, 22,
18, 25, 16, 27. On the basis of these data, can we conclude that middle-age men employed by
the corporation have become fatter? Assume a directional H1 is legitimate.
X X2
24 576
40 1600
29 841
32 1024
33 1089
25 625
15 225
22 484
18 324
25 625
16 256
27 729

∑X2=306 ∑X2=8398

Mean=25.5
\

SS=∑X2. –(∑X)2/N=8398- 306*306/12=595

S= sqroot(SS/(n - 1))

=sqroot(595/11)= 7.35

Tobt= X-M/s/√n=25.5-22/7.35/√12=3.5/2.12=1.65

Tcrit=1.796

Tobt = 1.65, and Tcrit = 1.796. Since Tobt < 1.796, we retain H0. We cannot conclude that
middle- age men employed by the corporation have be- come fatter.

Quesn 21: A local business school claims that their graduating seniors get higher-paying jobs
than the national average for business school graduates. Last year’s figures for salaries paid
to all business school graduates on their first job showed a mean of $10.20 per hour. A
random sample of 10 graduates from last year’s class of the local business school showed the
following hourly salaries for their first job: $9.40, $10.30, $11.20, $10.80, $10.40, $9.70,
$9.80, $10.60, $10.70, $10.90. You are skeptical of the business school claim and decide to
evaluate the salary of the business school graduates.

It can be inferred that


N= 10, µ= 10.20

X X-Mean (X-mean)2
9.4 -0.98 0.96
10.3 -0.08 0.01
11.2 0.82 0.67
10.8 0.42 0.18
10.4 0.02 0.00
9.7 -0.68 0.46
9.8 -0.58 0.34
10.6 0.22 0.05
10.7 0.32 0.10
10.9 0.52 0.27
∑= 103.8 3.04
Mean=103.8/10=10.38

Mean= 10.38
SS= 3.04

Substituting the values to calculate s,

S= sqroot(SS/(n - 1))

= sqroot(3.04/9)= = 0.58

Solving for tobt,

tobt= 𝑋$obt - µ/ (s/ √N)

= 10.38-10.20/ (0.58/ √10)

= 0.98

Now for df= 9 and α= 0.05, tcrit= 2.262

Since tobt < tcrit , we retain H0. Therefore, it cannot be concluded that the local business
school graduates get higher-paying jobs than the national average for business school
graduates.
Quesn 22: You wanted to estimate the mean number of vehicles crossing a busy bridge in
your neighbourhood each morning during rush hour for the past year. To accomplish this,
you stationed yourself and a few assistants at one end of the bridge on 18 randomly selected
mornings during the year and counted the number of vehicles crossing the bridge in a 10-
minute period during rush hour. You found the mean to be 125 vehicles per minute, with a
standard deviation of 32.

a. Construct the 95% confidence limits for the population mean (vehicles per minute).

b. Construct the 99% confidence limits for the population mean (vehicles per minute).

.N=18

X=125

S=32

Sx=32/√18=7.55

for 99% alpha=0.005

μlower=125-7.55(2.898)=103.18

μupper= 125+7.55(2.898)=146.82

for 95% alpha= 0.025

μlower=125-7.55(2.11)=109.07

μupper= 125+7.55(2.11)=140.93

You might also like