0% found this document useful (0 votes)
8 views

upload_f1520c99-c1f4-4534-a66e-112eafe7561c

The document discusses hypothesis testing, focusing on t-tests and non-parametric alternatives in research, particularly in the context of a study on the effects of a vitamin on baby development. It outlines the null and alternative hypotheses, significance levels, types of errors, effect sizes, and the power of statistical tests. Additionally, it explains different t-test procedures for independent and dependent means, as well as chi-square tests for nominal variables.

Uploaded by

Sakshi Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

upload_f1520c99-c1f4-4534-a66e-112eafe7561c

The document discusses hypothesis testing, focusing on t-tests and non-parametric alternatives in research, particularly in the context of a study on the effects of a vitamin on baby development. It outlines the null and alternative hypotheses, significance levels, types of errors, effect sizes, and the power of statistical tests. Additionally, it explains different t-test procedures for independent and dependent means, as well as chi-square tests for nominal variables.

Uploaded by

Sakshi Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

Hypothesis testing t and non

parametric alternatives
HYPOTHESIS TESTING
A large research project has been going on for several years. In this project, new babies are
given a particular vitamin, and then the research team follows their development during the
first 2 years of life. So far, the vitamin has not speeded up the development of the babies. The
mean is 14 months , the standard deviation is 3 months, and the ages follow a normal curve.
Only about 2% of babies start walking before 8 months of age; these are the babies who are
more than 2 standard deviations below the mean. A newborn in the project is then randomly
selected to take the highly purified version of the vitamin, and the researchers then follow this
baby’s progress for 2 years. This is a hypothesis-testing problem. The researchers want to
draw a general conclusion about whether the purified vitamin allows babies in general to walk
earlier. The conclusion will be about babies in general (a population of babies). However, the
conclusion will be based on results of studying a sample.
• Null hypothesis –
• Alternate hypothesis –

• In general, psychology researchers use a cutoff on the comparison


distribution with a probability of 5% that a score will be at least that
extreme if the null hypothesis were true. That is, researchers reject the null
hypothesis if the probability of getting a sample score this extreme (if the
null hypothesis were true) is less than 5%. This probability is usually written
as p < .05. However, in some areas of research, or when researchers want
to be especially cautious, they use a cutoff of 1% (p < .01). These are called
conventional levels of significance. They are described as the .05
significance level and the .01 significance level.
ONE TAIL TEST
• Directional hypothesis research hypothesis predicting a particular
direction of difference between populations—for example, a
prediction that the population like the sample studied has a higher
mean than the population in general.
• One-tailed test - hypothesis-testing procedure for a directional
hypothesis; situation in which the region of the comparison
distribution in which the null hypothesis would be rejected is all on
one side (tail) of the distribution. In the baby study example, the
tail for the predicted effect was at the low end.
TWO TAIL TEST
• Nondirectional hypothesis - research hypothesis that does not
predict a particular direction of difference between the population
like the sample studied and the population in general.
• Two-tailed test hypothesis-testing procedure for a nondirectional
hypothesis; the situation in which the region of the comparison
distribution in which the null hypothesis would be rejected is divided
between the two sides (tails) of the distribution.
DECISION ERRORS
Incorrect conclusions in hypothesis testing in relation
to the real (but unknown) situation

Type I error rejecting the null Type II error failing to reject


hypothesis when in fact it is the null hypothesis when in
true; getting a statistically fact it is false; failing to get a
significant result when in fact statistically significant result
the research hypothesis is not when in fact the research
true. Alpha α probability of hypothesis is true. beta ß
making a Type I error; same as probability of making a Type II
significance level. error.
EFFECT SIZE
• Effect size is a measure of the difference between population means.
• You can think of effect size as how much something changes after a
specific intervention.
• Effect size increases with greater differences between means.
• The standardized effect size is the difference between the
population means divided by the population’s standard deviation.
POWER
Power – ability of a statistical test to detect a difference or
relationship.
• The probability of rejecting the null hypothesis when it is false.
It is influenced by –
1) EFFECT SIZE - The larger the effect size is, the greater the power is.
2) SAMPLE SIZE - The larger the sample size is, the greater the power
is.
3) LEVEL OF SIGNIFICANCE (alpha level) - Power is also affected by
significance level.
t- test
• t test - hypothesis-testing procedure in which the
population variance is unknown; it compares t scores from
a sample to a comparison distribution called a t
distribution.
• test for a single sample - hypothesis testing procedure in
which a sample mean is being compared to a known
population mean and the population variance is unknown.
ഥ−𝝁
𝒙
•𝒕 = when N is less than 30
𝑺Τ 𝒏−𝟏
• When N is more than 30,
• In a population, the average IQ is 100. A team of scientists want to test a
new medicine to see if it has a positive or negative effect on intelligence, or
no effect at all. A sample of 30 participants who have taken the medicine
have a mean of 140 with a SD of 20. Did the medication affect intelligence?

• Ho=.
• Ha =
• Df =
• Critical value - 2.043
• t = 10.77
• If t value
t test for dependent means
• hypothesis-testing procedure in which there are two scores for each person
and the population variance is not known; it determines the significance of
a hypothesis that is being tested using difference or change scores from a
single group of people.
• This kind of research situation is called a repeated measures design (also
known as a within-subjects design) or a matched group design
• A common example is when you measure the same people before and
after some psychological or social intervention.
• difference scores difference between a person’s score on one testing and
the same person’s score on another testing; often an after-score minus a
before score, in which case it is also called a change score.
• Df = n - 1
t – test for independent means
• Hypothesis-testing procedure in which there are two separate
groups of people tested and in which the population variance is
not known.
• When the scores in one group are for different people than the
scores in the other group, what you can compare is the mean of
one group to the mean of the other group.
• The goal of a t test for independent means is to decide whether
the difference between the means of your two actual samples is
a more extreme difference than the cutoff difference on this
distribution of differences between means.
• Df = (n1-1) + (n2-1) = N -2
Df = df1 + df2
= (N1-1) + (N2-1)
Questions from previous year papers
• In matched two-group design with 30 subjects per group, the ‘t’ test
would be based on degrees of freedom.
1)58 2) 29 3) 30 4) 59

• A study employed two groups, matched on intelligence, each group


with 30 subjects. Each subject was required to learn a list of CVC
trigrams and the number of trials required to learn the list was the
dependent variable. What would be the degrees of freedom if a
suitable t test is used to evaluate the mean difference?
a) 29 b) 30 c) 58 d) 60
PRACTICE QUESTIONS
• State whether tests of the following hypotheses might permit one- or two-
tailed tests.
a) Diabetics are more health-conscious than other people.
b) Extroverts and introverts differ in their ability to learn people’s names.
c) Job satisfaction correlates negatively with absenteeism.
d) Self-esteem correlates with outward confidence

• A report claims that a t-value of 2.85 is significant (p < .01) when the
number of people in a repeated measures design was 11. What would be
the df?
• A researcher wanted to test the effect of drug in reducing anxiety.
For this purpose he used two groups of individuals, experimental
(10) and control (10) matched in pairs. He made use of an anxiety
scale for the measurement of anxiety among the subjects of the
group. What would be the df?
• A group of 20 students were given an achievement test under two
conditions – when tense and when relaxed. What would be the df to test
the significance of difference between means of the conditions.
• Five students take AP Calculus AB one year and AP Calculus BC the next
year. Their overall course grades (%) are listed below for both courses.
Which of the following statistical procedures would be most appropriate
to test the claim that student overall course grades are the same in both
courses? Assume that any necessary normality requirements hold.
Student 1 2 3 4 5
AP CAL AB 80.0% 72.6% 99.0% 91.3% 68.9%
AP CAL BC 85.5% 71.0% 93.2% 93.0% 74.8%

a) paired/dependent t-test of means


b) independent t-test of means
c) z-test of means
d) One sample t-test
• Which of the following exam scores is better relative to other students
enrolled in the course?
A. A psychology exam grade of 85; the mean grade for the psychology
exam is 92 with a standard deviation of 3.5
B. An economics exam grade of 87; the mean grade for the economics
exam is 79 with a standard deviation of 8
C. A chemistry exam grade of 62; the mean grade for the chemistry
exam is 62 with a standard deviation of 5.
a) The psychology exam score is relatively better
b) The economics exam score is relatively better
c) The chemistry exam score is relatively better
d) All of the exam scores are relatively equivalent
• In hypothesis testing, a Type 2 error occurs when
A. The null hypothesis is not rejected when the null hypothesis is true.
B. The null hypothesis is rejected when the null hypothesis is true.
C. The null hypothesis is not rejected when the alternative hypothesis
is true.
D. The null hypothesis is rejected when the alternative hypothesis is
true.
• Null and alternative hypotheses are statements about:
A. population parameters.
B. sample parameters.
C. sample statistics.
D. it depends - sometimes population parameters and sometimes
sample statistics.
• A prospective observational study on the relationship between sleep
deprivation and heart disease was done by Ayas, et. al. (Arch Intern
Med 2003) assuming that sleep deprivation leads to risk of heart
disease. Women who slept at most 5 hours a night were compared to
women who slept for 8 hours a night (reference group). After adjusting
for potential confounding variables like smoking, a 95% confidence
interval for the relative risk of heart disease was (CR= 1.10, t = 1.92).
Based on this confidence interval, a consistent conclusion would be
A. Sleep deprivation is associated with a modestly increased risk of
heart disease.
B. Sleep deprivation is associated with a modestly decreased risk of
heart disease.
C. There was no evidence of an association between sleep deprivation
and heart disease.
D. Lack of sleep causes the risk of heart disease to increase by 10% to
92%.
• It is known that for right-handed people, the dominant (right) hand
tends to be stronger. For left-handed people who live in a world
designed for right-handed people, the same may not be true. To test
this, muscle strength was measured on the right and left hands of a
random sample of 15 left-handed men and the difference (left - right)
was found. The alternative hypothesis is one-sided (left hand stronger).
The resulting t-statistic was 1.80.
1. This is an example of:
A. A two-sample t-test. B. A paired t-test.
C. A single sample t-test. D. Information is not adequate.
2. Assuming the conditions are met, based on the t-statistic of 1.80 the
appropriate conclusion for this test using α = .05 is: (CV = 1.23)
A. Df = 14, so p-value < .05 and the null hypothesis can be rejected.
B. Df = 14, so p-value > .05 and the null hypothesis cannot be rejected.
C. Df = 28, so p-value < .05 and the null hypothesis can be rejected.
D. Df = 28, so p-value > .05 and the null hypothesis cannot be rejected
• Suppose we were interested in determining if there were differences in the average
prices among two local supermarkets. We randomly pick six items to compare at both
supermarkets. Which statistical procedure would be best to use for this study?
a) Matched-pairs t procedure
b) One-sample t test
c) Two-sample t test
d) None of the above
• Two groups of rats were selected for the study. One group (n= 12) were given a vitamin
supplement and the other group (n= 17) were not given any vitamin supplement. Their
weights were measured. What df would be used to measure the significance of the
difference between weights.
• In a course of experiment on two samples, the following data was collected.
Difference between means – 4.20
Standard error of the difference of means – 2.80 (CV – 2.54)
Is the obtained difference between means is significant at 0.05 level?
CHI SQUARE TEST
• Hypothesis-testing procedures used when the variables of interest are
nominal variables.
• The chi-square test was originally developed by Karl Pearson – also known
as Pearson’s chi square.
• Chi-square test for goodness of fit hypothesis-testing procedure that
examines how well an observed frequency distribution of a nominal
variable fits some expected pattern of frequencies - What this hypothesis
testing involves is first figuring a number for the amount of mismatch
between the observed frequencies and the expected frequencies, and
then seeing whether that number indicates a greater mismatch than you
would expect by chance.- rare in research
• Chi-square test for independence, which is used when there are two
nominal variables, each with several categories.
• Observed frequency in a chi-square test - number of individuals actually found in the study
to be in a category or cell.
• Expected frequency in a chi-square test- number of people in a category or cell expected if
the null hypothesis were true
• A one rupee coin is tossed in the air 100 times and the recorded
results of these 100 throws indicate 40 heads and 60 tails. Using chi
square test, find out whether this result is better than mere
“chance’.
• 100 boys and 60 girls were asked to select one of the five elective
subjects. The choices of the two genders were tabulated separately.
DoGENDER
you think that the choices of the subjects is dependent upon the
gender of the students? SUBJECTS
• Df = (r-1)(c-1) A B C D E Total
Boys 25 30 10 25 10 100

Girls 10 15 5 15 15 60
QUESTIONS FROM PREVIOUS YEAR PAPERS

• The chi square computed for a contingency table, was based on six
degrees of freedom. If the contingency table had three rows, how
many columns would it have?
a)2 b) 3 c) 4 d) 6
• In a contingency table, one of the cell has an obtained frequency of 20
and an expected frequency of 30. What would be the contribution of
this cell towards the total Chi-square value?
1) 3.33 2) 5.00 3) 10.00 4) 20.00
• How many subjects would be required in a 3 x 3 x 2 factorial
design with subjects 10 per cell?
a)10 b) 18 c) 90 d)180

• A cell in a contingency cell had an obtained frequency of 16 and


an expected frequency of 25. what would be the contribution of
this cell to the total chi square value?
a) 3.24 b)5.06 c)9.00 d)81.00
• A subject has a T score (Mean = 50, SD = 10) of 40 on an abstract
reasoning test. The corresponding percentile rank would be
a)16 b)34 c)40 d)84

• A test of Abstract Thinking had the norms in terms of normalized T


scores (Mean = 50, SD = 10). If the subject received a T score of
60, what would be his percentile rank?
a) 16 b)60 c)66 d)84
THE SIGN TEST FOR RELATED DATA
• Nominal-level test for difference between two sets of paired/related data using
direction of each difference only.
• It is used for comparing two correlated samples paired off in some way.
• No assumption of normality needed
• Where we assume the median difference between pairs is zero.
• Data on nominal scale
• It takes into the account only the difference and not the magnitude.
• For significance, S must be equal to or less than the critical value.
EXAMPLE
• Suppose that, in order to assess the effectiveness of therapy, a
psychotherapist investigates whether or not, after three months of
involvement, clients feel better about themselves or worse. If therapy
improves people’s evaluation of themselves then we would expect
clients’ self-image ratings to be higher after three months’ therapy
than they were before. a fictitious set of clients’ self image ratings
before and after three months’ therapy on a scale of 1–20, where a
high value signifies a positive self-image.
Client (A) Self image rating Self image rating D (c – b) SIGN OF
before therapy (B) after therapy (C) DIFFERENCE

a) 3 7
b) 12 18
c) 9 5
d) 7 7
e) 8 12
f) 1 5
g) 15 16
h) 10 12
i) 11 15
j) 10 17

S=
DF = total number of signs =
CV = 2
THE WILCOXON (T) MATCHED PAIRS SIGNED
RANKS TEST
• Ordinal-level significance test for differences between two related
sets of data.
• More efficient and powerful non-parametric test
• Takes in to account both magnitude and direction
• Suppose we ask students to rate two methods of learning that they
have experienced on two different modules. Method A is a traditional
lecture-based approach while method B is an active assignment-
based method.
Participant No. Rating of Rating of D (B – A) RANK
traditional lecture assignment based
method (A) method (B)
1 23 33
2 14 22
3 35 38
4 26 30
5 28 31
6 19 17
7 42 42
8 30 25
9 26 34
10 31 24
11 18 21
12 25 46
13 23 29
14 31 40
15 30 41
• ADD POSITIVE RANK SCORES –
• ADD NEGATIVE RANK SCORES

• THAT WILL BE THE VALUE OF T
• LOWER T IS THE T VALUE =
• DF =
• CV = 20
• SO NULL HYPOTHESIS =
THE MANN-WHITNEY U TEST
• Useful non-parametric alternative to the t test for assessing the
difference between two independent samples having uncorrelated
data, especially in the circumstances when the assumptions and
conditions for applying the t test are not met.
• This test too is used to find out whether or not the two independent
samples have been drawn from the same population.
• Data at least at ordinal level.
• In general, H0 is that the populations from which the two samples have
been randomly selected are identical. In most cases it is specifically
that the two population medians are equal.
Stereotype scores for children whose mothers had
Full time jobs No jobs
Scores Scores
17 9 19 6
32 7 63 0
39 6.5 78 0
27 8 29 4
58 6 39 1.5
25 8 59 0
31 7 77 0
81 0
68 0
TOTAL U1 = 51.5 U2= 11.5

U VALUE = 11.5
CV = 12
WE CHECK THE VALUE AT N1 AND N2
Questions from previous years papers
• A researcher wanted to test the hypothesis that a given drug would
adversely affect the rate of learning. A group of 40 subjects was employed
in the research. Each subject was required to learn a task upto two
errorless trials. Then each subject was given a specified drug dosage
(1mg/body weight in kilogram) and thirty minutes after, the subject was
required to learn another equated learning task upto two errorless trials.
The number of trials required to learn each task followed a normal
distribution. The mean number of trials required to learn the tasks were 24
and 20 only for normal and ‘drugged conditions’, respectively. The two
conditions had comparable standard deviations. The mean difference was
tested by a suitable statistical test and the difference was found to be
significant at .01 level.
• What can you conclude about the researcher’s hypothesis?
a). The hypothesis has been verified
b). The hypothesis has been rejected
c). The hypothesis has been partially verified
d). The data are inadequate to evaluate researcher’s hypothesis.

What statistical test appears to be suitable for evaluating mean


differences?
a). Independent samples t test
b). One-way ANOVA for independent groups
c). Paired samples t-test
d). Chi-square test
• The dependent variable in this experiment is:
a). Drug dosage
b). Trials required to learn the task
c). Body weight
d). Time interval between the drug administration and learning the
second task

• In a single group pretest-post test design, which one of the following


statistical method can be best employed to evaluate the mean
differences of pre and post tests?
a). Mann-Whitney U test
b). Randomized ANOVA
c). Chi-Square
d). Repeated measures ANOVA

You might also like