0% found this document useful (0 votes)
14 views

Null Hypothesis Vs

Uploaded by

eleicerh
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Null Hypothesis Vs

Uploaded by

eleicerh
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Null hypothesis vs.

the alternative hypothesisWe can illustrate the process of


hypothesis testing with an example. Let's say we collect somedata from 40 men and 40 women about their current
mood. We do this by giving everyone aquestionnaire that asks all sorts of questions about happiness and satisfaction .
Each questionnaireis assessed by scoring the answers, where a higher score indicates poorer mood. Based onprevious
evidence, we might predict that women will report poorer mood scores than men . Thatprediction would be our
(alternative) hypothesis. By contrast, the null hypothesis would be thatthere will be no difference in mood scores
between men and women . To test our prediction, wemust investigate whether we can reject the null hypothesis (or
not) before we can say anythingabout the alternative hypothesis. Why? Well, it goes back to the point we were
making aboutprobability in statistical significance. By stating that there is less than 5% probability that anoutcome
occurred by chance, we are actually saying that there is a less than 5% probability thatthe null hypothesis is ‘true’
(that there is no difference).Once we have collected the data, we might observe that women have indeed reported
highermood scores than men. Statistical analyses might show that there is a 3% probability that theoutcome occurred
by chance. Because this is lower than the 5% cut-off point that we usually setfor significance, it would appear that
our prediction is correct. However, this is only half of thepicture – the process of testing hypothesis testing must start
with the null hypothesis. Accordingto our results here, we can reject the null hypothesis because there is not enough
evidence tosupport that it is true (because the outcome was significant at p = .03). As a result, we can saythat the null
hypothesis is rejected in favour of the alternative hypothesis. Strictly speaking, wecannot say that we have ‘accepted
the alternative hypothesis’ (although many people do this, evenin the most prestigious journals).Similarly, we might
still find that women reported higher mood scores than men, but statisticalanalyses suggest that there is a 6%
probability that the outcome occurred by chance (wherep = .06, or p 7.05). Because this is greater than the 5% cut-off
point, we cannot reject the nullhypothesis. This does not mean that the null hypothesis is true, but simply that there is
notevidence that it is a false. Once again, strictly speaking, we should not say that the alternativehypothesis is rejected
(although, again, many researchers do say that), we should always phrasethe outcome in terms of the null
hypothesis.When we make predictions, we should state this in our reports in terms of what we expect tofind.
However, when we report the findings we should say one of two things: ‘statistical analysessuggest that we can reject
the null hypotheses, in favour of the alternative hypotheses’ or ‘statisticalanalyses suggest that we cannot reject the
null hypothesis’.

Significance with one-tailed tests


When we test hypotheses we will (usually) set the significance level at 5%. If we employ a onetailedtest, we are
predicting that our ‘outcome’ will reside in the outer 5% of one end of thesampling distribution (we will see more
about sampling distributions later). If we predict thatA will be greater than B, we would expect to find the outcome in
the upper 5% of the samplingdistribution (see Figure 4.2). For example, we might predict that mood scores will be
higher forwomen than for men . If we find that women do report higher mood scores than men and statisticalanalyses
indicate that there is a less than 5% probability that this happened by chance, wecan reject the null hypothesis (in
favour of the alternative hypothesis). If men score more highlythan women (even if there is a less than 5% probability
that this occurred by chance), we cannotreject the null hypothesis (because the outcome contradicts our
prediction).However, if we predict that X will be less than Y, we would expect to find the outcome in thelower 5% of
the sampling distribution (as shown in Figure 4.3). For example, we might predictthat IQ scores of cats might be less
than for dogs. If we find that cats present lower IQ scores thandogs, and statistical analyses indicate that there is a less
than 5% probability that this happenedby chance, we can reject the null hypothesis.

Significance with two-tailed tests


Sometimes, we may not have enough evidence to make a specific prediction . However, we mightbe able to suggest
that there will be a difference, without specifying the direction of that difference.For example, we could predict that
there will be a difference in the hours spent in lecturesacross the student groups, but not predict which group will
spend more time in lectures than theother. In this instance, we have made a two-tailed hypothesis. In a non-directional
test, we still(usually) set the significance level at 5%, but we have to share that between the two tails of
thedistribution because the difference could reside at either end. Our significance level at either endis now 2.5%, as
shown in Figure 4.4. If we find that there is a difference between the groups inrespect of hours spent in lectures, and
statistical analyses indicate that there is a less than 2.5%probability that this happened by chance, we can reject the
null hypothesis.

The Purpose of Null Hypothesis Testing As we have seen, psychological research typically
involves measuring one or more variables for a sample and computing descriptive statistics for
that sample. In general, however, the researcher’s goal is not to draw conclusions about that
sample but to draw conclusions about the population that the sample was selected from. Thus
researchers must use sample statistics to draw conclusions about the corresponding values in the
population. These corresponding values in the population are called parameters. Imagine, for
example, that a researcher measures the number of depressive symptoms exhibited by each of
50 clinically depressed adults and computes the mean number of symptoms. The researcher
probably wants to use this sample statistic (the mean number of symptoms for the sample) to
draw conclusions about the corresponding population parameter (the mean number of symptoms
for clinically depressed adults). Unfortunately, sample statistics are not perfect estimates of their
corresponding population parameters. This is because there is a certain amount of random
variability in any statistic from sample to sample. The mean number of depressive symptoms
might be 8.73 in one sample of clinically depressed adults, 6.45 in a second sample, and 9.44 in
a third—even though these samples are selected randomly from the same population. Similarly,
the correlation (Pearson’s r) between two variables might be +.24 in one sample, −.04 in a
second sample, and +.15 in a third—again, even though these samples are selected randomly
from the same population. This random variability in a statistic from sample to sample is called
sampling error. (Note that the term error here refers to random variability and does not imply
that anyone has made a mistake. No one “commits a sampling error.”) One implication of this is
that when there is a statistical relationship in a sample, it is not always clear that there is a
statistical relationship in the population. A small difference between two group means in a
sample might indicate that there is a small difference between the two group means in the
population. But it could also be that there is no difference between the means in the population
and that the difference in the sample is just a matter of sampling error. Similarly, a Pearson’s r
value of −.29 in a sample might mean that there is a negative relationship in the population. But
it could also be that there is no relationship in the population and that the relationship in the
sample is just a matter of sampling error. In fact, any statistical relationship in a sample can be
interpreted in two ways: • There is a relationship in the population, and the relationship in the
sample reflects this. • There is no relationship in the population, and the relationship in the
sample reflects only sampling error. The purpose of null hypothesis testing is simply to help
researchers decide between these two interpretations.
The Logic of Null Hypothesis Testing

Null hypothesis testing is a formal approach to deciding between two interpretations of a


statistical relationship in a sample. One interpretation is called the null hypothesis (often
symbolized H0 and read as “H-naught”). This is the idea that there is no relationship in the
population and that the relationship in the sample reflects only sampling error. Informally, the
null hypothesis is that the sample relationship “occurred by chance.” The other interpretation is
called the alternative hypothesis (often symbolized as H1). This is the idea that there is a
relationship in the population and that the relationship in the sample reflects this relationship in
the population.

Again, every statistical relationship in a sample can be interpreted in either of these two ways: It
might have occurred by chance, or it might reflect a relationship in the population. So
researchers need a way to decide between them. Although there are many specific null
hypothesis testing techniques, they are all based on the same general logic. The steps are as
follows: • Assume for the moment that the null hypothesis is true. There is no relationship
between the variables in the population. • Determine how likely the sample relationship would
be if the null hypothesis were true. • If the sample relationship would be extremely unlikely,
then reject the null hypothesis in favour of the alternative hypothesis. If it would not be
extremely unlikely, then retain the null hypothesis. Following this logic, we can begin to
understand why Mehl and his colleagues concluded that there is no difference in talkativeness
between women and men in the population. In essence, they asked the following question: “If
there were no difference in the population, how likely is it that we would find a small difference
of d = 0.06 in our sample?” Their answer to this question was that this sample relationship
would be fairly likely if the null hypothesis were true. Therefore, they retained the null
hypothesis—concluding that there is no evidence of a sex difference in the population. We can
also see why Kanner and his colleagues concluded that there is a correlation between hassles
and symptoms in the population. They asked, “If the null hypothesis were true, how likely is it
that we would find a strong correlation of +.60 in our sample?” Their answer to this question
was that this sample relationship would be fairly unlikely if the null hypothesis were true.
Therefore, they rejected the null hypothesis in favour of the alternative hypothesis—concluding
that there is a positive correlation between these variables in the population.

A crucial step in null hypothesis testing is finding the likelihood of the sample result if the null
hypothesis were true. This probability is called the p value. A low p value means that the sample
result would be unlikely if the null hypothesis were true and leads to the rejection of the null
hypothesis. A high p value means that the sample result would be likely if the null hypothesis
were true and leads to the retention of the null hypothesis. But how low must the p value be
before the sample result is considered unlikely enough to reject the null hypothesis? In null
hypothesis testing, this criterion is called α (alpha) and is almost always set to .05. If there is
less than a 5% chance of a result as extreme as the sample result if the null hypothesis were true,
then the null hypothesis is rejected. When this happens, the result is said to be statistically
significant. If there is greater than a 5% chance of a result as extreme as the sample result when
the null hypothesis is true, then the null hypothesis is retained. This does not necessarily mean
that the researcher accepts the null hypothesis as true—only that there is not currently enough
evidence to conclude that it is true. Researchers often use the expression “fail to reject the null
hypothesis” rather than “retain the null hypothesis,” but they never use the expression “accept
the null hypothesis.”

The Misunderstood p Value

The p value is one of the most misunderstood quantities in psychological research (Cohen,
1994)1 . Even professional researchers misinterpret it, and it is not unusual for such
misinterpretations to appear in statistics textbooks! The most common misinterpretation is that
the p value is the probability that the null hypothesis is true—that the sample result occurred by
chance. For example, a misguided researcher might say that because the p value is .02, there is
only a 2% chance that the result is due to chance and a 98% chance that it reflects a real
relationship in the population. But this is incorrect. The p value is really the probability of a
result at least as extreme as the sample result if the null hypothesis were true. So a p value of .02
means that if the null hypothesis were true, a sample result this extreme would occur only 2% of
the time. You can avoid this misunderstanding by remembering that the p value is not the
probability that any particular hypothesis is true or false. Instead, it is the probability of
obtaining the sample result if the null hypothesis were true.

Role of Sample Size and Relationship Strength


Recall that null hypothesis testing involves answering the question, “If the null hypothesis were
true, what is the probability of a sample result as extreme as this one?” In other words, “What is
the p value?” It can be helpful to see that the answer to this question depends on just two
considerations: the strength of the relationship and the size of the sample. Specifically, the
stronger the sample relationship and the larger the sample, the less likely the result would be if
the null hypothesis were true. That is, the lower the p value. This should make sense. Imagine a
study in which a sample of 500 women is compared with a sample of 500 men in terms of some
psychological characteristic, and Cohen’s d is a strong 0.50. If there were really no sex
difference in the population, then a result this strong based on such a large sample should seem
highly unlikely. Now imagine a similar study in which a sample of three women is compared
with a sample of three men, and Cohen’s d is a weak 0.10. If there were no sex difference in the
population, then a relationship this weak based on such a small sample should seem likely. And
this is precisely why the null hypothesis would be rejected in the first example and retained in
the second. Of course, sometimes the result can be weak and the sample large, or the result can
be strong and the sample small. In these cases, the two considerations trade off against each
other so that a weak result can be statistically significant if the sample is large enough and a
strong relationship can be statistically significant even if the sample is small. Table 13.1 shows
roughly how relationship strength and sample size combine to determine whether a sample
result is statistically significant. The columns of the table represent the three levels of
relationship strength: weak, medium, and strong. The rows represent four sample sizes that can
be considered small, medium, large, and extra large in the context of psychological research.
Thus each cell in the table represents a combination of relationship strength and sample size. If
a cell contains the word Yes, then this combination would be statistically significant for both
Cohen’s d and Pearson’s r. If it contains the word No, then it would not be statistically
significant for either. There is one cell where the decision for d and r would be different and
another where it might be different depending on some additional considerations, which are
discussed in Section 13.2 “Some Basic Null Hypothesis Tests”

Rejecting the null hypothesis when it is true is called a Type I error. This error means that we
have concluded that there is a relationship in the population when in fact there is not. Type I
errors occur because even when there is no relationship in the population, sampling error alone
will occasionally produce an extreme result. In fact, when the null hypothesis is true and α
is .05, we will mistakenly reject the null hypothesis 5% of the time. (This possibility is why α is
sometimes referred to as the “Type I error rate.”) Retaining the null hypothesis when it is false is
called a Type II error. This error means that we have concluded that there is no relationship in
the population when in fact there is. In practice, Type II errors occur primarily because the
research design lacks adequate statistical power to detect the relationship (e.g., the sample is too
small). We will have more to say about statistical power shortly.

You might also like