P Value
P Value
The p-value serves as an alternative to rejection points to provide the smallest level of
significance at which the null hypothesis would be rejected. A smaller p-value means
stronger evidence in favor of the alternative hypothesis.
The calculation for a p-value varies based on the type of test performed. The three test
types describe the location on the probability distribution curve: lower-tailed test, upper-
tailed test, or two-tailed test. In each case, the degrees of freedom play a crucial role in
determining the shape of the distribution and thus, the calculation of the p-value.
In a nutshell, the greater the difference between two observed values, the less likely it is
that the difference is due to simple random chance, and this is reflected by a lower p-
value.
The null hypothesis, also known as the conjecture, is the initial claim about a population
(or data-generating process). The alternative hypothesis states whether the population
parameter differs from the value of the population parameter stated in the conjecture.
In practice, the significance level is stated in advance to determine how small the p-
value must be to reject the null hypothesis. Because different researchers use different
levels of significance when examining a question, a reader may sometimes have
difficulty comparing results from two different tests. P-values provide a solution to this
problem.
Even a low p-value is not necessarily proof of statistical significance, since there is still
a possibility that the observed data are the result of chance. Only repeated experiments
or studies can confirm if a relationship is statistically significant.
For example, suppose a study comparing returns from two particular assets was
undertaken by different researchers who used the same data but different significance
levels. The researchers might come to opposite conclusions regarding whether the assets
differ.
If one researcher used a confidence level of 90% and the other required a confidence
level of 95% to reject the null hypothesis, and if the p-value of the observed difference
between the two returns was 0.08 (corresponding to a confidence level of 92%), then the
first researcher would find that the two assets have a difference that is statistically
significant, while the second would find no statistically significant difference between
the returns.
To avoid this problem, the researchers could report the p-value of the hypothesis test and
allow readers to interpret the statistical significance themselves. This is called a p-value
approach to hypothesis testing. Independent observers could note the p-value and decide
for themselves whether that represents a statistically significant difference or not.
Example of P-Value
An investor claims that their investment portfolio’s performance is equivalent to that of
the Standard & Poor’s (S&P) 500 Index. To determine this, the investor conducts a two-
tailed test.
The null hypothesis states that the portfolio’s returns are equivalent to the S&P 500’s
returns over a specified period, while the alternative hypothesis states that the portfolio’s
returns and the S&P 500’s returns are not equivalent—if the investor conducted a one-
tailed test, the alternative hypothesis would state that the portfolio’s returns are either
less than or greater than the S&P 500’s returns.
The p-value hypothesis test does not necessarily make use of a preselected confidence
level at which the investor should reset the null hypothesis that the returns are
equivalent.
Instead, it provides a measure of how much evidence there is to reject the null
hypothesis. The smaller the p-value, the greater the evidence against the null hypothesis.
Thus, if the investor finds that the p-value is 0.001, there is strong evidence against the
null hypothesis, and the investor can confidently conclude that the portfolio’s returns
and the S&P 500’s returns are not equivalent.
Although this does not provide an exact threshold as to when the investor should accept
or reject the null hypothesis, it does have another very practical advantage.
P-value hypothesis testing offers a direct way to compare the relative confidence that the
investor can have when choosing among multiple different types of investments or
portfolios relative to a benchmark such as the S&P 500.
For example, for two portfolios, A and B, whose performance differs from the S&P 500
with p-values of 0.10 and 0.01, respectively, the investor can be much more confident
that portfolio B, with a lower p-value, will actually show consistently different results.
A p-value of 0.001 indicates that if the null hypothesis tested were indeed true, then
there would be a one-in-1,000 chance of observing results at least as extreme. This leads
the observer to reject the null hypothesis because either a highly rare data result has been
observed or the null hypothesis is incorrect.
Beyond this simplified example, you could compare a 0.04 p-value to a 0.001 p-value.
Both are statistically significant, but the 0.001 example provides an even stronger case
against the null hypothesis than the 0.04.