0% found this document useful (0 votes)
340 views13 pages

P Value

A P value represents the probability of obtaining results at least as extreme as the observed data, assuming that the null hypothesis is true. A low P value (<0.05) indicates strong evidence against the null hypothesis, while a high P value suggests the results could have occurred by chance. Confidence intervals provide a range of values that are likely to contain the true population parameter with a specified degree of confidence (typically 95%). P values and confidence intervals are complementary statistical tools for drawing inferences from data.

Uploaded by

Kamal Anchalia
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
340 views13 pages

P Value

A P value represents the probability of obtaining results at least as extreme as the observed data, assuming that the null hypothesis is true. A low P value (<0.05) indicates strong evidence against the null hypothesis, while a high P value suggests the results could have occurred by chance. Confidence intervals provide a range of values that are likely to contain the true population parameter with a specified degree of confidence (typically 95%). P values and confidence intervals are complementary statistical tools for drawing inferences from data.

Uploaded by

Kamal Anchalia
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

P values

Definition of a P value Consider an experiment where you've measured values in two samples, and the means are different. How sure are you that the population means are different as well? There are two possibilities:
y y

The populations have different means. The populations have the same mean, and the difference you observed is a coincidence of random sampling.

The P value is a probability, with a value ranging from zero to one. It is the answer to this question: If the populations really have the same mean overall, what is the probability that random sampling would lead to a difference between sample means as large (or larger) than you observed? How are P values calculated? There are many methods, and you'll need to read a statistics text to learn about them. The choice of statistical tests depends on how you express the results of an experiment (measurement, survival time, proportion, etc.), on whether the treatment groups

are paired, and on whether you are willing to assume that measured values follow a Gaussian bell-shaped distribution. Common misinterpretation of a P value Many people misunderstand what question a P value answers. If the P value is 0.03, that means that there is a 3% chance of observing a difference as large as you observed even if the two population means are identical. It is tempting to conclude, therefore, that there is a 97% chance that the difference you observed reflects a real difference between populations and a 3% chance that the difference is due to chance. Wrong. What you can say is that random sampling from identical populations would lead to a difference smaller than you observed in 97% of experiments and larger than you observed in 3% of experiments. You have to choose. Would you rather believe in a 3% coincidence? Or that the population means are really different? "Extremely significant" results

Intuitively, you probably think that P=0.0001 is more statistically significant than P=0.04. Using strict definitions, this is not correct. Once you have set a threshold P value for statistical significance, every result is either statistically significant or is not statistically significant. Some statisticians feel very strongly about this. Many scientists are not so rigid, and refer to results as being "very significant" or "extremely significant" when the P value is tiny. Often, results are flagged with a single asterisk when the P value is less than 0.05, with two asterisks when the P value is less than 0.01, and three asterisks when the P value is less than 0.001. This is not a firm convention, so you need to check the figure legends when you see asterisks to find the definitions the author used. One- vs. two-tail P values When comparing two groups, you must distinguish between one- and two-tail P values. Start with the null hypothesis that the two populations really are the same and that the observed discrepancy between sample means is due to chance.

The two-tail P value answers this question: Assuming the null hypothesis, what is the chance that randomly selected samples would have means as far apart as observed in this experiment with either group having the larger mean? To interpret a one-tail P value, you must predict which group will have the larger mean before collecting any data. The one-tail P value answers this question: Assuming the null hypothesis, what is the chance that randomly selected samples would have means as far apart as observed in this experiment with the specified group having the larger mean?

A one-tail P value is appropriate only when previous data, physical limitations or common sense tell you that a difference, if any, can only go in one direction. The issue is not whether you expect a difference to exist - that is what you are trying to find out with the experiment. The issue is whether you should interpret increases and decreases the same. You should only choose a one-tail P value when you believe the following:

Before collecting any data, you can predict which group will have the larger mean (if the means are in fact different). If the other group ends up with the larger mean, then you should be willing to attribute that difference to chance, no matter how large the difference.

It is usually best to use a two-tail P value for these reasons:


y

The relationship between P values and confidence intervals is more clear with two-tail P values. Some tests compare three or more groups, which makes the concept of tails inappropriate (more precisely, the P values have many tails). A two-tail P value is more consistent with the P values reported by these tests. Choosing a one-tail P value can pose a dilemma. What would you do if you chose a one-tail P value, but observed a large difference in the opposite direction to the experimental hypothesis? To be rigorous, you should conclude that the difference is due to chance, and that the difference is not statistically significant. But most people would

be tempted to switch to a two-tail P value or to reverse the direction of the experimental hypothesis. You avoid this situation by always using two-tail P values.
Statistical hypothesis testing

The P value is a fraction. In many situations, the best thing to do is report that number to summarize the results of a comparison. If you do this, you can totally avoid the term "statistically significant", which is often misinterpreted. In other situations, you'll want to make a decision based on a single comparison. In these situations, follow the steps of statistical hypothesis testing. 1.Set a threshold P value before you do the experiment. Ideally, you should set this value based on the relative consequences of missing a true difference or falsely finding a difference. In fact, the threshold value (called alpha) is traditionally almost always set to 0.05. 2.Define the null hypothesis. If you are comparing two means, the null hypothesis is that the two populations have the same mean.

3.Do the appropriate statistical test to compute the P value. 4.Compare the P value to the preset threshold value. If the P value is less than the threshold, state that you "reject the null hypothesis" and that the difference is "statistically significant". If the P value is greater than the threshold, state that you "do not reject the null hypothesis" and that the difference is "not statistically significant". Note that statisticians use the term hypothesis testing very differently than scientists. Statistical significance The term significant is seductive, and it is easy to misinterpret it. A result is said to bestatistically significant when the result would be surprising if the populations were really identical. A result is said to be statistically significant when the P value is less than a preset threshold value. It is easy to read far too much into the word significant because the statistical use of the word has a meaning entirely distinct from its usual meaning. Just because a difference is statistically significant does not mean that it is important or

interesting. And a result that is not statistically significant (in the first experiment) may turn out to be very important. If a result is statistically significant, there are two possible explanations:
y

The populations are identical, so there really is no difference. You happened to randomly obtain larger values in one group and smaller values in the other, and the difference was large enough to generate a P value less than the threshold you set. Finding a statistically significant result when the populations are identical is called making a Type I error. The populations really are different, so your conclusion is correct.

There are also two explanations for a result that is not statistically significant:
y

The populations are identical, so there really is no difference. Any difference you observed in the experiment was a coincidence. Your conclusion of no significant difference is correct. The populations really are different, but you missed the difference due to some

combination of small sample size, high variability and bad luck. The difference in your experiment was not large enough to be statistically significant. Finding results that are not statistically significant when the populations are different is called making a Type II error.

Confidence intervals
Statistical calculations produce two kinds of results that help you make inferences about the populations from the samples. You've already learned about P values. The second kind of result is a confidence interval. 95% confidence interval of a mean Although the calculation is exact, the mean you calculate from a sample is only an estimate of the population mean. How good is the estimate? It depends on how large your sample is and how much the values differ from one another. Statistical calculations combine sample size and variability to generate a confidence interval for the population mean. You can calculate intervals for any desired degree of confidence, but 95%

confidence intervals are used most commonly. If you assume that your sample is randomly selected from some population, you can be 95% sure that the confidence interval includes the population mean. More precisely, if you generate many 95% CI from many data sets, you expect the CI to include the true population mean in 95% of the cases and not to include the true mean value in the other 5%. Since you don't know the population mean, you'll never know for sure whether or not your confidence interval contains the true mean. Other situations When comparing groups, calculate the 95% confidence interval for the difference between the population means. Again interpretation is straightforward. If you accept the assumptions, there is a 95% chance that the interval you calculate includes the true difference between population means. Methods exist to compute a 95% confidence interval for any calculated statistic, for example the relative risk or the best-fit value in nonlinear regression. The interpretation is the same in all cases. If you accept the assumptions of the test,

you can be 95% sure that the interval contains the true population value. Or more precisely, if you repeat the experiment many times, you expect the 95% confidence interval will contain the true population value in 95% of the experiments. Why 95%? There is nothing special about 95%. It is just convention that confidence intervals are usually calculated for 95% confidence. In theory, confidence intervals can be computed for any degree of confidence. If you want more confidence, the intervals will be wider. If you are willing to accept less confidence, the intervals will be narrower.

What is "Statistical Significance" (p-value)?


The statistical significance of a result is the probability that the observed relationship (e.g., between variables) or a difference (e.g., between means) in a sample occurred by pure chance ("luck of the draw"), and that in the population from

which the sample was drawn, no such relationship or differences exist. Using less technical terms, we could say that the statistical significance of a result tells us something about the degree to which the result is "true" (in the sense of being "representative of the population"). More technically, the value of the p-value represents a decreasing index of the reliability of a result (see Brownlee, 1960). The higher the pvalue, the less we can believe that the observed relation between variables in the sample is a reliable indicator of the relation between the respective variables in the population. Specifically, the p-value represents the probability of error that is involved in accepting our observed result as valid, that is, as "representative of the population." For example, a p-value of .05 (i.e.,1/20) indicates that there is a 5% probability that the relation between the variables found in our sample is a "fluke." In other words, assuming that in the population there was no relation between those variables whatsoever, and we were repeating experiments such as ours one after another, we could expect that approximately in every 20 replications of the experiment there would be one in which the relation between the variables in question would be equal or stronger than in ours. (Note that this is not the same as saying that,

given that there IS a relationship between the variables, we can expect to replicate the results 5% of the time or 95% of the time; when there is a relationship between the variables in the population, the probability of replicating the study and finding that relationship is related to the statistical power of the design. See also, Power Analysis). In many areas of research, the p-value of .05 is customarily treated as a "border-line acceptable" error level.

You might also like