Unit 2
Unit 2
Few statistical estimates are as significant as the p-value. The p-value or probability value is a
number, calculated from a statistical test, that describes how likely your results would have
occurred if the null hypothesis were true. A P-value less than 0.5 is statistically significant, while
a value higher than 0.5 indicates the null hypothesis is true; hence it is not statistically
significant. So, what is P-Value exactly, and why is it so important?
What Is P-Value?
In statistical hypothesis testing, P-Value or probability value can be defined as the measure of the
probability that a real-valued test statistic is at least as extreme as the value actually obtained. P-
value shows how likely it is that your set of observations could have occurred under the null
hypothesis. P-Values are used in statistical hypothesis testing to determine whether to reject the
null hypothesis. The smaller the p-value, the stronger the likelihood that you should reject the
null hypothesis. P-values are expressed as decimals and can be converted into percentage. For
example, a p-value of 0.0237 is 2.37%, which means there's a 2.37% chance of your results
being random or having happened by chance. The smaller the P-value, the more significant your
results are. In a hypothesis test, you can compare the p value from your test with the alpha level
selected while running the test. Now, let’s try to understand what is P-Value vs Alpha level.
A large p value (>0.05), you should not reject the null hypothesis
P Values and Critical Values
In addition to the P-value, you can use other values given by your test to determine if your null
hypothesis is true. For example, if you run an F-test to compare two variances in Excel, you will
obtain a p-value, an f-critical value, and a f-value. Compare the f-value with f-critical value. If f-
critical value is lower, you should reject the null hypothesis.
How to calculate P-value depends on which statistical test you’re using to test your hypothesis.
Every statistical test uses different assumptions and generates different statistics.
Select the test method that best suits your data and matches the effect or relationship
being tested.
The number of independent variables included in your test determines how big or
small the test statistic should be in order to generate the same p-value.
Regardless of what statistical test you are using, the p-value will always denote the same thing –
how frequently you can expect to get a test statistic as extreme or even more extreme than the
one given by your test.
But if P-Value is lower than your threshold of significance, though the null hypothesis
can be rejected, it does not mean that there is 95% probability of the alternative
hypothesis being true.
A P-Value >0.05 is not statistically significant. It denotes strong evidence for the null
hypothesis being true. Thus, we retain the null hypothesis and reject the alternative
hypothesis. We cannot accept null hypothesis; we can only reject or not reject it.
A statistically significant result does not prove a research hypothesis to be correct. Instead, it
provides support for or provides evidence for the hypothesis.
Reporting P-Values
You should report exact P-Values upto two or three decimal places.
Do not use 0 before the decimal point as it cannot equal1. Write p = .001, and not p =
0.001
Make sure p is always italicized and there is space on either side of the = sign.
Usually, the goal of the Six Sigma team is to find the level of variation of the output, not just the mean of
the population. Above all, the team would like to know how much variation the production process shows
about the target to see what changes are needed to reach a process free of defects.
For a comparison between several sample variances or a comparison between frequency proportions, the
standard test statistic called the Chi-Square χ2 test will be used. So, the distribution of the Chi-Square
statistic is called the Chi-Square distribution.
Chi-Square Test of Independence: Determines whether there is any association between two
categorical variables by comparing the observed and expected frequencies of test outcomes when
there is no defined population variance.
Chi-Square Test of Variance: Compare the variances when the variance of the population is
known.
The Chi-Square Test of Independence determines whether there is an association between two categorical
variables (like gender and course choice). For example, the Chi-Square Test of Independence examines
the association between one category, like gender (male and female), and another category, like the
percentage of absenteeism in a school. The Chi-Square Test of Independence is a non-parametric test. In
other words, you do not need to assume a normal distribution to perform the test.
A Chi-Square test uses a contingency table to analyze the data. Each row shows the categories of one
variable. Similarly, each column shows the categories of another variable. Each variable must have two or
more categories. Each cell reflects the total number of cases for a specific pair of categories.
Assumptions of Chi-Square Test of Independence
The expected frequency is calculated for each cell = (frequency of columns * frequency of rows)/ n
Step 4: Calculate the degree of freedom = (number of rows -) * (number of columns -1)= (r-1) * (c-1)
Step 6: Finally, draw the statistical conclusion: If the test statistic value is greater than the critical value,
reject the null hypothesis, and hence, we can conclude that there is a significant association between two
categorical variables.
When we select a random sample from the population of interest, we expect the sample
proportion to be a good estimate of the population proportion. But we also know that
sample proportions vary, so we expect some error. (Remember that the error here is due
to chance. It is not due to a mistake that anyone made.)
For a given sample proportion, we will not know the amount of error, so we use the
standard error as an estimate for the average amount of error we expect in sample
proportions. (Recall that the standard error is the expected standard deviation of sample
proportions when we take many, many random samples.)
If a normal model is a good fit for the sampling distribution, then about 95% of sample
proportions estimate the population proportion within 2 standard errors. We say that we
are 95% confident that the following interval contains the population proportion.
You may realize that this formula for the confidence interval is a bit odd, since our goal in
calculating the confidence interval is to estimate the population proportion p. Yet the formula
requires that we know p. In the section “Introduction to Statistical Inference,” we used an
estimate for p from a previous study when calculating the confidence interval. This is not the
usual way statisticians estimate the standard error, but it captured the main idea and allowed us
to practice finding and interpreting confidence intervals. Now, we develop a different way to
estimate standard error that is commonly used in statistical practice.
Output:
The average of list values is : 2
Output:
Median of data-set 1 is 5
Median of data-set 2 is 5.9
Median of data-set 3 is 2
Median of data-set 4 is -5
Median of data-set 5 is 0.0
O/p
Output:
Maximum = 5, Minimum = 1 and Range = 4
Output:
The Standard Deviation of Sample1 is 3.9761191895520196
The Standard Deviation of Sample2 is 1.8708286933869707
The Standard Deviation of Sample3 is 7.8182478855559445
The Standard Deviation of Sample4 is 0.41967844833872525