0% found this document useful (0 votes)
13 views

Unit 2

FUNDAMENTALS OF HEALTHCARE ANALYTICS REGULATION 2021

Uploaded by

suhagaja
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Unit 2

FUNDAMENTALS OF HEALTHCARE ANALYTICS REGULATION 2021

Uploaded by

suhagaja
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Statistical parameters p-values

Few statistical estimates are as significant as the p-value. The p-value or probability value is a
number, calculated from a statistical test, that describes how likely your results would have
occurred if the null hypothesis were true. A P-value less than 0.5 is statistically significant, while
a value higher than 0.5 indicates the null hypothesis is true; hence it is not statistically
significant. So, what is P-Value exactly, and why is it so important?

What Is P-Value?

In statistical hypothesis testing, P-Value or probability value can be defined as the measure of the
probability that a real-valued test statistic is at least as extreme as the value actually obtained. P-
value shows how likely it is that your set of observations could have occurred under the null
hypothesis. P-Values are used in statistical hypothesis testing to determine whether to reject the
null hypothesis. The smaller the p-value, the stronger the likelihood that you should reject the
null hypothesis. P-values are expressed as decimals and can be converted into percentage. For
example, a p-value of 0.0237 is 2.37%, which means there's a 2.37% chance of your results
being random or having happened by chance. The smaller the P-value, the more significant your
results are. In a hypothesis test, you can compare the p value from your test with the alpha level
selected while running the test. Now, let’s try to understand what is P-Value vs Alpha level.

P Value vs Alpha Level


A P-value indicates the probability of getting an effect no less than that actually observed in the
sample data.An alpha level will tell you the probability of wrongly rejecting a true null
hypothesis. The level is selected by the researcher and obtained by subtracting your confidence
level from 100%. For instance, if you are 95% confident in your research, the alpha level will be
5% (0.05).

When you run the hypothesis test, if you get:

 A small p value (<=0.05), you should reject the null hypothesis

 A large p value (>0.05), you should not reject the null hypothesis
P Values and Critical Values
In addition to the P-value, you can use other values given by your test to determine if your null
hypothesis is true. For example, if you run an F-test to compare two variances in Excel, you will
obtain a p-value, an f-critical value, and a f-value. Compare the f-value with f-critical value. If f-
critical value is lower, you should reject the null hypothesis.

How Is P-Value Calculated?


P-Values are usually calculated using p-value tables or spreadsheets, or calculated automatically
using statistical software like R, SPSS, etc. Depending on the test statistic and degrees of
freedom (subtracting no. of independent variables from no. of observations) of your test, you can
find out from the tables how frequently you can expect the test statistic to be under the null
hypothesis.

How to calculate P-value depends on which statistical test you’re using to test your hypothesis.

 Every statistical test uses different assumptions and generates different statistics.
Select the test method that best suits your data and matches the effect or relationship
being tested.

 The number of independent variables included in your test determines how big or
small the test statistic should be in order to generate the same p-value.

Regardless of what statistical test you are using, the p-value will always denote the same thing –
how frequently you can expect to get a test statistic as extreme or even more extreme than the
one given by your test.

P-Value in Hypothesis Testing


In the P-Value approach to hypothesis testing, a calculated probability is used to decide if there’s
evidence to reject the null hypothesis, also known as the conjecture. The conjecture is the initial
claim about a data population, while the alternative hypothesis ascertains if the observed
population parameter differs from the population parameter value according to the conjecture.
Effectively, the significance level is declared in advance to determine how small the P-value
needs to be such that the null hypothesis is rejected. The levels of significance vary from one
researcher to another; so it can get difficult for readers to compare results from two different
tests. That is when P-value makes things easier.

P-Values and Statistical Significance


The level of statistical significance is usually represented as a P-value between 0 and 1. The
smaller the p-value, the more likely it is that you would reject the null hypothesis.

 A P-Value < or = 0.05 is considered statistically significant. It denotes strong evidence


against the null hypothesis, since there is below 5% probability of the null being
correct. So, we reject the null hypothesis and accept the alternative hypothesis.

 But if P-Value is lower than your threshold of significance, though the null hypothesis
can be rejected, it does not mean that there is 95% probability of the alternative
hypothesis being true.

 A P-Value >0.05 is not statistically significant. It denotes strong evidence for the null
hypothesis being true. Thus, we retain the null hypothesis and reject the alternative
hypothesis. We cannot accept null hypothesis; we can only reject or not reject it.

A statistically significant result does not prove a research hypothesis to be correct. Instead, it
provides support for or provides evidence for the hypothesis.

Reporting P-Values

 You should report exact P-Values upto two or three decimal places.

 For P-values less than .001, report as p < .001.

 Do not use 0 before the decimal point as it cannot equal1. Write p = .001, and not p =
0.001

 Make sure p is always italicized and there is space on either side of the = sign.

 It is impossible to get P = .000, and should be written as p < .001


Chi-Square (χ2) Hypothesis Test

Usually, the goal of the Six Sigma team is to find the level of variation of the output, not just the mean of
the population. Above all, the team would like to know how much variation the production process shows
about the target to see what changes are needed to reach a process free of defects.

For a comparison between several sample variances or a comparison between frequency proportions, the
standard test statistic called the Chi-Square χ2 test will be used. So, the distribution of the Chi-Square
statistic is called the Chi-Square distribution.

Types of Chi-Square Hypothesis Tests

There are two types of Chi-Square tests:

 Chi-Square Test of Independence: Determines whether there is any association between two
categorical variables by comparing the observed and expected frequencies of test outcomes when
there is no defined population variance.
 Chi-Square Test of Variance: Compare the variances when the variance of the population is
known.

Chi-Square Test of Independence

The Chi-Square Test of Independence determines whether there is an association between two categorical
variables (like gender and course choice). For example, the Chi-Square Test of Independence examines
the association between one category, like gender (male and female), and another category, like the
percentage of absenteeism in a school. The Chi-Square Test of Independence is a non-parametric test. In
other words, you do not need to assume a normal distribution to perform the test.

A Chi-Square test uses a contingency table to analyze the data. Each row shows the categories of one
variable. Similarly, each column shows the categories of another variable. Each variable must have two or
more categories. Each cell reflects the total number of cases for a specific pair of categories.
Assumptions of Chi-Square Test of Independence

 Variables must be nominal or categorical


 Categories of variables are mutually exclusive
 The sampling method is a simple random sampling
 The data in the contingency table are frequencies or count
Steps to Perform Chi-Square Test of Independence

Step1: Define the Null Hypothesis and Alternative Hypothesis

 Null Hypothesis (H0): There is no association between two categorical variables


 Alternative Hypothesis (H1): There is a significant association between two categorical variables

Step2: Specify the level of significance

Step 3: Compute χ2 statistic


 O is the observed frequency
 E is the expected frequency

The expected frequency is calculated for each cell = (frequency of columns * frequency of rows)/ n

Step 4: Calculate the degree of freedom = (number of rows -) * (number of columns -1)= (r-1) * (c-1)

Step 5: Find the critical value based on degrees of freedom

Step 6: Finally, draw the statistical conclusion: If the test statistic value is greater than the critical value,
reject the null hypothesis, and hence, we can conclude that there is a significant association between two
categorical variables.

Hypothesis testing -single population proportion

In “Estimating a Population Proportion,” we continue our discussion of estimating a population


proportion with a confidence interval. Recall that the purpose of a confidence interval is to use a
sample proportion to construct an interval of values that we can be reasonably confident contains
the true population proportion.

The basic idea is summarized here:

 When we select a random sample from the population of interest, we expect the sample
proportion to be a good estimate of the population proportion. But we also know that
sample proportions vary, so we expect some error. (Remember that the error here is due
to chance. It is not due to a mistake that anyone made.)
 For a given sample proportion, we will not know the amount of error, so we use the
standard error as an estimate for the average amount of error we expect in sample
proportions. (Recall that the standard error is the expected standard deviation of sample
proportions when we take many, many random samples.)
 If a normal model is a good fit for the sampling distribution, then about 95% of sample
proportions estimate the population proportion within 2 standard errors. We say that we
are 95% confident that the following interval contains the population proportion.
You may realize that this formula for the confidence interval is a bit odd, since our goal in
calculating the confidence interval is to estimate the population proportion p. Yet the formula
requires that we know p. In the section “Introduction to Statistical Inference,” we used an
estimate for p from a previous study when calculating the confidence interval. This is not the
usual way statisticians estimate the standard error, but it captured the main idea and allowed us
to practice finding and interpreting confidence intervals. Now, we develop a different way to
estimate standard error that is commonly used in statistical practice.

A Program for finding Mean


import statistics
li = [1, 2, 3, 3, 2, 2, 2, 1]
print ("The average of list values is : ",end="")
print (statistics.mean(li))

Output:
The average of list values is : 2

Program for finding Median

from statistics import median


from fractions import Fraction as fr
data1 = (2, 3, 4, 5, 7, 9, 11)
data2 = (2.4, 5.1, 6.7, 8.9)
data3 = (fr(1, 2), fr(44, 12),
fr(10, 3), fr(2, 3))
data4 = (-5, -1, -12, -19, -3)
data5 = (-1, -2, -3, -4, 4, 3, 2, 1)
print("Median of data-set 1 is % s" % (median(data1)))
print("Median of data-set 2 is % s" % (median(data2)))
print("Median of data-set 3 is % s" % (median(data3)))
print("Median of data-set 4 is % s" % (median(data4)))
print("Median of data-set 5 is % s" % (median(data5))

Output:
Median of data-set 1 is 5
Median of data-set 2 is 5.9
Median of data-set 3 is 2
Median of data-set 4 is -5
Median of data-set 5 is 0.0

Program for finding Mode

from statistics import mode


from fractions import Fraction as fr
data1 = (2, 3, 3, 4, 5, 5, 5, 5, 6, 6, 6, 7)
data2 = (2.4, 1.3, 1.3, 1.3, 2.4, 4.6)
data3 = (fr(1, 2), fr(1, 2), fr(10, 3), fr(2, 3))
data4 = (-1, -2, -2, -2, -7, -7, -9)
data5 = ("red", "blue", "black", "blue", "black", "black", "brown")
print("Mode of data set 1 is % s" % (mode(data1)))
print("Mode of data set 2 is % s" % (mode(data2)))
print("Mode of data set 3 is % s" % (mode(data3)))
print("Mode of data set 4 is % s" % (mode(data4)))
print("Mode of data set 5 is % s" % (mode(data5)))

O/p

Mode of data set 1 is 5


Mode of data set 2 is 1.3
Mode of data set 3 is 1/2
Mode of data set 4 is -2
Mode of data set 5 is black

Python code to calculate Range


arr = [1, 2, 3, 4, 5]
Maximum = max(arr)
Minimum = min(arr)
Range = Maximum-Minimum
print("Maximum = {}, Minimum = {} and Range = {}".format(Maximum, Minimum,
Range))

Output:
Maximum = 5, Minimum = 1 and Range = 4

Python code to calculate Variance


from statistics import variance
from fractions import Fraction as fr
sample1 = (1, 2, 5, 4, 8, 9, 12)
sample2 = (-2, -4, -3, -1, -5, -6)
sample3 = (-9, -1, -0, 2, 1, 3, 4, 19)
sample4 = (fr(1, 2), fr(2, 3), fr(3, 4),
fr(5, 6), fr(7, 8))
sample5 = (1.23, 1.45, 2.1, 2.2, 1.9)
print("Variance of Sample1 is % s " % (variance(sample1)))
print("Variance of Sample2 is % s " % (variance(sample2)))
print("Variance of Sample3 is % s " % (variance(sample3)))
print("Variance of Sample4 is % s " % (variance(sample4)))
print("Variance of Sample5 is % s " % (variance(sample5)))
Output:
Variance of Sample1 is 15.80952380952381
Variance of Sample2 is 3.5
Variance of Sample3 is 61.125
Variance of Sample4 is 1/45
Variance of Sample5 is 0.17613000000000006

Python code to calculate Standard Deviation


from statistics import stdev
from fractions import Fraction as fr
sample1 = (1, 2, 5, 4, 8, 9, 12)
sample2 = (-2, -4, -3, -1, -5, -6)
sample3 = (-9, -1, -0, 2, 1, 3, 4, 19)
sample4 = (1.23, 1.45, 2.1, 2.2, 1.9)
print("The Standard Deviation of Sample1 is % s"
% (stdev(sample1)))

print("The Standard Deviation of Sample2 is % s"


% (stdev(sample2)))

print("The Standard Deviation of Sample3 is % s"


% (stdev(sample3)))

print("The Standard Deviation of Sample4 is % s"


% (stdev(sample4)))

Output:
The Standard Deviation of Sample1 is 3.9761191895520196
The Standard Deviation of Sample2 is 1.8708286933869707
The Standard Deviation of Sample3 is 7.8182478855559445
The Standard Deviation of Sample4 is 0.41967844833872525

You might also like