Statis-Methods Testing Hypothesis
Statis-Methods Testing Hypothesis
1
STANDARD DISTRIBUTIONS
CONTENTS OF MODULE
Unit Structure
1.0 Objective
1.1 Introduction
1.2 Study Guidance
1.3 Standard Distributions
1.3.1 Random, Discrete and continuous variable
1.3.2 Probability Mass Function
1.3.3 Probability Density Function
1.3.4 Expectation
1.3.5 Variance
1.3.6 Cumulative Distribution Function
1.3.7 Reliability
1.4 Introduction and properties of following distributions
1.5 Binomial Distribution
1.6 Normal Distribution
1.7 Chi-square test
1.8 T-test
1.9 F-test
1.10 Summary
1.11 Unit End Questions
1.12 References
1.13 Further Readings
1.0 OBJECTIVES
Students will be able to:
Identify the types of random variables.
Understand the concept of Probability distribution.
Enable students to understand various types of distributions.
1.1 INTRODUCTION
The science of statistics deals with assessing the uncertainty of inferences
drawn from random samples of data. This chapter focuses on random
variables its types and their probability distribution. To assess the outcome
1
of an experiment it is desirable to associate a real number X with the Standard Distributions
possible outcome of an event. The concept of “randomness” is Contents of Module
fundamental to the field of statistics. Probability is not only used for
calculating the outcome of one event but also can summarize the likelihood
of all possible outcomes. The relationship between each possible outcome
for a random variable and its probabilities is called a probability
distribution. Probability distributions are an important foundational concept
in probability and the names and shapes of common probability
distributions will be familiar. The structure and type of the probability
distribution vary based on the properties of the random variable, such as
continuous or discrete, and this, in turn, impacts how the distribution might
be summarized or how to calculate the most likely outcome and its
probability.
2
Statistical Methods And
P a X b f x dx
b
Testing of Hypothesis a
(1) p(x) must be nonnegative for each value of the random variable
(2) The sum of the probabilities for each value of the random variable
must be equal to one.
px 1
n
i. e, i i
X 0 1 2
P[X=x] 1/4 2/4 1/4
Note: Please note that the probability mass function is different from the
probability density function. f(x) does not give any value of probability
directly hence the rules of probability do not apply to it.
x, 0 x 1
F X find p [0.2<x<1.2]
2 x,1 x 2
Solution:
P0.2 X 1.2 f x dx
1.2
0.2
0.2
xdx
1
2 xdx
1
x2 x2
1.2
2x
2
0.2 2 1
1 1
0.02 2.4 0.72 2
2 2
1 1
0.02 1.68 2
2 2
0.66
E.g. Find the expected value of the following probability distribution from
4
Statistical Methods And
Testing of Hypothesis
the given probability distribution table
5
i.e. f x 0 ,xfor all x. -1 -2 -3 0 1 2
Standard Distributions
Contents of Module
Solution:
Expected value,
E X i1 xi P xi
n
1 0.25 2 0.35 3 0.01 0 0.011 0.2 2 0.18
= 0.42
6
Statistical Methods And
Testing of Hypothesis
V X EX 2 E X
2
where E(X) is the expected value
7
E X 2 x2 p x Standard Distributions
Contents of Module
V X EX 2 E X 2
where E(X) is the expected value
E X 2 x2 f x dx
X 1 2 3 4 5 6
P(X) 0.2 0.15 0.1 0.2 0.15 0.2
E X i` xi p xi
n
1 0.2 2 0.15 3 0.1 4 0.2 5 0.15 6 0.2
3.55
E X 2 x2P x
12 0.2 22 0.15 32 0.1 42 0.2 52 0.15 62 0.2
15.85
V X E X 2 E X
2
15.85 3.55
2
15.85 12.6025
3.2475
Mean = 3.55
Variance = 3.2475
6
Eg: The p.d.f of random variable X is f X 6 x x2 , 0 x 1 Find
Statistical Methods And
Testing of Hypothesis
Mean and variance?
Mean E X xf x dx
6x x x2 dx
1
1
x3 x4
6
3 4
0
6
1 1
3 4
1
6
12
1
2
E X 2 x2 f x dx
0 x2 6 x x2 dx
1
6 6x3 x4 dx
1
1
x4 x5
6
4 5
0
1
6
20
3
10
V X E X 2 E X
2
3 1
10 4
1
20
1
Mean =
7
E X 2 x2 p x Standard Distributions
Contents of Module
2
8
1 Standard Distributions
Variance Contents of Module
20
1.3.7 Reliability:
Reliability is dependent on probability for measuring and describing its
characteristics.The probability that the component survives until some time
t is called reliability R(t) of the component where X be the lifetime or the
time to failure of a component.
Bernoulli’s Trial:
Bernoulli‟s trials are events or experiments which results in two mutually
exhaustive outcome one of them is termed as success and the other is failure.
For example , when an unbiased coin is tossed we can define success as
getting tail and hence getting head is failure
8
Eg: The p.d.f of random variable X is f X 6 x x2 , 0 x 1 Find
Statistical Methods And
Testing of Hypothesis
Consider „n‟ independent Bernoulli‟s trial which results into either success
or failure with probability of success “p” and probability of failure “q”.
9
2. The mean, median, and mode of a normal distribution are identical. Standard Distributions
Contents of Module
probability distribution of X is called Binomial distribution and is defined
n x nz
as P X x pq
x
= 0 elsewhere Where x =0, 1 ,2….n,0<p<1 and q =1- p
For example, let‟s assume an unbiased coin is tossed 10 times and
probability of getting a head on one filp is ½.Flip 10 times ,the probability
of getting head on any throw is ½ and have a binomial distribution of n=10
and p = ½.„„Success” would be “flipping a head” and failure will be”
flipping tail‟
N
e x / 2 2 where =
2
The equation of the normal curve is
2
x
We can transform the variable x to z here z is called normal
variate.
1
0
Statistical Methods And Let „X‟ be a discrete random variable denoting the success in „n‟
Testing of Hypothesis independent trials the variate X is called random variate and the
1. It is bell shaped and symmetrical in nature.
1
1
2. The mean, median, and mode of a normal distribution are identical. Standard Distributions
Contents of Module
3. The total area under the normal curve is unity.
4. Normal distributions are denser in the center and less dense in the
tails.
8. The position and shape of the normal curve depend upon , 𝑎𝑛𝑑 𝑁
10
Statistical Methods And
Testing of Hypothesis
1.8 T - DISTRIBUTION
The t-Distribution, also known as Student‟s t-distribution is the probability
distribution that estimates the population parameters when the sample size
is small and the population standard deviation is unknown.
It resembles the normal distribution and as the sample size increases the t-
distribution looks more normally distributed with the values of means and
standard deviation of 0 and 1 respectively.
Properties of t-Distribution:
1. The graph of the t distribution is also bell-shaped and symmetrical
with a mean zero.
2. The t-distribution is most useful for small sample sizes, when the
population standard deviation is not known, or both.
3. The student distribution ranges from to (infinity).
4. The shape of the t-distribution changes with the change in the degrees
of freedom.
5. The variance is always greater than one and can be defined only when
the degrees of freedom v 3
2v 2
v v 2
Variance = 2
, 1 2
v v 2 v 4
2
1 2 2
1.10 SUMMARY
We discussed about random variable and its different types. There are two
types of probability distribution, discrete and continuos.A random variable
assumes only a finite or countably infinite number of values are called a
discrete random variable. A continuous random variable can assumes values
uncountable number of values. Discrete random variable is associated with
probability mass function and that of continuous related with probability
density function. Expected value and variance of the discrete and
continuous distribution were defined. We learnt some standard distributions
12
Statistical Methods And
Testing of Hypothesis
and its properties and these distributions will be applicable in testing of
hypothesis. The application methods of probability
13
6. A random variable x has following probability distribution Standard Distributions
Contents of Module
simulation algorithms, data mining, and speech recognition.
2x
(ii) f x where x = 3, 4, 5
3
1
(iii) f x where x = 1, 2
2
Value 0 1 2 3
x 2 4 6 8 10
P(x) 0.3 0.2 0.2 0.2 0.1
14
6. A random variable x has following probability distribution Standard Distributions
Contents of Module
x 0 1 2 3 4 5 6
P(x) k 2k 3k 5k 4k 2k K
7. A bag contains 4 Red and 6 White balls. Two balls are drawn at random
and gets Rs.10 for each red and Rs.5 for each white ball.. Find his
mathematical expectation.
1
3 x
2
F x - 3 x -1
16
1
6 2x
2
-1 x 1
16
1
3 x
2
1 x 3
16
1.12 REFERENCES
1. Probability and Statistics with Reliability, Queuing and Computer
Science Applications, Kishor S. Trivedi, 2016 by John Wiley &Sons,
Inc., 1946.
*****
14
UNIT II
2
HYPOTHESIS TESTING
Unit Structure
2.0 Objective
2.1 Introduction
2.2 Hypothesis Testing
2.3 Null Hypothesis (𝐻o)
2.4 Alternate Hypothesis (𝐻1)
2.5 Critical Region
2.6 P-Value
2.7 Tests based on T
2.8 Normal and F Distribution
2.9 Analysis of Variance
2.10 One Way analysis of variance
2.11 Two-way analysis of variance
2.12 Summary
2.13 Unit End Questions
2.14 References for Future Reading
2.0 OBJECTIVE
Statistics is referred to as a process of collecting, organizing and
analyzing data and drawing conclusions.
The statistical analysis gives significance to insignificant data or
numbers.
Statistics is “a branch of mathematics that deals with the collection,
analysis, interpretation, and presentation of masses of numerical data.
2.1 INTRODUCTION
The science of collecting, organizing, analyzing and interpreting data
in order to make decisions.
Statistics is used to describe the data set and to draw conclusion about
the population from the data set.
15
Inferential Method: This method uses confidence interval and Hypothesis Testing
significance test which are part of applied statistics.
16
Statistical Methods And
Testing of Hypothesis
Type I Error:
When we reject a hypothesis when it should be accepted
Type II Error:
When we accept a hypothesis when it should be rejected
◗ If the sample being tested falls into either of the critical areas, the
alternative hypothesis is accepted instead of the null hypothesis.
17
▸ One tail test: A one-tailed test is a statistical test in which the critical Hypothesis Testing
area of a distribution is one-sided so that it is either greater than or less
than a certain value, but not both.
◗ If the sample being tested falls into the one-sided critical area, the
alternative hypothesis will be accepted instead of the null hypothesis.
One-tailed tests are applied to answer for the questions: Is our finding
significantly greater than our assumed value? Or: Is our finding
significantly less than our assumed value?
Two-tailed tests are applied to answer the questions: Are the findings
different from the assumed mean?
Level of Significance:
18
Statistical Methods And
Testing of Hypothesis 2.6 P -VALUE
Z Score:
◗ Mean Z= (X -̄μ)/(σ/√N)
◗ Proportion Z= (P -p)/√(pq/N)
Steps for hypothesis testing
▹ If Z > Zc , reject Ho
Question:
Solution:
Step 1- Write given values
Population
Parameter
N = 50
19
Hypothesis Testing
◗
◗ LOS = = 0.01= 1 %
Step 2- Propose H0
α=0.05 (5 %) α=0.01 (1 %)
Two-tailed Test Zc=1.96 Zc= 2.58
One-tailed Test Zc=1.645 Zc= 2.33
Μ 1800
Σ 100
N 50
X¯ 1850
Step 6 – Inference
◗ Therefore, we can support the claim at 0.01 LOS. i.e., the cable
strength is increased.
20
Statistical Methods And
Testing of Hypothesis
N = 200
X ¯=75.9
LOS = α = 0.05= 5 %
α=0.05 (5 %) α=0.01 (1 %)
Z=2.4748
μ 74.5
σ 8
N 200
X¯ 75.9
Step 6 – Inference
Z =2.4748, Zc=1.645
As Z > Zc, reject Ho.
Therefore, we can support the claim at 0.05 LOS. i.e., the performance of
the school is better than population
Zc= 1.96
Step 5:
Z =2.4748
Step 6 : Inference
Therefore, we can support the claim at 0.05 LOS. i.e., the performance of
the school is different than the population
α=0.05 (5 %) α=0.01 (1 %)
22
Statistical Methods And
Testing of Hypothesis
Z Score
Mean
Z= (X -̄μ)/(σ/√N)
Proportion
Z= (P -p)/√(pq/N)
3. Identify test-
6. Inference-
23
Question Hypothesis Testing
N = 200
Step 2- Propose H0
Zc= 1.645
Z= (P -p)/√(pq/N)
Z=- 4.714
α=0.05 (5 %) α=0.01 (1 %)
p 0.9
q 0.1
P 0.8
N 200
24
Statistical Methods And Step 6 – Inference
Testing of Hypothesis
Z = -4.714, Zc=-1.645
Therefore, we cannot support the claim at 0.05 LOS. i.e., the medicine is
not 90% effective.
n(E) = 6
n(S) = 36
q = 1-p = 0.833
N = 100
P=23/100=0.23
LOS = α = 0.05= 5 %
25
Step 4: Get table value of Zc for LOS α=0.05 (5 %) Hypothesis Testing
Zc= 1.96
α=0.05 (5 %) α=0.01 (1 %)
Z= (P -p)/√(pq/N)
Z=0.1689
p 0.167
q 0.833
P 0.23
N 100
Step 6: Inference
Z =0.1689, Zc=1.96
As Z < Zc, Accept Ho.
Therefore, we can support the claim at 0.05 LOS. i.e., the dice are fair.
26
Statistical Methods And z score, or z statistic is replaced by a suitable t score, or t statistic.
Testing of Hypothesis
Q.10 individuals are chosen at random from a population and their height
(in inches) is found to be – 63, 63, 64, 65, 66, 69, 69, 70, 70, 71. Find
students t by considering population mean to be 65.
Solution:
Formula-
Given-
N = 10
μ = 65
Given:
μ = 0.050 in
N = 10
X ¯= 0.053 in
27
σ_x ¯= 0.003 Hypothesis Testing
Propose Hypothesis:
t= 3
At 5% LOS
tc= 2.26
t=3
As t > tc à Reject Ho at 5% LOS
At 1% LOS
tc=3.25
t=3
As t < tc à Accept Ho at 1% LOS
Where,
N1= Sample 1 size
N2= Sample 2 size
σ1 = Population 1 SD
σ2= Population 2 SD
S1= Sample 1 SD
28
Statistical Methods And S2= Sample 2 SD
Testing of Hypothesis
Q. Two samples of sizes 9 and 12 are drawn from two normally distributed
populations having variances 16 and 25 respectively. If the sample
variances are 20 and 8, determine whether the first sample has a
significantly larger variance than the second sample at significance levels
of (a)0.05 (b) 0.01
(F0.95=2.95, F0.99=4.74)
Solution:
Given:
N1 = 9
N2 = 12
σ1^2 = Population 1 variance =16
σ2^2 = Population 2 variance = 25
S1^2 = Sample 1 variance = 20
S2^2 = Sample 2 variance = 8
At 5% LOS
Fc= 2.95
F = 4.03
As F > Fc à We can conclude that the variance of sample 1 is significantly
larger than that for sample 2.
At 1% LOS
Fc =4.74
F = 4.03
As F < Fc à Variance of sample 1 is not larger than that for sample 2.
29
2.9 ANALYSIS OF VARIANCE (ANOVA) Hypothesis Testing
The ANOVA test allows a comparison of more than two groups at the same
time to determine whether a relationship exists between them. The result of
the ANOVA formula, the F statistic (also called the F-ratio), allows for the
analysis of multiple groups of data to determine the variability between
samples and within samples.
The one-way ANOVA compares the means between the groups you are
interested in and determines whether any of those means are statistically
significantly different from each other. Specifically, it tests the null
hypothesis:
where µ = group mean and k = number of groups. If, however, the one- way
ANOVA returns a statistically significant result, we accept the alternative
hypothesis (HA), which is that there are at least two group means that are
statistically significantly different from each other.
30
Statistical Methods And
Testing of Hypothesis 2.11 TWO-WAY ANALYSIS OF VARIANCE
A two-way ANOVA is used to estimate how the mean of a quantitative
variable changes according to the levels of two categorical variables. Use a
two-way ANOVA when you want to know how two independent variables,
in combination, affect a dependent variable.
Example: You are researching which type of fertilizer and planting density
produces the greatest crop yield in a field experiment. You assign different
plots in a field to a combination of fertilizer type (1, 2, or 3) and planting
density (1=low density, 2=high density), and measure the final crop yield in
bushels per acre at harvest time.
You can use a two-way ANOVA to find out if fertilizer type and planting
density influence average crop yield.
A two-way ANOVA with interaction tests three null hypotheses at the same
time:
There is no difference in group means at any level of the first
independent variable.
There is no difference in group means at any level of the second
independent variable.
The effect of one independent variable does not depend on the effect of
the other independent variable (a.k.a. no interaction effect).
A two-way ANOVA without interaction (a.k.a. an additive two-way
ANOVA) only tests the first two of these hypotheses.
Sum sq is the sum of squares (a.k.a. the variation between the group
means created by the levels of the independent variable and the overall
mean).
Mean sq shows the mean sum of squares (the sum of squares divided
by the degrees of freedom).
F value is the test statistic from the F-test (the mean square of the
variable divided by the mean square of each parameter).
Pr(>F) is the p-value of the F statistic, and shows how likely it is that
the F-value calculated from the F-test would have occurred if the null
hypothesis of no difference was true.
31
2.12 SUMMARY Hypothesis Testing
At the end of this chapter one can draw conclusion based on the data
available. Data will be processed, summarized and results can be generated
and in graphs it will be displayed.
-4 -2 -2 0 2 2 3 3
Q4. Two samples of sizes 8 and 12 are drawn from two normally distributed
populations having variances 25 and 49, respectively. If the sample
variances are 36 and 60, determine whether Summary.
https://www.scribbr.com/statistics/two-way-anova/
*****
32
UNIT III
3
NON-PARAMETRIC TESTS
Unit Structure
3.0 Objective
3.1 Introduction
3.2 Non-Parametric Test Definition
3.3 Need of Non-Parametric Test Definition
3.4 Sign Test
3.5 Wilcoxon‘s Signed Rank Test
3.6 Run Test
3.7 Kruskal-Walis Test
3.8 Post-hoc analysis of one-way analysis of variance:
3.9 Duncan‘s test Chi-square test of association
3.10 Summary
3.11 Unit End Questions
3.12 References for Future Reading
3.0 OBJECTIVE
This type of statistics can be used without the mean, sample size, standard
deviation, or the estimation of any other related parameters when none of
that information is available. Since nonparametric statistics makes fewer
assumptions about the sample data, its application is wider in scope than
parametric statistics.
3.1 INTRODUCTION
A non-parametric test (sometimes called a distribution free test) does not
assume anything about the underlying distribution (for example, that the
data comes from a normal distribution). That‘s compared to parametric test,
which makes assumptions about a population‘s parameters (for example,
the mean or standard deviation); When the word ―non parametric‖ is used
in stats, it doesn‘t mean that you know nothing about the population. It
usually means that you know the population data does not have a normal
distribution.
Sign Test:
The sign test compares the sizes of two groups. It is a non-parametric or
―distribution free‖ test, which means the test doesn‘t assume the data
comes from a particular distribution, like the normal distribution. The sign
test is an alternative to a one sample t test or a paired t test. It can also be
used for ordered (ranked) categorical data. The null hypothesis for the sign
test is that the difference between medians is zero.
34
Statistical Methods And H0: No difference in median of the signed differences.
Testing of Hypothesis
H1: Median of the signed differences is less than zero.
Step1: Subtract set 2 from set 1 and put the result in the third column.
4 positives.
12 negatives.
Step 3: Add up the number of items in the sample and subtract, we get a
difference of zero for (in column 3). The sample size in this question was
17, with one zero, so n = 16.
Step 4: Find the p-value using a binomial distribution table or use a
binomial calculator.
.5 for the probability. The null hypothesis is that there are an equal
number of signs (i.e., 50/50). Therefore, the test is simple binomial
experiment with a .5 chance of the sign being negative and .5 of it being
positive (assuming the null hypothesis is true).
35
4 for the number of successes. ―Successes‖ here is the smaller of either Non-Parametric Tests
the positive or negative signs from Step 2.
The p-value is 0.038, which is smaller than the alpha level of 0.05. We can
reject the null hypothesis and there is a significant difference.
36
Statistical Methods And Step 1: State the null and alternative hypotheses.
Testing of Hypothesis
H0: The median difference between the two groups is zero.
HA: The median difference is negative. (e.g., the players make less free
throws before participating in the training program)
Step 2: Find the difference and absolute difference for each pair.
Step3:
Step 4: Find the sum of the positive ranks and the negative ranks.
37
Step 5: Reject or fail to reject the null hypothesis. Non-Parametric Tests
The test statistic, W, is the smaller of the absolute values of the positive
ranks and negative ranks. In this case, the smaller value is 29.5. Thus, our
test statistic is W = 29.5.
If our test statistic, W, is less than or equal to the critical value in the table,
we can reject the null hypothesis. Otherwise, we fail to reject the null
hypothesis.
The critical value that corresponds to an alpha level of 0.05 and n = 13 (the
total number of pairs minus the two we didn‘t calculate ranks for since they
had an observed difference of 0) is 17.
Since in test statistic (W = 29.5) is not less than or equal to 17, we fail to
reject the null hypothesis
Source: This Question and Solution is taken from the link: How to Perform
the Wilcoxon Signed Rank Test - Statology
38
Statistical Methods And the occurrence of similar events that are separated by events that are
Testing of Hypothesis
different.
Wolfowitz runs test, which was developed by mathematicians
Abraham Wald and Jacob Wolfowitz.
A runs test is a statistical analysis that helps determine the randomness
of data by revealing any variables that might affect data patterns.
Technical traders can use a runs test to analyze statistical trends and
help spot profitable trading opportunities.
For example, an investor interested in analyzing the price movement
of a particular stock might conduct a runs test to gain insight into
possible future price action of that stock.
A nonparametric test for randomness is provided by the theory of runs.
To understand what a run is, consider a sequence made up of two
symbols, a and b, such as
Proceeding from left to right in sequence (10), the first run, indicated
by a vertical bar, consists of two a‘s; similarly, the second run consists
of three b‘s, the third run consists of one a, etc. There are seven runs
in all.
39
There seems to be a trend pattern, in which the a‘s and b‘s are Non-Parametric Tests
grouped (or clustered) together. In such case there are too few runs,
and we would not consider the sequence to be random. Thus, a
sequence would be considered nonrandom if there are either too many
or too few runs, and random otherwise.
To quantify this idea, suppose that we form all possible sequences
consisting of N1 a‘s and N2 b‘s, for a total of N symbols in all N1 +
N2 = N. The collection of all these sequences provides us with a
sampling distribution: Each sequence has an associated number of
runs, denoted by V. In this way we are led to the sampling distribution
of the statistic V. It can be shown that this sampling distribution has a
mean and variance given, respectively, by the formulas
By using formulas, we can test the hypothesis of randomness at appropriate
levels of significance. It turns out that if both N1 and N2 are at least equal
to 8, then the sampling distribution of V is very nearly a normal
distribution. Thus, it is normally distributed with mean 0 and variance 1.
40
Statistical Methods And Rank the data from 1 for the smallest value of the dependent variable
Testing of Hypothesis
and next smallest variable rank 2 and so on… (if any value ties, in
that case it is advised to use mid-point), N being the highest variable.
Compute the test statistic
Determine critical value from Chi-Square distribution table
The test statistic for the Kruskal Wallis test denoted as H is given as
follows:
decreasing the smallest value will have zero effect on H. Hence, the
extreme outliers (higher and lower side) will not impact this test.
41
Null Hypothesis H0: The distribution of operator scores is same Non-Parametric Tests
Right tailed chi-square test with 95% confidence level, and df =3,
critical χ2 value is 7.815
42
Statistical Methods And statistically significant difference in group means (i.e., a statistically
Testing of Hypothesis
significant one-way ANOVA result). Post hoc tests attempt to control the
experiment wise error rate (usually alpha = 0.05) in the same manner that
the one-way ANOVA is used instead of multiple t-tests.
ϑ=(m-1)(n-1)
No Moderate
Heavy smokers
Smokers Smokers
Hypertension 21 36 30
No hypertension 48 26 19
Solution:
Ho: Presence or absence of hypertension is independent of smoking.
H1: Presence or absence of hypertension is dependent of smoking.
No Moderate
Heavy smokers
Smokers Smokers
o
O o
Hypertension 21 36 30 RT1=87
No hypertension 48 26 19 RT2= 93
CT3=49
Total=180 CT1 =69 CT2=62
Total=180
RT=Row Total and CT=Column Total
No Smokers Moderate Heavy
43
E Smokers smokers Non-Parametric Tests
e e
(RT1 x
Hypertension RT1xCT2/Total RT1xCT3/Total
CT1)/Total
No (RT2 x (RT2 x (RT2 x
hypertension CT1)/Total CT2)/Total CT3)/Total
Moderate Heavy
No Smokers
Smokers smokers
E
e e
Hypertension 87*69/180 87*62/180 87*49/180
No hypertension 93*69/180 93*62/180 93*49/180
No Moderate Heavy
Smokers Smokers smokers Total
O O O
Hypertension 21 36 30 87
No
48 26 19 93
hypertension
Total 69 62 49 180
o e (0-e)2/e
21 33.35 4.5734
36 29.967 1.2177
30 23.683 1.6849
48 35.65 4.2780
26 32.033 1.1363
19 25.316 1.5761
= 14.46
χ^2=14.46
χ_tab^2=5.99
As χ^2> χtab^2, Reject H0 at 5% LOS.
Therefore, we can conclude that Presence or absence of hypertension is
dependent of smoking.
The Chi-square test of independence determines whether there is a
statistically significant relationship between categorical variables. It is a
hypothesis test that answers the question—do the values of one categorical
variable depend on the value of other categorical variables? This test is also
known as the chi-square test of association.
44
Statistical Methods And Null hypothesis: There are no relationships between the categorical
Testing of Hypothesis
variables. If one variable is known, it does not help you predict the
value of another variable.
Alternative hypothesis: There are relationships between the
categorical variables. Knowing the value of one variable does help you
predict the value of another variable.
The Chi-square test of association works by comparing the distribution that
you observe to the distribution that you expect if there is no relationship
between the categorical variables.
For a Chi-square test, a p-value that is less than or equal to your significance
level indicates there is sufficient evidence to conclude that the observed
distribution is not the same as the expected distribution. You can conclude
that a relationship exists between the categorical variables.
A Chi-square test of independence to determine whether there is a
statistically significant association between shirt color and deaths. We need
to use this test because these variables are both categorical variables. Shirt
color can be only blue, gold, or red. Fatalities can be only dead or alive.
The problem discussed is from https://statisticsbyjim.com/hypothesis-
testing/chi-square-test-independence-example/
Eg -The color of the uniform represents each crewmember‘s work area. We
will statistically assess whether there is a connection between uniform color
and the fatality rate.
45
Non-Parametric Tests
Both p-values are less than 0.05. Reject the null hypothesis and there
is a relationship between shirt color and deaths.
3.10 SUMMARY
In statistics, nonparametric tests are methods of statistical analysis that do
not require a distribution to meet the required assumptions to be analyzed
(especially if the data is not normally distributed).
It is also referred to as distribution-free tests. Nonparametric tests serve as
an alternative to parametric tests such as T-test or ANOVA that can be
employed only if the underlying data satisfies certain criteria and
assumptions.
46
Statistical Methods And Given- χ_tab^2=9.49
Testing of Hypothesis
Q2. The PQR Company claims that the lifetime of a type of battery that it
manufactures is more than 250 hours (h). A consumer advocate wishing to
determine whether the claim is justified measures the lifetimes of 24 of the
company‘s batteries; the results are listed below. Assuming the sample to
be random, determine whether the company‘s claim is justified at the 0.05
significance level. Work the problem first by hand, supplying all the details
for the sign test
Q5. In 30 tosses of a coin the following sequence of heads (H) and tails
(T) is obtained:
HTTHTHHHTHHTTHT
HTHHTHTTHTHHTHT
(a) Determine the number of runs, V.
47
(b) Test at the 0.05 significance level whether the sequence is random. Non-Parametric Tests
Work the problem first by hand, supplying all the details of the runs test
for randomness.
https://www.statisticshowto.com/probability-and-statistics/statistics-
definitions/parametric-and-non-parametric-data/
https://www.statisticshowto.com/sign-test/
How to Perform the Wilcoxon Signed Rank Test - Statology
https://sphweb.bumc.bu.edu/otlt/mph-
modules/bs/bs704_nonparametric/bs704_nonparametric_print.html
https://www.statisticshowto.com/kruskal-wallis/
https://statistics.laerd.com/statistical-guides/one-way-anova-statistical-
guide-4.php
*****
48