0% found this document useful (0 votes)
1 views

unit-5

The document covers various probability distributions, including Binomial and Poisson distributions, along with their applications in R programming. It explains the functions available in R for generating these distributions and provides examples of calculating probabilities for specific scenarios. Additionally, it discusses the Normal distribution and its properties, including the transformation to standard scores.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

unit-5

The document covers various probability distributions, including Binomial and Poisson distributions, along with their applications in R programming. It explains the functions available in R for generating these distributions and provides examples of calculating probabilities for specific scenarios. Additionally, it discusses the Normal distribution and its properties, including the transformation to standard scores.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

STATISTICS WITH R PROGRAMMING jntuk396.blogspot.

com Unit - V

UNIT-V: Probability Distributions, Normal Distribution- Binomial Distribution- Poisson


Distributions, Other Distribution, Basic Statistics, Correlation and Covariance, T-Tests, ANOVA.

BINOMIAL DISTRIBUTION:- The binomial distribution is a discrete probability distribution. It


describes the outcome of n independent trials in an experiment. Each trial is assumed to have only two
outcomes, either success or failure. If the probability of a successful trial is p, then the probability of
having x successful outcomes in an experiment of n independent trials is as follows.

R has four in-built functions to generate binomial distribution. They are


described below.
 dbinom(x, size, prob) :- This function gives the probability density distribution at each point.
 pbinom(x, size, prob) :- This function gives the cumulative probability of an event. It is a single
value representing the probability.
 qbinom(p, size, prob) :- This function takes the probability value and gives a number whose
cumulative value matches the probability value.
 rbinom(n, size, prob) :- This function generates required number of random values of given
probability from a given sample.
Following is the description of the parameters used −
 x is a vector of numbers.
 p is a vector of probabilities.
 n is number of observations.
 size is the number of trials.
 prob is the probability of success of each trial.
Examples:
 rbinom(n=1,size=10,prob=0.4) - It generates 1 random number from the binomial
distribution basesd on number of successes of 10 independent trails.
 rbinom(n=5,size=10,prob=0.4) - It generates 5 random number from the binomial distribution
basesd on number of successes of 10 independent trails with probability 0.4.
 rbinom(n=5,size=1,prob=0.4) – Setting size to 1 turns the numbers into a bernoulli random
variable, which can take only value 1 (success) or 0 (failure).
 To visualize the binomial distribution we randomly generate 10,000
experiments, each with 10 trails and 0.3 probability.
b <- data.frame(success=rbinom(n=10000,size=10,prob=0.3))
ggplot(b,aes(x=success))+geom_bar()

Problem: Suppose a die is tossed 5 times. What is the probability of getting exactly 2 fours?
Solution: This is a binomial experiment in which the number of trials is equal to 5, the number of
successes is equal to 2, and the probability of success on a single trial is 1/6 or about 0.167.
Therefore, the binomial probability is:
b(2; 5, 0.167) = 5C2 * (0.167)2 * (0.833)3
b(2; 5, 0.167) = 0.161
R Code:
> dbinom(2, size=5, prob=0.167)
[1] 0.1612

1 U.Padma Jyothi, CSE Dept , VITB


STATISTICS WITH R PROGRAMMING Unit - V

Problem: In a restaurant seventy percent of people order for Chinese food and thirty percent for Italian food. A
group of three persons enter the restaurant. Find the probability of at least two of them ordering for Italian food.
Solution:-
The probability of ordering Chinese food is 0.7 and the probability of ordering Italian food is 0.3. Now, if
at least two of them are ordering Italian food then it implies that either two or three will order Italian
food.

Probability for two ordering Italian food,


P(X=2) = 3C2(0.3)2(0.7)1
= 3×0.09×0.7
= 0.189
Probability for all three ordering Italian food,
P(X=3) = 3C3(0.3)3(0.7)0
= 1×0.027×1
= 0.027
Hence, the probability for at least two persons ordering Italian food is,
P(X ≥ 2) = P(X=2)+P(X=3) = 0.189+0.027=0.216
R code:-
> dbinom(2,size=3,prob=0.3)+
+ dbinom(3,size=3,prob=0.3)
[1] 0.216

Cumulative Binomial Probability:- A cumulative binomial probability refers to the probability that the
binomial random variable falls within a specified range (e.g., is greater than or equal to a stated lower
limit and less than or equal to a stated upper limit).

Problem:What is the probability of obtaining 45 or fewer heads in 100 tosses of a coin?


Solution: To solve this problem, we compute 46 individual probabilities, using the binomial
formula. The sum of all these probabilities is the answer we seek.
Thus,
b(x < 45; 100, 0.5) = b(x = 0; 100, 0.5) + b(x = 1; 100, 0.5) + . . . + b(x = 45; 100, 0.5)
= 0.184
R code:-
> pbinom(45,size=100,prob=0.5)
[1] 0.1841008

Problem: Suppose there are twelve multiple choice questions in an English class quiz. Each question has five
possible answers, and only one of them is correct. Find the probability of having four or less correct answers if a
student attempts to answer every question at random.
Solution:
Since only one out of five possible answers is correct, the probability of
answering a question correctly by random is 1/5=0.2.
 To find the probability of having exactly 4 correct answers by
random attempts as follows.
> dbinom(4, size=12, prob=0.2)
[1] 0.1329
 To find the probability of having four or less correct answers by random attempts, we
apply the function dbinom with x = 0,…,4.
> dbinom(0, size=12, prob=0.2) + dbinom(1, size=12, prob=0.2) +
+ dbinom(2, size=12, prob=0.2) + dbinom(3, size=12, prob=0.2) +
+ dbinom(4, size=12, prob=0.2)
[1] 0.9274
 Alternatively, we can use the cumulative probability function for binomial
distribution pbinom.

2 U.Padma Jyothi, CSE Dept , VITB


STATISTICS WITH R PROGRAMMING Unit - V

> pbinom(4, size=12, prob=0.2)


[1] 0.92744
Answer:-The probability of four or less questions answered correctly by random in a twelve
question multiple choice quiz is 92.7%.

Problem: Fit an appropriate binomial distribution and calculate the theoretical distribution
x: 0 1 2 3 4 5
f: 2 14 20 34 22 8
Solution:
Here n = 5 , N = 100
Mean = ∑ xi fi = 2.84
∑ fi
np = 2.84
p = 2.84/5 = 0.568
q = 0.432

p(r) = 5Cr (0.568)r (0.432) 5-r , r = 0,1,2,3,4,5


Theoretical distributions are
Calculation of Expected Frequency as follows
r p(r) N* p(r)
0 0.0147 100 * 0.0147 =1.47 = 1
1 0.097 100 * 0.097 =9.7 =10
2 0.258 100 * 0.258 =25.8 =26
3 0.342 100 * 0.342 =34.2 =34
4 0.226 100 * 0.226 =22.6 =23
5 0.060 100 * 0.060 = 6 =6
Total = 100
R code:-
> x <- 0:5
> f <- c(2,14,20,34,22,8)
> df <-data.frame(x,f)
> fitbin <- fitdist(df$f,"nbinom")
> summary(fitbin)
Fitting of the distribution ' nbinom ' by maximum
likelihood
Parameters :
estimate Std. Error
size 2.192416 1.441296
mu 16.664004 4.886713
Loglikelihood: -22.387 AIC: 48.774 BIC: 48.35752
Correlation matrix:
size mu
size 1.0000000000 0.0003165092
mu 0.0003165092 1.0000000000

> plot(fitbin)

Poisson Distribution :- The Poisson distribution is the probability distribution of independent


event occurrences in an interval. If λ is the mean occurrence per interval, then the probability of
having x occurrences within a given interval is:

3 U.Padma Jyothi, CSE Dept , VITB


STATISTICS WITH R PROGRAMMING Unit - V

Examples:
1. The number of defective electric bulbs manufactured by a reputed company.
2. The number of telephone calls per minute at a switch board
3. The number of cars passing a certain point in one minute.
4. The number of printing mistakes per page in a large text.

R has four in-built functions to generate binomial distribution. They are described below.
 dpois(x, lambda, log = FALSE) :- This function gives the probability density distribution at each
point.
 ppois(q, lambda, lower.tail = TRUE, log.p = FALSE) :- This function gives the cumulative
probability of an event. It is a single value representing the probability.
 qpois(p, lambda, lower.tail = TRUE, log.p = FALSE):- This function takes the probability value
and gives a number whose cumulative value matches the probability value.
 rpois(n, lamda) :- This function generates required number of random values of given probability
from a given sample.
Following is the description of the parameters used −
 x is a vector of numbers.
 p is a vector of probabilities.
 n is number of observations.
 size is the number of trials.
 prob is the probability of success of each trial.

Problem:- If there are twelve cars crossing a bridge per minute on average, find the probability of having
seventeen or more cars crossing the bridge in a particular minute.
Solution:-
The probability of having sixteen or less cars crossing the bridge in a
particular minute is given by the function ppois.
> ppois(16, lambda=12) # lower tail
[1] 0.89871
Hence the probability of having seventeen or more cars crossing the
bridge in a minute is in the upper tail of the probability density function.
> ppois(16, lambda=12, lower=FALSE) # upper tail
[1] 0.10129
Answer:- If there are twelve cars crossing a bridge per minute on average, the probability of
having seventeen or more cars crossing the bridge in a particular minute is 10.1%.

Problem:- The average number of homes sold by the Acme Realty company is 2 homes per day. What is the
probability that exactly 3 homes will be sold tomorrow?
Solution: This is a Poisson experiment in which we know the following:
 μ = 2; since 2 homes are sold per day, on average.
 x = 3; since we want to find the likelihood that 3 homes will be
sold tomorrow.
 e = 2.71828; since e is a constant equal to approximately
2.71828.
We plug these values into the Poisson formula as follows:

4 U.Padma Jyothi, CSE Dept , VITB


STATISTICS WITH R PROGRAMMING Unit - V

P(x; μ) = (e-μ) (μx) / x!


P(3; 2) = (2.71828-2) (23) / 3!
= (0.13534) (8) / 6
= 0.180
R Code:-
> dpois(3,lambda = 2)
[1] 0.180447

Cumulative Poisson Probability:- A cumulative Poisson probability refers to the probability that the
Poisson random variable is greater than some specified lower limit and less than some specified upper
limit.
Problem:-Suppose the average number of lions seen on a 1-day safari is 5.
What is the probability that tourists will see fewer than four lions on the next
1-day safari?
Solution: This is a Poisson experiment in which we know the following:
 μ = 5; since 5 lions are seen per safari, on average.
 x = 0, 1, 2, or 3; since we want to find the likelihood that
tourists will see fewer than 4 lions; that is, we want the
probability that they will see 0, 1, 2, or 3 lions.
 e = 2.71828; since e is a constant equal to approximately
2.71828.
To solve this problem, we need to find the probability that tourists will see 0, 1, 2, or 3 lions. Thus,
we need to calculate the sum of four probabilities: P(0; 5) + P(1; 5) + P(2; 5) + P(3; 5). To compute
this sum, we use the Poisson formula:
P(x < 3, 5) = P(0; 5) + P(1; 5) + P(2; 5) + P(3; 5)
P(x < 3, 5) = [ (e-5)(50) / 0! ] + [ (e-5)(51) / 1! ] + [ (e-5)(52) / 2! ] + [ (e-5)(53) / 3! ]
P(x < 3, 5) = [ (0.006738)(1) / 1 ] + [ (0.006738)(5) / 1 ] + [ (0.006738)(25) / 2 ] +[ (0.006738)(125) / 6]
P(x < 3, 5) = [ 0.0067 ] + [ 0.03369 ] + [ 0.084224 ] + [ 0.140375 ]
P(x < 3, 5) = 0.2650
Thus, the probability of seeing at no more than 3 lions is 0.2650.
R Code:-
> ppois(3,lambda = 5)
[1] 0.2650259

Normal Distribution:- A continuous random variable X follows a normal distribution


with mean μ and variance σ2 is a statistic distribution with probability density function

, on the domain .
Standard Normal Distribution
It is the distribution that occurs when a normal random variable has a mean of zero and a standard
deviation of one.
The normal random variable of a standard normal distribution is called a standard score or a z score.
Every normal random variable X can be transformed into a z score via the following equation:
Z = (X - μ) / σ
where X is a normal random variable, μ is the mean, and σ is the standard deviation.
yielding

Standard Normal Curve:- One way of figuring out how data are
distributed is to plot them in a graph. If the data is evenly distributed,
you may come up with a bell curve. A bell curve has a small percentage
of the points on both tails and the bigger percentage on the inner part of

5 U.Padma Jyothi, CSE Dept , VITB


STATISTICS WITH R PROGRAMMING Unit - V

the curve. The shape of the standard normal distribution looks like
this:

 mean = median = mode


 symmetry about the center
 50% of values less than the mean and 50% greater than
the mean

R functions:
 dnorm(x, mean = 0, sd = 1, log = FALSE) :- This function gives the probability density distribution
at each point.
 pnorm(q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE):- This function gives the cumulative
probability of an event. It is a single value representing the probability.
 qnorm(p, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE):- This function takes the probability
value and gives a number whose cumulative value matches the probability value.
 rnorm(n, mean = 0, sd = 1) :- This function generates required number of random values of given
probability from a given sample.

Procedure to find probability using positive Z-score table


Case 1: Area between 0 Area(z)
and any z score

Case 2: Area in any tail 0.5 – Area(z)

Case 3: Area between two |Area(z2)-Area(z1)|


z-scores on the same side
of the mean

6 U.Padma Jyothi, CSE Dept , VITB


STATISTICS WITH R PROGRAMMING Unit - V

Case 4: Area between two Area(z1)+Area(z2)


z-scores on the opposite
side of the mean

Case 5: Area to the left of 0.5+ Area(z)


a positive Z score

Case 6: Area to the right 0.5+ Area(z)


of a negative Z score

7 U.Padma Jyothi, CSE Dept , VITB


STATISTICS WITH R PROGRAMMING Unit - V

Problem:-X is a normally normally distributed variable with mean μ = 30 and standard deviation σ = 4. Find
a) P(x < 40)
b) P(x > 21)
c) P(30 < x < 35)
Solution:
a) For x = 40, then
z = x − µ /σ
⇒z = (40 – 30) / 4
= 2.5 (=z1 say)
Hence P(x < 40) = P(z < 2.5)
= 0.5+A(z1) = 0.9938
b) For x = 21,
z = x − µ /σ
⇒z = (21 - 30) / 4
= -2.25 (= -z1 say)
Hence P(x > 21) = P(z > -2.25)
= 0.5- A(z1) = 0.9878
c) For x = 30
z = x − µ /σ ⇒,
z = (30 - 30) / 4 = 0 and
for x = 35,
z = x − µ /σ
⇒ z = (35 - 30) / 4
= 1.25
Hence P(30 < x < 35) = P(0 < z < 1.25)
= [area to the left of z = 1.25] - [area to the left of 0]
= 0.8944 - 0.5 = 0.3944

Problem:-The length of life of an instrument produced by a machine has a normal ditribution with a mean of 12
months and standard deviation of 2 months. Find the probability that an instrument
produced by this machine will last.
a) less than 7 months.
b) between 7 and 12 months.
Solution:
a) P(x < 7)
for x = 7
z = x − µ /σ
⇒z = (7 – 12) / 2
= -2.5 (=z1 say)
Hence P(x < 7) = P(z < -2.5)
= 0.0062
b) P(7 < x < 12)
For x=12
z = x − µ /σ
⇒z = (12 – 12) / 2
= 0 (=z1 say)
Hence P(7 < x < 12) = P(-2.5 < z < 0)
= 0.4938

Problem:-The Tahoe Natural Coffee Shop morning customer load follows a normal
distribution with mean 45 and standard deviation 8. Determine the probability that the
number of customers tomorrow will be less than 42.

8 U.Padma Jyothi, CSE Dept , VITB


STATISTICS WITH R PROGRAMMING Unit - V

Solution:-
We first convert the raw score to a z-score. We have
z = x − µ /σ
⇒z =(42−45)/8=−0.375
Next, we use the table to find the probability. The table gives 0.3520. (We have rounded the raw score
to -0.38).
We can conclude that
P(x<42)=P(x<-0.38)
=0.352
That is there is about a 35% chance that there will be fewer than 42 customers tomorrow.

Example:
> x <- c(92,117,109,85,117,107,82,83,119,113,101,106,101,84,126,69,82,79,84,100,104,111,109,92,93,107,
81,118,81,133,111,82,120,103,115,89,74,110,83,110,96,102,108,110,140,106,111,98,98,99,74,101,107,104,
128,87,95,109,104,91,83,98,99,103,126,123,85,98,93,100)

> h<-hist(x,col = "blue")


> m <- mean(x)
> s <- sd(x)
> xf <- seq(min(x),max(x),length=70)
> dis <- dnorm(xf,m,s)
> dis <- dis*diff(h$mids[1:2]*length(x))
> lines(xf,dis,col="red",lwd=3)

Problem:-Assume that the test scores of a college entrance exam fits a normal distribution. Furthermore,
the mean test score is 72, and the standard deviation is 15.2. What is the percentage of students scoring
84 or more in the exam?
Solution:-
We apply the function pnorm of the normal distribution with mean 72 and standard deviation
15.2. Since we are looking for the percentage of students scoring higher than 84, we are interested in
the upper tail of the normal distribution.
> pnorm(84, mean=72, sd=15.2, lower.tail=FALSE)
[1] 0.21492

Correlation:- A correlation is a relationship between two variables.


Typically, we take x to be the independent variable. We take y to be the
dependent variable. Data is represented by a collection of ordered pairs
(x,y).

This will always be a number between -1 and 1 (inclusive).


• If r is close to 1, we say that the variables are positively correlated. This means there is likely a strong
linear relationship between the two variables, with a positive slope.
•If r is close to -1, we say that the variables are negatively correlated. This means there is likely a strong
linear relationship between the two variables, with a negative slope.
•If r is close to 0, we say that the variables are not correlated. This means that there is likely no linear
relationship between the two variables, however, the variables may still be related in some other way.
To run a correlation test we type:
> cor.test(var1, var2, method = "method")
The default method is "pearson" so you may omit this if that is what you want. If you type "kendall" or
"spearman" then you will get the appropriate significance test.

9 U.Padma Jyothi, CSE Dept , VITB


STATISTICS WITH R PROGRAMMING Unit - V

Problem:- The local ice cream shop keeps track of how much ice cream they sell versus the temperature
on that day, here are their figures for the last 12 days:

Temperature 14.2 16.4 11.9 15.2 18.5 22.1 19.4 25.1 23.4 18.1 22.6 17.2
oC
Ice cream $215 $325 $185 $332 $406 $522 $412 $614 $544 $421 $445 $408
sales

Solution:-

Formula for correlation coefficient:

R Code:-
> temp <- c(14.2,16.4,11.9,15.2,18.5,22.1,19.4,25.1,23.4,18.1,22.6,17.2)
> sales <- c(215,325,185,332,406,522,412,614,544,421,445,408)
> corr_coeff <- cor(temp,sales)
> corr_coeff
[1] 0.9575066
> cov(temp,sales)
[1] 484.0932
#Adds a line of best fit to your scatter plot
> plot(temp, sales, pch=16,col="red")
>abline(lm(sales~temp),col="blue")

10 U.Padma Jyothi, CSE Dept , VITB


STATISTICS WITH R PROGRAMMING Unit - V

T-test for single mean:- One-sample t-test is used to compare the mean of a population to a
specified theoretical mean (μ).
Let X represents a set of values with size n, with mean μ and with standard deviation S.
The comparison of the observed mean (μ) of the population to a theoretical value μ is performed with
the formula below:
x  0
t
s n
To evaluate whether the difference is statistically significant, you first have to read in t test
table the critical value of Student’s t distribution corresponding to the significance level alpha of your
choice (5%). The degrees of freedom (df) used in this test are: df = n−1

Problem:-: A professor wants to know if her introductory statistics class has a good grasp of basic
math. Six students are chosen at random from the class and given a math proficiency test. The professor
wants the class to be able to score above 70 on the test. The six students get scores of 62, 92, 75, 68, 83,
and 95. Can the professor have 90 percent confidence that the mean score for the class on the test would
be above 70?
Solution:-
Null hypothesis: H 0: μ = 70
Alternative hypothesis: H a : μ > 70
First, compute the sample mean and standard deviation:
62  92  75  68  83  95
x
6
475
  13.17
6
 Null Hypothesis H0: The sample meet upto standard i.e
µ >70 hours
 Alternative Hypothesis HA: µ not greater than 70,
 Level of Siginificance:   0.05
x  0
 The test statistic is t 
s n

11 U.Padma Jyothi, CSE Dept , VITB


STATISTICS WITH R PROGRAMMING Unit - V

79.71  70 9.17
t= 
13.17 6 5.38
= 1.71(calculate value of t)
To test the hypothesis, the computed t‐value of 1.71 will be compared to the critical value in the t‐table
with 5 df is 1.67, the calculate of t is more than table value of t, so null hypothsis is rejected.
R code:-
> t.test(x,alternative="two.sided",mu=70)

One Sample t-test

data: x
t = 1.7053, df = 5, p-value = 0.1489
alternative hypothesis: true mean is not equal to 70
95 percent confidence interval:
65.34888 92.98446
sample estimates:
mean of x
79.16667

Problem:-: A Sample of 26 bulbs gives a mean life of 990 hours with S.D of 20 hours. The manufacurer
claims that the mean life of bulbs is 1000 hours. Is sample meet upto the standard.
Solution: Here n = 26,
Sample mean x̅ = 990 hours
S.D s = 20 hours
Population mean µ = 1000 hours
Df = n-1 = 26-1 = 25
 Null Hypothesis H0: The sample meet upto standard i.e µ = 1000 hours
 Alternative Hypothesis HA: µ not equal to 1000,
 Level of Siginificance:   0.05
 the test statistic is
x  0
t
s n
t = 990-1000/20/√26
= 2.5 (calculate value of t)
Table value of t with 25 df is 1.708
The calculate value of t is more than table value of t, so null hypotheis is rejected at 5% level.

Paired comparisons( Paired t-test ):- Sometimes data comes from non independent samples. An
example might be testing "before and after" of cosmetics or consumer products. We could use a single
random sample and do "before and after" tests on each person. A hypothesis test based on these data
would be called a paired comparisons test. Since the observations come in pairs, we can study the
difference, d, between the samples. The difference between each pair of measurements is called di.

Test statistic:- With a population of n pairs of measurements, forming a simple random sample from a
normally distributed population, the mean of the difference, d , is tested using the following
implementation of t.
d 
t
S/ n

Problem :- The blood pressure of 5 women before and after intake of a certain drug are
given below: Test whether there is significant change in blood pressure at 1% level of
significance.
Before 110 120 125 132 125
After 120 118 125 136 121

12 U.Padma Jyothi, CSE Dept , VITB


STATISTICS WITH R PROGRAMMING Unit - V

Solution: Let µ be the mean of population of differences.


 Null Hypothesis H0: µ1= µ2 i,e, no change in B.P.
 Alternative Hypothesis HA: µ1≠ µ2 i,e, no change in B.P.
 Level of Siginificance:   0.01
 Computation : Differences di’s (before and after drug) are
-10,2,0,14,4
 10  2  0  4  4
d
5
8
  1.6
5

1 n
S2  
n  1 i 1
(d i  d ) 2

1 5
  (d i  d ) 2
4 i 1
1
 [(10  1.6) 2  (2  1.6) 2  (0  1.6) 2  (4  1.6) 2  (4  1.6) 2 ]
4
123.20
  30.8
4
S  30.8  5.55
 Test statistic: The test statistic is t which is calculated as
d 
t
S/ n
 1.16
  0.645
5.55 / 5
Calculated |t| value is 0.645
Tabulates t0.01 with 5-1 = 4 degrees of freedom is 3.747.
Since calculated t < t0.01 , we accept the Null hypothesis and conclude that there is no significant
change in blood pressure.
R code:-
> x <- c(110,120,125,132,125)
> y <- c(120,118,125,136,121)
> t.test(x,y,paired=TRUE)

Paired t-test

data: x and y
t = -0.64466, df = 4,
p-value = 0.5543
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-8.490956 5.290956
sample estimates:
mean of the differences
-1.6

T-test for difference of two population means :-


With a two-sample t-test, we compare the population means to each other and again look at the
difference. We expect that x  y would be close to μ1 – μ2. The test statistic will use both sample means,
sample standard deviations, and sample sizes for the test.
A two-sample t-test follows
 Write the null and alternative hypotheses.

13 U.Padma Jyothi, CSE Dept , VITB


STATISTICS WITH R PROGRAMMING Unit - V

 State the level of significance and find the critical value. The critical value, from the
student’s t-distribution, has the lesser of n1-1 and n2 -1 degrees of freedom.
 Compute the test statistic.
 Compare the test statistic to the critical value and state a conclusion.

x y
t ~ t n1  n 2 - 2
1 1
S 
n1 n2
where

S2 
n1 s1  n2 s2
2 2
or S2 
(x i  x ) 2   ( yi  y ) 2
n1  n2  2 n1  n2  2

Problem:- Two horses A and B were tested according to the time (in seconds) to run a particular track
with the following results.
Horse A 28 30 32 33 33 29 34
Horse B 29 30 30 24 27 29
Test whether the two horses have the same running capacity.

Solution:- Given n1=7 and n2 = 6


We first compute the same means and standard deviations.
x  M ean of the first sample
1 1
 (28  30  32  33  33  29  34)  (219)  31.286
7 7
y  M ean of the second sample
1 1
 (29  30  30  24  27  29)  (169)  28.16
6 6
x xx ( x  x )2 y yy (y y )2
28 -3.286 10.8 29 0.84 0.7056
30 -1286 1.6538 30 1.84 3.3856
32 0.714 0.51 30 1.84 3.3856
33 1.714 2.94 24 -4.16 17.3056
33 1.714 2.94 27 -1.16 1.3456
29 -2.286 5.226 29 0.84 0.7056
34 2.714 7.366
219 31.4359 169 26.8336

Now, S2 
 ( xi  x )   ( y i  y )
2 2

n1  n2  2
(31.4358  26.8336)
  5.23
762
Therefore S  5.23  2.3

 Null Hypothesis H0: µ1= µ2


 Alternative Hypothesis HA: µ1≠ µ2
 Level of Siginificance:   0.05

14 U.Padma Jyothi, CSE Dept , VITB


STATISTICS WITH R PROGRAMMING Unit - V

x y 31.286 - 28.16
 Computation : t    2.443
1 1 1 1
S  (2.3) 
n1 n2 7 6
Tabulates t0.05 with 7+6-2 = 11 degrees of freedom at 5% level of significance is 2.2
Since calculated t > t0.05 , we reject the Null hypothesis and conclude that there is no significant change in
blood pressure.

ANOVA:- (ANALYSIS OF VARIANCE)


When we have only two samples we can use the t-test to compare the means of the samples
but it might become unreliable in case of more than two samples. If we only compare two means, then
the t-test (independent samples) will give the same results as the ANOVA. Anova is performed with F-
test.

Null hypothesis H0: There are no differences among the mean values of the groups being compared
(i.e., the group means are all equal)–
H0: µ1 = µ2 = µ3 = …= µk
Alternative hypothesis H1: (Conclusion if H0 rejected)?
Not all group means are equal (i.e., at least one group mean is different from the rest).

ANOVA one-way classification:-


Step 1: Total number of all observations
T   X ij
i j
Step 2: Correlation factor
T2 T2
cf  
N rs
Step 3:Total sum of squares
 X
2
TSS = S2T  ij  cf
i j
Step 4: Treatment sum of squares
2
Tj
TrSS = S2Tr   cf
N
Step 5: Error sum of squares
ESS = S2E = TSS-TrSS
Source of variable d.f Sum of Squares TSS F-Test
Treatment k-1 Tj
2
ST
2
S 2Tr
(between sample) S Tr 
2  cf N S2Tr  r
k 1
Fcal  2
S E
Error n-k S2E = TSS-TrSS S 2E
S 2E 
nk

15 U.Padma Jyothi, CSE Dept , VITB

You might also like