0% found this document useful (0 votes)

1 views

unit-5

The document covers various probability distributions, including Binomial and Poisson distributions, along with their applications in R programming. It explains the functions available in R for generating these distributions and provides examples of calculating probabilities for specific scenarios. Additionally, it discusses the Normal distribution and its properties, including the transformation to standard scores.

Uploaded by

Dr.Kusuma Kumari B.M

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views

unit-5

Uploaded by

Dr.Kusuma Kumari B.M

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

STATISTICS WITH R PROGRAMMING jntuk396.blogspot.

com Unit - V

UNIT-V: Probability Distributions, Normal Distribution- Binomial Distribution- Poisson

Distributions, Other Distribution, Basic Statistics, Correlation and Covariance, T-Tests, ANOVA.

BINOMIAL DISTRIBUTION:- The binomial distribution is a discrete probability distribution. It

describes the outcome of n independent trials in an experiment. Each trial is assumed to have only two
outcomes, either success or failure. If the probability of a successful trial is p, then the probability of
having x successful outcomes in an experiment of n independent trials is as follows.

R has four in-built functions to generate binomial distribution. They are

described below.
 dbinom(x, size, prob) :- This function gives the probability density distribution at each point.
 pbinom(x, size, prob) :- This function gives the cumulative probability of an event. It is a single
value representing the probability.
 qbinom(p, size, prob) :- This function takes the probability value and gives a number whose
cumulative value matches the probability value.
 rbinom(n, size, prob) :- This function generates required number of random values of given
probability from a given sample.
Following is the description of the parameters used −
 x is a vector of numbers.
 p is a vector of probabilities.
 n is number of observations.
 size is the number of trials.
 prob is the probability of success of each trial.
Examples:
 rbinom(n=1,size=10,prob=0.4) - It generates 1 random number from the binomial
distribution basesd on number of successes of 10 independent trails.
 rbinom(n=5,size=10,prob=0.4) - It generates 5 random number from the binomial distribution
basesd on number of successes of 10 independent trails with probability 0.4.
 rbinom(n=5,size=1,prob=0.4) – Setting size to 1 turns the numbers into a bernoulli random
variable, which can take only value 1 (success) or 0 (failure).
 To visualize the binomial distribution we randomly generate 10,000
experiments, each with 10 trails and 0.3 probability.
b <- data.frame(success=rbinom(n=10000,size=10,prob=0.3))
ggplot(b,aes(x=success))+geom_bar()

Problem: Suppose a die is tossed 5 times. What is the probability of getting exactly 2 fours?
Solution: This is a binomial experiment in which the number of trials is equal to 5, the number of
successes is equal to 2, and the probability of success on a single trial is 1/6 or about 0.167.
Therefore, the binomial probability is:
b(2; 5, 0.167) = 5C2 * (0.167)2 * (0.833)3
b(2; 5, 0.167) = 0.161
R Code:
> dbinom(2, size=5, prob=0.167)
[1] 0.1612

1 U.Padma Jyothi, CSE Dept , VITB

STATISTICS WITH R PROGRAMMING Unit - V

Problem: In a restaurant seventy percent of people order for Chinese food and thirty percent for Italian food. A
group of three persons enter the restaurant. Find the probability of at least two of them ordering for Italian food.
Solution:-
The probability of ordering Chinese food is 0.7 and the probability of ordering Italian food is 0.3. Now, if
at least two of them are ordering Italian food then it implies that either two or three will order Italian
food.

Probability for two ordering Italian food,

P(X=2) = 3C2(0.3)2(0.7)1
= 3×0.09×0.7
= 0.189
Probability for all three ordering Italian food,
P(X=3) = 3C3(0.3)3(0.7)0
= 1×0.027×1
= 0.027
Hence, the probability for at least two persons ordering Italian food is,
P(X ≥ 2) = P(X=2)+P(X=3) = 0.189+0.027=0.216
R code:-
> dbinom(2,size=3,prob=0.3)+
+ dbinom(3,size=3,prob=0.3)
[1] 0.216

Cumulative Binomial Probability:- A cumulative binomial probability refers to the probability that the
binomial random variable falls within a specified range (e.g., is greater than or equal to a stated lower
limit and less than or equal to a stated upper limit).

Problem:What is the probability of obtaining 45 or fewer heads in 100 tosses of a coin?

Solution: To solve this problem, we compute 46 individual probabilities, using the binomial
formula. The sum of all these probabilities is the answer we seek.
Thus,
b(x < 45; 100, 0.5) = b(x = 0; 100, 0.5) + b(x = 1; 100, 0.5) + . . . + b(x = 45; 100, 0.5)
= 0.184
R code:-
> pbinom(45,size=100,prob=0.5)
[1] 0.1841008

Problem: Suppose there are twelve multiple choice questions in an English class quiz. Each question has five
possible answers, and only one of them is correct. Find the probability of having four or less correct answers if a
student attempts to answer every question at random.
Solution:
Since only one out of five possible answers is correct, the probability of
answering a question correctly by random is 1/5=0.2.
 To find the probability of having exactly 4 correct answers by
random attempts as follows.
> dbinom(4, size=12, prob=0.2)
[1] 0.1329
 To find the probability of having four or less correct answers by random attempts, we
apply the function dbinom with x = 0,…,4.
> dbinom(0, size=12, prob=0.2) + dbinom(1, size=12, prob=0.2) +
+ dbinom(2, size=12, prob=0.2) + dbinom(3, size=12, prob=0.2) +
+ dbinom(4, size=12, prob=0.2)
[1] 0.9274
 Alternatively, we can use the cumulative probability function for binomial
distribution pbinom.

2 U.Padma Jyothi, CSE Dept , VITB

STATISTICS WITH R PROGRAMMING Unit - V

> pbinom(4, size=12, prob=0.2)

[1] 0.92744
Answer:-The probability of four or less questions answered correctly by random in a twelve
question multiple choice quiz is 92.7%.

Problem: Fit an appropriate binomial distribution and calculate the theoretical distribution
x: 0 1 2 3 4 5
f: 2 14 20 34 22 8
Solution:
Here n = 5 , N = 100
Mean = ∑ xi fi = 2.84
∑ fi
np = 2.84
p = 2.84/5 = 0.568
q = 0.432

p(r) = 5Cr (0.568)r (0.432) 5-r , r = 0,1,2,3,4,5

Theoretical distributions are
Calculation of Expected Frequency as follows
r p(r) N* p(r)
0 0.0147 100 * 0.0147 =1.47 = 1
1 0.097 100 * 0.097 =9.7 =10
2 0.258 100 * 0.258 =25.8 =26
3 0.342 100 * 0.342 =34.2 =34
4 0.226 100 * 0.226 =22.6 =23
5 0.060 100 * 0.060 = 6 =6
Total = 100
R code:-
> x <- 0:5
> f <- c(2,14,20,34,22,8)
> df <-data.frame(x,f)
> fitbin <- fitdist(df$f,"nbinom")
> summary(fitbin)
Fitting of the distribution ' nbinom ' by maximum
likelihood
Parameters :
estimate Std. Error
size 2.192416 1.441296
mu 16.664004 4.886713
Loglikelihood: -22.387 AIC: 48.774 BIC: 48.35752
Correlation matrix:
size mu
size 1.0000000000 0.0003165092
mu 0.0003165092 1.0000000000

> plot(fitbin)

Poisson Distribution :- The Poisson distribution is the probability distribution of independent

event occurrences in an interval. If λ is the mean occurrence per interval, then the probability of
having x occurrences within a given interval is:

3 U.Padma Jyothi, CSE Dept , VITB

STATISTICS WITH R PROGRAMMING Unit - V

Examples:
1. The number of defective electric bulbs manufactured by a reputed company.
2. The number of telephone calls per minute at a switch board
3. The number of cars passing a certain point in one minute.
4. The number of printing mistakes per page in a large text.

R has four in-built functions to generate binomial distribution. They are described below.
 dpois(x, lambda, log = FALSE) :- This function gives the probability density distribution at each
point.
 ppois(q, lambda, lower.tail = TRUE, log.p = FALSE) :- This function gives the cumulative
probability of an event. It is a single value representing the probability.
 qpois(p, lambda, lower.tail = TRUE, log.p = FALSE):- This function takes the probability value
and gives a number whose cumulative value matches the probability value.
 rpois(n, lamda) :- This function generates required number of random values of given probability
from a given sample.
Following is the description of the parameters used −
 x is a vector of numbers.
 p is a vector of probabilities.
 n is number of observations.
 size is the number of trials.
 prob is the probability of success of each trial.

Problem:- If there are twelve cars crossing a bridge per minute on average, find the probability of having
seventeen or more cars crossing the bridge in a particular minute.
Solution:-
The probability of having sixteen or less cars crossing the bridge in a
particular minute is given by the function ppois.
> ppois(16, lambda=12) # lower tail
[1] 0.89871
Hence the probability of having seventeen or more cars crossing the
bridge in a minute is in the upper tail of the probability density function.
> ppois(16, lambda=12, lower=FALSE) # upper tail
[1] 0.10129
Answer:- If there are twelve cars crossing a bridge per minute on average, the probability of
having seventeen or more cars crossing the bridge in a particular minute is 10.1%.

Problem:- The average number of homes sold by the Acme Realty company is 2 homes per day. What is the
probability that exactly 3 homes will be sold tomorrow?
Solution: This is a Poisson experiment in which we know the following:
 μ = 2; since 2 homes are sold per day, on average.
 x = 3; since we want to find the likelihood that 3 homes will be
sold tomorrow.
 e = 2.71828; since e is a constant equal to approximately
2.71828.
We plug these values into the Poisson formula as follows:

4 U.Padma Jyothi, CSE Dept , VITB

STATISTICS WITH R PROGRAMMING Unit - V

P(x; μ) = (e-μ) (μx) / x!

P(3; 2) = (2.71828-2) (23) / 3!
= (0.13534) (8) / 6
= 0.180
R Code:-
> dpois(3,lambda = 2)
[1] 0.180447

Cumulative Poisson Probability:- A cumulative Poisson probability refers to the probability that the
Poisson random variable is greater than some specified lower limit and less than some specified upper
limit.
Problem:-Suppose the average number of lions seen on a 1-day safari is 5.
What is the probability that tourists will see fewer than four lions on the next
1-day safari?
Solution: This is a Poisson experiment in which we know the following:
 μ = 5; since 5 lions are seen per safari, on average.
 x = 0, 1, 2, or 3; since we want to find the likelihood that
tourists will see fewer than 4 lions; that is, we want the
probability that they will see 0, 1, 2, or 3 lions.
 e = 2.71828; since e is a constant equal to approximately
2.71828.
To solve this problem, we need to find the probability that tourists will see 0, 1, 2, or 3 lions. Thus,
we need to calculate the sum of four probabilities: P(0; 5) + P(1; 5) + P(2; 5) + P(3; 5). To compute
this sum, we use the Poisson formula:
P(x < 3, 5) = P(0; 5) + P(1; 5) + P(2; 5) + P(3; 5)
P(x < 3, 5) = [ (e-5)(50) / 0! ] + [ (e-5)(51) / 1! ] + [ (e-5)(52) / 2! ] + [ (e-5)(53) / 3! ]
P(x < 3, 5) = [ (0.006738)(1) / 1 ] + [ (0.006738)(5) / 1 ] + [ (0.006738)(25) / 2 ] +[ (0.006738)(125) / 6]
P(x < 3, 5) = [ 0.0067 ] + [ 0.03369 ] + [ 0.084224 ] + [ 0.140375 ]
P(x < 3, 5) = 0.2650
Thus, the probability of seeing at no more than 3 lions is 0.2650.
R Code:-
> ppois(3,lambda = 5)
[1] 0.2650259

Normal Distribution:- A continuous random variable X follows a normal distribution

with mean μ and variance σ2 is a statistic distribution with probability density function

, on the domain .
Standard Normal Distribution
It is the distribution that occurs when a normal random variable has a mean of zero and a standard
deviation of one.
The normal random variable of a standard normal distribution is called a standard score or a z score.
Every normal random variable X can be transformed into a z score via the following equation:
Z = (X - μ) / σ
where X is a normal random variable, μ is the mean, and σ is the standard deviation.
yielding

Standard Normal Curve:- One way of figuring out how data are
distributed is to plot them in a graph. If the data is evenly distributed,
you may come up with a bell curve. A bell curve has a small percentage
of the points on both tails and the bigger percentage on the inner part of

5 U.Padma Jyothi, CSE Dept , VITB

STATISTICS WITH R PROGRAMMING Unit - V

the curve. The shape of the standard normal distribution looks like
this:

 mean = median = mode

 symmetry about the center
 50% of values less than the mean and 50% greater than
the mean

R functions:
 dnorm(x, mean = 0, sd = 1, log = FALSE) :- This function gives the probability density distribution
at each point.
 pnorm(q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE):- This function gives the cumulative
probability of an event. It is a single value representing the probability.
 qnorm(p, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE):- This function takes the probability
value and gives a number whose cumulative value matches the probability value.
 rnorm(n, mean = 0, sd = 1) :- This function generates required number of random values of given
probability from a given sample.

Procedure to find probability using positive Z-score table

Case 1: Area between 0 Area(z)
and any z score

Case 2: Area in any tail 0.5 – Area(z)

Case 3: Area between two |Area(z2)-Area(z1)|

z-scores on the same side
of the mean

6 U.Padma Jyothi, CSE Dept , VITB

STATISTICS WITH R PROGRAMMING Unit - V

Case 4: Area between two Area(z1)+Area(z2)

z-scores on the opposite
side of the mean

Case 5: Area to the left of 0.5+ Area(z)

a positive Z score

Case 6: Area to the right 0.5+ Area(z)

of a negative Z score

7 U.Padma Jyothi, CSE Dept , VITB

STATISTICS WITH R PROGRAMMING Unit - V

Problem:-X is a normally normally distributed variable with mean μ = 30 and standard deviation σ = 4. Find
a) P(x < 40)
b) P(x > 21)
c) P(30 < x < 35)
Solution:
a) For x = 40, then
z = x − µ /σ
⇒z = (40 – 30) / 4
= 2.5 (=z1 say)
Hence P(x < 40) = P(z < 2.5)
= 0.5+A(z1) = 0.9938
b) For x = 21,
z = x − µ /σ
⇒z = (21 - 30) / 4
= -2.25 (= -z1 say)
Hence P(x > 21) = P(z > -2.25)
= 0.5- A(z1) = 0.9878
c) For x = 30
z = x − µ /σ ⇒,
z = (30 - 30) / 4 = 0 and
for x = 35,
z = x − µ /σ
⇒ z = (35 - 30) / 4
= 1.25
Hence P(30 < x < 35) = P(0 < z < 1.25)
= [area to the left of z = 1.25] - [area to the left of 0]
= 0.8944 - 0.5 = 0.3944

Problem:-The length of life of an instrument produced by a machine has a normal ditribution with a mean of 12
months and standard deviation of 2 months. Find the probability that an instrument
produced by this machine will last.
a) less than 7 months.
b) between 7 and 12 months.
Solution:
a) P(x < 7)
for x = 7
z = x − µ /σ
⇒z = (7 – 12) / 2
= -2.5 (=z1 say)
Hence P(x < 7) = P(z < -2.5)
= 0.0062
b) P(7 < x < 12)
For x=12
z = x − µ /σ
⇒z = (12 – 12) / 2
= 0 (=z1 say)
Hence P(7 < x < 12) = P(-2.5 < z < 0)
= 0.4938

Problem:-The Tahoe Natural Coffee Shop morning customer load follows a normal
distribution with mean 45 and standard deviation 8. Determine the probability that the
number of customers tomorrow will be less than 42.

8 U.Padma Jyothi, CSE Dept , VITB

STATISTICS WITH R PROGRAMMING Unit - V

Solution:-
We first convert the raw score to a z-score. We have
z = x − µ /σ
⇒z =(42−45)/8=−0.375
Next, we use the table to find the probability. The table gives 0.3520. (We have rounded the raw score
to -0.38).
We can conclude that
P(x<42)=P(x<-0.38)
=0.352
That is there is about a 35% chance that there will be fewer than 42 customers tomorrow.

Example:
> x <- c(92,117,109,85,117,107,82,83,119,113,101,106,101,84,126,69,82,79,84,100,104,111,109,92,93,107,
81,118,81,133,111,82,120,103,115,89,74,110,83,110,96,102,108,110,140,106,111,98,98,99,74,101,107,104,
128,87,95,109,104,91,83,98,99,103,126,123,85,98,93,100)

> h<-hist(x,col = "blue")

> m <- mean(x)
> s <- sd(x)
> xf <- seq(min(x),max(x),length=70)
> dis <- dnorm(xf,m,s)
> dis <- dis*diff(h$mids[1:2]*length(x))
> lines(xf,dis,col="red",lwd=3)

Problem:-Assume that the test scores of a college entrance exam fits a normal distribution. Furthermore,
the mean test score is 72, and the standard deviation is 15.2. What is the percentage of students scoring
84 or more in the exam?
Solution:-
We apply the function pnorm of the normal distribution with mean 72 and standard deviation
15.2. Since we are looking for the percentage of students scoring higher than 84, we are interested in
the upper tail of the normal distribution.
> pnorm(84, mean=72, sd=15.2, lower.tail=FALSE)
[1] 0.21492

Correlation:- A correlation is a relationship between two variables.

Typically, we take x to be the independent variable. We take y to be the
dependent variable. Data is represented by a collection of ordered pairs
(x,y).

This will always be a number between -1 and 1 (inclusive).

• If r is close to 1, we say that the variables are positively correlated. This means there is likely a strong
linear relationship between the two variables, with a positive slope.
•If r is close to -1, we say that the variables are negatively correlated. This means there is likely a strong
linear relationship between the two variables, with a negative slope.
•If r is close to 0, we say that the variables are not correlated. This means that there is likely no linear
relationship between the two variables, however, the variables may still be related in some other way.
To run a correlation test we type:
> cor.test(var1, var2, method = "method")
The default method is "pearson" so you may omit this if that is what you want. If you type "kendall" or
"spearman" then you will get the appropriate significance test.

9 U.Padma Jyothi, CSE Dept , VITB

STATISTICS WITH R PROGRAMMING Unit - V

Problem:- The local ice cream shop keeps track of how much ice cream they sell versus the temperature
on that day, here are their figures for the last 12 days:

Temperature 14.2 16.4 11.9 15.2 18.5 22.1 19.4 25.1 23.4 18.1 22.6 17.2
oC
Ice cream $215 $325 $185 $332 $406 $522 $412 $614 $544 $421 $445 $408
sales

Solution:-

Formula for correlation coefficient:

R Code:-
> temp <- c(14.2,16.4,11.9,15.2,18.5,22.1,19.4,25.1,23.4,18.1,22.6,17.2)
> sales <- c(215,325,185,332,406,522,412,614,544,421,445,408)
> corr_coeff <- cor(temp,sales)
> corr_coeff
[1] 0.9575066
> cov(temp,sales)
[1] 484.0932
#Adds a line of best fit to your scatter plot
> plot(temp, sales, pch=16,col="red")
>abline(lm(sales~temp),col="blue")

10 U.Padma Jyothi, CSE Dept , VITB

STATISTICS WITH R PROGRAMMING Unit - V

T-test for single mean:- One-sample t-test is used to compare the mean of a population to a
specified theoretical mean (μ).
Let X represents a set of values with size n, with mean μ and with standard deviation S.
The comparison of the observed mean (μ) of the population to a theoretical value μ is performed with
the formula below:
x  0
t
s n
To evaluate whether the difference is statistically significant, you first have to read in t test
table the critical value of Student’s t distribution corresponding to the significance level alpha of your
choice (5%). The degrees of freedom (df) used in this test are: df = n−1

Problem:-: A professor wants to know if her introductory statistics class has a good grasp of basic
math. Six students are chosen at random from the class and given a math proficiency test. The professor
wants the class to be able to score above 70 on the test. The six students get scores of 62, 92, 75, 68, 83,
and 95. Can the professor have 90 percent confidence that the mean score for the class on the test would
be above 70?
Solution:-
Null hypothesis: H 0: μ = 70
Alternative hypothesis: H a : μ > 70
First, compute the sample mean and standard deviation:
62  92  75  68  83  95
x
6
475
  13.17
6
 Null Hypothesis H0: The sample meet upto standard i.e
µ >70 hours
 Alternative Hypothesis HA: µ not greater than 70,
 Level of Siginificance:   0.05
x  0
 The test statistic is t 
s n

11 U.Padma Jyothi, CSE Dept , VITB

STATISTICS WITH R PROGRAMMING Unit - V

79.71  70 9.17
t= 
13.17 6 5.38
= 1.71(calculate value of t)
To test the hypothesis, the computed t‐value of 1.71 will be compared to the critical value in the t‐table
with 5 df is 1.67, the calculate of t is more than table value of t, so null hypothsis is rejected.
R code:-
> t.test(x,alternative="two.sided",mu=70)

One Sample t-test

data: x
t = 1.7053, df = 5, p-value = 0.1489
alternative hypothesis: true mean is not equal to 70
95 percent confidence interval:
65.34888 92.98446
sample estimates:
mean of x
79.16667

Problem:-: A Sample of 26 bulbs gives a mean life of 990 hours with S.D of 20 hours. The manufacurer
claims that the mean life of bulbs is 1000 hours. Is sample meet upto the standard.
Solution: Here n = 26,
Sample mean x̅ = 990 hours
S.D s = 20 hours
Population mean µ = 1000 hours
Df = n-1 = 26-1 = 25
 Null Hypothesis H0: The sample meet upto standard i.e µ = 1000 hours
 Alternative Hypothesis HA: µ not equal to 1000,
 Level of Siginificance:   0.05
 the test statistic is
x  0
t
s n
t = 990-1000/20/√26
= 2.5 (calculate value of t)
Table value of t with 25 df is 1.708
The calculate value of t is more than table value of t, so null hypotheis is rejected at 5% level.

Paired comparisons( Paired t-test ):- Sometimes data comes from non independent samples. An
example might be testing "before and after" of cosmetics or consumer products. We could use a single
random sample and do "before and after" tests on each person. A hypothesis test based on these data
would be called a paired comparisons test. Since the observations come in pairs, we can study the
difference, d, between the samples. The difference between each pair of measurements is called di.

Test statistic:- With a population of n pairs of measurements, forming a simple random sample from a
normally distributed population, the mean of the difference, d , is tested using the following
implementation of t.
d 
t
S/ n

Problem :- The blood pressure of 5 women before and after intake of a certain drug are
given below: Test whether there is significant change in blood pressure at 1% level of
significance.
Before 110 120 125 132 125
After 120 118 125 136 121

12 U.Padma Jyothi, CSE Dept , VITB

STATISTICS WITH R PROGRAMMING Unit - V

Solution: Let µ be the mean of population of differences.

 Null Hypothesis H0: µ1= µ2 i,e, no change in B.P.
 Alternative Hypothesis HA: µ1≠ µ2 i,e, no change in B.P.
 Level of Siginificance:   0.01
 Computation : Differences di’s (before and after drug) are
-10,2,0,14,4
 10  2  0  4  4
d
5
8
  1.6
5

1 n
S2  
n  1 i 1
(d i  d ) 2

1 5
  (d i  d ) 2
4 i 1
1
 [(10  1.6) 2  (2  1.6) 2  (0  1.6) 2  (4  1.6) 2  (4  1.6) 2 ]
4
123.20
  30.8
4
S  30.8  5.55
 Test statistic: The test statistic is t which is calculated as
d 
t
S/ n
 1.16
  0.645
5.55 / 5
Calculated |t| value is 0.645
Tabulates t0.01 with 5-1 = 4 degrees of freedom is 3.747.
Since calculated t < t0.01 , we accept the Null hypothesis and conclude that there is no significant
change in blood pressure.
R code:-
> x <- c(110,120,125,132,125)
> y <- c(120,118,125,136,121)
> t.test(x,y,paired=TRUE)

Paired t-test

data: x and y
t = -0.64466, df = 4,
p-value = 0.5543
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-8.490956 5.290956
sample estimates:
mean of the differences
-1.6

T-test for difference of two population means :-

With a two-sample t-test, we compare the population means to each other and again look at the
difference. We expect that x  y would be close to μ1 – μ2. The test statistic will use both sample means,
sample standard deviations, and sample sizes for the test.
A two-sample t-test follows
 Write the null and alternative hypotheses.

13 U.Padma Jyothi, CSE Dept , VITB

STATISTICS WITH R PROGRAMMING Unit - V

 State the level of significance and find the critical value. The critical value, from the
student’s t-distribution, has the lesser of n1-1 and n2 -1 degrees of freedom.
 Compute the test statistic.
 Compare the test statistic to the critical value and state a conclusion.

x y
t ~ t n1  n 2 - 2
1 1
S 
n1 n2
where

S2 
n1 s1  n2 s2
2 2
or S2 
(x i  x ) 2   ( yi  y ) 2
n1  n2  2 n1  n2  2

Problem:- Two horses A and B were tested according to the time (in seconds) to run a particular track
with the following results.
Horse A 28 30 32 33 33 29 34
Horse B 29 30 30 24 27 29
Test whether the two horses have the same running capacity.

Solution:- Given n1=7 and n2 = 6

We first compute the same means and standard deviations.
x  M ean of the first sample
1 1
 (28  30  32  33  33  29  34)  (219)  31.286
7 7
y  M ean of the second sample
1 1
 (29  30  30  24  27  29)  (169)  28.16
6 6
x xx ( x  x )2 y yy (y y )2
28 -3.286 10.8 29 0.84 0.7056
30 -1286 1.6538 30 1.84 3.3856
32 0.714 0.51 30 1.84 3.3856
33 1.714 2.94 24 -4.16 17.3056
33 1.714 2.94 27 -1.16 1.3456
29 -2.286 5.226 29 0.84 0.7056
34 2.714 7.366
219 31.4359 169 26.8336

Now, S2 
 ( xi  x )   ( y i  y )
2 2

n1  n2  2
(31.4358  26.8336)
  5.23
762
Therefore S  5.23  2.3

 Null Hypothesis H0: µ1= µ2

 Alternative Hypothesis HA: µ1≠ µ2
 Level of Siginificance:   0.05

14 U.Padma Jyothi, CSE Dept , VITB

STATISTICS WITH R PROGRAMMING Unit - V

x y 31.286 - 28.16
 Computation : t    2.443
1 1 1 1
S  (2.3) 
n1 n2 7 6
Tabulates t0.05 with 7+6-2 = 11 degrees of freedom at 5% level of significance is 2.2
Since calculated t > t0.05 , we reject the Null hypothesis and conclude that there is no significant change in
blood pressure.

ANOVA:- (ANALYSIS OF VARIANCE)

When we have only two samples we can use the t-test to compare the means of the samples
but it might become unreliable in case of more than two samples. If we only compare two means, then
the t-test (independent samples) will give the same results as the ANOVA. Anova is performed with F-
test.

Null hypothesis H0: There are no differences among the mean values of the groups being compared
(i.e., the group means are all equal)–
H0: µ1 = µ2 = µ3 = …= µk
Alternative hypothesis H1: (Conclusion if H0 rejected)?
Not all group means are equal (i.e., at least one group mean is different from the rest).

ANOVA one-way classification:-

Step 1: Total number of all observations
T   X ij
i j
Step 2: Correlation factor
T2 T2
cf  
N rs
Step 3:Total sum of squares
 X
2
TSS = S2T  ij  cf
i j
Step 4: Treatment sum of squares
2
Tj
TrSS = S2Tr   cf
N
Step 5: Error sum of squares
ESS = S2E = TSS-TrSS
Source of variable d.f Sum of Squares TSS F-Test
Treatment k-1 Tj
2
ST
2
S 2Tr
(between sample) S Tr 
2  cf N S2Tr  r
k 1
Fcal  2
S E
Error n-k S2E = TSS-TrSS S 2E
S 2E 
nk

15 U.Padma Jyothi, CSE Dept , VITB

Generative AI On AWS
100% (3)
Generative AI On AWS
208 pages
PPP - Homework 5
67% (9)
PPP - Homework 5
17 pages
Introduction To Probability
No ratings yet
Introduction To Probability
88 pages
UNIT 4 - Part B
No ratings yet
UNIT 4 - Part B
15 pages
R-Prog Unit-5
No ratings yet
R-Prog Unit-5
23 pages
R UNIT 4
No ratings yet
R UNIT 4
23 pages
Statistics Using R Tutorial
No ratings yet
Statistics Using R Tutorial
22 pages
Binomial and Multinomial Distribution
No ratings yet
Binomial and Multinomial Distribution
5 pages
PS With R Lab Record Exp PDF
No ratings yet
PS With R Lab Record Exp PDF
25 pages
Lab-2: Probability Distributions Name: Objective:To Compute Probability Density Function (PDF) and Cumulative Distribution Function (CDF) Outcomes
No ratings yet
Lab-2: Probability Distributions Name: Objective:To Compute Probability Density Function (PDF) and Cumulative Distribution Function (CDF) Outcomes
15 pages
CH 3 Special Probability Distributions
No ratings yet
CH 3 Special Probability Distributions
21 pages
Binomial Distribution
No ratings yet
Binomial Distribution
10 pages
5221 Basic Probability Distributions in R MCA MMS 20MCA2CC9
No ratings yet
5221 Basic Probability Distributions in R MCA MMS 20MCA2CC9
30 pages
Probability Distribution - Binomial Distribution
No ratings yet
Probability Distribution - Binomial Distribution
16 pages
Sec 4.4 - Binomial
No ratings yet
Sec 4.4 - Binomial
26 pages
PS With R Lab Record Exp
No ratings yet
PS With R Lab Record Exp
21 pages
Astro MS Dist
No ratings yet
Astro MS Dist
49 pages
Random Variables
No ratings yet
Random Variables
68 pages
2.2. Binomial Distribution
No ratings yet
2.2. Binomial Distribution
42 pages
Unit 3 Probability Distributions - 21MA41
No ratings yet
Unit 3 Probability Distributions - 21MA41
17 pages
KK Math
No ratings yet
KK Math
13 pages
R Notes 03
No ratings yet
R Notes 03
5 pages
Worksheet 1
No ratings yet
Worksheet 1
7 pages
Discrete Distributions
No ratings yet
Discrete Distributions
25 pages
Tutorial 2 - Questions.
No ratings yet
Tutorial 2 - Questions.
11 pages
Binomial Distribution UPDATED
No ratings yet
Binomial Distribution UPDATED
15 pages
comm 214 Chapter 6 - Part 1 - Discrete Probability Distributions
No ratings yet
comm 214 Chapter 6 - Part 1 - Discrete Probability Distributions
38 pages
Binomial Distribution
No ratings yet
Binomial Distribution
27 pages
Binomial Probability Distribution
No ratings yet
Binomial Probability Distribution
5 pages
EPS - Chapter - 4 - Discrete Distributions - JNN - OK
No ratings yet
EPS - Chapter - 4 - Discrete Distributions - JNN - OK
56 pages
Lecture 7
No ratings yet
Lecture 7
32 pages
Binomial Distribution
No ratings yet
Binomial Distribution
5 pages
Distributions: Binomial (Or Bernoulli'S) Distribution
No ratings yet
Distributions: Binomial (Or Bernoulli'S) Distribution
15 pages
Lab-6-Binomail and Poisson Distribution
100% (1)
Lab-6-Binomail and Poisson Distribution
13 pages
7probability Distributions (Binomial, Poisson and Normal)
No ratings yet
7probability Distributions (Binomial, Poisson and Normal)
33 pages
Ist 214-Statictics Ii: Week 4: Binomial Distribution and Poison Distribution, Expected Values and Variance
No ratings yet
Ist 214-Statictics Ii: Week 4: Binomial Distribution and Poison Distribution, Expected Values and Variance
18 pages
Unit 2. 2. Binomial Distribution
No ratings yet
Unit 2. 2. Binomial Distribution
11 pages
Section N Notes With Answers
No ratings yet
Section N Notes With Answers
4 pages
Some Discrete Probability Distributions
No ratings yet
Some Discrete Probability Distributions
11 pages
Unit-3 - Copy
No ratings yet
Unit-3 - Copy
20 pages
Binomial Distribution
No ratings yet
Binomial Distribution
22 pages
ACTS 372 UNIT 2
No ratings yet
ACTS 372 UNIT 2
47 pages
Module 6 (Stat. Prob.)
No ratings yet
Module 6 (Stat. Prob.)
4 pages
Binomial distribution
No ratings yet
Binomial distribution
16 pages
Binomial Probability Distribution-2
No ratings yet
Binomial Probability Distribution-2
5 pages
Distributions
No ratings yet
Distributions
21 pages
Unit-2
No ratings yet
Unit-2
18 pages
Stat I PDF
No ratings yet
Stat I PDF
34 pages
Binomial and Related Distributions
No ratings yet
Binomial and Related Distributions
17 pages
Chapter 6 - Binomial Distribution
100% (1)
Chapter 6 - Binomial Distribution
4 pages
Sta 111 Lecture Note 2
No ratings yet
Sta 111 Lecture Note 2
19 pages
Chapter 5
No ratings yet
Chapter 5
38 pages
OCR S1 Revision Sheets
No ratings yet
OCR S1 Revision Sheets
12 pages
Lab3 Fitting and Plotting of Binomial Distribution & Poisson Distribution (Challenging Experiment 2 (A) and 2 (B) ) Aim
No ratings yet
Lab3 Fitting and Plotting of Binomial Distribution & Poisson Distribution (Challenging Experiment 2 (A) and 2 (B) ) Aim
18 pages
ChapterStat 2
No ratings yet
ChapterStat 2
77 pages
Binomial Probability Distributions: PS P PF P Q
No ratings yet
Binomial Probability Distributions: PS P PF P Q
7 pages
Statman511 - Prob Dist
No ratings yet
Statman511 - Prob Dist
18 pages
Chap 3
No ratings yet
Chap 3
18 pages
4.08 The Binomial Distribution: 4 Probability Distributions
No ratings yet
4.08 The Binomial Distribution: 4 Probability Distributions
2 pages
Binomial Distribution
100% (1)
Binomial Distribution
15 pages
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
From Everand
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
Gérard Blanchet
3/5 (1)
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
BAYES Theorem
From Everand
BAYES Theorem
Jeffery Short
2/5 (5)
Data Visualization Lab Manual
No ratings yet
Data Visualization Lab Manual
8 pages
DS(OET Lab)
No ratings yet
DS(OET Lab)
3 pages
ML-UNIT-1
No ratings yet
ML-UNIT-1
30 pages
data visualization lab manual (1)
No ratings yet
data visualization lab manual (1)
8 pages
125E2B
No ratings yet
125E2B
2 pages
Introduction to Statistics and Data Analysis: With Exercises, Solutions and Applications in R, 2nd Edition Christian Heumann All Chapters Instant Download
100% (3)
Introduction to Statistics and Data Analysis: With Exercises, Solutions and Applications in R, 2nd Edition Christian Heumann All Chapters Instant Download
50 pages
Syllabus EPH
No ratings yet
Syllabus EPH
44 pages
Statistics
No ratings yet
Statistics
38 pages
Topic Probability Distributions
100% (1)
Topic Probability Distributions
25 pages
Table of Specification: Remembering Understanding Applying Analyzing Evaluating Creating
No ratings yet
Table of Specification: Remembering Understanding Applying Analyzing Evaluating Creating
4 pages
Probability and Statistics Lpu
No ratings yet
Probability and Statistics Lpu
227 pages
Classification of Data PDF
No ratings yet
Classification of Data PDF
23 pages
Probability Distributions
No ratings yet
Probability Distributions
59 pages
Engineering Geology: Hyuck Jin Park, Jung Hyun Lee, Ik Woo
No ratings yet
Engineering Geology: Hyuck Jin Park, Jung Hyun Lee, Ik Woo
15 pages
Elementary Statistics 9th Edition Weiss Test Bank download
100% (3)
Elementary Statistics 9th Edition Weiss Test Bank download
46 pages
Gogte Institute of Technology: Karnatak Law Society'S
No ratings yet
Gogte Institute of Technology: Karnatak Law Society'S
33 pages
Apress Bayesian Optimization Theory and Practice Using Python 1484290623
No ratings yet
Apress Bayesian Optimization Theory and Practice Using Python 1484290623
243 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
53 pages
Intro To Statiscs Syllabus - Outlier
No ratings yet
Intro To Statiscs Syllabus - Outlier
2 pages
Fundamentals of Model Calibration: Theory & Practice
No ratings yet
Fundamentals of Model Calibration: Theory & Practice
31 pages
Cot 4-Nhize
No ratings yet
Cot 4-Nhize
5 pages
Eliciting Multivariate Probability Distributions
No ratings yet
Eliciting Multivariate Probability Distributions
23 pages
Phyto 1999 89 11 1088
No ratings yet
Phyto 1999 89 11 1088
16 pages
Ateneo de Davao University: Welcome To Statistics and Probability Subject!
No ratings yet
Ateneo de Davao University: Welcome To Statistics and Probability Subject!
13 pages
CBCS Syllabus
No ratings yet
CBCS Syllabus
34 pages
Module 5 - Stat. - Prob.
No ratings yet
Module 5 - Stat. - Prob.
4 pages
Download Full Navidi, W: ISE Elementary Statistics William Navidi PDF All Chapters
100% (2)
Download Full Navidi, W: ISE Elementary Statistics William Navidi PDF All Chapters
40 pages
STPM Mathematics T Syllabus 2010
No ratings yet
STPM Mathematics T Syllabus 2010
2 pages
Thermodynamics, Statmech PDF
No ratings yet
Thermodynamics, Statmech PDF
201 pages
Engineering Probability and Statistics
No ratings yet
Engineering Probability and Statistics
10 pages
Detection and Estimation of Signals in Noise
No ratings yet
Detection and Estimation of Signals in Noise
309 pages

unit-5

Uploaded by

unit-5

Uploaded by

STATISTICS WITH R PROGRAMMING jntuk396.blogspot.

UNIT-V: Probability Distributions, Normal Distribution- Binomial Distribution- Poisson

BINOMIAL DISTRIBUTION:- The binomial distribution is a discrete probability distribution. It

R has four in-built functions to generate binomial distribution. They are

1 U.Padma Jyothi, CSE Dept , VITB

Probability for two ordering Italian food,

Problem:What is the probability of obtaining 45 or fewer heads in 100 tosses of a coin?

2 U.Padma Jyothi, CSE Dept , VITB

> pbinom(4, size=12, prob=0.2)

p(r) = 5Cr (0.568)r (0.432) 5-r , r = 0,1,2,3,4,5

Poisson Distribution :- The Poisson distribution is the probability distribution of independent

3 U.Padma Jyothi, CSE Dept , VITB

4 U.Padma Jyothi, CSE Dept , VITB

P(x; μ) = (e-μ) (μx) / x!

Normal Distribution:- A continuous random variable X follows a normal distribution

5 U.Padma Jyothi, CSE Dept , VITB

 mean = median = mode

Procedure to find probability using positive Z-score table

Case 2: Area in any tail 0.5 – Area(z)

Case 3: Area between two |Area(z2)-Area(z1)|

6 U.Padma Jyothi, CSE Dept , VITB

Case 4: Area between two Area(z1)+Area(z2)

Case 5: Area to the left of 0.5+ Area(z)

Case 6: Area to the right 0.5+ Area(z)

7 U.Padma Jyothi, CSE Dept , VITB

8 U.Padma Jyothi, CSE Dept , VITB

> h<-hist(x,col = "blue")

Correlation:- A correlation is a relationship between two variables.

This will always be a number between -1 and 1 (inclusive).

9 U.Padma Jyothi, CSE Dept , VITB

Formula for correlation coefficient:

10 U.Padma Jyothi, CSE Dept , VITB

11 U.Padma Jyothi, CSE Dept , VITB

One Sample t-test

12 U.Padma Jyothi, CSE Dept , VITB

Solution: Let µ be the mean of population of differences.

T-test for difference of two population means :-

13 U.Padma Jyothi, CSE Dept , VITB

Solution:- Given n1=7 and n2 = 6

 Null Hypothesis H0: µ1= µ2

14 U.Padma Jyothi, CSE Dept , VITB

ANOVA:- (ANALYSIS OF VARIANCE)

ANOVA one-way classification:-

15 U.Padma Jyothi, CSE Dept , VITB

You might also like