0% found this document useful (0 votes)
38 views

Stat 235 Lab Assignment 2

STAT 235

Uploaded by

cornelfavour82
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views

Stat 235 Lab Assignment 2

STAT 235

Uploaded by

cornelfavour82
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

1.

Use either the built-in Excel functions or the instructional templates to help
answer the questions. Suppose that the number of new subscribers in a
single day follows a Poisson distribution with parameter λ = 4.75. This
implies that the expected number of new subscribers in a single day is
around 5. Answer the following questions. In parts (c) to (e), also give the
distribution(s) used and its parameters.

a.) What is the probability that there are no new subscribers in a randomly
selected day? In other words, what is the proportion of days with no
new subscribers?

P(X= 0) = (λ^0)*e^(- λ)/0!


e^(-4.75) = 0.008652

b.) Suppose the application’s marketing is investigated if there are one or


less new subscribers in a day. Use the template to determine the
proportion of days that the application’s marketing is investigated.

P(X<=1) = P(X=0) +P(X=1)


P(X=0) = 0.008652
So P(X=1) = (4.75^(1) × e^(-4.75))/1!
= 0.0411
Thus, P(X<=1) = 0.04975

c.) Refer to part (b). What is the probability that the application’s marketing
is not investigated for a randomly selected period of three consecutive
weeks? (Assume the first day of the entire period is selected randomly
and each day is independent of the other days.)
P(not investigated) = 1 – P(X<=1)
= 1- 0.04975
= 0.95025
P( not investigated in 21 days) = 0.95025^(21)
= 0.3424
Distribution and parameters = (1-POISSON.DIST(1,4.75, TRUE))^21

d.) Refer to part (b). What is the probability that the application’s marketing
is investigated for more than two days in a randomly selected period of
three consecutive weeks? (Again, assume the first day of the entire
period is selected randomly and independence between days.)

n=21 and p= 0.04975


P(Y>2) = 1 – P(Y<=2)
So,P(Y<=2) = P(Y = 0) + P(Y = 1) + P( Y= 2)

( )
21 0 21
P(Y = 0) = 0 0.04975 (1−0.04975 ) = 0.3424486

P(Y = 1) = ( 1 ) 0.04975 ( 1−0.04975 )


21 1 20
= 0.3577732

P(Y = 2) = ( 2 ) 0.04975 ( 1−0.04975 )


21 2 19
= 0.197117
Thus, P(Y<= 2) = 0.8973

P(Y > 2) = 0.1027

e.) Refer to part (b). What are the expected value and standard deviation
for the number of days until the application’s marketing is investigated?
(Again, assume independence.)
Geometric distribution where p = 0.04975
E(X) = 1/p = 1/0.04975 = 20.1005
SD(X) = √
1−p
= 19.5941.
p

2.) In this question, you can use either the built-in Excel functions or the
instructional templates to help answer the questions. In parts (a) and (b), also give
the distribution(s) used and its parameters.
a.) Suppose again that days are independent but now the number of new
subscribers in a single day follows a Poisson distribution with parameter λ = 5.
What is the probability that the twentieth consecutive day is the first one with no
new subscribers?

P(X = 0) = 0.0067
P(X > 0) = 1 – P(X = 0) = 0.9933
P(1st sub day on the 20th day) = P(X > 0)^19 * P(X = 0)
= 0.0059

b.) If the application company has already identified five consecutive days with at
least one new subscriber in each day, what is the probability that the twenty-fifth
consecutive day is the first one with no new subscriber? Compare your results
with probability obtained in part (a). Give a justification for your results.
 The condition that there are already 5 consecutive days with at least one
subscriber tells us nothing about the behavior of the remaining days (since
the days are independent). So, after these 5 days, we are still in the same
situation as we were in part (a).
 Essentially, we are now looking for the probability that the next 19
consecutive days (days 6 through 24) have at least one new subscriber each,
and the twenty-fifth day is the first day with no new subscribers.
So, P(25th is first no-sub day) = 0.9933^19 * 0.0067
= 0.0059
The result in part (b) is the same as the result in part (a). This is because the
events on different days are independent of each other. The fact that the
first five days had subscribers does not affect the probabilities for the
following days. Therefore, the probability of having no new subscribers for
the first time on the 25th day is identical to the probability of having no
subscribers for the first time on the 20th day in part (a).
3.) Open the worksheet Simulation. The worksheet allows you to simulate the
outcomes of new subscribers for 60 random samples of days. Use the Random
Number Generation feature (Poisson, seed 30) to generate 60 samples of size n =
20 from the Poisson distribution with  = 4.75. This corresponds to selecting 60
samples, each consisting of 20 days. The data will be entered in the form of 60
columns, each consisting of 20 rows into the range B10:BI29. In other words, the
range contains the outcome of subscribers for 1200 days.
Note: The worksheet Simulation is not protected. The students should be careful
not to remove the formulas entered in the rows AVERAGE, COUNT, and the
summary statistics for AVERAGE.
Once the data are entered, the values of the variables AVERAGE and COUNT are
automatically displayed in rows 61 and 63, respectively. They are further
explained below, as needed.
a.) Use the COUNTIF function to determine the number and proportion of days
with no new subscribers among the 1200 days. Compare the value with the
probability obtained in Question 1, part (a). Should the values be identical?
Explain briefly. The COUNTIF function was discussed in the Lab 2
Instructions.
Thus, to count how many days have zero new subscribers across all 1200
days and to find the proportion of days with no subscribers we use that
above = 0.01
In 1a. we have P (X= 0) = (λ^0)*e^(- λ)/0!
e^(-4.75) = 0.008652, they are not identical but are somewhat close.

b.) The variable COUNT in row 63 counts the number of days with one or less
new subscribers in each sample. Use the values to determine the number
and proportion of samples of 20 days all containing two or more new
subscribers. Using the probability from Question 1, part (b), calculate an
appropriate probability and compare to the simulated proportion. Should
the values be identical?
Calculating the proportion
the proportion of samples where every day had two or more subscribers
0.4

In 1b. the answer given was “Thus, P(X<=1) = 0.04975” which is way
smaller than the simulated proportion. Therefore, not identical.
c.) The variable AVERAGE in row 61 shows the average number of new
subscribers for each sample. Obtain and print a histogram of the average
number of new subscribers using the following bins: 3.75, 4.00, ..., 6.25. The
format of your histogram should be the same as the format from previous
labs and the Lab 1 Instructions. Describe the shape (modality, skewness,
outliers) of the histogram.
The histogram created is unimodal and right skewed with no outlier.

d.) The worksheet Simulation also displays Summary Statistics for the variable
AVERAGE. Use the feature to obtain the mean and standard deviation of the
sample means for the 60 samples. Compare them with the values predicted
by theory. Should the values be identical? Explain briefly.

In theory the mean is λ = 4.75 and the standard deviation is sqrt(λ)/sqrt(n) =


sqrt (4.75)/sqrt (20) = 0.4873. The simulation mean has a difference of
1.55% while the standard deviation has a difference of % both are less than
8.61% thus there is not much significant difference in means but slight
difference in standard deviation. However, the values are not identical due
to estimation errors in tables.
4.) Now we repeat Question 3 with n = 50. First, clear the range B10:BI29 in the
worksheet Simulation. Use the Random Number Generation feature (Poisson,
seed 30) to generate 60 samples of size n = 50 from the Poisson distribution with
the parameter  = 4.75. The data will be entered into the range B10:BI59 in the
form of 60 columns, each consisting of 50 rows. Use five decimal places for this
question.
Note: The worksheet Simulation is not protected. The students should be careful
not to remove the formulas entered in the rows AVERAGE, COUNT, and the
summary statistics for AVERAGE.
a.) Use the COUNTIF function to determine the number and proportion of days
with no new subscribers among the 3,000 days. Compare the value with the
probability obtained in Question 1, part (a). Should the values be identical?
Explain briefly.

To count how many days have zero new subscribers across all 1200 days:
Formula: =COUNTIF (B10:BI59, 0)
This counts all instances where the number of new subscribers is zero.

To find the proportion of days with no subscribers, divide the count


from above by 3000:
Formula: =COUNTIF (B10:BI59,0)/3000 = 0.00867

In 1a. we have e^(-4.75) = 0.008652 which is fairly close to the simulated


proportion and has a difference of 0.2% which is way less than 5%, thus can be
called identical.
b.) The variable COUNT in row 63 counts the number of days with one or less
new subscribers in each sample. Use the values to determine the number
and proportion of samples of 50 days all containing two or more new
subscribers. Using the probability from Question 1, part (b), calculate an
appropriate probability and compare to the simulated proportion. Should
the values be identical?

the proportion of samples where every day had two or more


subscribers: Formula: =COUNTIF (B63:BI63, 0)/60 = 0.0833. Compared to
1b. we have P(X<=1) = 0.04975 which has a 67.4% difference which is a
great difference. Thus, cannot be identical.
c.) Obtain a histogram of the average number of new subscribers using the
following bins: 3.75, 4.00, ..., 6.25. Describe the shape of the histogram
(modality, skewness, outliers) and compare to the histogram obtained in
Question 3, part (c).

Bins Frequency
3.75 0
4 0
4.25 1
4.5 12
4.75 15
5 19
5.25 12
5.5 1
5.75 0
6 0
6.25 0
More 0

Compared to histogram in 3c, they are both unimodal and have no outliers
but this histogram for n = 50 is less right
skewed than that of n = 20.

d.) Obtain the mean and standard


deviation of the sample means for
the 60 samples. Compare them with
the values predicted by theory
and relate to the similar comparison in
Question 3, part (c). What do you conclude? State your findings briefly.

The theoretical mean will be λ = 4.75 and the standard deviation will be
sqrt(λ)/sqrt(n) = sqrt(4.75)/sqrt(50) = 0.30822. Therefore, the mean for the
sample n = 20 and n = 50 are the same. However, the standard deviations
are different (where in n = 20 it is larger) which suggests that the sample for
n = 20 is has a wider spread.

You might also like