0% found this document useful (0 votes)
97 views

Statistics-1

Uploaded by

phuongmaivu744
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
97 views

Statistics-1

Uploaded by

phuongmaivu744
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Homework 1 - Statistics

Nguyen Gia An - MAMAIU21036


Instructor: Mr Nguyen Minh Quan
October 23, 2023

Chapter 2
6. Table below gives the number of commercial airline accidents and the total number of resulting fatalities
in the United States in the years from 1985 to 2006.

Year Accidents Year Accidents


1985 4 1996 3
1986 2 1997 3
1987 4 1998 1
1988 3 1999 2
1989 11 2000 2
1990 6 2001 6
1991 4 2002 0
1992 4 2003 2
1993 1 2004 1
1994 4 2005 3
1995 2 2006 2

Table 1

a) Represent the number of yearly airline accidents in a frequency table.


b) Give a frequency polygon graph of the number of yearly airline accidents.
c) Give a cumulative relative frequency plot of the number of yearly airline accidents.
d) Find the sample mean of the number of yearly airline accidents.
e) Find the sample median of the number of yearly airline accidents.
f) Find the sample mode of the number of yearly airline accidents.
g) Find the sample standard deviation of the number of yearly airline accidents.
- Solve:
a) The number of yearly airline accidents in a frequency table:

1
Number of accidents Frequency
0 1
1 3
2 6
3 4
4 5
5 0
6 2
7 0
8 0
9 0
10 0
11 1
Total 22

b) Give a frequency polygon graph of the number of yearly airline accidents:

10
8
6
4
2
0
0 2 4 6 8 10 12 14

c) Cumulative relative frequency plot of the number of yearly airline accidents:

Number of accidents Relative Frequency


0 0.045
1 0.136
2 0.273
3 0.182
4 0.227
6 0.09
11 0.045

1
0.8
0.6
0.4
0.2
0
0 1 2 3 4 5 6 7 8 9 10 11

d) The sample mean of the number of yearly airline accidents:


0 × 1 + 1 × 3 + 2 × 6 + 3 × 4 + 4 × 5 + 6 × 2 + 11 × 1
x̄ = = 3.18.
22

e) The sample median of the number of yearly airline accidents:


a11 + a12 3+3
22 × 0.5 = 11 ∈ N =⇒ median = = = 3.
2 2

2
f) The sample mode of the number of yearly airline accidents: mode = 2.
g) The standard deviation of the number of accidents:
rP r
(xi − x̄)2 113.2728
s= = = 2.3224.
n−1 22 − 1

13. The following are the percentages of ash content in 12 samples of coal found in close proximity: 9.2, 14.1,
9.8, 12.4, 16.0, 12.6, 22.7, 18.9, 21.0, 14.5, 20.4, 16.9
Find the:
a. Sample mean, and
b. Sample standard deviation of these percentages
- Solve:

9 .2 .8
12 .4 .6
14 .1 .5
16 .0 .9
18 .9
20 .4
21 .0
22 .7

3
- The sample mean: x̄ = 15.7083.
- The standard deviation: S = 4.3953.

16. The following data represent the lifetimes (in hours) of a sample of 40 transistors:

112 121 126 108 141 104 136 134


121 118 143 116 108 122 127 140
113 117 126 130 134 120 131 133
118 125 151 147 137 140 132 119
110 124 132 152 135 130 136 128

Table 2

a. Determine the sample mean, median, and mode.


b. Give a cumulative relative frequency plot of these data.
- Solve:

10 4 8 8
11 0 2 3 6 7 8 8 9
12 0 1 1 2 4 5 6 6 7 8
13 0 0 1 2 2 3 4 4 5 6 6 7
14 0 0 1 3 7
15 1 2

a. The sample mean, median, and mode:


Sample mean: x̄ = 127.425.
a20 +a21 127+128
Sample median: 40 × 0.5 = 20 =⇒ median 2 = 2 = 127.5.
Sample mode: {108, 118, 121, 126, 130, 132, 134, 136, 140}.
b. Cumulative relative frequency plot of these data.

1
0.8
0.6
0.4
0.2
0
105 110 115 120 125 130 135 140 145 150

4
17. An experiment measuring the percent shrinkage on drying of 50 clay specimens produced the following
data:

18.2 21.2 23.1 18.5 15.6


20.8 19.4 15.4 21.2 13.4
16.4 18.7 18.2 19.6 14.3
16.6 24.0 17.6 17.8 20.2
17.4 23.6 17.5 20.3 16.6
19.3 18.5 19.3 21.2 13.9
20.5 19.0 17.6 22.3 18.4
21.2 20.4 21.4 20.3 20.1
19.6 20.6 14.8 19.7 20.5
18.0 20.8 15.8 23.1 17.0

Table 3

a. Draw a stem and leaf plot of these data.


b. Compute the sample mean, median, and mode.
c. Compute the sample variance.
d. Group the data into class intervals of size 1 percent starting with the value 13.0, and draw the resulting
histogram.
e. For the grouped data acting as if each of the data points in an interval was actually located at the midpoint
of that interval, compute the sample mean and sample variance and compare this with the results obtained
in parts (b) and (c). Why do they differ?
- Solve:

13 .4 .9
14 .3 .8
15 .4 .6 .8
16 .4 .6 .6
17 .0 .4 .5 .6 .6 .8
18 .0 .2 .2 .4 .5 .5 .7
19 .0 .3 .3 .4 .6 .6 .7
20 .1 .2 .3 .3 .4 .5 .5 .6 .8 .8 .9
21 .2 .2 .2 .2 .4
22 .3
23 .1 .1 .6
24 .0

b. Sample mean = x̄ = 18.978


Sample median = 19.3
Sample mode = 21.2
c. Sample variance = 6.25271
d.

10
Frequency

13.0-14.0 14.0-15.0 15.0-16.0 16.0-17.0 17.0-18.0 18.0-19.0 19.0-20.0 20.0-21.0 21.0-22.0 22.0-23.0 23.0-24.0

e.

5
Interval Value of mid-point Frequency
13.0-14.0 13.5 2
14.0-15.0 14.5 2
15.0-16.0 15.5 3
16.0-17.0 16.5 3
17.0-18.0 17.5 6
18.0-19.0 18.5 7
19.0-20.0 19.5 7
20.0-21.0 20.5 10
21.0-22.0 21.5 5
22.0-23.0 22.5 1
23.0-24.0 23.5 4

The sample mean:

(13.5 + 14.5) × 2 + (15.5 + 16.5) × 3 + 17.5 × 6 + (18.5 + 19.5) × 7 + 20.5 × 10 + 21.5 × 5 + 22.5 × 1 + 23.5 × 4
x̄ =
50

x̄ = 19.024.
The sample variance= 6.5394
We can see that the mean and the variance in this part are different from parts b and c. The main
reason is the way to choose the midpoint to calculate the sample mean. (The real data fluctuates around
the midpoint it will have a difference when we choose a direct midpoint).

26. The following are the grade point averages of 30 students recently admitted to the graduate program
in the Department of Industrial Engineering and Operations Research at the University of California at
Berkeley:
3.46, 3.72, 3.95, 3.55, 3.62, 3.80, 3.86, 3.71, 3.56, 3.49, 3.96, 3.90, 3.70, 3.61,
3.72, 3.65, 3.48, 3.87, 3.82, 3.91, 3.69, 3.67, 3.72, 3.66, 3.79, 3.75, 3.93, 3.74,
3.50, 3.83
a. Represent the preceding data in a stem and leaf plot.
b. Calculate the sample mean x̄.
c. Calculate the sample standard deviation s.
d. Determine the proportion of the data values that lies within x̄ ± 1.5s and compare with the lower
bound given by Chebyshev’s inequality.
e. Determine the proportion of the data values that lies within x̄ ± 2s and compare with the lower
bound given by Chebyshev’s inequality.
- Solve:

3.4 6 8 9
3.5 0 5 6
3.6 1 2 5 6 7 9
3.7 0 1 2 2 2 4 5 9
3.8 0 2 3 6 7
3.9 0 1 3 5 6

b. We have: n = 30 =⇒ x̄ = 3.721.
c. v
u i r
u 1 X 1
s=t × (xi − x̄)2 = × (0.61539) = 0.146.
n − 1 i=1 30 − 1

d. Because there are 23 values belonging to (3.502,3.94), we have:


24
P(x̄−1.5s < X < x̄+1.5s) = P(3.721−1.5×(0.146) < X < 3.721+1.5(0.146)) ⇐⇒ P(3.502 < X, 3.94) = = 80%.
30
1
Chebyshev’s inequality: k = 1.5 : 1 − 1.52 = 55.56%.

6
Chebyshev’s inequality states that at least 55.56% of data lie within the interval (x̄ − 1.5s, x̄ = 1.5s).
White, in actuality, about 80% of data falls within these limits.
e. All values in range (3.429,4.018), we have:
P(x̄ − 2s < X < x̄ + 2s) = P(3.429 < X < 4.013) = 1, with x̄ = 3.721, s = 0.146.
Chebyshev’s inequality:k = 2 : 1 − 212 = 0.75 = 75%, Chebyshev’s inequality states that at least 75% of
data lies within the interval (x̄ − 2s, x̄ + 2s), whereas, in actuality, 100% of data falls within these limits.

29. Use the data of Problem 16 (Table 2)


a. Compute the sample mean and sample median.
b. Are the data approximately normal?
c. Compute the sample standard deviation s.
d. Compute the sample standard deviation s.
e. Compare your answer in part (d) to that given by the empirical rule.
f. Compare your answer in part (d) to the bound given by Chebyshev’s inequality.
- Solve:
a. Sample mean = x̄ = 127.425
a20 +a21 127+128
Sample median = 2 = 2 = 127.5
b. The sample mean (127.425) and sample median (127.5) are fairly close in value, with the median
being slightly higher than the mean. This suggests that the data is approximately symmetric, which is a
characteristic of a normal distribution.
v
u n r
u 1 X 1
s=t (xi − x̄)2 = × (5497.775) = 11.873.
(n − 1) i=1 40 − 1

A calculation gives that x̄ = 127.425, s = 11.873, we will prove whether the data are approximately
normal or not by using the empirical rules.
The empirical rule states that approximately 68% of data are between (115.552,139.298), the actual
percentage is 27×100
67.5 . Similarly, the empirical rule gives that approximately 95% of the data are between
(103.679,151.171), whereas the actual percentage is 39×100
40 = 97.5. It also states that 99.7% of data lies
within (91.806,163.044), while the actual percentage is 100. By using the empirical rules, we find that the
data are approximately normal.
c. s = 11.873.
34
d. P(x̄ − 1.5s < X < x̄ + 1.5s = P(109.62 < X < 145.234) = 40 = 85%.
e. Approximately 68% of the data falls within one standard deviation (x̄ ± 1s). Approximately 95%
of the data falls within two standard deviations (x̄ ± 2s). So, we can estimate (x̄ ± 1.5s) that it should
be roughly between 68% and 95%. Given this, it’s reasonable to estimate that approximately 85% of the
data falls within x̄ ± 1.5s, (the answer of D is 85%). This estimate aligns with the empirical rule for a
normally distributed dataset.
1
f. We have: k = 1.5 : 1 − 1.562 = 55.56%
Chebyshev’s inequality states that at least 55.56% of data lie within the interval (x̄ − 1.5s, x̄ + 1.5s),
while, in actualy, about 85% of data falls within these limits.

33. A random group of 12 high school juniors were asked to estimate the average number of hours they study
each week. The following give these hours along with the student’s grade point average. Find the sample
correlation coefficient between hours reported and GPA.
Solve:

7
Hours(xi ) GPA(yi ) xi − x̄ (xi − x̄)2 yi − ȳ (yi − ȳ)2 (xi − x̄) ∗ (yi − ȳ)
6 2.8 -7 49 -0.44167 0.195 3.09167
14 3.2 1 1 -0.04167 0.00174 -0.04167
3 3.1 -10 100 -0.14167 0.02007 1.41667
22 3.6 9 81 0.35833 0.1284 3.2247
9 3.0 -4 16 -0.24167 0.0584 0.96667
11 3.3 -2 4 0.05833 0.0034 0.11667
12 3.4 -1 1 0.15833 0.02507 -0.15833
5 2.7 -8 64 -0.54167 0.2934 4.3333
18 3.1 5 25 -0.14167 0.02007 -0.70833
24 3.8 11 121 0.55833 0.31174 6.14167
15 3.0 2 4 -0.24167 0.0584 -0.48334
17 3.9 4 16 0.65833 0.4334 2.6332
P12 P12 P12
x̄ = 13 ȳ = 3.24167 i=1 = 482 i=1 = 1.54914 i=1 = 20.3

- Sample correlation coefficient between hours reported and GPA is:


P12
i=1 (xi − x̄)(yi − ȳ)
R = qP qP = 0.74.
12 2 12 2
i=1 (x i − x̄) i=1 (y i − ȳ)

34. Prove properties 3 of covariance coefficient


Property 3: If for constant a and b, with b < 0 : yi = a + bxi with i = 1, 2, ..., n. Then r = −1
- Prove:
Let the mean value of X is x̄
Let the mean value of Y is ȳ
We have yi = a + bxi → the mean of Y is: ȳ = a + bx̄ (by properties of mean)
Pn Pn Pn
i=1 (xi − x̄)(yi − ȳ) i=1 (a + bxi − a − bx̄)(xi − x̄) b(xi − x̄)2 b
R = pPn pPn = pPn pPn = Pni=1 2
= = −1.
i=1 (xi − x̄)
2
i=1 (yi − ȳ)
2
i=1 (xi x̄)
2 2
i=1 b (xi − x̄)
2
i=1 |b|(xi − x̄) −b

Because b < 0 → |b| = −b. Hence, properties 3 are proved.

Chapter 3
23. Let X1 and X2 be independent normal random variables, each having mean 10 and variance σ 2 . Which
probability is larger:
a. P(X1 > 15) or P(X1 + X2 > 25);
b. P(X1 > 15) or P(X1 + X2 > 30)?
c. Find x such that P(X1 + X2 > x) = P(X1 > 15).
- Solve:
We have X1 and X2 are independent normal random variables
Therefore:
E(X1 ) = E(X2 ) = 10 and σ(X1 ) = σ(X2 ) = σ.

Hence,
E(X1 + X2 ) = E(X1 ) + E(X2 ) = 10 + 10 = 20 and σ 2 (X1 + X2 ) = 2.62 .

a. P(X1 > 15) = α1 .


15−10 5
Z1 = σ = σ ∼ N (0, 1).
P(X1 + X2 > 25) = α2 .
25−20 3.54
Z2 = √

= σ ∼ N (0, 1).
Since α2 > α1 , P(X1 + X2 > 25) > P(x1 > 15).

8
b. P(X1 + X2 ) > 30 = α2 .
30−20 7.07
Z2 = √

= σ ∼ N (0, 1).
Since α1 > α2 , P(X1 > 15) > P(X1 + X2 > 30).
15−10 σ
c. P(X1 > 15) = P(Z1 > σ ) = P(Z > α ).
P(X1 + X2 > x) = P(Z2 > x−20


).
5 x−20
√ √
Since P(X1 > 15) = P(X1 +X2 > X), then σ = √

⇐⇒ 5 2 = (x2 0) ⇐⇒ x = 20+5 2 ≃ 27.07.

25. The annual rainfall (in inches) in a certain region is normally distributed with µ = 40, σ = 4. What is the
probability that in 2 of the next 4 years, the rainfall will exceed 50 inches? Assume that the rainfalls in
different years are independent.
- Solve:
The problems combine normal and binomial distribution. First compute P(X1 > 50). Define success as
the probability that rainfall exceeds 50 inches. Compute the probability that you get success in 2 out of
4 trials.
We define:
p: the probability of success in one attempt.
q=1-p: the probability failure in one attempt.
The probability mass function of Binomial is:
P(X = x) = Cnk × pk × q (n−k) and E(x) = n × p = 40.
V ar(x) = σ 2 (x) = n × p(1 − p) = n × p × q = 42 = 16
16
Then, q = 40 = 0.4 and p = 0.6 =⇒ P(X = 2) = C42 × (0.6)2 × (0.4)2 = 0.3456.
R∞ −x2
29. Let I = −∞
e 2 dx.
1
R ∞ −(x−µ)2 √
a. Show that for any µ and σ: 2πσ −∞
e 2σ2 dx = 1 is equivalent to I = 2π.
√ R ∞ −x2 R ∞ −y2 R ∞ R ∞ −(x2 +y2 )
b. Show that I = 2π by writing I 2 = −∞ e 2 dx −∞ e 2 dy = −∞ −∞ e 2 dxdy and
then evaluating the double integral by means of a change of variables to polar coordinates.(That is, let
x = r cos θ, y = r sin θ, dxdy = rdrdθ).
- Solve:
a. We have: ∞ ∞
1
Z
−(x−µ)2
Z
−(x−µ)2 √
e 2σ 2 dx = 1 ⇐⇒ e 2σ 2 dx = 2πσ.
2πσ −∞ −∞

Let z = x−µ
σ ⇐⇒ σdz = dx.
Z ∞ Z ∞ Z ∞ √ √
−(x−µ)2 −z 2 −z 2
e 2σ2 dx ⇐⇒ σ e 2 dz ⇐⇒ σ e 2 dz ⇐⇒ σ × I = 2πσ ⇐⇒ I = 2π.
−∞ −∞ −∞

b. By Polar Coordinate:
Z ∞Z ∞ Z 2π Z ∞ Z 2π √
−(x2 +y 2 ) −r 2
I2 = e 2 dxdy ⇐⇒ e 2 rdrdθ = 1dθ = 2π =⇒ I = 2π.
−∞ −∞ 0 0 0

33. Value at risk (VAR) has become a key concept in financial calculations. The VAR of an investment is
defined as that value v such that there is only a 1 percent chance that the loss from the investment will
exceed v .
a. If the gain from an investment is a normal random variable with mean 10 and variance 49,
determine the value at risk. (If X is the gain, then −X is the loss).
b. Among a set of investments whose gains are all normally distributed show that the one having the
smallest VAR is the one having the largest value of µ − 2.33σ, where µ and σ 2 are the mean and variance
of the gain from the investment.

9
- Solve:
X−10
a. X ∼ (10, 72 ) =⇒ Z = 7 ∼ N (0, 1)
We have:
X − 10 x − 10 x − 10 x − 10
P(X < x) = 10% ⇐⇒ P( < ) = P(Z < ) = 0.01 =⇒ Z = −2.326 = =⇒ x = −6.28.
7 7 7 7

P(X < −6.28) = 1% =⇒ P(−X > 6.28) = 1% and P(X < 6.28) = 99%.
Thus, V ar = 6.28.
b. As we just calculated above, x = µ − 2.33s and this value represented as loss, then it is typically negative.
The greater this value (the less negative this value is), the lower is value of VAR. therefore, the smallest
VAR is the one having the largest value of µ − 2.336.

36. An IQ test produces scores that are normally distributed with mean value 100 and standard deviation
14.2. The top 1 percent of all scores are in what range?
- Solve:
We have: X ∼ N (100, 14.22 ) and P(X > x) = 1%
x−100
Then P(X < x) = 1 − 1% = 99% or P(X < 14.2 ) = 99%
x−100 x−100
and Z = 14.2 ∼ N (0, 1) ⇐⇒ 2.326 = 14.2 =⇒ x ≃ 133.0292.
To be in the top 1%, score must be greater or equal 133.0292.

37. The time (in hours) required to repair a machine is an exponentially distributed random variable with
parameter λ = 1.
a. What is the probability that a repair time exceeds 2 hours?
b.What is the conditional probability that a repair takes at least 3 hours, given that its duration
exceeds 2 hours?
- Solve:
Let X be the time (in hours) required to repair a machine, it can be assumed that X is an exponential
random variable with λ = 1
R∞
a. P(X > 2) = 2 e−x dx = 0.1353.
b. P(X > 2) = 1 − F (2) = 1 − (1 − e−2 ) = e−2 ≃ 0.1353.

42. When shooting at a target in a two-dimensional plane, suppose that the horizontal miss distance is
normally distributed with mean 0 and variance 4 and is independent of the vertical miss distance, which
is also normally distributed with mean 0 and variance 4. Let D denote the distance between the point at
which the shot lands and the target. Find E[D].
- Solve:
We have:
X1 −0 X1 X2 −0 X2
D2 = X12 + X22 and Z1 = 2 = 2 ; Z2 = 2 = 2 .
xi
If D is the distance and Xi , i = 1, 2 are the coordinate points, then Zi = 2 ,i = 1, 2 are standard normal
random variables, we obtain
D = Z12 + Z22 and D ∼ χ22 ∼ (0, 22 ).

and the fact that the chi-square distribution with 2 degrees of freedom is the same as the exponential
distribution with parameter 21 . Therefore, E[D] = n = 2.

43. If X is a chi-square random variable with 6 degrees of freedom, find


a.P(X ≤ 6);
b.P(3 ≤ X ≤ 9).

10
- Solve:
If X is a chi-square random variable with 6 degrees of freedom, we denoted X ∼ χ26
a. P(X ≤ 6) = P(χ26 ≤ 6) = 0.57681
b. P(3 ≤ X ≤ 9) = P(χ26 ≤ 9) − P(χ26 ≤ 3) = 0.63527

44. If X and Y are independent chi-square random variables with 3 and 6 degrees of freedom, respectively,
determine the probability that X + Y will exceed 10.
Solve:
Let Z = X + Y , because X and Y are independent chi-square random variables with 3 and 6 degrees of
freedom. So Z is also follow chi-square distribution with degrees of freedom 6+3=9
The probability that X + Y will exceed 10:

P(X + Y > 10) = P(χ29 > 10) = 0.35049



45. Show Γ( 12 ) = π.
Solve:
We have:
Z ∞
1
Γ( 12 ) = e−x x− 2 dx.
0

y2
Let x = 2 → dx = ydy Z ∞ √ Z ∞
1 y2 1 y2
Γ( ) = e− 2 q ydy = 2 e− 2 dy
2 0 y2 0
2
Z ∞ √
y2
Now, we let I = e− 2 dy → Γ = 2I
0
Z ∞ Z ∞ Z ∞ Z ∞
2 2 −(x2 +y 2 )
2 − y2 − u2
We have: I = e dy e du = e 2 dxdy.
0 0 0 0

Convert to polar coordinate, we have:

π π π
Z Z ∞ Z Z
4 r2 4 r2 4 π
I2 = e− 2 rdrdθ = (−e− 2 )|∞
0 dθ = 1dθ =
0 0 0 0 4
√ √ √ √ √
√π 2 √π2

So, I = 4 = 2
→ Γ( 12 ) = 2I = = π

46. If T has a t-distribution with 8 degrees of freedom, find:


a. P(T ≥ 1)
b. P(T ≤ 1)
c. P(−1 ≤ T ≤ 1)
Solve:
a. P(T ≥ 1) = P (T8 ≤ 1) = 0.1733
b. P(T ≤ 1) = 0.9597
c. P(−1 ≤ T ≤ 1) = 0.6534

11

You might also like