Statistics-1
Statistics-1
Chapter 2
6. Table below gives the number of commercial airline accidents and the total number of resulting fatalities
in the United States in the years from 1985 to 2006.
Table 1
1
Number of accidents Frequency
0 1
1 3
2 6
3 4
4 5
5 0
6 2
7 0
8 0
9 0
10 0
11 1
Total 22
10
8
6
4
2
0
0 2 4 6 8 10 12 14
1
0.8
0.6
0.4
0.2
0
0 1 2 3 4 5 6 7 8 9 10 11
2
f) The sample mode of the number of yearly airline accidents: mode = 2.
g) The standard deviation of the number of accidents:
rP r
(xi − x̄)2 113.2728
s= = = 2.3224.
n−1 22 − 1
13. The following are the percentages of ash content in 12 samples of coal found in close proximity: 9.2, 14.1,
9.8, 12.4, 16.0, 12.6, 22.7, 18.9, 21.0, 14.5, 20.4, 16.9
Find the:
a. Sample mean, and
b. Sample standard deviation of these percentages
- Solve:
9 .2 .8
12 .4 .6
14 .1 .5
16 .0 .9
18 .9
20 .4
21 .0
22 .7
3
- The sample mean: x̄ = 15.7083.
- The standard deviation: S = 4.3953.
16. The following data represent the lifetimes (in hours) of a sample of 40 transistors:
Table 2
10 4 8 8
11 0 2 3 6 7 8 8 9
12 0 1 1 2 4 5 6 6 7 8
13 0 0 1 2 2 3 4 4 5 6 6 7
14 0 0 1 3 7
15 1 2
1
0.8
0.6
0.4
0.2
0
105 110 115 120 125 130 135 140 145 150
4
17. An experiment measuring the percent shrinkage on drying of 50 clay specimens produced the following
data:
Table 3
13 .4 .9
14 .3 .8
15 .4 .6 .8
16 .4 .6 .6
17 .0 .4 .5 .6 .6 .8
18 .0 .2 .2 .4 .5 .5 .7
19 .0 .3 .3 .4 .6 .6 .7
20 .1 .2 .3 .3 .4 .5 .5 .6 .8 .8 .9
21 .2 .2 .2 .2 .4
22 .3
23 .1 .1 .6
24 .0
10
Frequency
13.0-14.0 14.0-15.0 15.0-16.0 16.0-17.0 17.0-18.0 18.0-19.0 19.0-20.0 20.0-21.0 21.0-22.0 22.0-23.0 23.0-24.0
e.
5
Interval Value of mid-point Frequency
13.0-14.0 13.5 2
14.0-15.0 14.5 2
15.0-16.0 15.5 3
16.0-17.0 16.5 3
17.0-18.0 17.5 6
18.0-19.0 18.5 7
19.0-20.0 19.5 7
20.0-21.0 20.5 10
21.0-22.0 21.5 5
22.0-23.0 22.5 1
23.0-24.0 23.5 4
(13.5 + 14.5) × 2 + (15.5 + 16.5) × 3 + 17.5 × 6 + (18.5 + 19.5) × 7 + 20.5 × 10 + 21.5 × 5 + 22.5 × 1 + 23.5 × 4
x̄ =
50
x̄ = 19.024.
The sample variance= 6.5394
We can see that the mean and the variance in this part are different from parts b and c. The main
reason is the way to choose the midpoint to calculate the sample mean. (The real data fluctuates around
the midpoint it will have a difference when we choose a direct midpoint).
26. The following are the grade point averages of 30 students recently admitted to the graduate program
in the Department of Industrial Engineering and Operations Research at the University of California at
Berkeley:
3.46, 3.72, 3.95, 3.55, 3.62, 3.80, 3.86, 3.71, 3.56, 3.49, 3.96, 3.90, 3.70, 3.61,
3.72, 3.65, 3.48, 3.87, 3.82, 3.91, 3.69, 3.67, 3.72, 3.66, 3.79, 3.75, 3.93, 3.74,
3.50, 3.83
a. Represent the preceding data in a stem and leaf plot.
b. Calculate the sample mean x̄.
c. Calculate the sample standard deviation s.
d. Determine the proportion of the data values that lies within x̄ ± 1.5s and compare with the lower
bound given by Chebyshev’s inequality.
e. Determine the proportion of the data values that lies within x̄ ± 2s and compare with the lower
bound given by Chebyshev’s inequality.
- Solve:
3.4 6 8 9
3.5 0 5 6
3.6 1 2 5 6 7 9
3.7 0 1 2 2 2 4 5 9
3.8 0 2 3 6 7
3.9 0 1 3 5 6
b. We have: n = 30 =⇒ x̄ = 3.721.
c. v
u i r
u 1 X 1
s=t × (xi − x̄)2 = × (0.61539) = 0.146.
n − 1 i=1 30 − 1
6
Chebyshev’s inequality states that at least 55.56% of data lie within the interval (x̄ − 1.5s, x̄ = 1.5s).
White, in actuality, about 80% of data falls within these limits.
e. All values in range (3.429,4.018), we have:
P(x̄ − 2s < X < x̄ + 2s) = P(3.429 < X < 4.013) = 1, with x̄ = 3.721, s = 0.146.
Chebyshev’s inequality:k = 2 : 1 − 212 = 0.75 = 75%, Chebyshev’s inequality states that at least 75% of
data lies within the interval (x̄ − 2s, x̄ + 2s), whereas, in actuality, 100% of data falls within these limits.
A calculation gives that x̄ = 127.425, s = 11.873, we will prove whether the data are approximately
normal or not by using the empirical rules.
The empirical rule states that approximately 68% of data are between (115.552,139.298), the actual
percentage is 27×100
67.5 . Similarly, the empirical rule gives that approximately 95% of the data are between
(103.679,151.171), whereas the actual percentage is 39×100
40 = 97.5. It also states that 99.7% of data lies
within (91.806,163.044), while the actual percentage is 100. By using the empirical rules, we find that the
data are approximately normal.
c. s = 11.873.
34
d. P(x̄ − 1.5s < X < x̄ + 1.5s = P(109.62 < X < 145.234) = 40 = 85%.
e. Approximately 68% of the data falls within one standard deviation (x̄ ± 1s). Approximately 95%
of the data falls within two standard deviations (x̄ ± 2s). So, we can estimate (x̄ ± 1.5s) that it should
be roughly between 68% and 95%. Given this, it’s reasonable to estimate that approximately 85% of the
data falls within x̄ ± 1.5s, (the answer of D is 85%). This estimate aligns with the empirical rule for a
normally distributed dataset.
1
f. We have: k = 1.5 : 1 − 1.562 = 55.56%
Chebyshev’s inequality states that at least 55.56% of data lie within the interval (x̄ − 1.5s, x̄ + 1.5s),
while, in actualy, about 85% of data falls within these limits.
33. A random group of 12 high school juniors were asked to estimate the average number of hours they study
each week. The following give these hours along with the student’s grade point average. Find the sample
correlation coefficient between hours reported and GPA.
Solve:
7
Hours(xi ) GPA(yi ) xi − x̄ (xi − x̄)2 yi − ȳ (yi − ȳ)2 (xi − x̄) ∗ (yi − ȳ)
6 2.8 -7 49 -0.44167 0.195 3.09167
14 3.2 1 1 -0.04167 0.00174 -0.04167
3 3.1 -10 100 -0.14167 0.02007 1.41667
22 3.6 9 81 0.35833 0.1284 3.2247
9 3.0 -4 16 -0.24167 0.0584 0.96667
11 3.3 -2 4 0.05833 0.0034 0.11667
12 3.4 -1 1 0.15833 0.02507 -0.15833
5 2.7 -8 64 -0.54167 0.2934 4.3333
18 3.1 5 25 -0.14167 0.02007 -0.70833
24 3.8 11 121 0.55833 0.31174 6.14167
15 3.0 2 4 -0.24167 0.0584 -0.48334
17 3.9 4 16 0.65833 0.4334 2.6332
P12 P12 P12
x̄ = 13 ȳ = 3.24167 i=1 = 482 i=1 = 1.54914 i=1 = 20.3
Chapter 3
23. Let X1 and X2 be independent normal random variables, each having mean 10 and variance σ 2 . Which
probability is larger:
a. P(X1 > 15) or P(X1 + X2 > 25);
b. P(X1 > 15) or P(X1 + X2 > 30)?
c. Find x such that P(X1 + X2 > x) = P(X1 > 15).
- Solve:
We have X1 and X2 are independent normal random variables
Therefore:
E(X1 ) = E(X2 ) = 10 and σ(X1 ) = σ(X2 ) = σ.
Hence,
E(X1 + X2 ) = E(X1 ) + E(X2 ) = 10 + 10 = 20 and σ 2 (X1 + X2 ) = 2.62 .
8
b. P(X1 + X2 ) > 30 = α2 .
30−20 7.07
Z2 = √
2σ
= σ ∼ N (0, 1).
Since α1 > α2 , P(X1 > 15) > P(X1 + X2 > 30).
15−10 σ
c. P(X1 > 15) = P(Z1 > σ ) = P(Z > α ).
P(X1 + X2 > x) = P(Z2 > x−20
√
2σ
).
5 x−20
√ √
Since P(X1 > 15) = P(X1 +X2 > X), then σ = √
2σ
⇐⇒ 5 2 = (x2 0) ⇐⇒ x = 20+5 2 ≃ 27.07.
25. The annual rainfall (in inches) in a certain region is normally distributed with µ = 40, σ = 4. What is the
probability that in 2 of the next 4 years, the rainfall will exceed 50 inches? Assume that the rainfalls in
different years are independent.
- Solve:
The problems combine normal and binomial distribution. First compute P(X1 > 50). Define success as
the probability that rainfall exceeds 50 inches. Compute the probability that you get success in 2 out of
4 trials.
We define:
p: the probability of success in one attempt.
q=1-p: the probability failure in one attempt.
The probability mass function of Binomial is:
P(X = x) = Cnk × pk × q (n−k) and E(x) = n × p = 40.
V ar(x) = σ 2 (x) = n × p(1 − p) = n × p × q = 42 = 16
16
Then, q = 40 = 0.4 and p = 0.6 =⇒ P(X = 2) = C42 × (0.6)2 × (0.4)2 = 0.3456.
R∞ −x2
29. Let I = −∞
e 2 dx.
1
R ∞ −(x−µ)2 √
a. Show that for any µ and σ: 2πσ −∞
e 2σ2 dx = 1 is equivalent to I = 2π.
√ R ∞ −x2 R ∞ −y2 R ∞ R ∞ −(x2 +y2 )
b. Show that I = 2π by writing I 2 = −∞ e 2 dx −∞ e 2 dy = −∞ −∞ e 2 dxdy and
then evaluating the double integral by means of a change of variables to polar coordinates.(That is, let
x = r cos θ, y = r sin θ, dxdy = rdrdθ).
- Solve:
a. We have: ∞ ∞
1
Z
−(x−µ)2
Z
−(x−µ)2 √
e 2σ 2 dx = 1 ⇐⇒ e 2σ 2 dx = 2πσ.
2πσ −∞ −∞
Let z = x−µ
σ ⇐⇒ σdz = dx.
Z ∞ Z ∞ Z ∞ √ √
−(x−µ)2 −z 2 −z 2
e 2σ2 dx ⇐⇒ σ e 2 dz ⇐⇒ σ e 2 dz ⇐⇒ σ × I = 2πσ ⇐⇒ I = 2π.
−∞ −∞ −∞
b. By Polar Coordinate:
Z ∞Z ∞ Z 2π Z ∞ Z 2π √
−(x2 +y 2 ) −r 2
I2 = e 2 dxdy ⇐⇒ e 2 rdrdθ = 1dθ = 2π =⇒ I = 2π.
−∞ −∞ 0 0 0
33. Value at risk (VAR) has become a key concept in financial calculations. The VAR of an investment is
defined as that value v such that there is only a 1 percent chance that the loss from the investment will
exceed v .
a. If the gain from an investment is a normal random variable with mean 10 and variance 49,
determine the value at risk. (If X is the gain, then −X is the loss).
b. Among a set of investments whose gains are all normally distributed show that the one having the
smallest VAR is the one having the largest value of µ − 2.33σ, where µ and σ 2 are the mean and variance
of the gain from the investment.
9
- Solve:
X−10
a. X ∼ (10, 72 ) =⇒ Z = 7 ∼ N (0, 1)
We have:
X − 10 x − 10 x − 10 x − 10
P(X < x) = 10% ⇐⇒ P( < ) = P(Z < ) = 0.01 =⇒ Z = −2.326 = =⇒ x = −6.28.
7 7 7 7
P(X < −6.28) = 1% =⇒ P(−X > 6.28) = 1% and P(X < 6.28) = 99%.
Thus, V ar = 6.28.
b. As we just calculated above, x = µ − 2.33s and this value represented as loss, then it is typically negative.
The greater this value (the less negative this value is), the lower is value of VAR. therefore, the smallest
VAR is the one having the largest value of µ − 2.336.
36. An IQ test produces scores that are normally distributed with mean value 100 and standard deviation
14.2. The top 1 percent of all scores are in what range?
- Solve:
We have: X ∼ N (100, 14.22 ) and P(X > x) = 1%
x−100
Then P(X < x) = 1 − 1% = 99% or P(X < 14.2 ) = 99%
x−100 x−100
and Z = 14.2 ∼ N (0, 1) ⇐⇒ 2.326 = 14.2 =⇒ x ≃ 133.0292.
To be in the top 1%, score must be greater or equal 133.0292.
37. The time (in hours) required to repair a machine is an exponentially distributed random variable with
parameter λ = 1.
a. What is the probability that a repair time exceeds 2 hours?
b.What is the conditional probability that a repair takes at least 3 hours, given that its duration
exceeds 2 hours?
- Solve:
Let X be the time (in hours) required to repair a machine, it can be assumed that X is an exponential
random variable with λ = 1
R∞
a. P(X > 2) = 2 e−x dx = 0.1353.
b. P(X > 2) = 1 − F (2) = 1 − (1 − e−2 ) = e−2 ≃ 0.1353.
42. When shooting at a target in a two-dimensional plane, suppose that the horizontal miss distance is
normally distributed with mean 0 and variance 4 and is independent of the vertical miss distance, which
is also normally distributed with mean 0 and variance 4. Let D denote the distance between the point at
which the shot lands and the target. Find E[D].
- Solve:
We have:
X1 −0 X1 X2 −0 X2
D2 = X12 + X22 and Z1 = 2 = 2 ; Z2 = 2 = 2 .
xi
If D is the distance and Xi , i = 1, 2 are the coordinate points, then Zi = 2 ,i = 1, 2 are standard normal
random variables, we obtain
D = Z12 + Z22 and D ∼ χ22 ∼ (0, 22 ).
and the fact that the chi-square distribution with 2 degrees of freedom is the same as the exponential
distribution with parameter 21 . Therefore, E[D] = n = 2.
10
- Solve:
If X is a chi-square random variable with 6 degrees of freedom, we denoted X ∼ χ26
a. P(X ≤ 6) = P(χ26 ≤ 6) = 0.57681
b. P(3 ≤ X ≤ 9) = P(χ26 ≤ 9) − P(χ26 ≤ 3) = 0.63527
44. If X and Y are independent chi-square random variables with 3 and 6 degrees of freedom, respectively,
determine the probability that X + Y will exceed 10.
Solve:
Let Z = X + Y , because X and Y are independent chi-square random variables with 3 and 6 degrees of
freedom. So Z is also follow chi-square distribution with degrees of freedom 6+3=9
The probability that X + Y will exceed 10:
y2
Let x = 2 → dx = ydy Z ∞ √ Z ∞
1 y2 1 y2
Γ( ) = e− 2 q ydy = 2 e− 2 dy
2 0 y2 0
2
Z ∞ √
y2
Now, we let I = e− 2 dy → Γ = 2I
0
Z ∞ Z ∞ Z ∞ Z ∞
2 2 −(x2 +y 2 )
2 − y2 − u2
We have: I = e dy e du = e 2 dxdy.
0 0 0 0
π π π
Z Z ∞ Z Z
4 r2 4 r2 4 π
I2 = e− 2 rdrdθ = (−e− 2 )|∞
0 dθ = 1dθ =
0 0 0 0 4
√ √ √ √ √
√π 2 √π2
pπ
So, I = 4 = 2
→ Γ( 12 ) = 2I = = π
11