Probability & Statistics
Probability & Statistics
Inferential Statistics
References
1. Inferential Statistics
( x )2
1
f ( x) e 2 2
2
where n is the sample size and xi represents the values of the random
variable.
n
(x i )
2
i1
n 1
Example 1
Calculate the mean and the standard deviation for the following
sample of x’s:
xi xi (xi )2
3 1.33 1.77
5 0.67 0.45
2 -2.33 5.43
7 2.67 7.13
6 1.67 2.79
3 -1.33 1.77
= 4.33 sum=19.34
19.34
standard deviation
6 1
or
= 1.966.
Characteristics of the Normal Probability Distribution
1. The area under the normal probability distribution curve between two
values x1 and x2 represents the probability that a randomly selected value
would fall between x1 and x2. For example, the shaded area under the
curve in the following diagram is the probability that a randomly selected
variable would fall between x1 and x2.
x1 x2
2. The area under the normal probability curve represents probability. The
maximum value that a probability function can assume is 1 (which means
the event will certainly happen). Therefore, the total area under the
normal probability distribution curve is equal to 1.
Area=1
3. The normal probability distribution curve is symmetrical around its
mean. This indicates that half of the total area under the curve (0.5) is to
the right of the mean and the other half is to the left of the mean.
0.5 0.5
(mean)
Step 1:
Calculate the value of Z1 from
x1
Z1
Step 2:
Calculate the value of Z2 from
x2
Z2
Step 3:
Table 3 (below): which is the table of areas for the normal probability
distribution provides the areas (probabilities) under the curve between
Z = 0 and a given value of Z1.
Important Note: The areas under the normal distribution curve listed
in Table 3 are the areas under the curve between Z = 0 and a given
value of Z1.
Z=0 Z1
Step 4:
Since for the problem at hand we are interested in the area under the
curve between Z1 and Z2, subtract the area obtained for Z1 from that
obtained for Z2 if both Z1 and Z2 are positive or they are both
negative. This value represents the area under the curve between Z 1
and Z2 or the probability that a randomly selected variable would
assume a value between x1 and x 2 (remember Z1 and Z2 were
calculated form x 1 and x 2). If Z1 and Z2 have opposite signs (i.e.: one
is positive and the other is negative) add the areas obtained for Z1 and
Z2. Due to symmetry, the area for a negative value of Z1 is the same as
the positive value of Z1 except that a negative Z1 has its area to the left
of Z = 0.
Example 2
A population is normally distributed with a mean of 50 and a standard
deviation of 10. What is the probability that a randomly selected
variable from this population falls between 35 and 40?
Step 1:
35 50
Z1 = 15
.
10
Step 2:
40 50
Z2 = 10
.
10
Step 3:
From Table 3, the area between Z = 0 and Z1 = 1.5 is 0.4332 and the
area between Z = 0 and Z2 = 1.0 is 0.3413. Therefore the area under
the curve between above values of Z1 and Z2 is:
Step 1:
Calculate the value of Z1 from
x1
Z1 = .
Step 2:
If Z1 has a positive value two situations may arise:
1. The problem asks for probability of a randomly selected variable
being greater than Z1. In this case subtract the area obtained from
0.5
2. Probability of a randomly selected variable being smaller than Z1.
Add 0.5 to the area obtained from the table.
Solution
250 200
Z1 = 166
.
30
600 500
Z1 = 2.0
50
From the table of normal distribution the area between Z0 and Z1 = 2.0
is 0.4772. However, for this problem we need the area to the right of
Z1 which is:
which means that the probability that a randomly selected variable has
a value larger than 600 is 0.0228 or 2.28 percent.
How to use Table 3:
The left column lists values of Z with one decimal point. The second
decimal for Z is selected from the top row. For example, the area
between Z = 0 and Z1 =1.23 is equal to 0.3907.
z 0 1 2 3 4 5 6 7 8 9
0.0 .0000 .0040 .0080 .0120 .0160 .0199 .0239 .0279 .0319 .0359
0.1 .0398 .0438 .0478 .0517 .0557 .0596 .0636 .0675 .0714 .0754
0.2 .0793 .0832 .0871 .0910 .0948 .0987 .1026 .1064 .1103 .1141
0.3 .1179 .1217 .1255 .1293 .1331 .1368 .1406 .1443 .1480 .1517
0.4 .1554 .1591 .1628 .1664 .1700 .1736 .1772 .1808 .1844 .1879
0.5 .1915 .1950 .1985 .2019 .2054 .2088 .2123 .2157 .2190 .2224
0.6 .2258 .2291 .2324 .2357 .2389 .2422 .2454 .2486 .2518 .2549
0.7 .2580 .2612 .2652 .2673 .2704 .2734 .2764 .2794 .2823 .2852
0.8 .2881 .2910 .2939 .2967 .2996 .3023 .3051 .3078 .3106 .3133
0.9 .3159 .3186 .3212 .3238 .3264 .3289 .3315 .3340 .3365 .3389
1.0 .3413 .3438 .3461 .3485 .3508 .3531 .3554 .3577 .3599 .3621
1.1 .3643 .3665 .3686 .3708 .3729 .3749 .3770 .3790 .3810 .3830
1.2 .3849 .3869 .3888 .3907 .3925 .3944 .3962 .3980 .3997 .4015
1.3 .4032 .4049 .4066 .4082 .4099 .4115 .4131 .4147 .4162 .4177
1.4 .4192 .4207 .4222 .4236 .4251 .4265 .4279 .4292 .4306 .4319
1.5 .4332 .4345 .4357 .4370 .4382 .4394 .4406 .4418 .4429 .4441
1.6 .4452 .4463 .4474 .4484 .4495 .4505 .4515 .4525 .4535 .4545
1.7 .4554 .4564 .4573 .4582 .4591 .4599 .4608 .4616 .4625 .4633
1.8 .4641 .4649 .4656 .4664 .4671 .4678 .4686 .4693 .4699 .4706
1.9 .4713 .4719 .4726 .4732 .4738 .4744 .4750 .4756 .4761 .4767
2.0 .4772 .4778 .4783 .4788 .4793 .4798 .4803 .4808 .4812 .4817
2.1 .4821 .4826 .4830 .4834 .4838 .4842 .4846 .4850 .4854 .4857
2.2 .4861 .4864 .4868 .4871 .4875 .4878 .4881 .4884 .4887 .4890
2.3 .4893 .4896 .4898 .4901 .4904 .4906 .4909 .4911 .4913 .4916
2.4 .4918 .4920 .4922 .4925 .4927 .4929 .4931 .4932 .4934 .4936
2.5 .4938 .4940 .4941 .4943 .4945 .4946 .4948 .4949 .4951 .4952
2.6 .4953 .4955 .4956 .4957 .4959 .4960 .4961 .4962 .4963 .4964
2.7 .4965 .4966 .4967 .4968 .4969 .4970 .4971 .4972 .4973 .4974
2.8 .4974 .4975 .4976 .4977 .4977 .4978 .4979 .4979 .4980 .4981
2.9 .4981 .4982 .4982 .4983 .4984 .4984 .4985 .4985 .4986 .4986
3.0 .4987 .4987 .4987 .4988 .4988 .4989 .4989 .4989 .4990 .4990
Example
What is the sample space for the experiment drawing a card from a deck
of 52 cards?
Solution
Once again a sample space is the set of all possible outcomes of the
experiment. In this case, there are 52 possibilities, and therefore, the
sample space has 52 elements.
Example
Suppose a government agency must decide where to locate three new
nuclear research laboratories, and that (for a certain purpose) it is of
interest only how many of these facilities will be located in Colorado.
Solution
The set of all possible outcomes is {0, 1, 2, 3} which means that none,
one, two or three of the research facilities may be located in Colorado.
This sample space has four elements and can be shown graphically as
follows:
0 1 2 3
Sample Space
Selecting elements
If sets M1, M2,..., Mk contain, respectively N1, N2,...,Nk elements, there
are N1 N2... Nk ways of selecting first an element from M1, then an
element from M2,..., and finally an element from Mk. In other words, the
sample space for selecting first an element from M1, then an element
form M2,…, and finally an element from M k has N1 N2 … Nk
elements.
Example
In an oil company the list of candidates for the President and Vice
President has been narrowed down to 15. In how many different ways a
President and a Vice President can be elected from the set of these 15
candidates?
Solution
The set for electing a president has 15 elements. However, once a
President has been elected, there are only 14 candidates remaining for
Vice President. Therefore, the set for the Vice President has 14
elements. Therefore, the number of different ways this election can be
carried out is:
15 14 210
In other words, the sample space for this election has 210 possible
outcomes.
Events
Probabilities are always associated with occurrence or nonoccurrence of
events; such as getting one head in four flips of a coin. Events can be
considered as a subset of a sample space. For example, if we are
interested in the event of getting a 6 in rolling one die, our sample space
has 6 elements {1, 2, 3, 4, 5, 6}, and the event of interest (getting a 6) is a
subset of this sample space with one element {6}.
Example
If we are interested in the event drawing an ace of hearts from a deck of
52 cards, how many elements our event set has? How about our sample
space?
Solution
Our sample space has 52 elements (the set of all possible outcomes),
and our event set has only one element {ace of hearts}.
sample space
In the above figure, there are no common elements between events A and
B, and therefore, they are mutually exclusive events.
In other words two events are considered to be mutually exclusive if
WHEN ONE EVENT OCCURS, THE OTHER CAN NOT OCCUR,
AND VICE VERSA. Two mutually exclusive events can not occur
simultaneously.
Example
Let A and B represent two events in rolling a die
A: {to get an odd number} i.e. {1, 3, 5}
B: {to get an even number} i.e. {2, 4, 6}
The two events A and B described above are said to be mutually
exclusive because in one roll of a die we can only get either an even or an
odd number but not both. If event A happens, B cannot happen and vice
versa
Simple Events
An event that can not be decomposed is called a simple event.
Example
In order to more clearly understand the above definition, let’s go back to
our experiment of rolling a die. Let’s further consider the following
events:
A1: {getting an odd number}, i.e., {1, 3, 5}
A2: {getting an even number}, i.e., {2, 4, 6}
A3: observing a 1, i.e., {1}
A4: {2}
A5: {3}
A6: {4}
A7: {5}
A8: {6}
In this example we notice that there is a difference between events A1, A2
and events A3 through A8. Event A1 (getting an odd number) occurs if
any of the events A3 (getting a 1), A5 (getting a 3), or A7 (getting a 5)
occur. Therefore, we can decompose event A1 into simple events A3, A5,
and A7. Similarly, event A2 (getting an even number) can be decomposed
into simple events A4, A6, and A8.
Independent Events
Two or more events are said to be independent of each other if the
occurrence (or non-occurrence) of one has no effect on the occurrence or
non-occurrence of the others.
Example
Consider tossing a coin and rolling a die. The events getting a head on
the coin and number 3 on the die are independent of each other because
the occurrence (or non-occurrence) of one event has no effect on the
occurrence (or non-occurrence) of the other event.
Probability
Now that we are familiar with fundamentals of set theory, and have
defined events and various types of events in a given sample space, we
can talk about the probability of one or more events.
The probability associated with an event is a measure of belief that the
event will occur on the next repetition of the experiment. For example
when we say that the probability of getting head in one toss of a coin is ½
(0.5 or 50 percent), what we are really saying is that there is a 50 percent
chance that the next toss of a coin will be a head.
Given a finite sample space S and an event A in that sample space we can
say:
1. The probability of event A is a number between zero and 1. Zero
means that event A can not happen and a probability of 1 means that
event A will certainly happen.
2. The probability of sample space (set of ALL possible outcomes of an
experiment) is 1.
If we show probability of event A with P(A), and probability of the
sample space with P(S), we can write:
0 P A 1
P S 1
Calculation of Probabilities of Events in a Sample Space
The probability of an event A in a finite sample space is equal to the
number of simple events in A (see definition of simple events discussed
earlier) divided by the total number of simple events in the sample space.
Note that we are making the assumption that all simple events in the
sample space have the same probability of occurrence.
Example
What is the probability of getting a head in one flip of a coin?
Solution
First we have to find out how many simple events our sample space
has. Remember that sample space, by definition, is the set of all
possible outcomes of an experiment. In this case, our sample space S
has only 2 elements:
S: {head, tail}
Next we have to find out how many simple events there are in our
event of interest. In this case our event of interest (head) has only one
element:
A: {head}
In this case
Probability of
getting a head
= = 0.5 or 50 percent
Example
What is the probability of drawing a 4 out of a 52 card deck?
Solution
Once again we have to find out how many simple events our sample
space and the event of our interest have. Our sample space, in this
case, is comprised of 52 simple events, and our event of interest is
comprised of 4 simple events (there are four 4’s in a deck of cards).
Therefore, we can say:
4
Probability of 0.077 or 7.7 percent
drawing a 4 52
Example
What is the probability of getting an even number in one roll of a die?
Solution
Our sample space S is comprised of 6 simple events (because there are
only 6 numbers on a die). The event of getting an even number is
comprised of 3 simple events (2, 4, 6). Therefore, we can say:
Probability of 3 1
getting an 0.5 or 50 percent
even number 6 2
Example
What is the probability of drawing a king OR an ace from a deck of
cards?
Solution
There are 4 kings and 4 aces in a deck of cards. Therefore, the event
of interest to us (drawing an ace or a king) is made of 8 simple events.
Our sample space, on the other had, is made of 52 simple events. The
probability in this case is:
8
0 .154 or 15 .4 percent
52
Axiom
If A and B are two mutually exclusive events, the probability of
either A or B occurring is the SUM of the probabilities of A and B.
Example
If the proportion of voters favoring legislation is 0.38, and the proportion
of voters who are undecided is 0.22, what is the proportion of voters who
are either in favor of the legislation or undecided?
Solution
In this case, the two events are mutually exclusive. Because a voter
can not be in favor of the legislation and, at the same time, be
undecided. The probabilities are additive.
0.38 + 0.22 = 0.60
Example
What is the probability of getting 1 or 6 in one roll of a die?
Solution
These two events are mutually exclusive because if we get 1 we can
not get 6 and vice versa. The probabilities, in this case, are additive.
The probability of getting 1 is (sample space has 6 elements, and the
event has only one element). Similarly, the probability of getting 6 is
. Therefore, the probability of getting 1 or 6 is:
1 1 2 1
0.33 or 33 percent
6 6 6 3
Example
What is the probability of drawing an ace of hearts or a king of spades or
a 4 of diamonds from a 52 deck of cards?
Solution
These events are obviously mutually exclusive because if one event
occurs, the other events can not occur. The probabilities are, therefore,
additive. In this case the probability is:
1 1 1 3
0.05 or 5 percent
52 52 52 52
Definition
Two events are said to be independent if the occurrence or non-
occurrence of one event has no effect on the outcome of the other event.
Example
If we roll a pair of dice and the events of our interest are getting 1 on one
die and 6 on the other die, we have two independent events; because
getting (or not getting) 1 on the first die has no effect on getting (or not
getting) 6 on the second die.
Example
If we draw two cards from two separate decks of cards and our event of
interest is getting two aces, we have two independent events; because
getting an ace from the first deck of cards has no effect on getting an ace
from the second deck of cards.
Axiom
If A and B are two independent events, the probability of both events
occurring is the product of probabilities of events A and B. if we
show probability of A with P(A) and probability of B with P(B), the
probability of A and B occurring is:
P(A) P(B)
Example
What is the probability that two cards drawn from two separate decks of
cards are both aces?
Solution
The events getting an ace from the first deck and an ace from the
second deck are independent. Let’s focus on the first deck of cards.
Our sample space has 52 elements and our event has 4 elements (there
are 4 aces in a deck of cards). The probability of drawing an ace from
the first deck is:
4
52
Similarly, the probability of drawing an ace from the second deck is:
4
52
Since these two events are independent, the probability that both cards
are aces is:
4 4
0.0059 or 0.6 percent
52 52
A probability of 0.5 percent means that we should expect the event to
occur 0.5 times in 100 tries or 1 time in 200 tries. In other words, the
result obtained above means that if we draw 200 times from two
separate decks of cards, we could EXPECT that one of these draws
would be two aces.
Example
What is the probability of tossing a coin 4 times and getting 4 tails in a
row?
Solution
Getting a head or tail on each toss of a coin comprise independent
events. For each toss of a coin our sample space has 2 elements and
our event (getting a tail) has only one element. Therefore, the
provability of getting a tail 4 times in a row is:
1 1 1 1 1
0.06 or 6 percent
2 2 2 2 16
Example
An oil company has four plants in geographically different locations.
Each facility has a chlorine tank. The probabilities of release of chlorine
into the atmosphere at each facility in a given year are as follows:
Facility 1: 1 106 (or 1 chance in a million)
Facility 2: 1 107
Facility 3: 1 108
Facility 4: 1 105
What is the probability that all four facilities would have a release of
chlorine in a given year?
Solution
Here we are dealing with 4 independent events. The probability is:
1 10 1 10 1 10 1 10 1 10
6 7 8 5 26
Yi
Y i 1
n
Example
What is the mean of the following set of data?
Yi : 3,4,2,6,1,2
Solution
n=6
3 4 2 6 1 2
Y
6
Y 3
Median The median is the middle point, where half of the values fall
above and half of the values fall below that point.
Example
What is the median of the following set of data?
Yi : 5,4,6,7,9
Solution
Half of the values (5 and 4) fall below and half of the values (7
and 9) fall above number 6. The median is 6.
Range The range is the difference between the high and low values of
a data set.
Example
What is the range of the following data set?
Yi : 2,17,19,25,18
Solution
Range = 25 2 = 23.
Y Y
n
2
i
S i 1
n 1
Where:
S = standard deviation
Yi = individual values
_
Y = arithmetic mean
n = sample size
Variance The square of the standard deviation is called the variance.
Y Y
n 2
i
S
2 i 1
n 1
Example
Calculate the standard deviation of the following set of data:
Yi : 5,6,9,3,5,2
Solution
Calculate and fill the table below:
Yi Yi - Y (Yi - Y )2
5 0 0
6 1 1
9 4 16
3 -2 4
5 0 0
2 -3 9
Y Y
2
Y5 i 30
30
S or S 2.45
6 1
Pearson Coefficient of Correlation (r)
The Pearson Coefficient of Correlation is the most common measure of
correlation between two variables. It has a value between –1.0 and + 1.0. A
value of r close to –1.0 indicates a strong negative relationship between the
two variables. The negative relation ship means that the two variables are
inversely proportional to each other. In other words, a value of r close to –
1.0 indicates that as the value of one variable increases, the value of the
other variable decreases. A value of r close to +1.0 indicates a strong
positive correlation between the two variables. This means that the value of
dependent variable increases with an increase in the value of the independent
variable. A value of r close to 0.0 indicates that there is no correlation
between the two variables.
References
1. Dunn, Olive J., V. A. Clark; “Applied Statistics”; John Wiley & Sons; New York, NY.
2. Miller, Irwin, J. E. Freund; “Probability and Statistics for Engineers”; Prentice Hall; Englewood Cliffs,
NJ.
4. Spiegel, M. R.; “Statistics”; Schaum Outline Services; McGraw Hill Book Co.; New York, NY.
5. Ott, L. An Introduction to Statistical Methods and Data Analysis, 2nd ed. Boston, MA: Duxbury Press.
6. Slote, Lawrence. Handbook of Occupational Safety and Health. New York: John Wiley and Sons,
7. Tapley, B. ed. Eshback’s Handbook of Engineering Fundamentals, 4th ed. New York: John Wiley and
Sons.