0% found this document useful (0 votes)
9 views29 pages

Probability & Statistics

Maths

Uploaded by

Faizan waris
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views29 pages

Probability & Statistics

Maths

Uploaded by

Faizan waris
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Probability & Statistics

 Inferential Statistics

 Set Theory, probability, and Statistics

 References
1. Inferential Statistics

The Normal Probability Distribution


A large number of random variables observed in everyday life and nature
form a frequency distribution which is approximated by a bell-shape
curve called the normal probability distribution.

  

The equation representing a normal probability distribution is:

( x )2
1
f ( x)  e 2 2
 2

Where x is the value of a random variable in a population.

For example if the height of people in a given city is normally


distributed, then x in the above equation represents a given height and
f(x) represents the proportion of people with that height. Let’s assume
that the height of people in a small city of 10,000 people is normally
distributed, and let’s further assume that 500 people are 6 feet tall.
This means that for a value of x equal to 6, f(x) is equal to 500/10,000.

In the equation representing the normal distribution,  is the


population mean and  is the population standard deviation.

Although in solving the CSP examination problems dealing with


normal distribution we do not use the above equation directly (as will
be discussed later), we do need to understand how to calculate the
mean () and the standard deviation () for a given population. These
can be calculated from the following relationships:
n
 xi
 i 1
n

where n is the sample size and xi represents the values of the random
variable.

n
 (x i   )
2

 i1
n 1

 is called the standard deviation and 2 is called the variance of the


sample distribution.

Example 1
Calculate the mean and the standard deviation for the following
sample of x’s:

xi xi   (xi  )2
3 1.33 1.77
5 0.67 0.45
2 -2.33 5.43
7 2.67 7.13
6 1.67 2.79
3 -1.33 1.77
 = 4.33 sum=19.34

our sample size is 6, we can write:

19.34
standard deviation   
6 1

or

 = 1.966.
Characteristics of the Normal Probability Distribution
1. The area under the normal probability distribution curve between two
values x1 and x2 represents the probability that a randomly selected value
would fall between x1 and x2. For example, the shaded area under the
curve in the following diagram is the probability that a randomly selected
variable would fall between x1 and x2.

x1 x2

2. The area under the normal probability curve represents probability. The
maximum value that a probability function can assume is 1 (which means
the event will certainly happen). Therefore, the total area under the
normal probability distribution curve is equal to 1.

Area=1
3. The normal probability distribution curve is symmetrical around its
mean. This indicates that half of the total area under the curve (0.5) is to
the right of the mean and the other half is to the left of the mean.

0.5 0.5


(mean)

4. The shape of the normal distribution is determined by the value of its


standard deviation. Large values of standard deviation reduce the height
of the curve and increase its spread while small values of standard
deviation increase the height of the curve and decrease its spread.

Small value of Standard Deviation

Large value of Standard Deviation


5. Approximately 68% of the area under the normal distribution curve lies
within  one standard deviation of the mean. About 95% of the area
under the curve falls within  two standard deviations of the mean and
almost all within  three standard deviations of the mean.

68% of total area

  

95% of total area

  

>99% of total area

  

Types of Problems Dealing With the Normal Probability


Distribution
 One type of problem dealing with normal probability distribution is to
find the probability that a randomly selected variable from the
population (with known values of mean and standard deviation)
assumes a value between x1 and x2.
In order to solve this type of problem we proceed as follows:

Step 1:
Calculate the value of Z1 from

x1  
Z1 

Step 2:
Calculate the value of Z2 from

x2  
Z2 

Step 3:
Table 3 (below): which is the table of areas for the normal probability
distribution provides the areas (probabilities) under the curve between
Z = 0 and a given value of Z1.
Important Note: The areas under the normal distribution curve listed
in Table 3 are the areas under the curve between Z = 0 and a given
value of Z1.

Z=0 Z1

Obtain the area between Z = 0, Z1 and Z2 from Table 3. This table


will be provided by the BCSP at the time of the examination.

Step 4:
Since for the problem at hand we are interested in the area under the
curve between Z1 and Z2, subtract the area obtained for Z1 from that
obtained for Z2 if both Z1 and Z2 are positive or they are both
negative. This value represents the area under the curve between Z 1
and Z2 or the probability that a randomly selected variable would
assume a value between x1 and x 2 (remember Z1 and Z2 were
calculated form x 1 and x 2). If Z1 and Z2 have opposite signs (i.e.: one
is positive and the other is negative) add the areas obtained for Z1 and
Z2. Due to symmetry, the area for a negative value of Z1 is the same as
the positive value of Z1 except that a negative Z1 has its area to the left
of Z = 0.

Example 2
A population is normally distributed with a mean of 50 and a standard
deviation of 10. What is the probability that a randomly selected
variable from this population falls between 35 and 40?

Step 1:

35  50
Z1 =  15
.
10

Step 2:

40  50
Z2 =  10
.
10

Step 3:
From Table 3, the area between Z = 0 and Z1 = 1.5 is 0.4332 and the
area between Z = 0 and Z2 = 1.0 is 0.3413. Therefore the area under
the curve between above values of Z1 and Z2 is:

0.4332  0.3413 = 0.0919

or the probability that a randomly selected variable from this


population falls between 35 and 40 is 0.0919 or 9.19 per cent. This
problem can be demonstrated graphically as follows.
35 40 =50

 Another type of problem dealing with normal probability distribution


is to find the probability that a randomly selected variable would
assume a value larger or smaller than a given value x1. In order to
solve this type of problem, we proceed as follows:

Step 1:
Calculate the value of Z1 from

x1  
Z1 = .

Step 2:
If Z1 has a positive value two situations may arise:
1. The problem asks for probability of a randomly selected variable
being greater than Z1. In this case subtract the area obtained from
0.5
2. Probability of a randomly selected variable being smaller than Z1.
Add 0.5 to the area obtained from the table.

If Z1 has a negative value:


1. For probability of the random variable being greater than Z1, add
0.5 to the area obtained from the table.
2. For probability of random variable being smaller than Z1, subtract
the area obtained form 0.5.
Example 3
For a population with a mean of 200 and a standard deviation of 30,
what is the probability that a randomly selected variable assumes a
value less than 250?

Solution

250  200
Z1 =  166
.
30

From the table of normal distribution (Table 3) the area between Z0


and Z1 = 1.66 is 0.4515. Therefore, the probability that a randomly
selected variable from this population assumes a value less than250 is
0.5 plus 0.4515 or 95.15 per cent.
Example 4
A population has a mean of 500 and a standard deviation of 50. What
is the probability that a randomly selected variable from this
population has a value larger than 600?
Solution

600  500
Z1 =  2.0
50

From the table of normal distribution the area between Z0 and Z1 = 2.0
is 0.4772. However, for this problem we need the area to the right of
Z1 which is:

0.5 – 0.4772 = 0.0228

which means that the probability that a randomly selected variable has
a value larger than 600 is 0.0228 or 2.28 percent.
How to use Table 3:

The left column lists values of Z with one decimal point. The second
decimal for Z is selected from the top row. For example, the area
between Z = 0 and Z1 =1.23 is equal to 0.3907.

z 0 1 2 3 4 5 6 7 8 9

0.0 .0000 .0040 .0080 .0120 .0160 .0199 .0239 .0279 .0319 .0359
0.1 .0398 .0438 .0478 .0517 .0557 .0596 .0636 .0675 .0714 .0754
0.2 .0793 .0832 .0871 .0910 .0948 .0987 .1026 .1064 .1103 .1141
0.3 .1179 .1217 .1255 .1293 .1331 .1368 .1406 .1443 .1480 .1517
0.4 .1554 .1591 .1628 .1664 .1700 .1736 .1772 .1808 .1844 .1879
0.5 .1915 .1950 .1985 .2019 .2054 .2088 .2123 .2157 .2190 .2224
0.6 .2258 .2291 .2324 .2357 .2389 .2422 .2454 .2486 .2518 .2549
0.7 .2580 .2612 .2652 .2673 .2704 .2734 .2764 .2794 .2823 .2852
0.8 .2881 .2910 .2939 .2967 .2996 .3023 .3051 .3078 .3106 .3133
0.9 .3159 .3186 .3212 .3238 .3264 .3289 .3315 .3340 .3365 .3389
1.0 .3413 .3438 .3461 .3485 .3508 .3531 .3554 .3577 .3599 .3621
1.1 .3643 .3665 .3686 .3708 .3729 .3749 .3770 .3790 .3810 .3830
1.2 .3849 .3869 .3888 .3907 .3925 .3944 .3962 .3980 .3997 .4015
1.3 .4032 .4049 .4066 .4082 .4099 .4115 .4131 .4147 .4162 .4177
1.4 .4192 .4207 .4222 .4236 .4251 .4265 .4279 .4292 .4306 .4319
1.5 .4332 .4345 .4357 .4370 .4382 .4394 .4406 .4418 .4429 .4441
1.6 .4452 .4463 .4474 .4484 .4495 .4505 .4515 .4525 .4535 .4545
1.7 .4554 .4564 .4573 .4582 .4591 .4599 .4608 .4616 .4625 .4633
1.8 .4641 .4649 .4656 .4664 .4671 .4678 .4686 .4693 .4699 .4706
1.9 .4713 .4719 .4726 .4732 .4738 .4744 .4750 .4756 .4761 .4767
2.0 .4772 .4778 .4783 .4788 .4793 .4798 .4803 .4808 .4812 .4817
2.1 .4821 .4826 .4830 .4834 .4838 .4842 .4846 .4850 .4854 .4857
2.2 .4861 .4864 .4868 .4871 .4875 .4878 .4881 .4884 .4887 .4890
2.3 .4893 .4896 .4898 .4901 .4904 .4906 .4909 .4911 .4913 .4916
2.4 .4918 .4920 .4922 .4925 .4927 .4929 .4931 .4932 .4934 .4936
2.5 .4938 .4940 .4941 .4943 .4945 .4946 .4948 .4949 .4951 .4952
2.6 .4953 .4955 .4956 .4957 .4959 .4960 .4961 .4962 .4963 .4964
2.7 .4965 .4966 .4967 .4968 .4969 .4970 .4971 .4972 .4973 .4974
2.8 .4974 .4975 .4976 .4977 .4977 .4978 .4979 .4979 .4980 .4981
2.9 .4981 .4982 .4982 .4983 .4984 .4984 .4985 .4985 .4986 .4986
3.0 .4987 .4987 .4987 .4988 .4988 .4989 .4989 .4989 .4990 .4990

Table 3. Table of The Normal Distribution; source: BCSP Candidate Handbook


Set Theory, Probability and Statistics
Definitions and Concepts
 Sample space
A set of ALL possible outcomes of an experiment. For example the
sample space for the experiment tossing a coin is the set of all possible
outcomes shown as {head, tail}. This sample space has 2 elements. By
the same token, the sample space for the experiment rolling a die is the
set of all possible outcomes of the experiment, which is {1, 2, 3, 4, 5, 6}.
This sample space has 6 elements.

Example
What is the sample space for the experiment drawing a card from a deck
of 52 cards?

Solution
Once again a sample space is the set of all possible outcomes of the
experiment. In this case, there are 52 possibilities, and therefore, the
sample space has 52 elements.

Example
Suppose a government agency must decide where to locate three new
nuclear research laboratories, and that (for a certain purpose) it is of
interest only how many of these facilities will be located in Colorado.

Solution
The set of all possible outcomes is {0, 1, 2, 3} which means that none,
one, two or three of the research facilities may be located in Colorado.
This sample space has four elements and can be shown graphically as
follows:

0 1 2 3

Sample Space

 Selecting elements
If sets M1, M2,..., Mk contain, respectively N1, N2,...,Nk elements, there
are N1  N2... Nk ways of selecting first an element from M1, then an
element from M2,..., and finally an element from Mk. In other words, the
sample space for selecting first an element from M1, then an element
form M2,…, and finally an element from M k has N1  N2  …  Nk
elements.

Example
In an oil company the list of candidates for the President and Vice
President has been narrowed down to 15. In how many different ways a
President and a Vice President can be elected from the set of these 15
candidates?

Solution
The set for electing a president has 15 elements. However, once a
President has been elected, there are only 14 candidates remaining for
Vice President. Therefore, the set for the Vice President has 14
elements. Therefore, the number of different ways this election can be
carried out is:
15  14  210
In other words, the sample space for this election has 210 possible
outcomes.

 Events
Probabilities are always associated with occurrence or nonoccurrence of
events; such as getting one head in four flips of a coin. Events can be
considered as a subset of a sample space. For example, if we are
interested in the event of getting a 6 in rolling one die, our sample space
has 6 elements {1, 2, 3, 4, 5, 6}, and the event of interest (getting a 6) is a
subset of this sample space with one element {6}.
Example
If we are interested in the event drawing an ace of hearts from a deck of
52 cards, how many elements our event set has? How about our sample
space?

Solution
Our sample space has 52 elements (the set of all possible outcomes),
and our event set has only one element {ace of hearts}.

 Mutually exclusive events


When there are no common elements among events (see Figure 2).

sample space

Figure 2. Sample space and mutually exclusive


events.

In the above figure, there are no common elements between events A and
B, and therefore, they are mutually exclusive events.
In other words two events are considered to be mutually exclusive if
WHEN ONE EVENT OCCURS, THE OTHER CAN NOT OCCUR,
AND VICE VERSA. Two mutually exclusive events can not occur
simultaneously.

Example
Let A and B represent two events in rolling a die
A: {to get an odd number} i.e. {1, 3, 5}
B: {to get an even number} i.e. {2, 4, 6}
The two events A and B described above are said to be mutually
exclusive because in one roll of a die we can only get either an even or an
odd number but not both. If event A happens, B cannot happen and vice
versa

Simple Events
An event that can not be decomposed is called a simple event.

Example
In order to more clearly understand the above definition, let’s go back to
our experiment of rolling a die. Let’s further consider the following
events:
A1: {getting an odd number}, i.e., {1, 3, 5}
A2: {getting an even number}, i.e., {2, 4, 6}
A3: observing a 1, i.e., {1}
A4: {2}
A5: {3}
A6: {4}
A7: {5}
A8: {6}
In this example we notice that there is a difference between events A1, A2
and events A3 through A8. Event A1 (getting an odd number) occurs if
any of the events A3 (getting a 1), A5 (getting a 3), or A7 (getting a 5)
occur. Therefore, we can decompose event A1 into simple events A3, A5,
and A7. Similarly, event A2 (getting an even number) can be decomposed
into simple events A4, A6, and A8.

 Independent Events
Two or more events are said to be independent of each other if the
occurrence (or non-occurrence) of one has no effect on the occurrence or
non-occurrence of the others.

Example
Consider tossing a coin and rolling a die. The events getting a head on
the coin and number 3 on the die are independent of each other because
the occurrence (or non-occurrence) of one event has no effect on the
occurrence (or non-occurrence) of the other event.
 Probability
Now that we are familiar with fundamentals of set theory, and have
defined events and various types of events in a given sample space, we
can talk about the probability of one or more events.
The probability associated with an event is a measure of belief that the
event will occur on the next repetition of the experiment. For example
when we say that the probability of getting head in one toss of a coin is ½
(0.5 or 50 percent), what we are really saying is that there is a 50 percent
chance that the next toss of a coin will be a head.

Given a finite sample space S and an event A in that sample space we can
say:
1. The probability of event A is a number between zero and 1. Zero
means that event A can not happen and a probability of 1 means that
event A will certainly happen.
2. The probability of sample space (set of ALL possible outcomes of an
experiment) is 1.
If we show probability of event A with P(A), and probability of the
sample space with P(S), we can write:

0  P A  1
P S   1
 Calculation of Probabilities of Events in a Sample Space
The probability of an event A in a finite sample space is equal to the
number of simple events in A (see definition of simple events discussed
earlier) divided by the total number of simple events in the sample space.
Note that we are making the assumption that all simple events in the
sample space have the same probability of occurrence.

Example
What is the probability of getting a head in one flip of a coin?

Solution
First we have to find out how many simple events our sample space
has. Remember that sample space, by definition, is the set of all
possible outcomes of an experiment. In this case, our sample space S
has only 2 elements:
S: {head, tail}
Next we have to find out how many simple events there are in our
event of interest. In this case our event of interest (head) has only one
element:
A: {head}

Probability Number of simple events in event A


=
of event A Number of simple events in sample space

In this case
Probability of
getting a head
=  = 0.5 or 50 percent
Example
What is the probability of drawing a 4 out of a 52 card deck?

Solution
Once again we have to find out how many simple events our sample
space and the event of our interest have. Our sample space, in this
case, is comprised of 52 simple events, and our event of interest is
comprised of 4 simple events (there are four 4’s in a deck of cards).
Therefore, we can say:
4
Probability of   0.077 or 7.7 percent
drawing a 4 52

Example
What is the probability of getting an even number in one roll of a die?

Solution
Our sample space S is comprised of 6 simple events (because there are
only 6 numbers on a die). The event of getting an even number is
comprised of 3 simple events (2, 4, 6). Therefore, we can say:

Probability of 3 1
getting an    0.5 or 50 percent
even number 6 2

Example
What is the probability of drawing a king OR an ace from a deck of
cards?
Solution
There are 4 kings and 4 aces in a deck of cards. Therefore, the event
of interest to us (drawing an ace or a king) is made of 8 simple events.
Our sample space, on the other had, is made of 52 simple events. The
probability in this case is:

8
 0 .154 or 15 .4 percent
52

 Calculation of Probabilities of Mutually Exclusive Events


Once again, remember that two events are said to be mutually exclusive
if, when one event occurs, the other can not, and vice versa. For example
the events getting an even number and the event getting an odd number
in one roll of a die are mutually exclusive.

Axiom
If A and B are two mutually exclusive events, the probability of
either A or B occurring is the SUM of the probabilities of A and B.

Example
If the proportion of voters favoring legislation is 0.38, and the proportion
of voters who are undecided is 0.22, what is the proportion of voters who
are either in favor of the legislation or undecided?

Solution
In this case, the two events are mutually exclusive. Because a voter
can not be in favor of the legislation and, at the same time, be
undecided. The probabilities are additive.
0.38 + 0.22 = 0.60
Example
What is the probability of getting 1 or 6 in one roll of a die?

Solution
These two events are mutually exclusive because if we get 1 we can
not get 6 and vice versa. The probabilities, in this case, are additive.
The probability of getting 1 is  (sample space has 6 elements, and the
event has only one element). Similarly, the probability of getting 6 is
. Therefore, the probability of getting 1 or 6 is:
1 1 2 1
    0.33 or 33 percent
6 6 6 3

Example
What is the probability of drawing an ace of hearts or a king of spades or
a 4 of diamonds from a 52 deck of cards?

Solution
These events are obviously mutually exclusive because if one event
occurs, the other events can not occur. The probabilities are, therefore,
additive. In this case the probability is:

1 1 1 3
    0.05 or 5 percent
52 52 52 52

 Calculation of Probabilities of Independent Events

Definition
Two events are said to be independent if the occurrence or non-
occurrence of one event has no effect on the outcome of the other event.

Example
If we roll a pair of dice and the events of our interest are getting 1 on one
die and 6 on the other die, we have two independent events; because
getting (or not getting) 1 on the first die has no effect on getting (or not
getting) 6 on the second die.
Example
If we draw two cards from two separate decks of cards and our event of
interest is getting two aces, we have two independent events; because
getting an ace from the first deck of cards has no effect on getting an ace
from the second deck of cards.

Axiom
If A and B are two independent events, the probability of both events
occurring is the product of probabilities of events A and B. if we
show probability of A with P(A) and probability of B with P(B), the
probability of A and B occurring is:
P(A)  P(B)
Example
What is the probability that two cards drawn from two separate decks of
cards are both aces?

Solution
The events getting an ace from the first deck and an ace from the
second deck are independent. Let’s focus on the first deck of cards.
Our sample space has 52 elements and our event has 4 elements (there
are 4 aces in a deck of cards). The probability of drawing an ace from
the first deck is:
4
52
Similarly, the probability of drawing an ace from the second deck is:
4
52
Since these two events are independent, the probability that both cards
are aces is:
4 4
  0.0059 or 0.6 percent
52 52
A probability of 0.5 percent means that we should expect the event to
occur 0.5 times in 100 tries or 1 time in 200 tries. In other words, the
result obtained above means that if we draw 200 times from two
separate decks of cards, we could EXPECT that one of these draws
would be two aces.
Example
What is the probability of tossing a coin 4 times and getting 4 tails in a
row?

Solution
Getting a head or tail on each toss of a coin comprise independent
events. For each toss of a coin our sample space has 2 elements and
our event (getting a tail) has only one element. Therefore, the
provability of getting a tail 4 times in a row is:
1 1 1 1 1
     0.06 or 6 percent
2 2 2 2 16
Example
An oil company has four plants in geographically different locations.
Each facility has a chlorine tank. The probabilities of release of chlorine
into the atmosphere at each facility in a given year are as follows:
Facility 1: 1  106 (or 1 chance in a million)
Facility 2: 1  107
Facility 3: 1  108
Facility 4: 1  105
What is the probability that all four facilities would have a release of
chlorine in a given year?

Solution
Here we are dealing with 4 independent events. The probability is:

1  10   1  10   1  10   1  10   1  10
6 7 8 5 26

(indeed a very very small number)


 Mean, median, range, and standard deviations
 Mean The arithmetic mean is the sum of all individual values in a
sample divided by the number of values

 Yi
Y  i 1
n

Example
What is the mean of the following set of data?
Yi : 3,4,2,6,1,2
Solution
n=6

3  4  2  6  1 2
Y 
6
Y 3

 Median The median is the middle point, where half of the values fall
above and half of the values fall below that point.

Example
What is the median of the following set of data?

Yi : 5,4,6,7,9
Solution
Half of the values (5 and 4) fall below and half of the values (7
and 9) fall above number 6. The median is 6.
 Range The range is the difference between the high and low values of
a data set.

Example
What is the range of the following data set?
Yi : 2,17,19,25,18
Solution
Range = 25  2 = 23.

 Standard Deviation The standard deviation is a measure of data


dispersion. Larger values of standard deviation indicate more
dispersion of sample data around the mean.

 Y Y 
n
2
i
S i 1

n 1

Where:
S = standard deviation
Yi = individual values
_
Y = arithmetic mean
n = sample size
 Variance The square of the standard deviation is called the variance.

 Y Y 
n 2

i
S 
2 i 1

n 1

Example
Calculate the standard deviation of the following set of data:

Yi : 5,6,9,3,5,2
Solution
Calculate and fill the table below:

Yi Yi - Y (Yi - Y )2
5 0 0
6 1 1
9 4 16
3 -2 4
5 0 0
2 -3 9
 Y  Y 
2
Y5 i  30

30
S or S  2.45
6 1
Pearson Coefficient of Correlation (r)
The Pearson Coefficient of Correlation is the most common measure of
correlation between two variables. It has a value between –1.0 and + 1.0. A
value of r close to –1.0 indicates a strong negative relationship between the
two variables. The negative relation ship means that the two variables are
inversely proportional to each other. In other words, a value of r close to –
1.0 indicates that as the value of one variable increases, the value of the
other variable decreases. A value of r close to +1.0 indicates a strong
positive correlation between the two variables. This means that the value of
dependent variable increases with an increase in the value of the independent
variable. A value of r close to 0.0 indicates that there is no correlation
between the two variables.
References
1. Dunn, Olive J., V. A. Clark; “Applied Statistics”; John Wiley & Sons; New York, NY.

2. Miller, Irwin, J. E. Freund; “Probability and Statistics for Engineers”; Prentice Hall; Englewood Cliffs,
NJ.

3. Mendenhall, William, R. J. Beaver; “Introduction to Probability and Statistics”; PWS-Kent Publishing


Company; Boston, MA.

4. Spiegel, M. R.; “Statistics”; Schaum Outline Services; McGraw Hill Book Co.; New York, NY.

5. Ott, L. An Introduction to Statistical Methods and Data Analysis, 2nd ed. Boston, MA: Duxbury Press.

6. Slote, Lawrence. Handbook of Occupational Safety and Health. New York: John Wiley and Sons,

7. Tapley, B. ed. Eshback’s Handbook of Engineering Fundamentals, 4th ed. New York: John Wiley and
Sons.

You might also like