Probability and Statistics Engineering Data Analysis
Probability and Statistics Engineering Data Analysis
COLLEGE OF ENGINEERING
Preface
Probability and Statistics / Engineering Data Analysis is an important topic, because
it is based on so much of what we take for granted. Without probability theory the insurance
and gaming industries will not be feasible. Most of modern finance is focused on statistics
and probability. So much modern research, both physical and social, without the methods of
statistical analysis would be grinding to a halt.
It is sad, despite its value, that so many students consider statistics a challenging
course because they are trying to get through the course with sheer memorization. Statistics
is a problem-solving topic and when problem-solving skills are needed a memorization
technique does not work well.
Statistics is a subject that seeks to get students to think in different ways, to see the
world around them from viewpoints they haven't used. When you understand the facts, you
can see the world in a way that you don't even know exists.
We apologize in advance for the module's typos and errors. Hope you'll enjoy
learning more about Engineering Data Analyis.
UNIT I. OBTAINING DATA
Overview
The main aim of this Module 1 is to address data collection methods, sample size
determination, using the Sloven’s formula and Gary's appropriate sample size depending on
the type of study, and to distinguish Probability Sampling to Non-Probability Sampling.
Learning Objectives
Topics
1
Setting up
A general term for the sampling technique that gives element for the
represented by letter;
respondent
2
Lesson Proper
OBTAINING DATA
Data point to statistical facts, principles, opinion and varied items of different
There are two main methods of collecting data these are direct and indirect method.
a. Interview
In an informal interview the researcher is free to ask questions without using a guide
so that there is no limit to the information obtained during an interview.
b. Observation
In quantitative analysis, data on the number of events is typically collected over a given
period of time.
c. Documentary Analysis
When the researcher obtains the information directly from the source.
2. Indirect Methods
a. Survey
3
Example:
Pre-election polls, marketing surveys etc. Survey may be administered in a variety of
ways.
a. Telephone interview and
b. Self-Administered Questionnaire
Questionnaire design
Examples of questionnaire
____ Age
____ Sex
Sex
____ Male
____ Female
4
3. Experiments
It is used when the objective is to determine the cause and effect of a certain
phenomenon under some controlled conditions.
Sampling
Strategies which enable a researcher to pick a sub group as a basis for making
judgments about a larger group.
Steps in sampling
Number of observation, n
1. Slovin’s Formula
�� =��
1 + ����2
Where
N = population size
n = sample size
e = margin of error
2. Gary’s recommended number of observation on the following type of research.
5
Sampling plans
A sampling plan is simply a system or process for deciding how a population should
take the sample.
Sampling Techniques
Probability Sampling
A sampling technique using the concept of lottery or fish bowl method. For example,
if you computed 50 samples out of 200, you may write down the names or numbers on
sheets of papers up to 200 and place in a box mixed thoroughly. Then start drawing the
paper until you obtain the 50 samples.
b. Systematic Sampling
�� =����
If N = 100
n = 20
100
then �� =
20
k=5
A sampling technique where the population is first divided into subsets based on
homogeneity called strata.
Downloaded by KENNETH DARYLL FIDELSON ([email protected])
lOMoARcPSD|24304008
d. Cluster Sampling
6
A sampling technique which occurs when one selects the members of a sample in
cluster rather than in using separate individuals. It is a sampling where groups, not
individuals, are randomly selected.
NON-PROBABILITY SAMPLING
1. Purposive Sampling
The sample respondents are selected based on certain criteria laid down by the
researcher. For example, a researcher might want to find whether the residential house in a
subdivision ABC comply with building fire requirements. Instead of interviewing the owners
of all residential houses in subdivision ABC, he can purposely choose to interview the
owners of 20 residential houses in subdivision ABC.
2. Quota Sampling
The samples are selected using quota system. For instance, in a survey about the
opinion of all college students in Di Mapigilan City about the position regarding the teleserye
of ABS-CBN “Love My Woman”. The researcher can assign quota system, let’s say 150 college
students are the respondents.
3. Convenience Sampling
The researcher picks the sample respondents from the population that he finds
convenient to interview due to their availability or accessibility. For example, a researcher
might want to find out the popularity of a certain candidate in Ms. Earth 2020. He might
choose his respondents within his barangay since this is more convenient for him.
References
Walpole, R.E, Myers, R.H., Myers, S.L, and K.ye,2002 -Probability and Statistics for Scientist
and Engineers, 7th ed. Pearson Education, Inc.
7
Assessing Learning
Activity 1
AB
homogeneity.
technique.
persons.
8
Activity 2
Downloaded by KENNETH DARYLL FIDELSON ([email protected])
lOMoARcPSD|24304008
Direction/s: Classify each sample as simple random, systematic stratified, cluster, quota,
convenience and purposive sampling.
1. Every 15th customer entering a shopping mall is asked to select his or her
favorite store.
2. Mrs. Cruz writes the name of each student in a card, shuffles the cards and
annual salaries.
determine whether they believe the students have higher grades now
9
Activity 3
1. Explain why there is a need to plan for the method of collecting data in any
investigation/ research.
2. Discuss the best method of collecting data in the research entitled Cultural
Practices of Aetas in Zambales.
10
3. Determine the sample size or number of observations of the following given
population size with the corresponding margin of errors?
a. N=500 ; e=5%
b. N=3600; e= 7%
c. N =20,000 ; e = 10%
4. There are 8000 students with an error of 5%, what is the number of students
to be considered as the respondent of the study?
11
UNIT II. Probability
Overview
combination. The knowledge in this topic is important in determining the number of ways of
arranging or selecting objects. Also discuss the Experiment, Sample Space, Events and the
Learning Objectives
Topics
2.3 Probability
12
Setting up
Directions: Multiple Choice. Choose the correct letter on the space provided.
a. Infinite c. 1
b. 0 d. -2
a. Cannot be determined c. 0
b. 1 d. Infinite
13
Lesson Proper
The fundamental principle of counting states that if one choice can be done in n,
ways and another choice can be done in n2 ways, together the choice can be done in n, n2
ways.
Example:
1. Students are classified according to gender (male or female), regular or irregular
students and field of specialization (mathematics, physics, or computer literate).
Solution:
η1 = 2 ������ ������������
η2 = 2 ������ �������������� ���� ������������������
η3 = 3 ���������� ���� ����������������������������
��, �� ��2 �� ��3 = 2 �� 2 �� 3 = 12 ��������.
2. A contractor has 4 style of homes, each with 3 style of comports,4 style of decks, and 5
style of roofing How many possible classification are there?
Solution:
�� 1 = 4
Downloaded by KENNETH DARYLL FIDELSON ([email protected])
lOMoARcPSD|24304008
�� 2 = 3
�� 3 = 4
�� 4 = 5
��, �� ��2 �� ��3 �� ��4 = (4) (3) (4) (5)
= 240 ��������
3. How many three digit number can be formed from the digits 1,2,3,4 and 5 if:
a. Repetition of digits are allowed?
b. Repetition of digits are not allowed?
c. The number must be odd?
Solution:
a. Since there are three choices to fill each of the three spaces.
555= 125 ways
b. 54 3 = 60 ways
c. If the number must be odd, then the last digit must be 1,3 or 5 only
433 = 36 ways
14
PERMUTATION
Permutations:
������ = ��!
= 120 ��������.
2. The number of arrangements of distinct objects taken r objects at a time is.
������ =��!
(�� − ��)!
Example: How many different ways can a chair person and an assistant chairperson be
selected for a research project if there are 6 statisticians
Solution:
�� = 6
�� = 2
6��2 =6!
(6 − 2)!=6.5.4.3.2.1
4321
= 30 ��������.
15
3. The number of permutation of n object are alike.
�� =��!
��, ! ��2! … ����!
Example: How many arrangement can be done from the word,
a. STATISTICS
b. PROBABILITY
Solution:
a. �� = 10
��1 = 3 ��’��
��2 = 3 ��’��
��3 = 2 ��’��
�� =10!
3! 3! 2!
=10.9.8.7.6.5.4.3.2.1
3.2.1 3.2.1 2.1
= 50,400
b. �� = 11
��, = 2 ��’��
��2 = 2 ��’��
�� =11!
2! 2!
= 9,979,200
Circular Permutation
16
Combination
8C4 =8!
(8 − 4)! 4!
=8!
4! 4!
= 70 ������������������ ��������.
Probability
We define an experiment to be any process that yields data. The data obtained
maybe numerical or non-numerical. The set S whose elements are all potential outcomes
from an experiment in what is called the sample space. The element of sample space is
called sample points.
Example:
1. Toss a die. There are six possible outcomes depending on the numbers that appears
on the upturned face of a die.
�� = ⦋1,2,3,4,5,6⦌
2. Toss two coins
�� = ⦋����, ����, ����, ����⦌
17
Probability of an Event
The probability of an event is the chance a certain event would be take place.
��(��) =��(��)
����
Properties of Probability:
Example:
1. In tossing 2 coins. Find the probability of getting:
a. All heads
b. At least one tail appear
Solution:
S = {����, ����, ����, ����}
a. Let
C = {����}
P(C) =14
12
��(��) = 45 ⁄
b. A English student
15
��(��) = 45 ⁄
c. A zoology student
18
��(��) = 45 ⁄
18
Rules on Probability 70
P (A U B) =
50
1. If A and B are any two events then. 120+
��(������) = ��(��) + ��(��) − 20
120+
��(�� ∩ ��). 120
If A and B are mutually exclusive
100
events then =
��(�� ∪ ��) = ��(��) + ��(��) 120
2. If A’ is the complement of an event then ��
��(��′) = 1 − ��(��) = ��or 0.8333
Example 1:
P (A U B) = 20/120
A box contains 5 red, 2 white, and 3 green balls. If a ball is drawn at random, what is
the probability that it is.
a. Red.
b. Green.
c. Red or white.
d. Not green.
Solution:
a. ��(������) =510
b. ��(����������) =310
c. ��(��������) =510+210=710
d. ��(������ ����������) = 1 − ��(����������)
3
= 1 − 10
7
= 10
Example 2:
Downloaded by KENNETH DARYLL FIDELSON ([email protected])
lOMoARcPSD|24304008
Of 120 students, 70 are studying Calculus, 50 are studying Filipino and 20 are
studying Calculus and Filipino. Find the probability that the students
a. Is studying either Calculus or Filipino
b. Is studying neither Calculus nor Filipino
Solution:
Write the events
Let A = A student is studying Calculus
B = A student is studying Filipino
A ∩ B = Student are studying both Calculus and
Filipino
a. P (A U B) = P(A) + P(B) – P(A∩ B)
P (A) = 70/120
P (B) = 50/120
19
b. P (A U B)’ = 1 – P (A U B)
= 1 - 56
= ����
AB
50 20 30
Calculus Filipino 20
Example 3:
The probability that a student will receive an A, B, C or incomplete in a course are
respectively 0.05, 0.15, 0.20, 0.25. Find the probability that the student will a.
Receive A or Incomplete
b. Get a B or a C
Solution:
a. P ( A or Inc.) = P (A) + P (Inc.)
= 0.05 + 0.2
= 0.30
b. P ( B or C ) = P (B) + P (C)
= 0.15 + 0.20
= 0.35
References
Walpole, R.E, Myers, R.H., Myers, S.L, and K.ye,2002 -Probability and Statistics for Scientist
and Engineers, 7th ed. Pearson Education, Inc.
20
Assessing Learning
Activity 1
a. An odd number
a. A queen of heart
b. A red card
c. An ace?
21
Activity 2
Directions: Solve the following problems and show your complete solution.
1. License plate consist of three letters followed by three numbers. How many
different license plates are possible in the absence of repeat and the
a. Three-digit number must be even?
b. Three digit numbers must be greater than 600?
c. First digit in the three digit number cannot be 0?
3. A bowl has 4 blue, 5 red and 3 white balls. If a ball is picked randomly, find the
probability that:
a. it is a red ball
b. it is not a blue ball
c. it is a white or red ball
d. it is not red nor blue ball.
22
Activity 3
Directions: Solve the following problems and show your complete solution.
2. Paulo wants to buy a special ink for his printer. The probability that university
bookstore carries it is 0.4 and the probability that MIG store carries it 0.3. Assuming
the two stores stock the item independently find the probability
a. That both store carries the item.
b. That neither store carries the item.
3. The Probability that a student study is 0.68. Given that she studies, the probability
that she will pass a course are 0.75. Given that she does not study, the probability
that she will pass the course is 0.43. What is the probability that,
a. She will study and pass the course.
23
UNIT III. Random Variables and Their Probability Distribution
Overview
Module III deals with random variables, the probability distribution discrete and
continuous random variable. The Binomial Distribution, Poisson Distribution, and the
Normal Distribution.
Learning Objectives
a. Random Variables
3.1.1 Probability Mass Function
3.1.2 Expected Values of Random Variable
3.1.3 Types of Random Variables
b. Probability Distribution
3.2.1 Discrete Probability Distribution
3.2.1.1 The Binomial Distribution
3.2.1.2 The Poisson Distribution
3.2.2 Continuous Probability Distribution
3.2.2.1 Normal Distribution
24
LESSON 1. Random Variables
Setting up
3. Map each sequence to a real number if x represent the number of head in the
sequence.
a. HH
b. HT
c. TH
d. TT
25
Lesson Proper
Definition:
A random variable is a variable (typically defined by x) that has a single numerical
value (determined by chance) for each out comes of an experiment.
Illustration
1. Tossing three coins.
Suppose three coins are tossed. Let x be the random variable representing the number of
heads that occur. Find the values of the random variable x.
Steps:
1. Determine the sample space. Let H represent head and T represent tail. 2. Count the
number of heads in each outcome in the sample space and assign this number to this
outcome
Solution:
S = { HHH , HHT , HTH , THH ,TTH , HTT , THT , TTT }
In table
Possible outcomes X (value of the random variable)
HHH 3
HHT 2
THH 2
HTH 2
TTH 1
HTT 1
THT 1
TTT 0
therefore
Downloaded by KENNETH DARYLL FIDELSON ([email protected])
lOMoARcPSD|24304008
X=0,1,2,3.
2. Two balls are drawn in succession without replacement from an urn containing 5 red
balls and 6 blue balls. Let y be the random variable representing the number if blue balls.
Find the values of the random variable Y. show in the table.
26
Solution:
S= { RR , RB , BR , BB , }
Example:
A Continuous random variable has infinitely many values, and those values can be
associated with measurements on a continuous scale.
Example:
1. The measure of voltage for a smoke detector battery.
2. The distance of certain type of car will travel using 10 litre of gasoline over a
27
Probability Distribution
A Probability Distribution gives the probability for each value of the random
variable. Example:
1. Suppose three coins are tossed. Let x be the random variable representing the
number of heads that occur. Find the probability of each of the values of the random
variable x.
Solution:
S = { HHH , HHT , THH , HTH , TTH , HTT , THT , TTT }
X = the number of heads occur.
X = { 0 , 1 , 2 , 3. }
Number of x 0 1 2 3
2. Refer to example 2 on random variable find the probability distribution for the
number of blue balls.
Solution:
S = { RR , RB , BR , BB }
X=0,1,2
��( 0 ) =14
2 1
�� ( 0 ) = 4= 2
1
�� ( 2 ) = 4
The probability distribution
Number of blue balls ( y ) 0 1 2
Probability p ( y ) 1 1 1
4⁄ 2⁄ 4⁄
28
Properties of Probability Distribution
1. The probability of each value of the random variable must be between or equal to 0
and 1.
�� ≤ �� (�� ) ≤ ��
2. The sum of the probabilities must be equal to 1
∑�� (�� ) = ��
P ( x ) = f ( x ) ; it follows that ∑ �� ( �� ) = 1.
29
Downloaded by KENNETH DARYLL FIDELSON ([email protected])
lOMoARcPSD|24304008
3/8
3/8
2/8
2/8
1/8
1/8
00123
00123
The probability histogram can give us insight into the nature of shape of the
distribution. Also we can often find the mean, variance and standard deviation of data w/c
provide insight into other characteristics.
�� = ∑��. �� (��)
Ó2 = ∑ [( x-µ)2 – p(9x)]
Ó2 = ∑ x2. P (x) - µ2
Standard Deviation
Ó=√[∑��2. ��(��)] − µ²
30
Example:
A recent survey by an insurance company showed the following probabilities for the
number of dependents each policy holder has. Find the mean and variance.
x 0 1 2 3 4
a. µ = ∑ x . p(x)
= 22/10
= 2.2
b. Ó2 = ∑[�� . �� (��)] − µ²
= 02 (2/10) + 12 (2/10) + 22 (3/10) +32 (2/10) + 42 (2/10)
= 0 + 2/10 + 12/10 + 18/10 + 32/10
= 64/10 – (22/10)2
= 6.4 – (2.2)2
Example 2
For the given probability distribution below, find the variance and standard
deviation.
x 0 1 2 3
Solution:
µ = Σ x.p(x)
= 0 (1/8) + 1 (3/8) + 2 3/8 + (3/18)
31
a. Variance
Ó² = [ x2.p(x)]-µ²
b. Standard deviation
Ó = J[Σx² . p(x)] - µ²
Ó = √0.75
= 0.87
References
Walpole, R.E, Myers, R.H., Myers, S.L, and K.ye,2002 -Probability and Statistics for Scientist
and Engineers, 7th ed. Pearson Education, Inc.
32
Assessing Learning
Activity 1
33
Activity 2
2. Map each sequence to a real number if the random variable × represented in the
a. HTHTHTHTHT
b. HHTTHHTTHH
c. TTTTTHHTTT
d. HHHHHHHHHH
e. HHTTTTTHTT
34
Activity 3
1. Toss five (5) coins simultaneously and record the number of tails that will occur.
2. Find the mean, and standard deviation of the random variable R representing the
R 3 4 5 6 7
35
LESSON 2: Discrete Probability Distribution
Setting up
a. I only
b. II and IV
c. I and III
d. IV only
36
Lesson Proper
Binomial Distribution
A binomial trial can result in a success with probability p and a failure with
probability of binomial random variable X , in
�� �� ��−��
�� ( �� ∶ �� ∶ �� ) = ( ��) �� ��
Where:
�� = 0 , 1, 2 . . . ��
Mean: �� = ����
Solution:
Given:
p= 95%
= 0.95
q= 1-0.95
= 0.5
8
= (86)(. 95)6(. 5)2 + (87) (. 95)7(. 5)1 + ( 8)(.95)8. 50
37
= .051 + 0.279 + .66
= 0.99
c. At least 2
�� ( �� > 2) = 1 − �� (�� = 0, 1, )
8 0 8 8 1 7
= 1 − [( 0) (.95) (.5) + ( 1) (.95) (. 5)
= 1 − [0.0039 + .0074]
= 1 − 0.0113
= ��. ��������
d. The average number of shots that hit the target
�� = ����
= 8 (. 95)
= 7.6
e. ��2 = ������
= 8(0.95)(0.05)
= 0.38
f. ��2 = √0.38
= 0.616
Poisson Distribution
This is useful in describing the number od events that will occur in a specific period
of time or a specific area or volume. This distribution is often referred to as the “ rare event”
distribution such as
1. The number of accidents in a given highway intersection per week.
2. The number of errors per page done by an encoder.
3. The parts per million of some substance found in air or water emission from a certain
manufacturing plant.
��!;
�� = 0 , 1, 2 … . ��
38
Downloaded by KENNETH DARYLL FIDELSON ([email protected])
lOMoARcPSD|24304008
Where:
λ= the average value of x
e= 2.71828
The mean:
�� = �� = ����
Example:
The number of people arriving at a particular bank teller is approximately
distributed as a Poisson random variable. Suppose that on the average one person arrives to
be served by that teller per minute. What is the probability that in a given minute?
a. Exactly 3
b. At most 4 people will arrived to be served?
Solution:
λ=1
a. X = 3
b. At most 4 people will arrive to be served? �� (�� < 4 ) = ��(�� = 0) +
��(�� = 1) + ��(�� = 2)
−1 3 4
=2.71828 (1) 3! 3!+1
= 0.061 4!]
+ ��(�� = 3)
= .368[1 + 1 + .5 + 0.17 + 0.0416] =
= ∑��−1[10 .368 (2.7116)
1
0!+1 = .997
2
1!+1
39
Assessing Learning
Downloaded by KENNETH DARYLL FIDELSON ([email protected])
lOMoARcPSD|24304008
Activity 1
Directions: Encircle the correct answer.
For numbers 1-3. If x is binomially distributed with 5 trials and the probability of success is
equal to 10%
1. The probability of failure is
a. 100%
b. 90%
c. 80%
d. infinite
2. The number of repeated trials “n”
a. 10
b. 9
c. 5
d. 0
3. The probability of getting exactly 4 success
a. 2%
b. 3%
c. 1%
d. 5%
4. The mean of a binomial distribution is
a. �� = ����
b. �� = ������
c. �� = 0
d. �� = √��
40
Name: Date: Course/Year/Section:
Activity 2
Directions: Solve the following question.
Downloaded by KENNETH DARYLL FIDELSON ([email protected])
lOMoARcPSD|24304008
The table below shows the average number of births per day is RJC Hospital.
Monday 15
Tuesday 23
Wednesday 25
Thursday 30
Friday 13
Saturday 10
Sunday 14
Based on the data, find the probability that a randomly selected baby will be
a. born on Monday
b. born on a weekend
41
Name: Date: Course/Year/Section:
Activity 3
Directions: Solve the following question.
1.A production line manufactures 6- gal paint cans. The probability of any one can being out
of tolerance is 0.03 if six cans are selected at random what is the probability
2.A traffic control engineer reports that 78% of the vehicle passing through a checkpoint are
from within the state. What is the probability that
a. fewer that 4 of the next nine vehicle are from out of state.
b. Exactly 4 of the next nine vehicle are from out of the state.
42
LESSON 3: Normal Distribution
Setting up
Which of the following is/are not true about normal distribution. a. the mean,
median and mode are equal.
b. the mean is zero and the standard deviation is one.
c. It is symmetrical about the mean value.
d. The standard deviation does not affect the height of the normal curve. 3.
The standard deviation of a standard normal distribution is.
a. 0
b. 1
c. 2
d. 3
4. The mean value of the standard normal distribution is
a. 0
b. 1
c. 2
d. 3
5. About how many percent of the distribution lies between 2 units standard deviation from
the mean?
a. 65%
b. 95%
c. 99.7%
d. cannot be determined.
43
Lesson Proper
Normal Distribution
In the field of statistics, usually in most cases, variables are normally distributed. A
child’s IQ, a person’s height or weight, an individual blood pressure, a family’s income, the
breaking strength of ropes, and life expectancy of electric bulbs are typical example. When
large samples are obtained the variables take appearances w/c are characteristically similar
when graph in a histogram they normally form a normal curve that is bell shaped, w/c is a
widely known as normal distribution.
The normal curve has a very important role in inferential statistics. It provides a
graphical representation of statistical values that are needed in describing the
characteristics of populations as well as in making decisions.
-3 -2 -1 0 z 1 2 3
O
Z 0.00 0.01 0.02 0.09
0.0
0.2 Area under the normal curve
1.0
2.0
3.0
44
Example:
1. If the z – score is 1.26, the corresponding area under the normal curve is the value 1.2
on the first column and 0.06 on the first row intersect the inter section is 0.3962. the
value 0.3962 or 39.62%represent the area under the normal curve when z is equal to
1.26. this interrupted to mean that the probability that z lies between the mean and z
= 1.26 is 0.3962 or 39.62%
2. Z = 1.48
3. Z = - 1.48
A standard normal curve is a normal probability distribution that has a mean =0 and
a standard deviation =1
Example: Find the area under the normal curves
a. To the right of Z = 1.79
b. To the left of Z = 1.79
Downloaded by KENNETH DARYLL FIDELSON ([email protected])
lOMoARcPSD|24304008
This time the areas under the normal curve are given and we are to determine the
corresponding Z- score
Z = 1.55
45
�� �� = ��. ��
When a random variable x is not in standard normal, we convert the x values into Z
score by using formula.
�� =��−µ
ÓZ- score for population data
�� =�� − ��̅
���� − ���������� ������ ������������ ��������
Where:
Z - standard score
µ - population mean
Ó - population standard deviation
S – sample standard deviation
x – sample mean
X – raw score/given measurement
Example:
1. The following are the final exam result of Paulo’s performance in his three subjects.
On what subject did he perform well?
46
Subject Grade X (mean) SD (standard deviation) English 1 84 81 4.5
Math 1 83 75 6
P.E 1 90 92 6.4
Solution:
�� =�� − ��
��
English1 : Z = 84−81
4.5= 0.67
83−75
Math1 : Z =
6= 1.33
90−92
P.E1 : Z =
6.4= −0.31
Interpretation:
The Z – scores indicate that Paulo performed best in math 1. He did not performed
well in P.E
̅
scores. Find the Z – value that corresponding to a score�� = 58. a. between 48 and
55.
Solution:
Given:
µ = 50
Ó=4
̅
�� = 58
a. Z = 58−50
8
4= 4= 2
b. Z1 = 48−50
−2
4= 4= −0.5
c. Z2 =56−50
5
4= 4= 1.5
47
Solution:
a. P (z<1.0)
Steps:
1. Draw a normal curve
2. Locate the z – value
3. Draw the line through the z – value
4. Shade the required region. Consult the z – table and find the area that
corresponds to z = 1.0
5. Use probability notation to form an equation showing the appropriate operation
to get the required area
6. Make a statement indicating the required area
For example in a test in statistics probability, you got a score of 82 and you want to
know how fared in comparison with your classmates. If your teacher tells you that you
scored at the 90th percentile; it means that 90% of the grades were lower than yours and
10% where higher.
Example:
What is the corresponding z-score of 95th percentile under the normal curve?
Solution:
Steps:
1. Express the given percentage as probability 95% = 0.9500
2. Split 0.9500 into 5000 and 0.4500
3. Refer to z – table (locate the area 0.4500)
4. Find the z – score that correspond to 0.4500 on the left most column and get the
z – value
5. Describe the shade region
a. 95% = 0.9500
b. 0.5000 and 0.4500
48
Z – value
�� = 4.545 × 10−3 + 1.64
0.4500
�� = 1.645
0.4506 = 1.65
a. The shade region is 95% of
0.4495 = 1.64 the distribution
By interpolation
1.65 0.04506
Z 0.4500
1.64 0.4495
0.01 =5 × 10⁻4
1.1 × 10⁻3 = .4545
�� 1.64 = 0.4545(0.01)
5%
Z
b. Find the upper 10% of the
normal curve
10%
1. 10% = 0.1000
2. .5000 – 0.1000 = 0.4000
3. Locate 0.4000 = 0.3997
Since we are interested in the percentile rank of 84, this means finding the percentage of
score below 84. And we transform 84 to z – score.
49
Solution:
�� =× − ×
��=84 − 80
15 = 0.27
P (Z<0.27)
= 0.6064
Example:
1. For a certain type of I pad, the length of time between charging of the battery is normally
distributed with a mean of 24 hours and a standard deviation of 4 hours. Migelle owns one
of these iPad and wants to know the probability that the length of time will be
a. Less than 15 hours.
b. More than 26 hours.
c. Exactly 18 hours.
Solution:
Given:
X = 24 hrs
S=4
a. X < 16 hrs
�� = �� − ��
��=16 − 24
⁻8
4= 4= ⁻2 c. X = 18 hrs
Z<-2.0 �� = 18 − 24
⁻6
The P (z<-2.0) = 0.5000 - .4722 = 4= 4= ⁻1.5
0.0228
b. X > 26 hrs 20
�� =26 − 24
2
4= 4= 0.5
P (Z> 0.5) = 0.5000 – 0.1915 = 0.3085
0.5
P (Z = ⁻1.5) = .4332
-1.5 0
50
2. Engr. Cruz decides to exempt from taking the final exam the upper 5% of the class.
Exam marks and roughly normally distributed w/x = 72 and s = 6, what mark must a
student make in order to be exempted?
Solution:
Given: X=ZS+x
X = 72 45%5%
S=6
Required: X The area =0.5000- 0.05
=0.4500
From Z = ��−̅�� ��
The area = 0.5000 – 0.05 from the table of normal curve the corresponding Z value = 1.65
X =1.64 (6) +72
= 81.84
=82 the mark of a student to make in order to be exempted to take the final exam
3. Two hundred college freshman have their grades in the normal distribution with a
mean of 2.2 and a standard deviation of 0.6. How many of these freshmen have their
grades between 2.0 and 2.5?
Solution:
Given:
̅
�� = 2.2
S = 06
2.0 < x < 2.5
2.1
= 0.3208 (200)
The number of students having the
= 64.16 = 64 students
grades between 2.0 and 2.5 =
(probability) (total number of -.33 0.5
students)
51
Normal Approximation to the Binomial Distribution
If x is a binomial random variable with means μ=np and variance σ^2= npq, then the
limiting form of the distribution of
�� =�� − ����
√������
Example:
Let x be the number of times that a fair coin flipped 30 times, lands tails. Find the
probability that x = 20. Use the normal approximation and then compare it to the exact
solution.
Solution:
Let x – denotes the number of tails land
Then p = 1/2
1 ⁄ ) (1 ⁄ ) = 7.5
��2 = ������ = 30 ( 2 2
Z1 = 19.5−15
√7.5= 1.64
Z2 = 19.5−15
Downloaded by KENNETH DARYLL FIDELSON ([email protected])
lOMoARcPSD|24304008
√7.5=2.0 1 10
)20( 2⁄ ) = 0.0279
1.64
= .9772 – 0.9495
= 0.0277
1.64
2.0
The exact result is P ( Z < 2.0) = .5 + .4772 = .9772
1 30 1
b ( 20:30: ⁄ ) = ( 20) �� ( 2⁄
52
Assessing Learning
Activity 1
Direction: Solve the following problems.
1. Find the are under the standard normal distribution curve a.
To the right of -0.44
3. Find the probabilities for each using the standard normal distribution. a. P
(z < - 1.8)
b. P (z > - 0.92)
c. P (-2.3 < z < 0.79
53
Assessing Learning
Activity 2
Direction: Solve the following problems.
1. The average time a person spends in a zoo is 62 min. The standard deviation is 12
minutes. If a visitor is selected at random. Find the probability that a person spends
a. At least 80 minutes.
b. More than 90 minutes.
probability that the height of a child, picked at random, is less than 104 cm? b. How many
children belongs to the upper 15% of the group?
54
Assessing Learning
Activity 3
1. An Engineering student commutes daily from home to his midtown office. The
average time for an office trip is 26 minutes, with a standard deviation of 3.6 minutes.
Assuming that the distribution of trip times to be normally distributed.
b. If the office opens at 8:30 AM and leaves his house 8:10 AM daily, what percentage of
time is he late for work?
c. Find the length of time above which we find the slowest 15% of the trips?
55
2. The probability that a patient recovers from a delicate heart operation is 0.85 of the next
110 patients having this operation. What is the probability that?
56
UNIT IV. Sampling Distribution
Overview
Learning Objectives
Topics
57
Setting up
1. Lorry wants to know, with 95% confidence, the proportion of households who like to
use detergent x. a previous survey showed that 42% like to use detergent x. lorry
likes to be accurate within 2% of the true proportion. What sample size does lorry
need?
2. Joshua want to replicate a study where the lowest observed value is 12.4 while the
highest is 12.8 He wants to estimate the population mean to within an error of 0.025 of
its true value. Using 99% confidence level, find the sample size that he needs.
58
Lesson Proper
The set of all possible values of the sample mean along with the probabilities of
occurrence of the possible values is called sampling distribution of the sampling mean.
Example:
Given:
City X
A 42
B 39
C 36
D 33
E 30
a. µ = Σ x p x
= 30(.2) + 33(.2) + 36(.2) + 39(.2) + 42(.2)
= 36
b. The numbers of samples of size 3 from this population is equal the
number of combinations possible when selecting 3 cities from five.
5��3 =5!
(5 − 3)! 3!
=5!
2! 3!
=5! 4
2.1
= 10
59
Samples X sample mean (̅�� )
A, B, C 42, 39, 36 39
A, B, D 42, 39, 33 38
A, B, E 42, 39, 30 37
A, C, D 42, 36, 33 37
A, C, E 42, 36, 30 36
A, D, E 42, 33, 30 35
B, C, D 39, 36, 33 36
B, C, E 39, 36, 30 35
C, D, E 36, 33, 30 33
B, D, E 39, 33, 3034
Downloaded by KENNETH DARYLL FIDELSON ([email protected])
lOMoARcPSD|24304008
c. For sampling, each of the samples equally likely to be selected. The probability of selecting
a sample w/ mean 39 is 0.1 since only one of 10 samples has a mean of 39.
��̅ 33 34 35 36 37 38 39
or
.1 .1 .2 .2 .2 .1 .1
P(� .1 .1 .2 .2 .2 .1 .1
�̅)
Compute:
a. Mean of the sample mean
b. Standard error of the mean.
Solution:
a. µ ��̅= Σ ��̅. P (��̅)
= 33(.1) + 4(.1) +3 5(.2) + 36(.2) + 37(.2) + 38(..1) + 39(.1)
=36
60
The Standard Error of the mean
Ó ��̅= √3
= 1.73
The relationship between the mean of the sample mean and population mean is expressed
by µ��̅ = µ
The relationship between the variance of the sample mean and the population variance is
expressed by
Ó²��̅ =Ó²��×��−��
��−��
Sampling Error
When the sample mean is used to estimate the population mean an error is usually
made. This error is called the absolute difference between the sample mean and the
population mean.
|��̅− µ |
Example: in example 1, give the sampling errors and probabilities associated with all the
different sample means.
��̅ Sampling error Probability
33 3 .1
34 2 .1
35 1 .2
36 0 .2
37 1 .2
38 2 .1
39 3 .1
It is seen that the probability of no sampling error in this scenario is 0.20. There is a
60% chance that the sampling error is 1 or less.
61
Mean and Standard Deviation of the Sample Mean
Science the sample mean has a distribution, it is a random variable and has a mean
and standard deviation. µ��̅ is the mean of sample mean and the standard deviation of the
sample mean is referred to as the standard error of the mean.
Ó��̅ =Ó√��
Shape of the sampling distribution of the sample and the central limit theorem
If samples are selected from a population w/c is normally distributed with mean µ
and standard deviation Ó, then the distribution of sample means is normally distributed and
the mean of the distribution is µ��̅ = µ, and the standard duration of this distribution is
Ó��̅= Ó
√��the shape of
The distribution of the sample means is normal or bell shaped regardless of the
simple size
States that when sampling from a large population of any distribution shape, the
sample means have a normal distribution whenever the sample size is 30 or more. The mean
of the distribution of sample mean is µx̅ = µ and the standard deviation of this distribution is
Ó��̅= Ó√��
Z=��̅−µ
Ó
√��
62
Example:
1. The average time it takes a group of college students to complete a certain examination is
50 minutes. The standard deviation is 9 minutes. Assume that the variable is normally
distributed. If 50 randomly selected college take the examination. What is the probability
that the mean time it takes the group to complete the test will be less than 44 minutes?
Solution:
µ = 50
Ó=9
x̅ = 44
n = 50
x < 44 min
�� (2 < −1.57 ) = 0.5000 − .4418
-1.57
Z=
��̅−µ
Ó =44−50 9 =−6√50 = 0.0582
√50 = 5.82%
9=1.57
√��
Estimation
Is a large (or an interval) of values that is likely to contain the true value of the
population parameter. A degree of confidence can be assigned before an interval estimate is
made. The confidence level is the probability that the interval estimate will contain the
parameter.
63
Margin of error E
When sample data are used to estimate a population mean µ, the margin of error E, is
the maximum likely (w/probability 1-8) difference between the observed sample mean x
and the true value of the population mean µ
Ó
�� = ��∝/2. √��
��̅ − �� ∝
⃑ ⃑
⁄���� �� – �� < µ < �� + ��
�� ∝
�� ( √��) ≤ ����̅ + �� √��
⁄
��̅= 100
��2 = 25
The Confidence Coefficient
1− ∝ = 0.95
∝ = .10
∝
2⁄ = 0.5
∝
�� 2⁄ = 1.65
64
Computing for the confidence interval
∝ ��
�� = ��̅± �� 2⁄ [ √��]
5
= 100 ± 1.65[
√����] We are 90% confident that the true population
mean of all the prices is found between 98.49
= ≤ �� ≤
and 101.51.
= 98.46 ≤ �� ≤ 101.51
∝ �� ∝ ��
��̅± �� 2⁄ ( √��) ��̅��̅+ �� 2⁄ ( √��)
Example:
A random sample of 40 students entering statistics major has the following GPAs.
4.0 3.5 3.0 3.3 3.8 3.1 3.6 4.0 3.9 3.5
3.2 3.0 3.5 3.2 3.0 3.2 4.0 3.0 3.4 3.0
3.0 2.8 3.6 3.0 3.2 3.5 3.2 2.8 3.3 3.1
3.2 2.9 3.0 2.8 4.0 3.7 3.0 3.3 3.2 2.8
Solution:
��̅ = 4.0+3.2+3.0+3.2+3.5+3.0+...+2.8
40
65
b. 95% confidence interval
We can say with 95% confidence that the interval between 3.18 and 3.40 contains the true
mean GPA of population based on the sample GPA of 40 students entering math major
Where E
N = no. of samples.
Example:
98.6 98.6 98 98 99 98.4 98.4 98.4 98.4 98.6 For these scores
n = 10; ��̅ = 98.44 and S = 0.3
66
Solution:
Degrees of freedom = n – 1
= 10 – 1
=9
Based on the 10 sample, we are 99% confident that the limits of 95.36 and 101.52 co
contain the true mean score.
��
�� = ��
Where
P - sample proportion.
n - number of observation.
X - particular value.
�� = ��∝/��√������ ; q= 1-p
67
Confidence interval for a population proportion p
�� =(��∝/��)² �� ��
����
b. When no estimate p known
�� =(��∝/2)²Ó²
��
��²; Ó = 4; �� ���� ��ℎ�� ����������
Example:
2. When 500 college students are randomly selected and surveyed, it is found that 135
of them own personal computers.
a. Find the point estimate of the true proportion of all college students who own
personal computers.
b. Find as 95% confidence interval for the true proportion of all college students
who own personal computers.
Solution:
a.��=����=135
500= 0.270
�� = 1 − 0.270 = 0,730
����
�� = √ ����∝/��
= 0.039
68
Therefore
P–E<P<P+E
Or
Thus w/95% confidence, we can state that the interval from 23.1% to 30.9% contains the
true percentage off all student who own personal computer.
References
Walpole, R.E, Myers, R.H., Myers, S.L, and K.ye,2002 -Probability and Statistics for Scientist
and Engineers, 7th ed. Pearson Education, Inc.
69
Assessing Learning
Activity 1
2. The dean of ABC University wishes to estimate the average age of students presently
enrolled. From past studies, the standard deviation is known to be 2 years. A sample
Students is selected and the mean is found to be 19.2 years. Find the 99%
confidence.
70
3. An instructor wants to estimate the difference in grade point average between two
groups of college students accurate to w/in 0.2 grade point, w/ probability equal to
0.95 If the standard deviation of the grade point measurement is equal to 0.6 how
many students must he include in each group? 95% confidence.
4. It has been reported that the average daily intake of calories for young woman is
1667. To see if this is true for nurses, a researcher sampled 15 nurses and found
their average daily intake of calories was 1593, w/standard deviation of 36 calories.
Find the 90% confidence interval of mean.
71
UNIT V. TESTING OF HYPOTHESIS
Overview
Module V deals with testing hypothesis for parametric test; the z-test and t-test. Z test is for
two sample for mean test, t-test for independent samples and for correlated Samples. It is
used to test non significance of difference between a single pair of samples.
Learning Objectives
Topics
5.1 Z-test
5.1.1 Z-test for a two sample mean test
5.2 T-test
5.2.1 T-test for independent samples
5.2.2 T-test for correlated samples
72
Setting up
4. Which of the following test statistic/s will be utilized when ��0 is non directional?
a. One tailed test
b. Two tailed test
c. Both a and b
d. Neither a nor b
73
Lesson Proper
TESTING OF HYPOTHESIS
Critical Values – the values of the test statistical evidence that separate the rejection regions.
74
One Tailed / Two Tailed Tests
A one tailed test is a hypothesis test for which the rejection lies at only one tail of the
distribution. It is classified as left or right tailed test. If the population mean is less than the
specified value of ����, then it is a left tailed test for which the alternative hypothesis can
be expressed as �� < ����. It is right tailed test if the population mean (��) is greater
than specified value of ���� for which the alternative hypothesis can be expressed as >
����
A two tailed test is used when the alternative hypothesis is non-directional which
means that the values of two measures of the same kind are not equal. The rejection regions
lie on both end tails of the distribution.
Level of Significance
The level that corresponds to the area in the critical region. For hypothesis testing, it
is customary to use an �� error of 5% or 1% as the case may be. It also implies that we are
Downloaded by KENNETH DARYLL FIDELSON ([email protected])
lOMoARcPSD|24304008
willing to commit an �� error of 5% or 1% as the case may be. It also implies that we are
95% or 99% confident in making decisions.
Type II error – is when we accept or fail to reject the null hypothesis when the alternative
hypothesis is true.
Note: Reject the null hypothesis when the computed value of z lies within the area of rejection.
75
76
Steps in Hypothesis Testing
���� ∶ �� = ����
2. Choose an appropriate alternative hypothesis ����.
����: �� < ����(������ ������������ �������� − ��������)
�� > ���� (������ ������������ �������� − ������ℎ��)
�� ≠ ���� (������ ������������ ��������)
3. Specify the level of significance to be used.
4. Select an appropriate test statistic and determine the critical value of the test
statistic.
5. Compute the value of the test statistic using the sample data.
�� =�� − �� t test
�� <
|����|
√��
����
��(��−��)
Decision Rule
�� =�� − ��
Reject ���� if ���� < ��
77
Examples:
A. The treasurer of a certain university claims that the mean salary of their college professors
Downloaded by KENNETH DARYLL FIDELSON ([email protected])
lOMoARcPSD|24304008
Solution:
1. ���� ∶ �� = 21,750
���� ∶ �� < 21,750
2. Test Statistic
�� = 0.05
�� = 75
�� = 19,375
�� = 6,000
�� =�� − ��
��
√��
4. Computation
���� =�� − ��
��
√��
=19,375 − 21,750
6,000
√75
���� = −��. ����
5. Decision:
Reject ���� : since ���� < −���� and conclude that 75 professors have lower
salaries than the rest.
78