Ce 204 Lecture Notes_spring 2022-2023
Ce 204 Lecture Notes_spring 2022-2023
CE 204
UNCERTAINTY AND DATA
ANALYSIS
SUPPLEMENTARY
LECTURE NOTES
M. Semih YÜCEMEN
Department of Civil Engineering
Middle East Technical University
February, 2023
© 2023 Mehmet Semih Yücemen All Rights Reserved
All publication rights of this work belong to the author.
PREFACE
These Supplementary Lecture Notes are for the use of the students enrolled in the CE 204
Uncertainty and Data Analysis course given during the Spring 2022-2023 semester at the
Department of Civil Engineering, Middle East Technical University. They may be used for
educational purposes without the written permission of the author by citing the source, which
is:
“Yücemen, M. S, Supplementary Lecture Notes, CE 204 Uncertainty and Data Analysis, Spring
2022-2023 Semester, Department of Civil Engineering, Middle East Technical University,
Ankara, Tukey, February, 2023”.
Since the main reference book for the course is Probability Concepts in Engineering Planning
and Design, Vol. I, First edition, 1975/ Second edition, 2007, by A.H.-S. Ang and W.H. Tang,
Wiley, a few examples are directly taken from the two editions of this book.
I have taught this civil engineering oriented basic statistic course together with a number of
colleagues for many years, in multiple sections. I thank all of them, but especially Dr. Engin
Karaesmen who authored Chapter 7 and Prof. Dr. Tuğrul Yılmaz for his contributions and
suggestions to some of the example problems in Chapter 6. A number of teaching assistants
also contributed by carrying out the numerical calculations of a few number of examples.
I commemorate with gratitude my co-advisors Prof. Alfredo. H-S. Ang and the late Prof. Wilson
Tang, who were among the pioneers of Structural Reliability, and Risk and Decision Analysis
in systems planning and design, for leading me to focus on these topics.
M. Semih Yücemen
Ankara, February, 2023
CE 204 (2022-23 Spring Semester) MSY
CONTENTS
Chapter 1 INTRODUCTION
Chapter 2 BASIC PROBABILITY CONCEPTS AND RULES
Chapter 3 RANDOM VARIABLES
Chapter 4 IMPORTANT PROBABILITY DISTRIBUTIONS
Chapter 5 MULTIPLE (MULTIVARIATE) RANDOM VARIABLES
Chapter 6 FUNCTIONS OF RANDOM VARIABLES
Chapter 7 BRIEF INFORMATION ON STATISTICS**
Chapter 8 SOME BASIC CONCEPTS OF STATISTICAL INFERENCE
Chapter 9 FITTING PROBABILISTIC MODELS TO OBSERVED DATA
Chapter 10 BASIC CONCEPTS OF SIMPLE LINEAR REGRESSION
______________________
**This chapter is originally prepared by Dr. Engin Karaesmen within the scope of the undergraduate
course, CE 204 Uncertainty and Data Analysis.
ii
CE 204 (2022-23 Spring Semester) MSY
Chapter 1
INTRODUCTION
The rational treatment and assessment of uncertainties in civil engineering have received
particular attention in the last six decades. In most cases, loading conditions, material
properties, geometry, and various other parameters show considerable variations. Observations
and measurements of physical processes as well as parameters exhibit random characteristics.
On top of these, modelling, workmanship, and human errors (gross errors) create additional
uncertainties.
Uncertainties yield to the risk of unsatisfactory performance and failure, which may cause loss
of life and property. Therefore, the management of risk is the most important issue, not only in
civil engineering but also in every other field, even in our daily lives. Statistical and
probabilistic procedures provide a sound framework for a rational basis for processing these
uncertainties.
Uncertainties and their effect on the safety of engineering structures can only be evaluated
rationally through probabilistic and statistical methods. Accordingly, the design and analysis of
structural systems have to be based on “stochastic” concepts.
1-1
CE 204 (2022-23 Spring Semester) MSY
In the following, the classical (deterministic) and probabilistic (stochastic) approaches to civil
engineering problems are compared, emphasizing the merits of the probabilistic methods and
the deficiencies of the deterministic approach.
Classical (Deterministic) Approach
(i) It is not possible to quantify uncertainties explicitly.
(ii) Load (demand) and resistance (capacity) parameters are single-valued (deterministic) and
safety factors are used for achieving safety.
(iii) The risk (failure probability) associated with the design is unknown.
(iv) There is no systematic procedure for adjusting the safety factor based on the additional
information and data acquired.
VULNERABILITY: The expected degree of loss resulting from the occurrence of the
phenomenon.
RISK: The likelihood or probability of a given hazard of a given level causing a particular level
of loss of damage. The elements of risk are: populations, communities, the built environment,
the natural environment, economic activities, and services that are under threat of disaster in a
given area. Mathematically total risk can be written as:
1-2
CE 204 (2022-23 Spring Semester) MSY
1-3
CE 204 (2022-23 Spring Semester) MSY
In Table 1.2, several examples of the different types of uncertainties encountered in civil
engineering areExamples
given. for Different Types of Uncertainties
Table 1.2 Examples of Different Types of Uncertainties
Uncertainties in Loads Examples
Fire, explosion
Accidental loads
Moving vehicle impact
1-4
CE 204 (2022-23 Spring Semester) MSY
Example 1.1
Compressive strength of concrete (fc) 21.55 MPa 1.25 0.105 0.14 0.18
Yield strength of BC III (fy) 365 MPa 1.24 0.038 0.08 0.09
1.4.
1.4 MEASURING
MEASURINGRISKRISK
USING USING
BASIC RELIABILITY THEORY
RELIABILITY THEORY
Let R and S be two random variables describing capacity and demand. Referring
to the following figure, consider the following definitions of failure and limit state:
Failure: R ≤ S 1-5
Failure: R ≤ S
P[LS] = P [ R – S < 0 ] = PF
where:
(μ R − μ S )
pF = 1 −
(σ 2R + σ 2S )1 2
The purpose of quantitative risk assessment (QRA) is to calculate a value for the risk to enable
improved risk communication and decision-making.
Several frameworks for QRA have been proposed by many experts. The QRA frameworks have
a common intention to find answers to the following questions (Ho et al. 2000, Lee and Jones
2004):
1) Danger Identification [What are the probable dangers/problems?]
2) Hazard Assessment [What would be the magnitude of dangers/problems?]
3) Consequence/Elements at Risk Identification [What are the possible consequences and/or
elements at risk?]
4) Vulnerability Assessment [What might be the degree of damage in elements at risk?]
5) Risk Quantification/Estimation [What is the probability of damage?]
6) Risk Evaluation [What is the significance of estimated risk?]
7) Risk Management [What should be done?]
1-6
CE 204 (2022-23 Spring Semester) MSY
1. Ho, K., Leroi, E. and Roberds, B. (2000), “Quantitative risk assessment”, Application,
myths and future directions, GeoEng, Technomic Publishing, p. 269-312.
2. Lee, E.M. ve Jones, D.K.C. (2004), Landslide Risk Assessment, Thomas Tilford
Publishing, London.
1-7
CE 204 (2022-23 Spring Semester) MSY
Chapter 2
BASIC PROBABILITY CONCEPTS AND RULES
2.1. BASIC CONCEPTS OF STATISTICS
Statistics is directly related to data and provides the scientific tools and methodology for the
collection of data (sampling and design of experiments), description of data (descriptive
statistics), processing of data to derive the maximum information from the data utilizing methods
of statistical inference (estimation and hypothesis testing). The basic steps of statistical data
analysis are summarized in Fig. 2.1.
RANDOM SAMPLING
Sample Size: n
POPULATION SAMPLE
Parameters: e.g.: , σ ̅, s
Statistics: e.g.: 𝐗
STATISTICAL INFERENCE
Estimation and Hypothesis Testing
The basic aim of statistical data analysis is to estimate the unknown values of population
parameters. Population parameters are generally shown by Greek letters. For example, you may
be interested in the mean value (X) and variability (quantified by standard deviation, σX) of the
yield stress of steel bars, X, produced by a certain steel plant. Since it is not possible to test all the
steel bars produced by this steel plant, which in this example forms the population, a random
sample, of size n, is taken from the population and is analyzed. In random sampling, all members
of the population have an equal chance of being selected. Based on this random sample, the sample
mean, 𝐗̅ and sample standard deviation, sX are computed using the following standard equations:
̅ = 1 ∑i=n
X X
n i=1 i
1
sX = √n−1 ∑i=n ̅ 2
i=1 (X i – X)
The values derived from data, like the sample mean, 𝐗 ̅ and the sample standard deviation, sX, are
called statistics and are used to estimate the population parameters X and σX, respectively. This
process is called statistical inference, where the methods of estimation and hypothesis testing are
implemented.
2-1
CE 204 (2022-23 Spring Semester) MSY
Set theory is a convenient tool for performing operations on events. Therefore, some concepts and
notations from set theory are given below:
An event is a set of outcomes. An elementary event is a single outcome (a single point on the
sample space), whereas a compound event has multiple outcomes.
Sample space S, is a set of all possible experimental outcomes and it represents the sure event.
Sample space is sometimes referred to as the universal set and denoted by U. The sample space
may also be described by the outcome tree (tree diagram).
Events are usually denoted by capital case letters. For example, failure and survival of a structural
element can be denoted by F and S, respectively.
Example 2.1 Assume there are two graders available at a construction site. Let the events F and S
be defined as follows: F: the grader is in a failed condition (cannot operate) and S: the grader is in
a satisfactory condition (can operate). The sample space, which is defined as: S: {SS, SF, FS, FF},
is shown in Fig. 2.2.
S
• SS (Both graders are in satisfactory condition).
• SF (First grader is in satisfactory condition and the other one failed).
• FS (First grader failed and the other one is in satisfactory condition.
• FF (Both graders failed)
•
Figure 2.2. Sample space for Example 2.1
2-2
CE 204 (2022-23 Spring Semester) MSY
The event A, defined as both graders failed A = {FF} contains only one sample point and is a
simple event. On the other hand, the event B defined as: at least one grader failed, B = {SF, FS,
FF}, contains three sample points, and is a compound event.
The sample space can also be described by an outcome tree (tree diagram) as shown in Fig. 2.3.
Second
Grader
First
Grader
S SS
S F SF
S FS
F
F FF
The union of two events will be denoted as A ∪ B = C, where event C corresponds to the outcomes
of either A or B or both. The corresponding Venn diagram is shown in Fig. 2.4. Event C is the
crosshatched area.
A B
Figure 2.4 Venn diagram for the union of the events A and B
The intersection of two events will be denoted as: A∩B = AB = D, where D is the event with
outcomes common to both A and B. The corresponding Venn diagram is shown in Fig. 2.5. Event
D is the crosshatched area.
2-3
CE 204 (2022-23 Spring Semester) MSY
A B
̅
A
Notation for the probability of any event, say A is denoted either by P(A) or Pr(A) throughout
the course.
Subjective Definition of Probability: Based on a scale from 0 to 1 (or 0% to 100%), it is the degree
of one’s belief in the likelihood of occurrence or non-occurrence of an event. Although such a
measure is based on an individual’s judgment without any precise computation, still it may be used
if it is a reasonable assessment by an experienced and knowledgeable person, when other means
of obtaining probability is not possible. Probability based on expert opinion is an example of
subjective probability.
nA
Pr(A) = lim
N→∞ N
Axiomatic Definition of Probability: Although the above definitions of probability are valid, the
proper definition of probability, within the framework of mathematical probability theory, is
based on the following three axioms:
2-4
CE 204 (2022-23 Spring Semester) MSY
a) Pr(A) ≥ 0
b) Pr (S) = 1
c) Pr (A∪B) = Pr(A) + Pr(B), iff Pr(A∩B) = , that is when A and B are mutually exclusive
events. The Venn diagram corresponding to two mutually exclusive events A and B is shown
in Fig. 2.7.
A B
All of the probability rules presented in the following sections are developed based on these three
basic axioms.
d) Pr(A∪B) = Pr(A) + P(B) – Pr(A∩B) iff A and B are not mutually exclusive events, that is
(A∩B) .
For three events, A, B, and C the above relationship takes the following form:
̅) = 1– Pr(A)
e) Pr(A
f) De Morgan’s theorem: P( A B) = P( A B) and P (AB ) = P( A B)
2-5
CE 204 (2022-23 Spring Semester) MSY
The keyword is OR. Probability of occurrence of either A or B or both events. The corresponding
Venn diagram is shown in Fig. 2.8.
A AB B
If A and B are mutually exclusive events, i.e. (A∩B) = , then since Pr(A∩B) = 0,
The addition rule can be generalized for the probability of mutually exclusive n events as follows:
Fig. 2.9 illustrates the case where the union of the mutually exclusive events A1, A2, ..., An
constitute the sample space S. In such a case the set of events A1, A2, ..., An is called mutually
exclusive and exhaustive (MEE).
S
A1 A4 An
A5
A2
An-1
Ai
A3
Figure 2.9 The sample space, S, corresponding to the union of mutually exclusive
events, A1, A2, ..., An
The keyword is AND. Probability of occurrence of both events A and B. The corresponding Venn
diagram is shown in Fig. 2.5.
2-6
CE 204 (2022-23 Spring Semester) MSY
If A and B are statistically independent events, Pr(A/B) = Pr(B) and Pr(B/A) = Pr(B).
Accordingly,
c) Statistical Independence
The events A and B are statistically independent if Pr(A/B) = Pr(B) or Pr(B/A) = Pr(B) or
Note: Mutually exclusive events cannot happen at the same time. Statistically independent events
can happen at the same time but they do not depend on each other.
d) Conditional Probability
Pr(A∩B)
Pr(A/B) =
Pr(B)
Pr(A∩B)
Pr(B/A) =
Pr(A)
Let B1, B2, ..., Bn be a mutually exclusive and exhaustive set of events partitioning the sample
space in such a way that the following properties are satisfied.
i) Bi S i = 1,2, ..., n
Furthermore, it is assumed that Pr(Bi) 0. For any event, A in the sample space, S, (see Fig. 2.10)
n
Pr(A) = Pr(B i ) Pr(A/Bi )
i =1
2-7
CE 204 (2022-23 Spring Semester) MSY
B1 B2 B3
Bk Bn
Proof:
It is observed in Fig. 2.10 that event A is the union of mutually exclusive events, i.e.
In other words,
A = (B1 A) (B 2 A) ... (B n A)
Applying the probability rule for the union of a mutually exclusive set of events,
and, finally
n
Pr(A) = Pr(B i ) Pr(A/Bi )
i =1
This result is known as the Theorem of Total Probability in Statistics and is widely used in Civil
Engineering applications. For example, the Logic Tree Method used in Probabilistic Seismic
Hazard Analysis (PSHA) is based on this theorem.
g) Bayes’ Theorem
Consider again the same mutually exclusive and exhaustive set of events B1, B2, ..., Bn described
above. Let A be any event in the sample space, S, such that Pr(A) ≠ 0. Then,
2-8
CE 204 (2022-23 Spring Semester) MSY
Pr(A/Bk ) Pr(Bk )
Pr(B k / A) = n
k = 1,2,..., n
Pr(A / Bi ) Pr(Bi )
i =1
Proof:
Pr (B ) Pr(A/B )
i =1
i i
where, the term Pr(A) in the denominator is replaced by the corresponding expression given by
the theorem of total probability.
Bayes’ Theorem is quite important in Statistics and forms the basis for the popular Bayesian
Statistics. It is also widely used in Civil Engineering applications, especially in combining expert
opinion with information based on observed data.
When the result is known, Bayes’ Theorem helps us to assess consistently the probability of a
specific event creating the observed result among all of the candidate events. As can be seen,
Bayes' Theorem has the opposite approach, such as reaching the cause from the result. The events
B1, B2, ..., Bn can be considered as hypotheses. It has been accepted that these events cannot happen
at the same time and there is no other assumption other than these. Pr(Bk), is called the prior
probability of the event Bk, Pr(Bk/A), is called the posterior probability of the event Bk.
Pr(A/Bk) is the likelihood of event A, given that the Bk assumption is valid. Bayes' Theorem, in
the light of new data, updates systematically and consistently the prior probabilities, leading to the
posterior probabilities.
The solution to probability problems can sometimes be possible by counting points in the sample
space. In this respect, the permutation and combination rules may be used.
Permutation: The number of permutations created from n different objects is n! (n factorial). The
number of permutations created by taking an r number of objects from n different objects is as
follows:
n!
n Pr =
(n − r )!
In permutation, the order of the selected objects is considered.
2-9
CE 204 (2022-23 Spring Semester) MSY
Combination: In some cases, it is important how many different objects can be selected out of n
objects without looking at their order. These selections are called combinations. The number of
combinations created by taking an r number of objects from n different objects at a time is given
by the following equation.
n n
n Cr = =
r r!(n − r )!
The water supply for two cities C and D comes from the two sources A and B as shown in Fig.
2.11. Water is transported by pipelines consisting of branches 1, 2, 3 and 4. Assume that either one
of the two sources, by itself, is sufficient to supply the water for both cities.
Denote:
E1 = failure of branch 1
E2 = failure of branch 2
E3 = failure of branch 3
E4 = failure of branch 4
Failure of a pipe branch means there is serious leakage or rupture of the branch.
Figure 2.11 The water supply system (from Ang and Tang, 2007)
Shortage of water in city C would be represented by (E1∩ E2)∪E3, and its complement
̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
(𝐸1 ∩ 𝐸2 ) ∪ 𝐸3 means that there is no shortage of water in city C. Applying de Morgan’s rule, we
have,
2-10
CE 204 (2022-23 Spring Semester) MSY
̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ ̅1∪ E
(𝐸1 ∩ 𝐸2 ) ∪ 𝐸3 = ( E ̅2) ∩ E
̅3
The last event above means that there is no failure in branch 1 or branch 2 and also no failure
in branch 3.
Similarly, the shortage of water in city D would be the event (E1∩ E2)∪E3∪E4. Therefore, no
shortage of water in city D is:
̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
(𝐸1 ∩ 𝐸2 ) ∪ 𝐸3 ∪ 𝐸4 = (E ̅1 ∪ E
̅2 ) ∩ E
̅3 ∩ E
̅4
̅1 ∪ E
which means that there is sufficient supply at the station, i.e., (E ̅2 ) and there are no
̅3 ∩ E
failures in both branches 3 and 4, represented by (E ̅4 ).
Example 2.2 (Adopted from Ang and Tang, 2007; Example 2.19)
Consider the following chain system consisting of two links as shown in Fig. 2.12 subjected to a
force F=300kg.
Figure 2.12 A two-link chain system (from Ang and Tang, 2007)
If the fracture strength of a link is less than 300kg, it will fail by fracture. Suppose that the
probability of this happening to either of the two links is 0.05. The chain will fail if one or both
of the two links should fail by fracture. To determine the probability of failure of the chain, define:
E1 = fracture of link 1
E2 = fracture of Link 2
Then Pr(E1) = Pr(E2) = 0.05 and the probability of failure of the chain system is
Ρr(E1 ∪ E2) = Pr(E1) + Pr(E2) − Pr (E1 ∩ E2)
= 0.05 + 0.05 − Ρr(E2|E1) Pr(E1)
We observe that the solution requires the value of the conditional probability Ρr(E2/E1), which
is a function of mutual dependence between E1 and E2. If there is no dependence or they are
statistically independent, Ρr(E2/E1) = Ρr(E2) = 0.05. In this case, the probability of failure of the
chain system is:
Ρr(E1 ∪ E2) = 0.10 − 0.05 x 0.05 = 0.0975
2-11
CE 204 (2022-23 Spring Semester) MSY
On the other hand, if there is complete or total dependence between E1 and E 2, which means
that if one link fractures the other will also fracture, then Ρr(E2/E1) = 1.0. In such a case, the
probability of failure of the chain system becomes:
Ρr(E1 ∪ E2) = 0.10 − 0.05 x 1.0 = 0.05
In this latter case, we see that the failure probability of the chain system is the same as the failure
probability of a single link. Therefore, we can state that the probability of failure of the chain
system ranges between 0.05 and 0.0975.
Example 2.3
Two nuclear power plants A and B supply energy to a northeastern region of Japan. Normally,
plant A is functioning and it is replaced by plant B if it fails. The failed plant is immediately
repaired while the other one is functioning so that it can replace the other if it also fails. Assume
that the probabilities of failure of nuclear power plants A and B are 0.0001 and 0.0002,
respectively. Find the probability that plant A is functioning.
Solution:
Let A be the event that the power plant A will be functioning and let B be the event that the power
plant B will be functioning and E be the event that energy will be supplied to the region.
Pr (E) = 0.9999 + 0.0001 x 0.0002 x 0.9999 + 0.0001 x 0.0002 x 0.0001 x 0.0002 x 0.9999 +….
1
=0.9999 ( ) = 0.99989998
1−0.0001 𝑥 0.0002
Note: c + cr2+cr3 +…+crn-1 is a geometric series and when r2 ≪ 1.0, the sum of an infinite geometric
series is: [c/ (1-r)].
Example 2.4
A certain steel factory produces steel bolts using three different machines labeled as A, B and C.
Machines A, B and C produce 40%, 25% and 35% of the daily production, respectively. At the
end of the day, all the bolts produced from these three machines are placed into the same storage
box.
a) If a bolt is selected randomly from the storage box what is the probability that it was produced
by Machine A? Machine B? Machine C?
2-12
CE 204 (2022-23 Spring Semester) MSY
b) Based on the previous production records it is estimated that machines A, B and C produce
defective bolts at a rate of 10%, 5% and 1%, respectively. What is the probability that the randomly
selected bolt is defective? Which probability rule is used?
c) If the bolt is defective, what is the probability that the bolt was produced by Machine A?
Solution:
a) Let A, B, C denote the events that the bolt was produced by Machines A, B and C, respectively
and D be the event that the bolt is defective. Then, using the relative frequency definition of
probability,
These are the prior probabilities that will be updated based on the observed data and information
on the rate of defectives.
b) Based on the information on defective percentages, the likelihood of observing defective items
for each machine is given by the following conditional probabilities:
The probability that the randomly selected bolt is defective is computed based on the theorem of
total probability as follows:
c) The required probability is Pr(A/D) and is referred to as the posterior probability since it is
obtained by revising the prior probability after observing that the bolt is defective. The updating
will be done based on Bayes’ theorem as follows:
Pr(D/A) Pr(A)
Pr(A/D) =
Pr(D/A) Pr(A)+Pr(D/B) Pr(B)+Pr(D/C)Pr(C)
As expected, the prior probability of the defective bolt produced by Machine A is increased from
0.40 to 0.71. This is expected since Machine A has the highest daily production and also has the
2-13
CE 204 (2022-23 Spring Semester) MSY
highest defective rate. However, the quantification of this increase by engineering judgment may
yield different results, whereas Bayes’ theorem achieves this consistently and systematically.
The problem can be solved without using any probabilistic method as follows: Assume that the
daily production is 10000 bolts. At the end of the day, the storage box will be like this:
4000 A
2500 B
3500 C
10000
TOTAL
BOLTS
This also corresponds to the original sample space and Pr(A) = 4000/10000 = 0.40.
Knowing the defective production rate and that the bolt is defective, the revised sample space
will take the following form:
400 A
125 B
35 C
560
TOTAL
DEFECTIVES
As observed, the Bayes Theorem updates the sample space and reduces it from 10000 bolts to
560 defective bolts consistent with the given information that the bolt is defective.
Example 2.5
The site of the new Corona Treatment hospital to be constructed in Istanbul is a moderately active
earthquake region. When a large magnitude earthquake occurs, the probability that the hospital
will experience structural damage (event D) is estimated to be 0.20. The probability of occurrence
of one or two large magnitude earthquakes in this region in one year is estimated as 0.15 and 0.10,
respectively, whereas the probability of occurrence of three or more earthquakes is negligible (i.e.
zero). Assume that the structural damages between earthquakes are statistically independent.
2-14
CE 204 (2022-23 Spring Semester) MSY
a) Obtain the probability mass function for the number of earthquakes, X, occurring at the site of
this hospital in one year and show it in a table.
b) What is the probability that there will be no structural damage in this hospital due to the
occurrence of earthquakes in this region during the next year?
c) What is the probability that there will be no structural damage in this hospital due to the
occurrence of earthquakes in this region in the next 5 years?
Solution:
a) The probability mass function for the number of earthquakes, X, occurring at the hospital site
in one year is:
Alternative way:
̅ /x=0)*Pr(x=0) + Pr(D
Pr(No Damage) = Pr(D ̅ /x=1)*Pr(x=1) + Pr(D
̅ /x=2)*Pr(x=2)
c) Assume that the structural damage is statistically independent from year to year. Then:
2-15
CE 204 (2022-23 Spring Semester) MSY
Chapter 3
RANDOM VARIABLES
The concepts of "event", "sample space" and "probability" were described in detail in the
previous chapter. This chapter will first describe the concept of a random variable, and then
show the probability distributions related to them and parameters that summarize the properties
of these distributions.
The set displaying all possible results of an experiment is called sample space. Generally, it is
more important to state the results of the experiment numerically than a detailed description of
each of these test results. The results of the experiment can sometimes be expressed directly
numerically, for example, earthquake magnitude, the length in mm of a screw selected
randomly from the daily production of a machine, etc. Sometimes the results of the experiment
are expressed in a non-numeric way, for example, the result of a coin toss: heads or tails,
whether it will rain on a given day or not, etc. In the second case, it is possible to give a different
number to each simple event (i.e. sample point) in the sample space.
In an experiment in which a fair coin was tossed three times, the sample space that shows the
results in the most detailed way is given in Fig. 3.1. If the actual aim is to record the number
of tails (T) in three tosses of the coin, then for each point in the sample space, one of the
numerical values 0, 1, 2 or 3 can be assigned.
HHH • X x
HHT • • 0
HTH • • 1
THH • • 2
HTT • • 3
THT • Reduced
Sample
TTH •
Space
TTT •
Original Detailed
Sample Space
.
Figure 3.1 Sample spaces for random variable X representing the number of tails in 3 tosses
of a fair coin (H = Heads; T = Tails)
The numbers 1, 2 and 3 are random values determined from the result of the experiment. In
other words, these numbers are the values that the random variable X receives. In this example,
X represents the number of tails in three tosses of a fair coin.
3-1
CE 204 (2022-23 Spring Semester) MSY
Definition: Random variable X, is a function that assigns each simple event in the sample
space a real numerical value, and its domain is only the sample space. Random variables are
indicated by uppercase letters such as X, Y and Z and their values are in lowercase letters such
as x, y and z.
Example 3.1 – Two balls in a row are drawn from a bag containing three red (R) and two white
(W) balls. Possible outcomes and the corresponding values of the random variable Y, where
Y = number of red balls, are given in Table 3.1.
Table 3.1 The values that the random variable Y defined in Example 3.1 will attain
Simple event Y
WW 0
WR 1
RW 1
RR 2
If there is a limited (finite) number of sample points in the sample space, this sample space is
called a discrete sample space. The random variable defined on the discrete sample space is
called a discrete random variable.
Definition: If a random variable X can only receive a specified limited (or countable infinite
number) of values, then X is called a discrete random variable. On the other hand, in the
case where there are an infinite number of sample points in the sample space (such as points
on a line segment), the sample space is called a continuous sample space, and the random
variable defined on that sample space is called a continuous random variable.
A discrete random variable X will attain each value with a specified probability. If X attains
only the values denoted by x1, x2, ... , xn, the following two conditions must be satisfied:
In an experiment where a coin is tossed three times, if X shows the number of tails, X will
attain its various values with the probabilities given in Table 3.2. As seen, the above two
conditions are both satisfied.
Table 3.2 Probability distribution of the number of tails (x) in three tosses of a fair coin
x 0 1 2 3
Pr(X=x) 1/8 3/8 3/8 1/8
3-2
CE 204 (2022-23 Spring Semester) MSY
Often it may be appropriate to show the probability distributions of random variables with
equations. If we symbolize this equation by pX (x), then it can be written as pX (x) = Pr(X=x).
For example, pX (2) = Pr(X=2). pX (x) is called as: (discrete) probability function (or
probability mass function) of X.
Definition: An equation or a chart or a table that shows all the values that a discrete random
variable can take and their corresponding probabilities is referred to as a discrete probability
distribution.
Example 3.2 – Find the probability distribution of the sum of the number to be obtained in two
rolls of a die.
Solution: Let X be a random variable that shows the sum of the numbers displayed in two rolls
of a fair die. The value of X, denoted by x can be any number between 2 and 12. Two dice can
come in 6x6 = 36 different ways, and the probability of each is 1/36. For example, Pr(x=4) =
3/36, since this sum (i.e. x = 4) can be obtained in three different ways: (1,3), (3,1) ve (2,2) .
The desired probability distribution is given in Table 3.3.
It is useful to show the probability distribution in a graphic form. The probability distribution
given in Table 3.2 is shown graphically in Fig. 3.2. The probability of each value is expressed
by the height of the corresponding bar. This graphical representation of the probability
distribution is called a bar diagram or bar chart.
pX (x)
3/8
2/8
1/8
0 1 2 3 x
Figure 3.2 Bar diagram (chart)
For a continuous random variable, the probability of attaining any single value is zero. Since
there will be an infinite number of points in the sample space of the continuous random
variable, the probability of selecting any one of these points is 1/∞ = 0. In this case, the
probability distribution of the continuous random variable cannot be shown by a table, but an
equation will be used. The probability distribution of a continuous random variable X will be
represented by fX (x) and will be called the probability density function (pdf). Now that X is
defined on a continuous sample space, the diagram of fX (x) will be continuous, for example, as
shown in Fig. 3.3.
3-3
CE 204 (2022-23 Spring Semester) MSY
fX (x) fX (x)
x x
(a) (b)
fX (x) fX (x)
x (d) x
(c)
fX (x) fX (x)
x x
(e) (f)
Definition: If a function fX (x) complies with the following conditions, then it is called the
probability density function of the continuous random variable X.
(i) fX (x) ≥ 0
(ii) the total area under the fX (x) curve and constrained by the x-axis is equal to one. Expressed
mathematically:
∞
∫−∞ fX (x)dx = 1.0.
The probability of X attaining a value between a and b is equal to the area under the fX (x) curve
that is bounded by the x-axis, x=a, and x=b vertically. This area is shown as shaded in Fig. 3.4.
fX (x)
x
a b
Figure 3.4 The area (shaded) that corresponds to the Pr(a ≤x ≤ b) value
3-4
CE 204 (2022-23 Spring Semester) MSY
Since probabilities correspond to areas and since probabilities have positive values,
accordingly the entire density function will remain above the x-axis.
The probability distribution of a discrete random variable can be expressed by the discrete
probability function (or probability mass function) and the probability distribution of a
continuous random variable with the probability density function. There is another useful
method for specifying the probability distributions of discrete and continuous random
variables. This involves cumulative probabilities. The cumulative probability that the random
variable X attains a value equal to or less than a specified x value is expressed as follows:
FX (x) = Pr (X ≤ 𝑥) (3.1)
where, the function, FX (x), is called cumulative distribution function (CDF). A cumulative
distribution function must satisfy the following requirements.
i) 0 ≤ FX (x) ≤ 1.0
ii) If a b, FX (a) FX (b)
iii) FX (∞) = 1.0 and FX (−∞) = 0.
From a given probability mass function or probability density function, the cumulative
distribution function can be derived. For a given value of X = a, if X is a discrete random
variable
and if X is continuous
a
FX (a)= Pr(X ≤ 𝑎) = ∫−∞ fX (x)dx (3.3)
Example 3.3 – Let X be a discrete random variable that shows the number of heads in two
tosses of a fair coin. The probability mass function of X is as follows:
1/4 𝑓𝑜𝑟 𝑥 = 0 𝑜𝑟 𝑥 = 2
p(x) = { 1/2 𝑓𝑜𝑟 𝑥 = 1 }
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
As observed, X only gets the values 0, 1 and 2. For values of X equal to or greater than 2,
FX (x) = 1. Since X cannot be less than zero, for values of X less than zero, FX (x) = 0. The
corresponding cumulative distribution function is shown in Fig. 3.5.
3-5
CE 204 (2022-23 Spring Semester) MSY
FX (x)
1
3/4
1/2
1/4
0 1 2 x
Figure 3.5 Cumulative distribution function obtained for Example 3.3
0 x0
1
0 x 1
4
F( x ) =
3 1 x 2
4
1 x 2
The probability distribution of each random variable X is fully described by its cumulative
distribution function (CDF) or by its probability mass function (pmf) if discrete or by
probability density function (pdf) if continuous. However sometimes, either because of a
lack of sufficient information or for simplicity we are satisfied with less information
provided by a number of numerical descriptors of the random variable. These are mainly
measures of central tendency, variability (dispersion, spread and uncertainty) and shape
(symmetry). In the following, we will consider the statistical parameters used to represent
these numerical descriptors summarizing the main characteristics of a random variable.
The main measures of central tendency are the expected value (mean, average), median and
mode.
The summation in Eq. 3.4 above covers all the different values that X can take and
3-6
CE 204 (2022-23 Spring Semester) MSY
E (aX + b) = aE(X) + b
Example 3.4 – If X is a random variable indicating the number observed when a fair die is
rolled, what is the expected value of X?
Solution: The numbers 1, 2, 3, 4, 5, and 6 will each occur with a probability of 1/6. Then, from
Eq. 3.4,
1 1 1 1 1 1
E(X) = 1x + 2x + 3x + 4x + 5x + 6x = 3.5
6 6 6 6 6 6
Example 3.5 – What is the expected value of the sum of numbers when a pair of dice is rolled?
3-7
CE 204 (2022-23 Spring Semester) MSY
If the expected value of the product of the numbers when a pair of dice are tossed was desired,
then from Theorem 3.4 and since X and Y are statistically independent,
The expected (average, mean) value is considered the most important measure of central
tendency and is considered the best point estimator for the design parameters whose exact
values are unknown in engineering applications. Another measure of central tendency that can
be preferred after the expected value is the median. Apart from these two measures of central
tendency, the mode is the third option. However, it is not preferred because it is based on a
single value.
Definition: The value of random variable X, for which the probabilities of getting values larger
and smaller than itself are equal, is called the median. If mX represents the median value, then
FX(mX) = 0.5.
Definition: The most likely value of a random variable is called modal value or mode for
short. (The root of this term is the word fashion). In other words, mode corresponds to the value
of the random variable, which occurs most often or has the greatest frequency or the highest
probability density of the probability density function.
The measure for the average value of a distribution is the expected value, which only
summarizes the central tendency of the distribution. Another important feature of the
distribution is its spread, dispersion, or variability around the expected value. This property of
the distribution is summarized by variance and standard deviation. In addition, the coefficient
of variation, defined as the ratio of standard deviation to expected value is also widely used.
Definition: The variance of the random variable X, having any type of distribution, is defined
as:
VAR(X) or σ2X denotes the variance of X. Variance is a measure of the dispersion and spread
of the distribution. If X can only get the expected value, then σ2X equals zero. As the values
of X move away from each other and the expected value, the variance will grow.
Definition: The standard deviation of the random variable X, is denoted by σX and defined
as follows:
X = VAR (X) (3.7)
3-8
CE 204 (2022-23 Spring Semester) MSY
Theorem 3.5 The variance of the random variable X can be written as follows:
VAR (X + b) = 2X
Theorem 3.7 If X is a random variable and a is a constant, then
VAR(aX) = a2 σ2X
Theorem 3.8 If X and Y are two statistically independent random variables, then
VAR (X + Y) = X 2 + 2Y
Example 3.6 – Compute the variance of X, if X is a random variable showing the number
obtained in a roll of a fair die.
1 1 1 1 1 1 91
E(X 2 ) = 1x + 4x + 9x + 16x + 25x + 36x =
6 6 6 6 6 6 6
91 35
X2 = − 3.5 2 = = 2.92
6 12
Definition: The coefficient of variation (c.o.v.) of the random variable X is denoted by δX and
defined as follows:
σ
δX = μX (3.9)
X
Example 3.7 – If X represents the number observed in the first roll and Y represents the number
observed in the second roll of a fair die, what is the variance, standard deviation, and coefficient
of variation of (X+Y)?
35
Solution: From Example 3.6, X2 = Y2 =
12
Based on Theorem 3.8,
35 35 35
VAR (X + Y) = X2 + Y2 = + = = 5.83
12 12 6
σ(X+Y) = √VAR(X + Y) = √5.83 = 2.42
3-9
CE 204 (2022-23 Spring Semester) MSY
Using the value computed in Example 3.5 for E(X+Y) = 3.5 + 3.5 = 7, the c.o.v. of X is found
as:
2.42
δ(X+Y) = = 0.346
7
Example 3.8 – The probability density function of the continuous random variable, X is as
follows:
f(x) = c x3 0≤x≤5
=0 otherwise
Solution:
5
a) ∫0 cx 3 dx = 1.0 → c = 4/625 = 0.0064
5
b) E(X) = μX = ∫0 x (0.0064 x 3 ) dx = 4.0
4
m = √0.5 𝑥 625 = 4.205
mode = 5.0
σX = √0.67 = 0.82
0.82
c.o.v. (X) = δX = = 0.205
4.0
4 (0.0064 x4 ) 4
Pr(2 < X < 4) ∫2 (0.0064 x3 ) dx {
4
}2
c) Pr(X > 2 | 1< X < 4) = = 4 = 4
0.0064 x 4
Pr(1< X < 4) ∫1 (0.0064 x3 ) dx { }1
4
256−16 240
= = = 0.941
256−1 255
3-10
CE 204 (2022-23 Spring Semester) MSY
𝟎 for x < 0
𝟏
𝐅𝐗 (𝐱) = { (𝟔𝟐𝟓)𝐱 𝟒 for 0 ≤ x < 5 }
𝟏 for x ≥ 5
𝟏 𝟏𝟔
Check: 𝐅𝐗 (𝟐) = (𝟔𝟐𝟓)𝟐𝟒 = (𝟔𝟐𝟓) = 0.0256
Besides the measures of central tendency and spread, two measures are available concerning
the shape of a distribution. Although the histogram provides a general view of the shape, it is
advisable to examine these two numerical measures of shape, which give more information.
These are the coefficients of skewness and kurtosis.
The skewness coefficient is an indicator of the degree and direction of skew, i.e. deviation
from horizontal symmetry. The measure of skewness or asymmetry is based on the third
central moment, 𝛍(𝟑)
𝐗 , defined for the discrete and continuous random variables, respectively,
as follows:
(3)
μX = E(X − )3 = ∑all xi(xi − X )3 pX (xi ) (3.10)
(3) ∞
μX = E(X − )3 = ∫−∞(x − X )3 fX (x)dx (3.11)
If the pdf or pmf of a random variable is symmetric about the mean value, 𝐗 , the third central
moment will be zero. If it is positive, then the distribution will be skewed in the positive
direction, i.e.towards the right, and will be called positively skewed. On the other hand, if it is
negative the skewness will be in the negative direction and the distribution will be called
negatively skewed. A dimensionless measure of skewness is the skewness coefficient, 𝛄1,
defined as:
(3)
E(X − )3 μX
γ1 = = (3.12)
σ3 σ3
The other measure related to the shape is the coefficient of kurtosis, which measures the
central peakedness (or flatness) of the distribution relative to the standard bell-shaped curve of
the normal distribution. Kurtosis is based on the fourth central moment, 𝛍(𝟒) 𝐗 , and is defined
for the discrete and continuous random variables, respectively, as follows:
(4)
μX = E(X − )4 = ∑all xi(xi − X )4 pX (xi ) (3.13)
(4) ∞
μX = E(X − )4 = ∫−∞(x − X )4 fX (x)dx (3.14)
3-11
CE 204 (2022-23 Spring Semester) MSY
One main reason why we are interested in these two coefficients is that most of the inferences
in statistics require that the distribution be normal or approximately normal. For a normal
distribution, the coefficients of skewness and kurtosis are 0 and 3, respectively. Therefore, if
the distribution under consideration has values close to these, then it is possible to justify the
normality assumption. The coefficient of kurtosis (𝛄2) is interpreted as follows: If 𝛄2 = 3,
normal kurtosis (i.e. equal to that of the normal distribution); 𝛄2 > 3, more peaked than the
normal distribution; 𝛄2 < 3, flatter than the normal distribution.
It is possible to make an analogy between the numerical descriptors of random variables and
moments in engineering mechanics. For this purpose, we define the nth general moment of X
(𝐧)
(𝐦𝐧𝐗 ) and the nth central moment of X (𝐗 ), respectively, as follows:
(𝐧)
𝐗 = E(X − )n = ∑all xi(xi − X )n pX (xi )
(𝐧) ∞
𝐗 = E(X − )n = ∫−∞(x − X )n fX (x)dx
(1)
For example, for n = 1, mX = E(X) = X .
(1) (2)
Also for n = 1, X = 0 and for n = 2, X = VAR(X) = σ2X .
As it will be observed from the definitions of moments given above, the mean value is
analogous to the centroidal distance and the variance to the moment of inertia of a unit area as
shown in Fig. 3.6. In this figure, an irregular-shaped unit area defined by the function y = f(x)
is considered. The centroidal distance, 𝑥0 of the unit area is:
3-12
CE 204 (2022-23 Spring Semester) MSY
Figure 3.6 An irregular-shaped unit area (adopted from Ang and Tang, 2007)
∞
∫−∞ x fX (x)dx ∞
x0 = =∫−∞ x fX (x)dx = m(1)
X
Area
which is also the first general moment of the irregular-shaped unit area, which equals the mean
value, X . The moment of inertia of the area about the vertical axis through the centroid, IY is:
∞ (2)
IY = ∫−∞(x − x0 )2 fX (x)dx = X
which is also the second central moment of the irregular-shaped unit area that equals the
variance of X.
The useful life, T of welding machines is assumed as a random variable having an exponential
probability distribution. The pdf and CDF of T are, respectively, as follows:
(a) (b)
Figure 3.7 Exponential (a) pdf and (b) CDF of useful life T of a welding machine for μT = 50
3-13
CE 204 (2022-23 Spring Semester) MSY
Compute mean, median, mode, variance, standard deviation and coefficient of variation of T.
Solution:
Therefore, the parameter λ of the exponential distribution is the reciprocal of the mean value;
i.e., λ = 1/E(T).
In this case, the mode is zero, whereas the median life, m is obtained as follows:
m
∫0 λe−λt dt = 0.50
m = (- ln 0.50) / λ = 0.693/ 𝛌
Therefore,
m = 0.693 𝛍T
The variance of T is
∞
VAR(T) = ∫0 (t − 1/ λ)2 λe−λt dt
VAR(T) = 1/ 𝛌2
σT = 1/ 𝛌 = 𝛍T
For the exponential distribution of the useful life of welding machines, T, of Example 3.9, the
mean useful life of the machines is µT. Then, the third central moment of the pdf is (using µ for
µT),
∞
E(T – µ)3 = ∫0 (t − µ)3 (1/µ)/e−(t/µ) dt
∞
= (1/µ) ∫0 (t 3 − 3𝑡 2 µ + 3t µ2 + µ3 ) e−(t/µ) dt
= 2µ3
3-14
CE 204 (2022-23 Spring Semester) MSY
We recall from Example 3.9 that the standard deviation of the exponential distribution is:
σT = µT. Therefore, the skewness coefficient of this distribution is:
3-15
CE 204 (2022-23 Spring Semester) MSY
EXERCISE PROBLEMS
3.1. Compute X, X and δX for the following discrete probability distribution of X.
x 2 3 8
p(x) = Pr(X=x) 1/4 1/2 1/4
3.2. The probability mass function of the random variable X is given as follows.
x −3 6 9
p(x) = Pr(X=x) 1/6 1/2 1/3
3.3. The probability mass function of the discrete random variable X is given as follows.
x −2 1 2 4
p(x) = Pr(X=x) 1/4 1/8 1/2 1/8
3.4. The probability density function of the random variable X is given as follows.
fX (x) = cx 0 x 1
=0 otherwise
a) What should be the value of c so that fX (x) is a proper probability density function?
b) Plot the function fX (x).
c) Compute E(X), median and mode of X.
d) Compute the variance, standard deviation and coefficient of variation of X.
3.6. The random variable X takes the value of 5 with a probability of 0.30 and has a triangular
distribution in the range [0, 20] as shown in the following figure (Fig. 3.6).
a) What should be the value of the coefficient k so that X has a valid probability density
function? Write down the equation that specifies the probability distribution of X.
b) Find the mean, median and mode values of X.
c) Find the variance, standard deviation and coefficient of variation of X.
d) If it is known that X will not get a value greater than 15, what is the probability that X will
be greater than 12?
3-16
CE 204 (2022-23 Spring Semester) MSY
fx(x)
0.3
0 x
5 10 20
3.7. The probability density function of the total load on a roof, denoted by S, is as follows:
c
fS(s) = 3 tons ≤ s ≤ 6 tons
s3
=0 otherwise
a) What should be the value of the coefficient, c so that the given relationship corresponds to
a valid probability density function?
b) What is the expected value and median of the total load S?
c) Find the variance and coefficient of variation of the total load S.
d) If it is known that the roof can carry a maximum load of 5.5 tons, what is the probability of
collapse?
3-17
CE 204 (2022-23 Spring Semester) MSY
Chapter 4
SOME IMPORTANT PROBABILITY DISTRIBUTIONS
The most important continuous probability distribution in statistics and also in civil engineering
applications is the "normal distribution". The diagram of this distribution is a bell-shaped
curve (see Figs. 4.1a, b, c). The normal distribution is symmetrical and the mean, median, and
mode are equal to the same value. It has a wide range of applications in terms of describing
the distribution of many populations in nature. The equation of the normal distribution curve
was first derived by DeMoivre in 1733. Later, Gauss (1777-1855) obtained this distribution
function after a study that investigated errors in a repeated measurement. For this reason, the
normal distribution is sometimes called the Gaussian distribution.
A random variable, X with a normal distribution is called a normal random variable. The
function that gives the probability distribution of the normal random variable depends on only
two parameters, X (−∞ ≤ X ≤ ∞) and (σ2X > 0). Accordingly, the probability density
function of X, will be shown as N(x; X ,σX ).
x
Definition: For a normal random variable, X, with mean and variance 2, the equation of
the normal distribution curve is as follows:
4-1
CE 204 (2022-23 Spring Semester) MSY
x - 2
1 -1/2 ( )
f ( x ) = N ( x ; , ) = e -X (4.1)
2
where, π = 3.141 and e = 2.718. The normal curve is fully defined when the values of and
are given. Theoretically, X gets any value between − and +. However, as shown in
Fig. 4.1c, the normal curve approaches the x-axis asymptotically but does not cut the x-axis.
Because the values in the tail sections are very close to zero, the range (spread) of X is often
taken as 3 in practice, which covers 99.7% of the total area. The two normal distributions
shown in Fig. 4.2a have the same mean value. However, the distribution with a large variance
is more compressed and spread. The two distributions in Fig. 4.2b are formally identical, but
because their mean values are different, they are located at different positions on the x-axis.
1
2
1= 2 x
Figure 4.2a Comparison of two normal distributions with the same mean value but different
standard deviations (1 = 2 and 2 > 1)
1 2
1 x
Figure 4.2b Comparison of two normal distributions with equal standard deviations but
different mean values (1 2 and 1= 2)
For the normal distribution displayed in Fig. 4.3, Pr(x1<X<x2) is shown with the shaded area.
For different values of and , different normal distributions will result. Integration of the
normal curve equation to compute the areas under this curve and hence probabilities do not
yield a closed-form solution. This difficulty may be overcome by numerical integration, but
this is not a practical solution. For this reason, it is convenient to use tables. It will not be
appropriate to prepare a table for each normal distribution. Therefore, a single table has been
prepared that can be used for all normal distributions. This table applies to the standard
normal random variable Z, which has a normal distribution with = 0 and = 1.
4-2
CE 204 (2022-23 Spring Semester) MSY
x1 x2 x
Figure 4.3 Pr(x1 < X < x2) = area of the shaded section
Any normal random variable X can be converted to the standard normal random variable, Z,
by utilizing the following relationship.
X − x
Z= (4.2)
x
As shown below the mean value and the variance of Z are equal to zero and 1, respectively.
E( Z) =
1
E(X − x ) =
1
E(X) − X = 1 ( X − X ) = 0
x X X
and
X − X
= 2 VAR (X) + VAR ( X ) = 2 X = 1 .
1 1 2
Z2 = VAR
X X X
Definition: The distribution of a normal random variable with a mean value of 0 and a standard
deviation of 1 is called a standard normal distribution.
When the random variable X is between X=x1 and X=x2, the random variable Z will fall
between z1=(x1-)/ and z2=(x2-)/ values. This is illustrated in Fig. 4.4. Thus, the area below
the X curve and between the x=x1 and x=x2 lines will be equal to the area below the Z curve
and between the z=z1 and z=z2 lines and
Pr( x1 X x 2 ) = Pr(z1 Z z 2 )
=1
x1 x2 x z1 z2 μ=0 z
Figure 4.4 The corresponding equivalent areas under the normal distribution curves of
random variables X and Z
In Figs. 4.5a, b, c and d, the CDF and pdf’s of standard normal distribution, with areas covering,
1, 2 and 3 standard deviations are shown, respectively.
4-3
CE 204 (2022-23 Spring Semester) MSY
(a)
(b)
(c)
(d)
Figure 4.5 (a) The CDF of the standard normal distribution and (b), (c), (d) pdf’s of a
standard normal distribution with areas covering, 1, 2 and 3 standard deviations
(Adopted from Ang and Tang, 2007)
Example 4.1 (a) – For a normal population with X = 50 and X = 10, calculate the z1 and z2
values that will satisfy the following equality.
45−50 62−50
z1 = = ‒ 0.5 and z2 = = 1.2
10 10
Accordingly, Pr(45 < X < 62) = Pr(− 0.5 < X < 1.2).
The tables required for normal distributions are thus reduced to the table prepared for a single
standard normal distribution. Table 4.1 at the end of this section lists the areas that are below
the standard normal curve and correspond to Pr(Z<z). In this table, the Z value varies between
0 and 4.0. The following examples show how to use this table.
Example 4.1 (b) – In Example 4.1(a) it was shown that Pr (45<X<62) = Pr(− 0.5<Z<1.2). Now
the corresponding probability will be computed as follows:
- 0.5 0 1.2 z
Example 4.2 – Car tires produced by a factory last an average of 2 years and show a standard
deviation of 0.5 years. Assuming that the lifetime of these tires is normal, what is the
probability that a purchased tire will wear out before 1.5 years?
Solution: First, a figure is drawn (Fig. 4.7) and the desired area is marked. To find Pr(X<1.5),
the area to the left of x=1.5 must be calculated. This area is equal to the area to the left of the z
value, which is the equivalent of x=1.5. This z value is:
1.5 − 2
z= = −1.0
0.5
From Table 4.1, Pr(X<1.5)=Pr(Z<−1.0) = 0.1587.
4-5
CE 204 (2022-23 Spring Semester) MSY
=
1.5 = x
Example 4.3 − The average value of points taken in a statistics examination was 70 and the
standard deviation was 8. If 10% of the class is given the grade A, what is the smallest point
that is enough to get an A? Points taken in the examination will be assumed to show a normal
distribution.
Solution: In the previous examples the z values corresponding to the x values were found and
then the desired areas were obtained from Table 4.1. In this example, the opposite will be done.
The z value will be found from the given area (or probability) and using this z value, x will be
calculated from x=+z. In Fig. 4.8, the area of 0.10 is shown as shaded.
=
0.10
x
The desired z value should satisfy the requirement: Pr(Z > z) = 0.10 or equivalently Pr(Z < z)
= 0.90. From Table 4.1, it is observed that Pr(Z < 1.28) = 0.90, meaning that the desired z value
is 1.28. The smallest point, xA , required to get an A grade is calculated below.
xA = + z
= 70 + 1.28 x 8 = 70 + 10.24
= 80.24
4-6
CE 204 (2022-23 Spring Semester) MSY
Definition: If X is a random variable having a lognormal distribution, then the equation of the
probability density function of X is as follows:
ln x - 2
1 -1/2 ( )
f ( x ) = LN ( x; , ) = e 0X (4.3)
2 x
Here, = 3.141 and e = 2.718. = X = E (Ln X) and 2 = 2X = VAR (Ln X), respectively,
denote the mean value and the variance of Ln X. The shape of the lognormal distribution is
illustrated in Fig. 4.8 for the different values of its parameter, 𝜉.
Figure 4.9 The lognormal probability density function corresponding to different 𝜉 values
(adopted from Ang and Tang, 2007)
Ln X − x
Z= (4.4)
x
4-7
CE 204 (2022-23 Spring Semester) MSY
The following relationships apply between the mean value, μX and the standard deviation, X
of the random variable, X and the mean value, X and standard deviation, X of the random
variable, LnX:
σX
Here, X = , and denotes the coefficient of variation of X. For small values of the coefficient
X
of variation ( 0.30), it is possible to assume, X ≅ X .
Often the median value is taken as the mean value of a lognormally distributed random variable,
since, = Ln (median).
Example 4.4 – For a population having a lognormal distribution with, =50 and =10,
calculate Pr(45 < X <62).
Solution: Using the relationships given above, the parameters of the lognormal distribution are
calculated as shown below.
Ln 45 − 3.89
z1 = = − 0.42
0.20
and
Ln 62 − 3.89
z2 = = 1.19
0.20
Therefore,
Using Table 4.1 the desired probability value is calculated as shown below.
Example 4.5 – The efficiency (E) of a company producing construction materials is estimated
based on the following relationship:
4-8
CE 204 (2022-23 Spring Semester) MSY
Y
E= √eT eS/9
M
where,
Y and M are lognormally distributed random variables with median values of 5000 hours and
250 TL, respectively and coefficients of variation of 0.20 and 0.15, respectively. T and S are
normally distributed random variables with mean values of 6 years and 45 hours, respectively,
and standard deviations of 2 years and 4.5 hours, respectively. T and S are dependent variables,
the coefficient that reflects the correlation between them is ρT,S = 0.75. All other variables are
independent of each other.
Solution:
a) The parameters of the lognormally distributed random variables Y and M are as follows:
If we take the logarithm of both sides of the equation given for efficiency,
ln E = ln Y – ln M + 0.5 (T + S/9)
This is a linear equation and the expected value and variance of E is calculated as follows:
VAR (ln E) = VAR (ln Y) + VAR (ln M) + 0.25 [VAR (T) + (1/81) VAR (S)]
+ 2 x 0.52 x (1/9) x ρ x σT x σS
ξ2E = 0.22 + 0.152 + 0.25 [22 + (1/81) 4.52] + 2 x 0.25 x (1/9) x 0.75 x 2 x 4.5
Example 4.6 − The safety factor, F, for a building element, is defined as follows:
R
F=
S
Here,
a) Find the distribution parameters (λR, ξR and λS, ξS) of the lognormal random variables R and
S.
b) Since R and S are random variables, F will also be a random variable. Accordingly, obtain
the probability distribution of F and state the name of this distribution. At the same time, find
the mean value and coefficient of variation of F.
c) If the safety factor, F, is greater than 3.0, the structure is considered to be "safe". What is the
probability that the structure will be rated as safe?
Solution:
a) ξR ≅ δR = 0.15
ξS ≅ δS = 0.25
b) F = R/S
ln F = ln R – ln S
Since R and S are lognormally distributed, ln R and ln S will be normally distributed and ln F,
which is a linear function of these two variables, will also be normally distributed. If ln F is
normal, then F will have a lognormal distribution.
4-10
CE 204 (2022-23 Spring Semester) MSY
VAR (ln F) = ξ2F = VAR (ln R) + VAR (ln S) = ξ2R + ξ2S = 0.152 + 0.252 = 0.085
δF ≅ ξF ≅ 0.292
λF = ln μF – 0.5 x ξ2F
μF = e0.7566 = 2.13
i) The experiment can only result in two ways: Success (S) or Failure (F);
p.p.(1-p)…(1-p).p...p = pr (1 ‒ p)n−r
Since in n trials, r number of successes will result in n = n! different ways and since
r (n − r )! r!
these events will be mutually exclusive, the required probability can be written as follows:
n!
B(r; n, p) = Pr (R = r; n, p) = p r (1 - p) n - r r = 0, 1, 2, … , n
(n - r)! r !
This distribution, which gives the probability of observing r number of successes in a Bernoulli
experiment repeated n times is called the binomial distribution.
4-11
CE 204 (2022-23 Spring Semester) MSY
n!
{ } is called the Binomial coefficient, and the failure probability, “1 – p” is generally
(n − r )! r!
shown by “q”.
n
B (r; n, p) = p r q n -r r = 0, 1, 2, ..., n (4.7)
r
Example 4.7 – A soldier hits the target in 75% of his shots. What is the probability that he will
not be able to hit the target at least three times in his next five shots?
Solution: If we consider it a success not to hit the target in this problem, then p = 0.25, and the
desired probability is computed as follows:
Example 4.8 – If a patient has a 0.90 chance of recovery as a result of heart surgery, what is
the probability of recovery of only 5 out of the 7 patients who will undergo heart surgery?
Solution:
7!
Pr(R = 5) = 0.905 x 0.102 = 0.124
(7 − 5)! − 5!
Theorem 4.1 The expected value and variance of the binomial distribution are as follows:
E(R) = np (4.8)
Proof: According to the definition of the expected value for a discrete random variable (see
Section 3.5)
n
E(R ) = r B(r; n, p)
r =0
n n!
= r p r (1 - p) n -r
r =0 (n − r )! r!
4-12
CE 204 (2022-23 Spring Semester) MSY
n n (n − 1)!
E(R ) = r p p r -1 (1 - p) n -r
r =1 (n − r )! r (r - 1)!
n (n − 1)!
= np p r -1 (1 - p) n -r
r =1 (n − r )! (r - 1)!
E(R) = R = np
(n − r )! (r - 1)! r
p p r -1 (1 - p) n -r
n
(n − 1)!
= np r p r -1 (1 - p) n -r
r =1 (n − r )! (r - 1)!
Let, n – 1 = n′ ve r – 1 = r′
n' n '!
E(R 2 ) = np (r '+1) P r' (1 - p) n'-r
r '= 0 (n '−r ' )! r'!
n n'
= np r ' B(r' ; n' , p) + B(r ' ; n' , p)
r'= 0 r'= 0
= np n' p + 1 = np(n - 1) p + 1
= n 2 p 2 - np 2 + np
VAR (R ) = n 2 p 2 - np 2 + np - n 2 p 2 = np - np 2
= np(1 - p) = npq
Example 4.9 –What is the expected value and variance of the number of patients who will
recover among the 7 patients in Example 4.8?
Solution:
Example 4.10 – If the number of experiments (trials) is fixed, what is the p value that
maximizes the variance of the Binomial distribution?
To find the p value that maximizes the variance, the derivative of the f(p) function relative to
p is taken and set equal to zero as shown below:
d f (p) 1
= n − 2np = 0 p=
dp 2
In a Bernoulli experiment, the probability distribution of the random variable describing the
number of trials repeated until the first occurrence of a specified event, N1, is called the
geometric probability distribution and is expressed as follows:
The average number of trials up to the first occurrence, μN1 is also called the mean recurrence
number (duration) or the mean return period (interval).
Example 4.11 – A building is designed for a 500-year earthquake. What are the chances of an
earthquake of this magnitude hitting the building for the first time within ten years after the
building was constructed?
In a Bernoulli experiment, the probability distribution of the random variable describing the
number of trials repeated until the kth occurrence of a specified event, Nk, is called the negative
binomial probability distribution and is expressed as follows:
4-14
CE 204 (2022-23 Spring Semester) MSY
n −1 (n−1)!
pNk = Pr(Nk = n) = ( ) pk qn−k = pk qn−k n = k, k+1, …
k−1 (n−k)!(k−1)!
(4.13)
=0 n<k
Example 4.12 – According to the information provided in Example 4.11, what is the
probability of exposure of this building to the second 500-year earthquake in the 10th year after
its construction?
(10−1)!
Pr(N2 = 10) = 0.0022 (0.998)10−2 = 9 x 4x10−6 x 0.984 = 3.543x10−5
(10−2)!(2−1)!
Before defining the Poisson distribution, it is worth explaining the Poisson process. In a
Poisson process, the following properties are assumed to be satisfied:
i) The probability of an event occurring over a short period of time, (t; t+Δt), is approximately
equal to (xΔt). is the average number of events per unit time (Stationarity);
ii) The number of events in any time period is independent of the number of events in other
time intervals (Independence);
iii) The probability of more than one event occurring in a short period of time is negligible
compared to the probability of occurrence of one or zero events (Nonmultiplicity).
The above-mentioned assumptions are given for the time-dependent Poisson processes (e.g.
the number of earthquakes over a certain period of time, the number of planes landing at an
airport, the number of accidents at an intersection). However, the same assumptions apply to
Poisson processes related to space (e.g. the number of particles in a certain volume, the number
of errors on a page, the number of earthquakes in a region).
Definition: The probability distribution of a Poisson random variable X, which gives the
likelihood of the number of events that will occur within a certain time period or a certain space
interval (or region), is as follows:
e - x
P( x ; ) = Pr(X = x ) = x = 0, 1, 2, ... (4.16)
x!
4-15
CE 204 (2022-23 Spring Semester) MSY
Here, = the average number of events that occur here during a specified time period (or in a
specified region). If the specified time (or space) interval is denoted by t and the average
number of events that occur within unit time (or space) is , then =t and the distribution is
written as follows:
e-t (t)x
P ( x ; , t ) = t≥0 x = 0, 1, 2,… (4.17)
x!
Example 4.13 – A secretary makes an average of 2 mistakes on a page. On the page, she is
currently typing,
a) What is the probability that she makes more than two mistakes?
b) What is the probability she makes no mistakes?
Solution:
e −2 2 o e −2 2 e −2 2 2
Pr(X 2) = + + = 0.677
0! 1! 2!
Example 4.14 – There are an average of 3/7 traffic accidents per day at an intersection.
Compute the probability of occurrence of five accidents at this intersection in a week?
Solution:
3
e − 3 / 7 x 7 ( x 7) 5
7 e −3 35
Pr(X = 5) = = = 0.1008
5! 5!
Theorem 4.2 The expected value and variance of a Poisson distribution are equal to , which
is the parameter of the distribution. That is, E(X) = VAR(X) = .
Proof: a) According to the definition given for the expected value of discrete random variables
(See Section 3.5)
e - x
E(X) = x
x =0 x!
4-16
CE 204 (2022-23 Spring Semester) MSY
e - x −1
E( X) = x
x =1 x ( x − 1)!
e - x '
E(X) =
x '= 0 x '!
Since, P(x'; ) = 1 → E(X) = μX =
x '= 0
Example 4.15 − An offshore platform will be constructed to withstand the forces created by
ocean waves.
a) The maximum annual wave height (relative to average sea level) is a Gaussian random
variable with an average value of 4.0 m and a coefficient of variation of 0.80. What is the
probability that wave height exceeds 6.0 m?
Solution:
6.0 – 4.0
Pr (H > 6) = 1 – ( ) = 1 – (0.625) = 1− 0.734 = 0.266
3.2
b) If the platform is to be designed according to a wave height (above average sea level) where
there is an 80% chance that the waves will not overtop, in a 3-year period, how many meters
above the average sea level should this height be? Over the years, it has been assumed that the
events exceeding the design height of wave heights are statistically independent of each other.
Solution:
Pr (Not exceeding the design height in the three consecutive years) = 0.80 = (1-p)3
Here, p = Pr (exceeding design height within a specified year).
h – 4.0
Accordingly, ( ) = 0.9283
3.2
where, h = design wave height.
4-17
CE 204 (2022-23 Spring Semester) MSY
c) It is assumed that ocean waves exceeding six meters occur according to a Poisson process,
and each of them has a 0.40 probability of damaging the platform. Based on these two
assumptions, what is the probability that the platform will experience damage in the next 3
years due to ocean waves? The annual events in which waves cause damage to the platform are
assumed to be statistically independent.
Solution:
If D is defined as an event in which ocean waves cause damage to the platform, then the
probability of a wave height that will cause damage to the platform occurring within a year is
as follows:
Accordingly, the mean rate of occurrence of wave heights that cause damage to the platform
will be ν = 0.1064 per year.
Pr (No damage to the platform from ocean waves in 3 years) = Pr (No damage-inducing wave
height results in 3 years) = e − 0.1064x3 = e − 0.3192 = 0.727
Example 4.16 − It is assumed that the capacity (C), of a building according to the equivalent
horizontal load coefficient is a lognormal variable with a median value of 6.5 and a coefficient
of variation of 0.20. It is estimated that the equivalent horizontal load coefficient due to the
largest earthquake to occur at the construction site will be 5.5.
a) What is the probability that the largest earthquake that can occur at the construction site will
cause damage to the building?
Solution: Seismic capacity, C, will have a lognormal distribution, and its parameters, λ and ξ,
are calculated as follows:
ln 5.5−ln 6.5
Pr(Damage to building) = Pr(C≤5.5) = ( ) = (− 0.835) = 1 – 0.7985 = 0.2015
0.20
b) If it is known that the building did not experience any damage when subjected to an
equivalent horizontal load coefficient equal to 4.0 created by a previous moderate earthquake,
what is the probability that it will not be damaged if it is subjected to the largest earthquake?
Solution:
According to the given information, what is asked is a conditional probability and is calculated
as follows:
4-18
CE 204 (2022-23 Spring Semester) MSY
Pr(No damage to building given that no damage was experienced when subjected to an
equivalent horizontal load coefficient of 4.0) = Pr(C ≥ 5.5C > 4.0)
Pr (C 4 C 5.5) Pr (C 5.5)
= =
Pr (C 4.0) Pr (C 4.0)
ln 5.5 − ln 6.5
1− ( )
0.20 1 − 0.20 0.80 0.80
= = = = = 0.806
ln 4.0 − ln 6.5 1 − (− 2.43) 1 − 0.007 0.993
1− ( )
0.20
c) The frequency of large magnitude earthquakes that may occur in the future is modelled by a
Poisson process with an average return period of 500 years. If the damage caused by
earthquakes is assumed to be statistically independent among themselves, what is the
probability that this building will not experience any earthquake damage during its economic
lifetime set to100 years?
Solution:
Mean annual occurrence rate of large magnitude earthquakes = ν =1/500 = 0.002 earthquakes
per year.
Pr (No earthquake damage to the building during next 100 years) = e− 0.0004x100 = e− 0.04 = 0.961
d) The building in question is located at a site consisting of five buildings designed considering
the same earthquake magnitude. Compute the probability that at least four of these five
buildings will not experience earthquake damage in their 100-year economic lifetime. It will
be assumed that damages to the buildings are statistically independent events.
Solution:
Pr (At least four out of five buildings do not experience earthquake damage during 100 years)
5 5
= (0.8)4 (0.2)1 + (0.8)5 (0.2)0 = 0.4096 + 0.3277 = 0.737
4 5
The mean annual occurrence rate of earthquakes that will cause damage at most three of the
five buildings:
Pr (At least four out of five buildings do not experience earthquake damage during 100 years)
If a series of events occur according to the Poisson distribution, the random variable T1, which
symbolizes the time passed up to the first occurrence, will have an exponential distribution.
The probability density function of the exponential distribution is as follows:
The average time until the first occurrence, μT1 is also called the mean recurrence time or
mean return period.
Example 4.17 – In a seismically active region, 10 earthquakes with a magnitude greater than
6.0 have occurred between 1900 and 2015, according to the past earthquake records. Assuming
that large magnitude earthquakes in this region occur according to the Poisson process,
compute the probability of occurrence of earthquakes of this magnitude in the next 5 years.
How long is the mean return period?
Solution:
10 10
ν= = = 0.087 earthquakes/year
2015−1900 115
5 5 5
Pr (T1 ≤ 5) = ∫0 fT1 (t)dt = ∫0 ν e−νt dt = ∫0 0.087 e− 0.087t dt = 1 − e− 0.087x5
= 1 – 0.647 = 0.353
If a series of events occur according to the Poisson distribution, the random variable Tk, which
symbolizes the time passed up to the kth occurrence, will have a gamma distribution. The
probability density function of the gamma distribution is as follows:
ν (νt)k−1
fTk (t) = e−νt t≥0 (4.21)
(k−1)!
4-20
CE 204 (2022-23 Spring Semester) MSY
If the parameter k is equal to an integer, then the gamma probability distribution is also called
the Erlang distribution.
Example 4.18 – In the region considered in Example 4.17, write down the probability density
function of the time elapsed until the occurrence of the third earthquake greater than magnitude
6.0. Find the expected value and coefficient of variation of the corresponding distribution.
Solution:
If the time elapsed until the occurrence of the third earthquake greater than magnitude 6.0 is
denoted by T3, the probability distribution of the random variable T3 will be equal to a gamma
probability density function as given below.
0.087 (0.087t)3−1
fT3 (t) =
(3−1)!
e− 0.087t
= 0.00033 t2 e− 0.087t t ≥ 0
VAR(T3) = σ2T3 = 3/ν2 = 3/0.0872 = 396.35 and σT3 = √396.35 = 19.91 years
Previous sections focused on the most commonly used probability distributions in civil
engineering applications. However, apart from these distributions, there are many other
distributions in statistics. Extreme value distributions are widely used in the modelling of live
and environmental loads such as wind, earthquake and snow. t, χ2 (chi-square) and F
distributions within the normal distribution family are widely used in point and interval
estimation and hypothesis testing in statistics. These distributions are not discussed here, but it
is possible to find information about them in any statistics book. Another important distribution
family is beta. The most important feature of this distribution family is that it allows defining
distributions in different forms. In particular, this distribution can be used efficiently in
4-21
CE 204 (2022-23 Spring Semester) MSY
quantifying uncertainties. For the sake of completeness, brief information on the beta
distribution is presented here.
If the two parameters of the beta distribution, denoted by n and r, meet the condition n>r> 0,
then the probability density function can be expressed as follows:
(n −1)!
f(x) = xr – 1 (1 – x)n – r – 1 0≤x≤1
(r −1)!(n −r −1)!
(4.24)
=0 otherwise
Although the equation of the beta distribution is similar in format to that of the binomial
distribution, the binomial distribution is discrete whereas beta is continuous. Another point to
be noted is that n and r do not need to be integers, although it is necessary to satisfy the
condition n > r > 0. When n and r are not integers, the terms (n – 1)!, (r – 1)! and (n – r – 1)!
taking place in Eq. 4.24 will be replaced by the corresponding gamma functions, Γ(n), Γ(r),
and Γ(n − r), respectively. For any variable, say y, the gamma function is defined as follows:
∞
Γ(y) = ∫0 x y −1 e− x dx (4.25)
In case n and r are integers, Γ(y) = (y – 1)!, and hence the factorial terms taking place in
Eq. 4.24 will be valid.
The expected value and variance of the beta distribution are as follows, respectively:
r
E(X│r, n) = μX = (4.26)
n
r(n –r)
VAR(X│r, n) = σ2X = (4.27)
n2 (n+1)
The shape of the beta distribution depends on the values of the parameters r and n. If r = n/2,
the distribution is symmetrical. If r > n/2, the distribution will be skewed to the left (negative
direction) and if r < n/2, it will be skewed to the right (positives direction). If r = 1 and n = 2,
the beta distribution will have a uniform distribution within the range of 0 to 1. The shapes of
the standard beta probability density functions corresponding to the different q(=n) and r values
are shown in Fig. 4.10.
Example 4.19 –The ratio of the defective steel rods produced in a steel manufacturing plant,
p, is modelled by a beta distribution with parameters, r = 1 and n = 20 items. Find the mean
value and variance of p.
Solution:
1
Based on Eq. 4.26, E(p) = μp = = 0.05
20
1(20 –1)
Based on Eq. 4.27, VAR(p) = σ2p = = 0.0023
202 (20+1)
4-22
CE 204 (2022-23 Spring Semester) MSY
Figure 4.10 The standard beta probability density functions with different q(=n) and r values
(From Ang and Tang, 2007)
The drainage from a community during a storm is a normal random variable estimated to have
a mean of 1.2 million gallons per day (mgd) and a standard deviation of 0.4 mgd; i.e.,
N(1.2, 0.4) mgd. If the storm drain system is designed with a maximum drainage capacity of
1.5 mgd, what is the underlying probability of flooding during a storm that is assumed in the
design of the drainage system?
Solution:
Flooding in the community will occur when the drainage load exceeds the capacity of the
drainage system; therefore, the probability of flooding is
1.5 −1.2
Pr(X > 1.5) = 1 − Pr(X ≤ 1.5) = 1- Pr (z ≤ ) = 1− (0.75)
0.4
= 1 − 0.7734 = 0.227
Also, the following are of interest:
(i) The probability that the drainage during a storm will be between 1.0 mgd and 1.6 mgd,
which is computed as follows:
1.6 − 1.2 1..0 − 1.2
Pr(1.0 < X ≤ 1.6) = ( ) − ( ) = (1.0) − (−0.5)
0.4 0.4
4-23
CE 204 (2022-23 Spring Semester) MSY
(ii) The 90-percentile drainage load from the community during a storm. This is the value of
the random variable at which the cumulative probability is less than 0.90, which we could
obtain as:
x0.90 −1.2 x0.90 −1.2
Pr(X ≤ x0.90) = Pr(z ≤ ) = ( ) = 0.90
0.4 0.4
Therefore,
x0.90 −1.2
= −1 (0.90)
0.4
In Example 4.20, if the distribution of storm drainage from the community is a lognormal
random variable instead of normal, with the same mean and standard deviation, compute:
a) The probability that the drainage during a storm will be between 1.0 mgd and 1.6 mgd.
b) The 90-percentile drainage load from the community during a storm.
Solution:
Thus, ξ = 0.324
= (1.049) − (−0.401)
ln x0.90−0.13
Therefore, = −1 (0.90)
0.324
4-24
CE 204 (2022-23 Spring Semester) MSY
4-25
CE 204 (2022-23 Spring Semester) MSY
EXERCISE PROBLEMS
4.1. If Z is the standard normal variable, compute the following probabilities using Table 4.1.
a) Pr(−1<Z<+1)
b) Pr(Z < 1.64)
c) Pr(−2 < Z)
d) Pr(Z < 2)
4.2. If Pr(− zo < Z < zo) = 0.95, what is the value of zo?
4.3. If the weight of 1000 students shows a normal distribution with an average weight of 68.5
kg and a standard deviation of 2.7 kg, what percentage of these students weigh:
4.4. In Problem 4.3, assume that the weights show the lognormal distribution and solve the
problem again according to this assumption.
4.5. The average voltage of a battery is 15 volts and its standard deviation is 0.2 volts. What
is the probability that the total voltage will be greater than 60.8 volts if such four batteries are
connected in series? Assume that the voltage of the batteries is normally distributed.
4.6. The weight of silver teaspoons manufactured by a company is normally distributed. The
mean value is 10.10 gm and the standard deviation is 0.04 gm. On the spoons, it is written that
they weigh 10 gm.
4.7 The columns of a high-rise building will be carried by pile groups consisting of two piles.
The load-carrying capacity of the piles, C, is equal to the sum of the friction resistance that will
develop along the total length of the pile, F, and the bearing capacity, B, at the tip of the pile.
The mean values of B and F are 20 and 30 tons, respectively, and the coefficients of variation
(c.o.v.) are 0.20 and 0.30, respectively. In addition, the random variables B and F are assumed
to be statistically independent and normally distributed.
a) Find the mean value and coefficient of variation of the load-carrying capacity C of a single
pile. What is the probability distribution of C? Why? Explain.
b) A certain number of piles can be connected to each other, creating pile groups to carry larger
external loads. It is assumed that the carrying capacity of the pile group is equal to the sum of
the carrying capacities of the piles that make up the group. In this problem, there is a group of
piles consisting of two piles. For various reasons, the capacities of these two piles are
dependent. The correlation coefficient reflecting this dependency is estimated to be ρ = 0.25.
If T represents the capacity of this pile group, consisting of two piles, find the mean value of T
and its c.o.v.
4-26
CE 204 (2022-23 Spring Semester) MSY
c) If the maximum load L this pile group will be exposed to is a normal random variable with
a mean value of 50 tons and a coefficient of variation of 0.30, what is the probability of this
pile group collapsing? The total carrying capacity, T, is assumed to be normally distributed and
statistically independent of L.
4.8. Between the years 1921 and 2000, four earthquakes with a magnitude value of 6.5 and
greater were recorded in a certain region. According to the assumption that earthquakes of this
magnitude in this region follow a Poisson distribution:
a) Compute the probability of occurrence of earthquakes of this magnitude in the next five
years. What is the mean recurrence period?
b) Write down the expression corresponding to the probability density function of the time
elapsed for the occurrence of the fourth earthquake of this magnitude. Find the expected value
and coefficient of variation.
c) A structure built in this region did not experience any damage during earthquakes with a
magnitude value of less than 6.5. Assuming that earthquakes with a magnitude value of 6.5 and
greater occur according to the Poisson process, what is the probability that this structure will
be damaged by earthquakes during its 50-year economic lifetime?
4.9. Prove the VAR(X) = ν relationship given for the Poisson distribution in Theorem 4.2.
4.10. A reinforced concrete tower is exposed to horizontal loads created by strong winds. One
of the considerations in the strengthening of the tower is the duration of strong winds. The
duration period of wind, T is assumed to be a normally distributed random variable with a mean
value of 4 hours and a standard deviation of 1 hour. According to this given information:
4.11. The depth from the ground surface to the rock layer, H, is not known for sure. Therefore,
it is treated as a normal random variable with a mean value of 10 m and a coefficient of variation
of 0.25. To create adequate support, the steel piles must be embedded 0.5 m into the rock.
a) What is the probability that a 14 m long steel pile will not anchor satisfactorily into the rock
layer?
b) If no rock layer up to 13 m depth was encountered during the driving of a 14 m long steel
pile to the ground, what is the probability that the pile will satisfactorily anchor in the rock
layer when an additional pile of 1 m is welded to the original length?
4.12. A column is designed to have a central safety factor of 1.6 (μR / μS = 1.6, where μR and
μS denote the mean strength and mean load values, respectively). The strength of the column
against axial loads is a random variable denoted by R with a coefficient of variation of 0.25.
The total axial load acting on column S consists of the sum of the effects of moving, dead, wind
and snow loads in the axial direction. It is assumed that the loads are independent of each other
and have the mean values and coefficients of variation given in the following table.
4-27
CE 204 (2022-23 Spring Semester) MSY
a) Calculate the expected value and coefficient of variation of the total axial load acting on the
column (S).
b) Calculate the failure probability of this column according to the assumptions that the axial
strength, R, also has a normal distribution and is statistically independent of the total load.
c) Compute the failure probability of this column if the axial strength and total load are
statistically dependent normal random variables. The correlation coefficient, which is a
measure of the degree of this dependence, is estimated as ρ = − 0.60.
4.13. In the construction of a dam, the contractor uses four construction equipment with the
same characteristics. The operating lifetime of this construction equipment until the first failure
(T) is assumed a lognormally distributed random variable with a mean value of 1200 hours and
a coefficient of variation of 0.25.
4.14. The safety margin for a building element, M, is defined as follows: M = R – S. Here,
R = the strength (carrying capacity) of the building element, and S = the load that the building
element is exposed to. R is a random variable with a mean value, μR = 40 kN, and a coefficient
of variation δR = 0.15. S is also a random variable with a mean value μS = 20 kN and a
coefficient of variation δS = 0.25. R and S are dependent variables, and the value of the
correlation coefficient, which is the measure of dependence between them, is given as
ρR,S = − 0.20. Both variables are normally distributed.
a) Since R and S are random variables, M will also be a random variable. Accordingly, obtain
the probability distribution of M and specify the name of the distribution. At the same time,
compute the mean value, μM, and coefficient of variation, δM of M.
(Answer: Normal distribution; μM = 20 kN; δM = 0.427)
b) Calculate the probability of failure of this building element according to the information
provided.
(Answer: pf = 0.0096)
c) If the probability of failure is less than 0.01, the building element is considered "safe". To
meet this condition, what should be the smallest value of the mean strength, μR , of this building
element?
(Answer: μR = 39.85 kN)
4.15. If the expected value and variance of a beta distribution are 2/3 and 1/72, respectively,
calculate the values of the parameters r and n of this distribution.
(Answer: r = 10, n = 15)
4-28
CE 204 (2022-23 Spring Semester) MSY
4.16. For the seismic design of ordinary structures, generally, the peak ground acceleration,
corresponding to an exceedance probability of 10% in 50 years is used in the calculations.
Compute the mean return period corresponding to this peak ground acceleration.
(Answer: 475 years)
4-29
CE 204 (2022-23 Spring Semester) MSY
TABLE 4.1
0 z0 z
z0 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.00000 0.00399 0.00798 0.01197 0.01595 0.01994 0.02392 0.02790 0.03188 0.03586
0.1 0.03983 0.04380 0.04776 0.05172 0.05567 0.05962 0.06356 0.06749 0.07142 0.07535
0.2 0.07926 0.08317 0.08706 0.09095 0.09483 0.09871 0.10257 0.10642 0.11026 0.11409
0.3 0.11791 0.12172 0.12552 0.12930 0.13307 0.13683 0.14058 0.14431 0.14803 0.15173
0.4 0.15542 0.15910 0.16276 0.16640 0.17003 0.17364 0.17724 0.18082 0.18439 0.18793
0.5 0.19146 0.19497 0.19847 0.20194 0.20540 0.20884 0.21226 0.21566 0.21904 0.22240
0.6 0.22575 0.22907 0.23237 0.23565 0.23891 0.24215 0.24537 0.24857 0.25175 0.25490
0.7 0.25804 0.26115 0.26424 0.26730 0.27035 0.27337 0.27637 0.27935 0.28230 0.28524
0.8 0.28814 0.29103 0.29389 0.29673 0.29955 0.30234 0.30511 0.30785 0.31057 0.31327
0.9 0.31594 0.31859 0.32121 0.32381 0.32639 0.32894 0.33147 0.33398 0.33646 0.33891
1.0 0.34134 0.34375 0.34614 0.34849 0.35083 0.35314 0.35543 0.35769 0.35993 0.36214
1.1 0.36433 0.36650 0.36864 0.37076 0.37286 0.37493 0.37698 0.37900 0.38100 0.38298
1.2 0.38493 0.38686 0.38877 0.39065 0.39251 0.39435 0.39617 0.39796 0.39973 0.40147
1.3 0.40320 0.40490 0.40658 0.40824 0.40988 0.41149 0.41308 0.41466 0.41621 0.41774
1.4 0.41924 0.42073 0.42220 0.42364 0.42507 0.42647 0.42785 0.42922 0.43056 0.43189
1.5 0.43319 0.43448 0.43574 0.43699 0.43822 0.43943 0.44062 0.44179 0.44295 0.44408
1.6 0.44520 0.44630 0.44738 0.44845 0.44950 0.45053 0.45154 0.45254 0.45352 0.45449
1.7 0.45543 0.45637 0.45728 0.45818 0.45907 0.45994 0.46080 0.46164 0.46246 0.46327
1.8 0.46407 0.46485 0.46562 0.46638 0.46712 0.46784 0.46856 0.46926 0.46995 0.47062
1.9 0.47128 0.47193 0.47257 0.47320 0.47381 0.47441 0.47500 0.47558 0.47615 0.47670
2.0 0.47725 0.47778 0.47831 0.47882 0.47932 0.47982 0.48030 0.48077 0.48124 0.48169
2.1 0.48214 0.48257 0.48300 0.48341 0.48382 0.48422 0.48461 0.48500 0.48537 0.48574
2.2 0.48610 0.48645 0.48679 0.48713 0.48745 0.48778 0.48809 0.48840 0.48870 0.48899
2.3 0.48928 0.48956 0.48983 0.49010 0.49036 0.49061 0.49086 0.49111 0.49134 0.49158
2.4 0.49180 0.49202 0.49224 0.49245 0.49266 0.49286 0.49305 0.49324 0.49343 0.49361
2.5 0.49379 0.49396 0.49413 0.49430 0.49446 0.49461 0.49477 0.49492 0.49506 0.49520
2.6 0.49534 0.49547 0.49560 0.49573 0.49585 0.49598 0.49609 0.49621 0.49632 0.49643
2.7 0.49653 0.49664 0.49674 0.49683 0.49693 0.49702 0.49711 0.49720 0.49728 0.49736
2.8 0.49744 0.49752 0.49760 0.49767 0.49774 0.49781 0.49788 0.49795 0.49801 0.49807
2.9 0.49813 0.49819 0.49825 0.49831 0.49836 0.49841 0.49846 0.49851 0.49856 0.49861
3.0 0.49865 0.49869 0.49874 0.49878 0.49882 0.49886 0.49889 0.49893 0.49896 0.49900
3.1 0.49903 0.49906 0.49910 0.49913 0.49916 0.49918 0.49921 0.49924 0.49926 0.49929
3.2 0.49931 0.49934 0.49936 0.49938 0.49940 0.49942 0.49944 0.49946 0.49948 0.49950
3.3 0.49952 0.49953 0.49955 0.49957 0.49958 0.49960 0.49961 0.49962 0.49964 0.49965
3.4 0.49966 0.49968 0.49969 0.49970 0.49971 0.49972 0.49973 0.49974 0.49975 0.49976
3.5 0.49977 0.49978 0.49978 0.49979 0.49980 0.49981 0.49981 0.49982 0.49983 0.49983
3.6 0.49984 0.49985 0.49985 0.49986 0.49986 0.49987 0.49987 0.49988 0.49988 0.49989
3.7 0.49989 0.49990 0.49990 0.49990 0.49991 0.49991 0.49992 0.49992 0.49992 0.49992
3.8 0.49993 0.49993 0.49993 0.49994 0.49994 0.49994 0.49994 0.49995 0.49995 0.49995
3.9 0.49995 0.49995 0.49996 0.49996 0.49996 0.49996 0.49996 0.49996 0.49997 0.49997
4.0 0.49997 0.49997 0.49997 0.49997 0.49997 0.49997 0.49998 0.49998 0.49998 0.49998
4-30
CE 204 (2022-23 Spring Semester) MSY
Chapter 5
MULTIPLE (MULTIVARIATE) RANDOM VARIABLES
If X and Y are two random variables, then their joint random characteristics and the
probabilities associated with given values of x and y can be described either by the joint
Cumulative Distribution Function, (CDF) or for the discrete random variables by the joint
probability distribution or joint probability mass function (pmf) and for the continuous
random variables by the joint probability density function (pdf) of X and Y. The basic
concepts and definitions are summarized in the following.
p(x, y) = Pr (X = x ∩ Y = y)
∞ ∞
The volume under pdf gives the probability (see Fig. 5.1)
5-1
CE 204 (2022-23 Spring Semester) MSY
𝑏 𝑑
Moreover, the volume under joint pdf gives the joint CDF.
Figure 5.1 Volume under the joint pdf corresponding to Pr(a < X ≤ b, c < Y ≤ d)
(From Ang and Tang, 2007)
∂2
f(x, y) = F(x, y)
∂x ∂y
Note that, the joint cumulative distribution function of X and Y, F(x, y), satisfies the following
conditions; hence, FX,Y (X ≤ x, Y ≤ y) is a non-negative and non-decreasing function of
x and y.
i) Discrete Case:
Marginal probability distribution of X and Y:
p(x) = ∑y p(x, y) ; p(y) = ∑x p(x, y)
5-2
CE 204 (2022-23 Spring Semester) MSY
∞ ∞
f(x) = ∫ f(x, y)dy ; f(y) = ∫ f(x, y)dx
−∞ −∞
i) Discrete Case:
p(x,y)
p(x/y) = p(y) ≠ 0
p(y)
p(x,y)
p(y/x) = p(x) ≠ 0
p(x)
f(x,y)
f(x/y) = f(y) ≠ 0
f(y)
f(x,y)
f(y/x) = f(x) ≠ 0
f(x)
E(aX ± bX) = aE(X) ± bE(Y) = aμX ± bμY ; where a and b are constants
∞ ∞
E[g(x, y)] = ∑x ∑y g(x, y) p(x, y) E[g(x, y)] = ∫−∞ ∫−∞ g(x, y) f(x, y)dxdy
5-3
CE 204 (2022-23 Spring Semester) MSY
When there are two random variables, the presence of a linear relationship is determined by:
i) Joint First Moment of X and Y
∞ ∞
If X and Y are statistically independent, then COV(X, Y) = 0. However, the converse is not
necessarily true, since the correlation coefficient detects only linear dependencies between
two variables.
𝐂𝐎𝐕(𝐗, 𝐘) is a measure of the degree of the linear relationship between two random variables
X and Y. For practical purposes, it is better to use normalized covariance, which is called the
correlation coefficient, defined as follows:
COV(X, Y)
ρ=
σX σY
ρ is a dimensionless measure of the linear dependence between two random variables. Its value
ranges between -1.0 and 1.0; i.e.
−1.0 ≤ ρ ≤ 1.0
As stated before covariance, accordingly, the correlation coefficient measures the degree of a
linear relationship and no causality information can be inferred from the correlation coefficient.
5-4
CE 204 (2022-23 Spring Semester) MSY
Comments:
a) Perfect positive
correlation. ρ = + 1.0
b) Perfect negative
correlation. ρ = – 1.0
c) No correlation.
d) Positively correlated.
e) ρ = 0. No linear
dependence, but a
functional (circular)
relationship.
f) ρ = 0. No linear
dependence, but a
functional (sinusoidal)
relationship.
Figure 5.2 Some examples of the degree of correlation and the corresponding values of the
correlation coefficient (From Ang and Tang, 2007)
Example 5.1
Consider the following joint probability mass function, which is given as a table:
Table 5.1 The joint probability mass function, p(x,y) given in Example 5.1 and the computed
marginal distributions, p(x) and p(y)
x
0 1 2 p(y)
y
0 1/6 1/3 1/12 7/12
1 2/9 1/6 0 7/18
2 1/36 0 0 1/36
p(x) 5/12 1/2 1/12
5-5
CE 204 (2022-23 Spring Semester) MSY
Solution:
a) Using the equations given in Sect. 5.1.2, the marginal distributions, p(x) and p(y), are
computed and displayed on the margins of Table 5.1.
pXY (x,1)
b) 𝐩𝐗/𝟏 = , for x = 0, 1, 2
pY (1)
pXY (2,1) 0
𝐩𝟐/𝟏 = = =0
pY (1) 7/18
0.1667 ≠ 0.243
Example 5.2
Consider the following joint probability density function:
=0 elsewhere
b) Find f(x/y).
5-6
CE 204 (2022-23 Spring Semester) MSY
Solution:
1 1
a) fX(x) = ∫0 fXY (x, y)dy = ∫0 4 xy dy = 2x for 0 ≤ 𝐱 ≤ 𝟏
=0 elsewhere
1 1
fY(y) = ∫0 fXY (x, y)dx = ∫0 4 xy dx = 2y for 0 ≤ 𝐲 ≤ 𝟏
=0 elsewhere
f(x,y) 4xy
b) f(x/y) = = = 2x for 0 ≤ 𝐱 ≤ 𝟏
f(y) 2𝑦
=0 elsewhere
Solution:
Q = S1 + S2
M = aS1 + 2aS2
E(Q) = Q = 1 + 2 VAR(Q) = σ2Q = σ12 + σ22
E(M) = M = a1 + 2a2 VAR(M) = σ2M = a2 σ12 + 4a2 σ22
COV(Q,M) = E(QM) ‒ QM
COV(Q,M)
𝛒𝐐𝐌 =
σQ σM
But,
E(S1S2) = E(S1) E(S2) = 1 2
E(S12 ) = σ12 + 12 E(S22 ) = σ22 + 22
Substituting these into the E(QM) expression and simplifying,
E(QM) = a(σ12 + 2σ22 ) + QM
COV(Q,M) = E(QM) ‒ QM = a(σ12 + 2σ22 ) + QM ‒ QM
= a(σ12 + 2σ22 )
Note that, the correlation coefficient depends only on variances but not on mean values.
If, it is assumed that 𝛔𝟐𝟏 = 𝛔𝟐𝟐 = 𝛔𝟐 , then
3aσ2 3aσ2 3
𝛒𝐐𝐌 = = = = 0.948
√{(2σ2 )(5a2 σ2 )} √10 (aσ2 ) √10
The computed value of 𝛒𝐐𝐌 indicates a high linear dependence between the shear force, Q
and bending moment, M at the fixed end of the beam. This is because both Q and M are
functions of the same loads S1 and S2.
Example 5.4
A cantilever beam is subjected to a random concentrated load P as shown in Fig. 5.4, below.
The load P may take on values of 10 kN or 15 kN with probabilities of 0.8 and 0.2, respectively.
The moment capacity of the beam MR is assumed to be NORMAL (30 kN.m, 6 kN.m). The
shear capacity VR is also assumed to be NORMAL (19 kN, 5kN).
Solution:
On the other hand, if the two failure modes are assumed to be perfectly correlated, the weakest
mode will be considered,
As observed the bounds on the survival probability of the given cantilever beam is:
5-9
CE 204 (2022-23 Spring Semester) MSY
5-10
CE 204 (2022-23 Spring Semester) MSY
Chapter 6
FUNCTIONS OF RANDOM VARIABLES
6.1. INTRODUCTION
In this chapter the following problem will be answered: “If we know the random characteristic
of a random variable, X, in terms of its CDF or pdf or pmf and if another random variable, say
Y, is related to X by a deterministic function, say, Y = g(X), how can we obtain the
corresponding CDF or pdf or pmf of Y. Here, X is called the independent variable and Y as
the dependent variable. First, we will consider the function of a single random variable, and
then extend it to the function of multiple random variables.
Let
Y = g(X) (6.1)
Then when Y = y, X = g-1(y), where g-1(∗) is the inverse function of g(∗). If the inverse function
g-1(y) is single-valued, i.e. has a single root, then
The relationship given by Eq. 6.2, is illustrated in Fig. 6.1(b) for the following function and the
pmf of X is displayed in Fig. 6.1(a):
Y = X2 for x ≥ 0
(a) (b)
Figure 6.1 Illustration of Eq. 6.2 (a) pmf of X; (b) pmf of Y
6-1
CE 204 (2022-23 Spring Semester) MSY
Because X is discrete, the graphical representation given in Fig. 6.1 can also be displayed in a
table format:
Table 6.1 pY(y) derived from pX (x) according to the functional relationship Y=X2 for x ≥ 0
According to Eq. 6.2, as seen in Table 6.1 when y = 1, x = 1 and pY (1) = pX (1) = 0.25; similarly,
when y = 4, x = 2 and pY (4) = pX (2) = 0.50 and when y = 9, x = 3 and pY (9) = pX (3) = 0.25.
At all other values of Y, pY (y) = 0.The probability mass function of Y is shown in Fig. 6.1(b)
and given in Table 6.1.
FY(y) = FX[g-1(y)]
For the continuous case, with fX (x) denoting the probability density function of X,
g−1 (y)
FY(y) = ∫−∞ fX (x)dx (6.4)
By making a change of variable in this integration, according to the basic rules of calculus, we
get,
6-2
CE 204 (2022-23 Spring Semester) MSY
dg−1 (y)
In Eq. 6.5, when y increases with x, will be positive; however, when y decreases with
dy
x, FY(y) = 1 ‒ FX(g −1 (y)) and accordingly
dg−1 (y)
but, since is also negative in this case, the derived pdf of y, for both positive and
dy
negative cases will be:
dg−1 (y)
fY(y) = fX (g −1 (y)) | |
dy
where, | ∗ | denotes the absolute value. The above results are summarized in the following
theorem.
Theorem 6.1
Suppose that X is a continuous random variable and g(∗) is a strictly monotonic differentiable
function. Let Y = g(X). Then the PDF of Y is given by:
dg−1 (y)
fY(y) = fX (g −1 (y)) | | where g(x) = y
dy
(6.5)
=0 if g(x) = y does not have a solution.
Example 6.1
The pmf of X is given in the following table. A new random variable dependent on X is defined
as follows:
Y = X2 ‒ X
Obtain the pmf of Y.
Solution:
The solution is presented in the following table:
X=x pX (x) Y = y (= x2 ‒ x)
-3 1/7 12
-2 1/7 6
-1 1/7 2
0 1/7 0
1 1/7 0
2 1/7 2
3 1/7 6
6-3
CE 204 (2022-23 Spring Semester) MSY
Y=y 𝐩𝐘 (y)
0 2/7
2 2/7
6 2/7
12 1/7
Example 6.2
Let X∼Uniform (−1, 1) and Y=X2. Find the CDF and pdf of Y.
Solution:
First, we note that the range of Y = [0, 1]. As usual, we start with the CDF. For y∈[0, 1], we
have
√y −(−√y )
= since X ∼ uniform (−1, 1)
1− (−1)
2 √y )
= = √y 0<y<1
2
0 for y < 0
FY(y) = {√y for 0 ≤ y ≤ 1}
1 for y > 1
Note that the CDF is a continuous function of Y, so Y is a continuous random variable. Thus,
we can find the PDF of Y by differentiating FY(y),
d 1
fY(y) = (FY(y)) = for 0 ≤ y ≤ 1
dy 2 √y
=0 otherwise
Consider a normal variate, X with parameters µ and σ; i.e., X: N(µ, σ) with pdf:
x - 2
1 -1/2 ( )
f ( x ) = N ( x ; , ) = e -X
2
6-4
CE 204 (2022-23 Spring Semester) MSY
X−
If Z = , determine the pdf of Z.
σ
Solution:
dg−1
First, we observe that the inverse function is: g −1 (z) = σz + μ and = σ. Then, according
dz
to Theorem 6.1, the pdf of Z is:
1
1 − 2 (σz + μ − μ)2
fZ (z) = exp [ ] |σ|
√2π σ σ2
1 2
1
= e−2 z
√2π
The random variable X has a lognormal distribution with parameters λ and ζ. Derive the pdf
of the random variable Y, where, Y = ln X.
Solution:
1 1 1 ln x − λ 2
fX (x) = exp [− ( ) ]
√2π ζ x 2 ζ
g −1 (y) = ey
and
dg −1
= ey
dy
1 1 1 y−λ 2 y 1 1 y−λ 2
fY (y) = exp [− ( ) ] |e | = exp [− ( ) ]
√2π ζ ey 2 ζ √2π ζ 2 ζ
which means that the distribution of Y = ln X is normal with a mean of λ and a standard
deviation of ζ; i.e., 𝐘: 𝐍(𝛌, 𝛇). This result also shows that:
6-5
CE 204 (2022-23 Spring Semester) MSY
In certain cases, the inverse function, g −1 (y) may not be single-valued and for a given value
of y, there may be multiple values of g −1 (y). For example, if
dgi−1 (y)
fY(y) = ∑ki=1 fX (g −1
i (y)) | | (6.7)
dy
The strain energy stored in a linearly elastic bar of length, L, subjected to an axial force, S, is
given by the following equation:
L
U= S2
2AE
where, A = cross-sectional area of the bar and E = modulus elasticity of the material. Here, the
only random variable is S and is assumed to be to have a standard normal distribution, i.e.
S: N(0, 1). Obtain the pdf of U.
Solution:
Since all of the variables, except S (i.e. L, A and E) are deterministic quantities we introduce a
constant c defined as:
c=L/2AE,
then
U = c S2
𝑢
The inverse will be s = ± √
𝑐
ds 1 ds 1
=± , but as explained above the absolute value will be taken as: | |= .
du 2 √cu du 2 √cu
6-6
CE 204 (2022-23 Spring Semester) MSY
Now based on Theorem 6.1 and because the inverse function has two roots,
2 2
1 𝑢 1 𝑢
− [ (√ ) ] − [ (‒√ ) ]
1 2 𝑐 1 1 2 𝑐 1
fU (u) = e + e
√2π 2 √cu √2π 2 √cu
1 𝑢
1 − [ ( )]
fU (u) = e 2 𝑐 for u ≥ 0
√2πcu
The resulting distribution is known as the Chi-square distribution with one degree-of-
freedom.
=0 otherwise
Solution:
√y 1 1
=∫
− √y 2
dx = [√y − (−√y)] = √y
2
1
d[FY (y)] 1
𝑦− 2 =
1
fY (y) = = 0<y<1
dy 2 2 √y
Therefore,
1
fY (y) = 0<y<1
2 √y
=0 otherwise
dx 1
=±
dy 2 √y
dx 1
but we will take the absolute value | |=
dy 2 √y
6-7
CE 204 (2022-23 Spring Semester) MSY
1 1 1 1 1
fY (y) = ∗ + ∗ =
2 2 √y 2 2 √y 2 √y
We conclude that:
1
fY (y) = 0<y<1
2 √y
=0 otherwise
We will consider first the case of two independent random variables, X and Y. Let the function
be defined as follows:
Z = g(X, Y)
Then for the discrete case, the corresponding pmf and CDF will be as follows, respectively:
For the functions of continuous multiple random variables Theorem 6.1 should be extended.
The resulting equations will be more complex however, there exist some general rules, which
simplify the derivations. These are listed below:
Example 6.7
A certain site is exposed to seismic hazard due to three active faults close to the site. Earthquake
occurrences on these three faults are assumed to be statistically independent events and
described by the Poisson distribution with mean annual rates as follows:
ν1 = 0.01 earthquakes/year, ν2= 0.04 earthquakes/year and ν3 = 0.05 earthquakes/year.
What is the probability distribution of the seismic activity at the site?
6-8
CE 204 (2022-23 Spring Semester) MSY
Solution:
Let Z, be the random variable representing the seismic activity at the site. According to the rule
stated above, Z will have Poisson distribution with a mean rate,
νZ = ∑3i=1 νi = 0.01 + 0.04 + 0.05 = 0.10 earthquakes/year
Example 6.8
Let Z = X1 + X2 and W = X1 ‒ X2 and let X1 and X 2 be two statistically independent normally
distributed random variables defined as X1 : N(20, 4) and X2 : N(10, 3). Derive the distributions
of Z and W.
Solution:
According to the rule stated above, Z and W will have normal distributions with the following
parameters.
Z = 20 + 10 = 30 and σZ = √42 + 32 = 5
Example 6.9
The total axial load, Y that is acting on a column is the sum of the following load effects, which
are listed in the table given below, together with their statistical parameters. All load effects are
assumed to be statistically independent and normally distributed.
a) Find the mean value, standard deviation and coefficient of variation of the total load, Y.
b) If the coefficient of variation of the axial design capacity of the column, D is 0.25 and if it
6-9
CE 204 (2022-23 Spring Semester) MSY
c) If the coefficient of variation of the axial design capacity of the column, D is 0.25 and if it
is a normally distributed random variable correlated with Y with a correlation coefficient
ρ = 0.7, what is the reliability of this column, if it is designed for a mean safety factor (S.F.) of
1.75?
Solution:
a) E(Y) = 50 + 80 + 20 + 10 + 35 = 𝟏𝟗𝟓 𝐤𝐍 = 𝛍𝐘
VAR(Y) = (50 × 0.03)2 + (80 × 0.2)2 + (20 × 0.25)2+ (10 × 0.25)2 +(35 × 0.4)2
σ2Y = 485.5(kN)2
σ 22.03
𝛔𝐘 = √485.5 = 22.03 kN and 𝛅𝐘 = μY = = 𝟎. 𝟏𝟏𝟑
Y 195
b)Let the safe state be defined as: "D>Y" or alternatively "D − Y > 0", where D is the
design load.
−341.25 + 195
= Pr (z > ) ≅ Pr(z > −1.66)
√7278.22 + 485.5
= 𝟎. 𝟗𝟓
6-10
CE 204 (2022-23 Spring Semester) MSY
−146.25
Pr (z > ) = Pr(z > −2.04) ≅ 𝟎. 𝟗𝟖
71.64
Example 6.10
The water supply for a city comes from two reservoirs, A and B with a total capacity of X m 3.
The amount of water in reservoir A comes from three rivers. Each of the rivers feeding reservoir
A has a discharge X1 varying normally as N(250000 m3, 15000 m3) and the amount of water in
reservoir B comes from two rivers each having discharge X2 normally distributed as N(150000
m3, 35000 m3). The demand (D) by the city also fluctuates normally with a mean of 800000 m3
and a coefficient of variation of 0.2. It is assumed that the amount of water coming from the
rivers is statistically independent.
a) Determine the expected value of the total amount of water X in reservoirs A and B.
b) Determine the variance of the total amount of water X in reservoirs A and B. What is the
probability that there may be a water shortage in the city (Shortage: D > X).
Solution:
b) X = X1 + X1 + X1 + X2 + X2
E(X) = 3 E(X1) + 2 E(X2) = 750000 + 300000 = 1 050 000 m3
VAR(X) = 3 VAR(X1) + 2 VAR(X2) = 3*150002 + 2*350002 = 3125*106 m6
Shortage: D > X → Pr(D > X) = Pr(D ‒ X > 0)
Let Y = D ‒ X
E(Y) = Y = E(D) – E(X) = 800000 – 1050000 = ‒250 000 m3
VAR(Y) = σ2Y = VAR(D) + VAR(X) = 160 0002 + 3125*106 m6
Since both X and D are normally distributed and statistically independent, Y will be also
normally distributed.
Y−Y 0−(−250000)
Pr(Y > 0) =Pr ( > )
√ σY √1600002 + 3125∗106
6-11
CE 204 (2022-23 Spring Semester) MSY
ln Z = ∑ni=1 ln Xi (6.9)
Since, lnXi ’s are statistically independent normally distributed random variables, ln Z, which is
their sum, will also be normally distributed with mean λZ = ∑ni=1 λi and variance 2Z = ∑ni=1 2i .
Hence, Z will be lognormally distributed with the following parameters,
λZ = ∑ni=1 λi (6.10a)
and
λZ = λX ‒ λY (6.12a)
and
6-12
CE 204 (2022-23 Spring Semester) MSY
Example 6.11
The efficiency (E) of a company producing construction materials is estimated based on the
following relationship:
Y
E= √eT eS/9
M
where,
Y and M are lognormally distributed random variables with median values of 5000 hours and
250 TL, respectively, and coefficients of variation of 0.20 and 0.15, respectively. T and S are
normally distributed random variables with mean values of 6 years and 45 hours, respectively
and standard deviations of 2 years and 4.5 hours, respectively. T and S are dependent variables,
the coefficient that reflects the correlation between them is 𝛒𝐓,𝐒 = 0.75. All other variables are
independent of each other.
Solution:
a) The parameters of the lognormally distributed random variables Y and M are as follows:
If we take the logarithm of both sides of the equation given for efficiency,
ln E = ln Y – ln M + 0.5 (T + S/9)
This is a linear equation and the expected value and variance of E are calculated as follows:
VAR (ln E) = VAR (ln Y) + VAR (ln M) + 0.25 [VAR (T) + (1/81) VAR (S)]
𝛏𝟐𝐄 = 0.22 + 0.152 + 0.25 [22 + (1/81) 4.52] + 2 x 0.25 x (1/9) x 0.75 x 2 x 4.5
ξE = 1.225
6-13
CE 204 (2022-23 Spring Semester) MSY
c) According to the results obtained in parts (a) and (b), E is a lognormally distributed random
variable with parameters λE = 8.495 and ξE = 1.225. Accordingly,
ln 90 −8.495
Pr (E > 90) = Pr (ln E > ln 90) = Pr (z > )
1.225
4.500 − 8.495 − 3.995
= Pr (z > ) = Pr (z > )
1.225 1.225
= Pr (z > − 3.26) = 0.99944
a) Exact Results
Let Y depend on n random variables denoted by Xi, i = 1, 2,…, n, according to the following
functional relationship:
Y = g(x1 , … , xn ) (6.13)
Then the mathematical expectation of Y, E(Y), will be obtained from the following equation:
⏟ ∫ g(x1 , … , xn ) fX1…X2 (x1 , … , xn )dx1 … dxn
E(Y) = ∬ … (6.14)
D𝑋
Here, Xi = i th independent variable (i = 1, …, n); fX1,…, Xn (x1,…, xn) = joint probability density
function; DX = the region over which the joint probability density function is defined.
In computing E(Y) using Eq. 6.14, two problems will be faced:
i) Computational difficulties due to n-tuple integrals.
ii) Lack of sufficient data to assess the joint probability density function.
On the other hand if g(x1 , … , xn ) is a linear function, then these difficulties will be of no concern
as illustrated in the following.
i) Let, Y= a + bX where a and b are constants
E(Y) = a + bE(X) = a + bμX
VAR(Y) = a2 VAR(X) = a2 σ2X
ii) Let, Y = a1 X1 + a2 X2
E(Y) = μY = E(a1 X1 + a2 X2 ) = a1 E(X1 ) + a2 E(X2) = a1 μ1 + a2 μ2 (6.15)
VAR(Y) = E[(Y ‒ μY )2 ] = E[(a1 X1 + a2 X2 ) ‒ (a1 μ1 + a2 μ2 )]2
6-14
CE 204 (2022-23 Spring Semester) MSY
If W = a1 X1 ‒ a2 X2
E(W) = μW = E(a1 X1 ‒ a2 X2 ) = a1 E(X1 ) ‒ a2 E(X2 ) = a1 μ1 ‒ a2 μ2
VAR(W) = σ2W = a21 σ12 + a22 σ22 ‒ 2 a1 a2 COV(X1 , X2 )
where, each Xi has a mean value, i and a standard deviation, σi and 𝒂𝒊 ’s are constants, and if
Xi’s are correlated with correlation coefficient, ρij, where i = 1, 2,…, n and j = 1, …, n, then
the mean value and variance of Y will be:
= ∑ni=1 ai i (6.20)
Y
= ∑ni=1 ai i (6.23)
Y
In the multivariate case, the correlation structure can conveniently be described by the
covariance matrix, CX:
6-15
CE 204 (2022-23 Spring Semester) MSY
2 2 2
σ11 σ12 … … … … … … … . σ1n
σ221 σ222 … … … … … … … . σ22n
………………………..
CX = … … … … σ2kk … … … . (6.26)
………………………..
… …..………………..
[ σn1 σ2n2 … … … … . … … … σ2nn ]
2
The covariance matrix is square and symmetric. In this matrix, the σ2ij term corresponds to
variance if i = j and to the covariance between the random variables Xi and Xj if i ≠ j.
where, each Xi has a mean value, i and 𝒂𝒊 ’s are constants and Xi’s are statistically
independent. Then the mean value of T will be:
Similarly,
̃) = X1, X2, …, Xn be the vector of random variables involved in the problem at hand.
Let (X
These variables will be called basic variables. Let Y be a function of these n basic variables
defined as follows:
In the previous section, we have seen that for linear functions finding the expected value and
the variance is computationally simple. For nonlinear functions to obtain the exact results,
multiple integrations must be performed. This could be avoided by linearizing the function but
getting approximate results. This linearization can be done by applying the Taylor series
expansion at the mean vector denoted by, ̃:1, 2, …, n and keeping only the linear terms
as shown below:
∂g
Y = g(x̃) = g(μ1 , μ2 , … , μn ) + ∑ni=0 (∂x ) (xi − μi )
i
̃
6-16
CE 204 (2022-23 Spring Semester) MSY
∂g 2 ∂g ∂g
VAR(Y) = σ2Y ≅ ∑ni=1 ( ) σ2i + ∑i≠j ∑ ( ) ( ) ρij σi σj (6.33)
∂xi ∂xi ̃ ∂xj
̃
̃
Here, i = standard deviation of Xi; ij = correlation coefficient between Xi and Xj. In case, the
basic variables are statistically independent,
∂g 2
VAR(Y) = σ2Y ≅ ∑ni=1 ( ) σ2i (6.34)
∂xi
̃
The above method is generally referred to as the First-Order Second Moment (FOSM)
approximation.
Note that for one variable case, i.e. Y = g(X), the following relationships will be valid.
dg 2
VAR(Y) = σ2Y ≅ ( ) σ2X (6.36)
dx
̃
If we keep the second order term, then the following, so-called second-order approximation
results will be obtained:
1 dg 2
E(Y) = μY ≅ g(μX )+ ( ) σ2X (6.37)
2 dx
̃
2
dg 2 1 d2 g dg d2 g
VAR(Y) = σ2Y ≅ ( ) σ2X ‒ ( 2 ) σ2X + E(X ‒ μX )3 ( ) ( 2 )
∂x
̃ 4 dx
̃ ∂x dx
̃
̃
2
1 4 d2 g
+ E(X ‒ μX ) ( ) (6.38)
4 dx2
̃
6-17
CE 204 (2022-23 Spring Semester) MSY
The annual operational cost, C, for a waste treatment plant is a function of the weight of solid
waste, W, the unit cost factor, F, and an efficiency coefficient, E, as follows:
WF
C=
√E
where W, F, and E are statistically independent lognormal variates with the following
respective medians and coefficients of variation (c.o.v.):
As C is a function of the product and quotient of lognormal variates, its probability distribution
is also lognormal, which we can show as follows:
1
ln C = ln W + ln F − ln E
2
1
Accordingly, ln C is normal with mean, λC = λW + λF − 2 λE
1 2
and variance, ζ 2C = ζ 2W + ζ 2F + (2 ζE ) .
6-18
CE 204 (2022-23 Spring Semester) MSY
On the basis of the above, the probability that the annual cost of operating the waste treatment
plant will exceed $35 000 is:
𝐏𝐫(𝐂 > 𝟑𝟓𝟎𝟎𝟎) = 1 − Pr(C ≤ 35000)
ln 35000−10.36
= 1 – ϕ( )= 1 − ϕ(0.397)
0.26
= 1– 0.655 = 0.345
If the three loads are statistically independent, i.e., 𝛒𝐢𝐣 = 𝟎, the mean and standard deviation
of the total load T, where, T = D + L + E, are:
𝛍𝐓 = 2000 + 1500 + 2500 = 𝟔𝟎𝟎𝟎 𝐭𝐨𝐧𝐬
and
𝛔𝟐𝐓 = 2102 + 3502 + 4502 = 𝟑𝟔𝟗 𝟏𝟎𝟎 𝐭𝐨𝐧𝐬𝟐
Hence, the standard deviation is 𝛔𝐓 = 𝟔𝟎𝟕. 𝟓𝟎 𝐭𝐨𝐧𝐬
However, the dead load, D, and the earthquake load, E, may be correlated, say with a correlation
coefficient of 𝛒𝐢𝐣 = 𝟎. 𝟓, whereas the live load L is uncorrelated with D and E. Then, the
corresponding variance would be:
𝛔𝟐𝐓 = 2102 + 3502 + 4502 + 2x1x1(𝟎. 𝟓)(210)(450) = 𝟒𝟔𝟑 𝟔𝟎𝟎 𝐭𝐨𝐧𝐬𝟐
and the standard deviation becomes 𝛔𝐓 = 𝟔𝟖𝟎. 𝟖𝟖 𝐭𝐨𝐧𝐬.
Now, suppose the mean and standard deviation of the load-carrying capacity of column C are
𝛍C = 𝟏𝟎 𝟎𝟎𝟎 𝐭𝐨𝐧𝐬 and 𝛔𝐂 = 𝟏𝟓𝟎𝟎 𝐭𝐨𝐧𝐬. The probability that the column will be overloaded
is then,
Pr(C < T) = Pr(C − T = 0)
But E(C – T) = 𝐂−𝐓= 10000 – 6000 = 4000 tons, and its standard deviation is
6-19
CE 204 (2022-23 Spring Semester) MSY
Assuming that all the variables are Gaussian, and therefore, the difference (C – T) is also
Gaussian, the probability of overloading the column will be:
0−4000
𝐏𝐫([𝐂 − 𝐓)] < 𝟎) = ϕ ( ) = ϕ(−2.47) = 1 − ϕ(2.47) = 1 − 0.9932 = 0.007
1618
where:
M = applied bending moment, P = applied axial force, A = cross-sectional area of the beam and
Z = section modulus of the beam. The following statistical information is given for the
engineering parameters:
μM = 45 000 in-lb δM = 0.10
μZ = 100 in3 δZ = 0.20
μP = 5 000 lb δP = 0.10
A = 50 in2
Assume that M and P are correlated with a correlation coefficient of 𝛒𝐌,𝐏 = 0.75, whereas Z
is statistically independent of M and P.
Compute the mean and standard deviation of the applied stress S in the beam by first-order
approximation.
Solution:
Example 6.16
The following empirical equation is derived for the solution of an engineering problem:
Z = X Y2 √W
6-20
CE 204 (2022-23 Spring Semester) MSY
where:
X: Uniformly distributed between 2.0 and 4.0,
Y: Normally distributed with a median of 1.0 and Pr(Y ≤ 2.0) = 0.9207,
W: Exponentially distributed with a median of 1.0,
and X, Y and W are statistically independent.
a) Compute the mean values, variances and coefficients of variation of X, Y and W.
b) Compute the mean, standard deviation and coefficient of variation of Z using the first-order
approximation.
Solution:
6-21
CE 204 (2022-23 Spring Semester) MSY
Example 6.17
A column of a building is designed in such a way that its strength, R, has a median value of
336 kN and Pr(R ≤ 532 kN) = 0.99. The strength of the column, R, is assumed to be normally
distributed and uncorrelated with loads. The total column load, T, is the sum of live (L), dead
(D), wind (W) and snow (S) loads. Assume these loads to be mutually independent normal
variables with the following statistical parameters:
a) Compute the expected values and coefficients of variation of the strength of the column, R
and total column load, T.
b) Compute the reliability index defined by Cornell (i.e. βC) and the probability of failure of
the column, if the strength is also assumed to be a normal random variable and independent of
the total load.
c) Compute the reliability index defined by Cornell (i.e. βC) and the probability of failure of the
column, if the strength and total load are negatively correlated normal random variables with a
correlation coefficient of 𝛒 = − 0.6. Explain also what is meant by negative correlation.
d) Assume that the safety level found in part (b) is rated to be inadequate and the design should
be revised to improve the safety level. Assuming all statistical data associated with the total
load T and strength R are the same as you have computed in part (a) and all variables are
mutually uncorrelated, compute the revised mean strength, 𝛍𝐑∗ , in order to have a survival
probability of 0.97 (i.e. failure probability, pf = 0.03).
(Hint: For the definition of the reliability index, βC, please refer to Section 7.4).
Solution:
a) T = L + D + W+ S
E(T) = µT = E(L + D + W+ S) = E(L) + E(D) + E(W) + E(S)
6-22
CE 204 (2022-23 Spring Semester) MSY
= 70 + 90 + 30 + 20 = 210 kN
VAR(T) = 𝛔𝟐𝐓 = VAR(L + D + W+ S) = VAR(L) + VAR(D) + VAR(W) + VAR(S)
= 10.52 + 4.52 + 92 + 42 = 110.25 + 20.25 + 81 + 16 = 227.5 kN2
𝛔𝐓 = 15.08 kN 𝛅𝐓 = 15.08/210 = 0.0718
µR = Median(R) = 336 kN
532 − 336
Pr(R ≤ 532) = Pr(z < ) = 0.99
𝜎𝑅
532 − 336
Since, R is normal, ϕ(0.99) = 2.33, Therefore, = 2.33
σR
196
𝛔𝐑 = = 84.12 kN ≅ 84 kN 𝛅𝐑 = 84.12/336 = 0.25
2.33
𝛔𝐌 = √8803.56 = 93.83 kN
βC = 126/93.83 = 1.343
pf = 1 – ϕ(1.343) = 1 – 0.9099 = 0.0901 ≅ 0.09
There is a negative linear dependence between the total load T and resistance R, implying that
as one increases the other one decrease or vice versa.
d) µM = μ∗R – 210
6-23
CE 204 (2022-23 Spring Semester) MSY
μ∗R – 210
βC = = 1.88
𝟐
√(μ∗R x 0.25) + 15.082
(μ∗R – 210)2 = 1.882 [(μ∗R x 0.25)𝟐 + 15.082 ]
𝛍∗𝐑 ≅ 400.23 kN
6-24
CE 204 (2022-23 Spring Semester) MSY
Chapter 7
BRIEF INFORMATION ON STATISTICS**
Many real-life problems that engineers deal with are formulated under conditions of uncertainty.
The engineering design of a physical problem may involve natural processes and phenomena that
are inherently random. The related information may not be complete, adequate, or not satisfactory
to the problem of concern. Therefore, the idealized prototype of the problem and/or its mathematical
model (formulated form) may involve such uncertainties together with the uncertainties related to
the imperfections in modelling and parameters used. In short, uncertainties may enter the problem
at the input stage (physical uncertainties) and the modelling phase (model uncertainties) resulting
in output uncertainties (statistical uncertainties). Thus, inevitably, decisions required for planning
and design are to be made or are made under conditions of uncertainty.
The above-mentioned uncertainties are in general grouped into two, namely: “aleatory” and
“epistemic”.
Aleatory uncertainties (random/stochastic uncertainties) deal with the randomness or predictability
of an event, mostly reflecting external variability in the system. They are uncertainties ascribed to
the physical system and/or environment under consideration. They are irreducible, inherent, and
stochastic. (e.g., wind speed and direction are aleatory (random) uncertainties).
Epistemic uncertainties (parameter uncertainties) reflect the possibility of errors in our general
knowledge. Such uncertainties result from some level of ignorance or incomplete information about
the system or surrounding environment. They are subjective and model forms of uncertainties and
are related to the state of knowledge uncertainty. Since generally, we do not know the correct values
of the parameters in the model constructed, parameter uncertainties are of epistemic type. Epistemic
uncertainties are reducible. (e.g., I believe that the speed of the wind is less than 40 km/hr, but I am
not sure of that).
The output or final uncertainty that results in the problem-solution over which decisions are to be
made may appear to be aleatory but usually, it may result from both sources of uncertainty, which
are both aleatory and epistemic. Whatever the types are, the effects of uncertainty are important in
both the design and planning of engineering systems and require quantification. For scientific
quantification of uncertainties, engineers use the concepts and methods of probability. On the other
hand, to reduce the degree of epistemic uncertainties one may require obtaining more
information/data via observations, experiments and records where concepts, tools and methods of
statistics are almost inevitable.
7.2 STATISTICS
Before we proceed to the exercises, we need to define two main concepts: population and sample.
• A population is a set of well-defined distinct objects and its elements or in other words, it
is the whole set or group that we are interested in. Usually, we denote the population by the
set S. S can be finite (if so, the population size is usually denoted by N) or infinite in extent.
For several reasons, we may not be able to observe the population totally, but we may be
able to study only a portion of it, called the sample.
• When the observations are numerical values, the population is referred to as a quantitative
population. If the observations are on attributes (type of structure, level of damage, or
similar category) the population is a qualitative population.
• A sample is a subset of S (let it be denoted by A, A= {s1, … ,sn}). Usually, A is a small part
of the population that we draw from S to make observations on it to learn about S. The more
representative sample we collect from the population, the better we may learn about the
population.
• Raw data is the list of observations/ measurements in the sample whose values are not
manipulated at all.
• Statistics is concerned with data. If the population is quantitative, the data set will constitute
numbers. However, if the population is qualitative the observations will be non-numerical
and for a statistical study, numerical data can be artificially created. We call such data
nominal data since the numbers will represent arbitrary codes.
In engineering, most of the time the data we encounter is ratio data, for which the basic
arithmetic operations (addition, subtraction, division and multiplication) are valid for such
data.
For some types of data only addition and subtraction are meaningful, such data are scale-
dependent as in the case of temperature and are called interval data (e.g. In Celsius 0o, 20o
and 40o correspond to 32o, 68o and 104o Fahrenheit, respectively). The ratios in the two
scales differ between the temperatures but intervals remain constant. Therefore, the
temperature is an interval data.
Data, where no arithmetic operations are meaningful, are called ordinal data. The numbers
in the data will represent an ordering relation in other words ranking in terms of importance,
preference, strength, etc.
The elements of the sample represented as real-valued function X (si) = xi that are defined on
the sample A= {s1, … ,sn} is called data set. The data set may be discrete or continuous
depending on the physical characteristics of xi. (If your sample is for the number of vehicles or
the type of vehicles crossing a bridge you have a discrete sample but if you are interested in the
weights or lengths of cars, the data set will be continuous).
• The number of elements in the data set is called the sample size and is usually denoted by
n.
• Collecting sample data from a larger population and using it to make predictions and /or
decisions for the entire population is called statistical inference. The efficiency of the
inference will lie in selecting a representative, appropriate in general a random sample.
• The random sample is the one obtained by the items giving the same chance of being chosen
to every item as any other item of the population.
• Random variable: Quantities that are measured or observed are termed variables and
because of the inherent randomness, they are called random variables (since the values of
measured or observed quantities depend on chance).
• Continuous Random Variable: Variables that can have any value on a continuous interval.
7-2
CE 204 (2022-23 Spring Semester) MSY
• Discrete Random Variable: Variables that can have countable isolated numbers (e.g.
integers).
• Distribution refers to the variability pattern of the random variable.
Descriptive statistics consists of methods used to organize, display and analyze data from some
population or sample.
• Data Collection: When the population size is large, it can be time-consuming, expensive or
impractical, and/or impossible to study its set values. In practice, it is, therefore, more
common and usually desirable to study a relatively small fraction of the population, which
we have defined as the sample which is required to be representative of the entire population
(so to be chosen at random).
• Organization of Data: We may organize data in several different ways to see if a pattern
exists for the characteristics of the data.
- One basic way is to list the data in numerical order (ascending or descending order).
When the observations are listed in ascending order, say for a sample of size n, as
{x (1) x (2) ... x (n) } (1)
the elements of the set in Eq. (1) are called the order statistic and x(i) is called the i'th order
(i − 0.5)
statistic. The i’th order statistic x(i) is the quantile. Sometimes it may be useful to
n
obtain the d’th percentile value (Q d) of the ordered data (sorted in ascending form) by the
following formula:
where, n is the sample size and the subscript i, is the largest integer such that i ≤ (n+1) d.
x (1) = min{x1,..., , x 2 }
(3)
x (n) = max{x1,..., , x 2 }
The interquartile range (iqr) is the length of the interval that contains the middle half of
the data. The interquartile range is defined as
(iqr) = Q3 - Q1 (5)
where, (Q3 - Q1) is called the middle 50 percent range (Q1 and Q3 correspond to the first
25 percent (lower quartile) and 75 percent (upper quartile) ranges of the data, respectively.
7-3
CE 204 (2022-23 Spring Semester) MSY
- In case n or the range of existing numeric values are or data come from continuous
set of numbers one may arrange the data in categories or in the form of class intervals.
The number of class intervals k is usually chosen at least 5 (for small-sized data) but
not more than 20 (for large-sized data) and the intervals are to be non-overlapping. The
frequency estimation in interval form is usually for continuous data but if the sample
size is large one may prefer to represent the discrete set of data in intervals also.
The practical guide to choosing “k” can be based on the following empirical
mathematical models:
a) rule of thumb: choose the closest integer to n as k where n is the sample size.
b) due to Sturges (1926) : k = 1 + 3.3 log10 n where n is the sample size.
r n1/3
c) due to Friedman and Diaconis (1981): k = where r is the range, n is the
2(iqr)
size and (iqr) is the interquartile range of the sample data, respectively.
Note that the intervals will be expressed in the form [a, b) (left parenthesis is
closed and the right one is open). a and b will be class boundaries or class
limits and class marks (or mid-point) are the mid-points of each class interval.
The number of observations (frequencies) fi for each number in the data set or for each
class interval can be counted and may be listed as frequency or class frequency in a
tabular form. Such a table will give a full summary of the population frequency
distribution or sample frequency distribution(or empirical frequency distribution)
depending on whether the data represent the entire population or a portion of it.
Usually, only samples will be available, thus population frequency distribution will
remain unknown.
7-4
CE 204 (2022-23 Spring Semester) MSY
Whenever some frequencies are large the relative frequency (rel. freq.)
fj fj
rel.freq. = k =
n or N
fj
j=1
where j = 1, 2, …, k can be used on the vertical axis. Frequency distributions are the
basic shapes of their histograms.
A frequency polygon is obtained by joining the frequencies of the sample data points or
class marks linearly.
- Dot Diagram: When the sample size is small and/or data is continuous, one may
sometimes use a dot diagram. One dot per data is placed over the numerical value of the
data represented on the horizontal axis.
- Stem and Leaf Plot: Histograms may not be effective when n < 50 (may not give a clear
indication of variability and other characteristics). In such cases, we may prefer stem and
leaf plots that yield no loss of information since all magnitudes are represented. Such a
plot will also highlight extreme values and other characteristics and can be constructed
easily.
- Box Plot: For showing the three quartiles Q1 (lower quartile), Q2 (median) and Q3
(upper quartile) on a rectangular box, a representation may be used to indicate variability
of the data.
- Scatter Diagram: If there are n pairs of data (x1, y1), (x2, y2), …, (xn, yn) a preliminary
indication of correlation between them is obtained by a scatter diagram in which the
horizontal axis is reserved for the independent (or the variable with least uncertainty)
and the vertical axis is for the dependent or uncertain variable.
Several other graphical representations are used in practice, but in engineering bar
charts, histograms, ogives, scatter diagrams, and sometimes stem and leaf plots are the
most commonly preferred visual tools.
You may easily note that graphic displays and frequency tables depend on the size and
number of class intervals chosen (so as the stem and leaf). A good graphic description and
display is mostly partly art and partly science. Unless they are accompanied by the
following statistical descriptors, they may not give sound ideas about the sample and
population.
In addition to numerical descriptors defined above like the sample range, quartiles several
others exist to give locational and variability characteristics of the data.
7-5
CE 204 (2022-23 Spring Semester) MSY
In the following xi represents the data value in the ungrouped data or the classmark (mid
point value of the interval) in the grouped data unless.
x=
x i or x = f j x j (6)
n n
where, fi is the frequency of xi. xi is either the numerical value of the data
or is the classmark (class center) for the grouped data. The arithmetic
mean is easy to find and it is unique for a given data set but it is highly
affected by the extreme values present in the data. In such a case the
sample mean may not be a good representer of the data set.
- Harmonic Mean: When one needs to find the average of the reciprocals of a
variable or if xi values are very large to get a meaningful x , one may
compute the harmonic mean as
1
xh = (7)
1 1 1 1
( + + ... + )
n x1 x 2 xn
- Geometric Mean: When averaging values that represent a rate of change one
may use the geometric mean defined as:
n
x g = n x1 x 2 ..... x n = n x i (8)
i =1
- Median: When the data is ordered, the central value is defined as the median.
The median also corresponds to the second quartile Q2 of the data set (x med).
So that half of the data is less than the median and the other half is larger
than the median. If the number of data is odd median is middle point data as
xmed = x((n+1)/2) (9)
If the number of data is even, the median is the average of the middle two
data as
x (n/2) + x ((n/2) +1)
x med = (10)
2
- Mode: That value of the data set, which occurs most frequently, is defined as
the mode of the data (x mod).
One should note that if instead of the sample the data covers all population values we have
population descriptors with N instead of n, and usually population mean will be denoted by
μ as:
xi f ixi
μ= or μ = (11)
N N
7-6
CE 204 (2022-23 Spring Semester) MSY
The following quantities are for measuring how far does the data spread from a central
value: mean or median.
- Mean Absolute Deviation: It is the average distance of the data points from
the central value as
n k
x 1 − x + x 2 − x + ... x n − x (x i − x ) f
j=1
j (x j − x)
d= = i =1
or (12)
n n n
1
s2 = (x i − x ) for ungrouped data
2
(13)
n
and
1 k
s2 = f j (x j − x )2 for grouped data (14)
n j=1
For reasons that will be explained later the sample variance (with sample size
n) is modified as
1
s2 = (x i − x )
2
(15)
n −1
and
1 k
s2 =
n − 1 j=1
f j (x j − x ) 2 (16)
Computationally, the following forms for the variances defined above will be more
preferable (since the number of computations will be reduced):
1 n 2 2
s2 = xi − x (17)
n i =1
and
1 n 2 n 2
s2 = xi − x (18)
n - 1 i =1 n −1
For the population variance or standard deviation (σ), the deviations are measured
7-7
CE 204 (2022-23 Spring Semester) MSY
- Coefficient of Variation: The unitless quantity, defined as the ratio of the standard
deviation s to that of the mean value is a measure of the absolute variation of the
data and is called the coefficient of variation
s
cov = (19)
x
If the coefficient of variation is small (approximately less than 0.20 or so) the
spread in the data around the mean is small and in that case, the mean can
represent the data more efficiently.
There are also other measures (higher-ordered moments of the variations from the
central values) of variability, some of them are
s
s. e.( x ) = (20)
n
(x i − x )3
g1 = i =1
(for ungrouped data)
ns3
and (21)
k
f (x
j=1
j j − x )3
g 1= (for grouped data)
ns3
If the coefficient of skewness is positive it implies that the long tail of the
distribution is on the right-hand side, and so on.
A small coefficient of kurtosis implies that the tail weight of the distribution is small
and data has a peak.
- Covariance: When dealing with pairs of data (X and Y) as (x1 , y1), (x2 , y2),
…, (xn , yn) it is usually necessary to observe how two sets of data vary together.
The measure for the common variation of the sample is given by the sample
covariance sXY.
7-8
CE 204 (2022-23 Spring Semester) MSY
s XY =
1 n
(x i − x)(yi − y) =
x y i i
−x y
n i =1 n
(23)
or s XY =
1 n
(x i − x)(yi − y) =
x y i i − nx y
n − 1 i =1 n −1
rXY =
s XY 1 n (x − x) (y i − y)
= i =
x y i i − nx y
(24)
s X s Y n i =1 s X sY 1 2 1 2
x i − ( n )( x i ) y i − ( n )( y i )
2 2
The square of the correlation coefficient gives the degree of tightness for a linear fit of
the data set and is called the coefficient of determination.
• Outliers
Sometimes some extreme values exist in the data, which will highly affect the mean value,
to detect the outliers we may use interquartile ranges or z- scores of the data:
If a data point is more than approximately 1.5 iqr, (where (iqr) is the interquartile range
as defined previously) from an end of an extreme quartile value, the data is considered as
an outlier and if it is approximately 3.0 iqr from the ends then it is an extreme outlier.
When outliers are present in the data one may prefer to use trimmed means to get more
effective means. In the ordered data T% (or outliers) of the observations are removed
from each and then the sample mean of the remaining numbers is calculated. The
resulting mean is the T% trimmed mean and it generally lies between the sample mean and
the sample median.
7-9
CE 204 (2022-23 Spring Semester) MSY
7. 4 EXCERCISES
NOW PLEASE GO OVER THE FOLLOWING TWO EXERCISES AND MAKE THE
NECESSARY CORRECTIONS AND COMMENTS.
Class Exercise 1
45 50 35 95 60 70 55 95 43 65 60 58 75 62 65 90 95 60 75 60 30 100
55 50 60 60 35 60 35 53 60 55 45 85 95 50 55 69 45 45 25 55 43 26
30 21 50 55 17 30 35 25 23 27 20 07 20 55 15 21 13 30 30 38 15 40
50 75 80 80 75 85 40 55 60 55 85 65 65 47 41 28 35 36 25 23 30 40
55 13
07 13 13 15 15 17 20 20 21 21 23 23 25 25 25 26 27 28 30 30 30 30
30 30 35 35 35 35 35 36 38 40 40 40 41 43 43 45 45 45 45 47 50 50
50 50 50 53 55 55 55 55 55 55 55 55 55 55 58 60 60 60 60 60 60 60
60 60 62 65 65 65 65 69 70 75 75 75 75 80 80 85 85 85 90 95 95 95
95 100
The data can be summarized by a FREQUENCY TABLE using class intervals (which shows how
many data points are around the midpoint value of the interval). Note that in this example we chose
11 groups from 0 to 110 to have a nice interval size (as 10 here) and the smallest and largest values
of the data are included in the first and last intervals, respectively.
a) We may use one of the following graphical representations to see the frequency variations
of the data:
7-10
CE 204 (2022-23 Spring Semester) MSY
In this example, we draw the following histograms from the data given in the above table.
A histogram or frequency polygon may have the vertical frequency axis as the relative
frequencies. The basic shape of the plots will not change but relative frequencies may give a
better overall measure of the number of occurrences of the grades. The relative frequency
polygons show us the shape of the distribution of the variable in our data set.
Histogram
20
frequency
15
10
5
0
5 15 25 35 45 55 65 75 85 95 105
grades
The cumulative relative frequency diagram of the data (we assume the tabular form of the data
is given) is as follows:
20
15 0.8
cum. rel. frequency
10
5 0.6
0
0 10 20 30 40 50 60 70 80 90 100 110 0.4
grades 0.2
0
0 10 20 30 40 50 60 70 80 90 100
grades
7-11
CE 204 (2022-23 Spring Semester) MSY
Stem Leaves
0 7
10 3 3 5 5 7
20 0 0 1 1 3 3 5 5 5 6 7 8
30 0 0 0 0 0 0 5 5 5 5 5 6 8
40 0 0 0 1 3 3 5 5 5 5 7
50 0 0 0 0 0 3 5 5 5 5 5 5 5 5 5 5 8
60 0 0 0 0 0 0 0 0 0 2 5 5 5 5 9
70 0 5 5 5 5
80 0 0 5 5 5
90 0 5 5 5 5
100 0
We divide the ordered data into four quartiles and observe how different the extreme groups
are. Interquartile range is defined as, IQR = Q3 – Q1.
i) Assuming only grouped data is available (that is using the table) Q1 = 35, Q2 = 55
(corresponds to median) and Q3 = 65 ; IQR = 65-35 = 30.
Note that we don’t have any outliers (small or large) in the SGD.
0
Q1= 35 (30)
Q2= 55 (50)
7-12
CE 204 (2022-23 Spring Semester) MSY
• MEDIAN: xmed = 55 (From the table). From the ordered data (original, ungrouped data)
x + x 46
median is ~
x = 45 50.
2
• 60th percentile of the data from the ordered, original form: i ≤ (90 +1)x 0.60 → i = 54
Q60 = 55 +[(90+1)x 0.60- 1](55-55) = 55
• MODE: xmod = 55 (From the table). The original data mode is 55 also.
f x
j=1
j
2
j
n 2 282050 90
s =
2
− x = − (51.3333) 2 = 504.382 or
n −1 n −1 89 89
11
f (x
j=1
j j − x)2
44890
s2 = = = 504.382
n −1 89
90
( xi − x ) 2
44171.6
If we use the original data: s2 = i =1 = = 496.31 0
n −1 89
• STANDARD DEVIATION: s
s = 22.458 (s= 22.278)
• COEFFICIENT OF SKEWNESS:
11
f (x
j=1
j j − x) 3
g1 = = 0.244 ( > 0 it implies the tail of the frequency dist. is longer
ns 3
on the positive side)
• COEFFICIENT OF KURTOSIS:
11
f (x
j =1
j j − x) 4
g2 = = 2.400 (sufficiently small number implying some
ns 4
peakedness in the distribution)
7-13
CE 204 (2022-23 Spring Semester) MSY
• STANDARD z- SCORES: If you observe the z-scores on the table, approximately 75%
of the data lies within one and 95 % lies within two standard deviations from the mean.
• NUMBER OF CLASS INTERVALS: Note that we may decide on the number of class
intervals (n c) as a rule of thumb between 5 and 15 depending on the size of the data or
from n c = n = 90 = 9.49 or nc =1+3.3 log10 n =1+3.3 log10 90=7.45 or
1 1
rn 3 (105 − 5) 90 3
nc = = =7.47, so the choice of nc as 11 is not a bad preference.
2(iqr) 2(65 − 35)
CLASS EXERCISE 2:
25
20
15
y
10
5
0
14.8 15 15.2 15.4 15.6 15.8
x
• Numerical Descriptors:
x = 15.3 , s 2 x = 0.052 , s x = 0.228 , v x = 0.015
y = 15.3 , s 2 y = 11 .6 , s y = 3.406 , v y = 0.223
Note that the two data sets have the same mean but quite different coefficients of
variation !!!
• COVARIANCE
1 n x i yi − n x y = - 0.260
s XY = (x i − x )(yi − y) =
n − 1 i =1 n −1
• CORRELATION COEFFICIENT
s XY − 0.260
rXY = = = −0.335
s X s Y 0.228 * 3.406
Chapter 8
SOME BASIC CONCEPTS OF STATISTICAL INFERENCE
Population: A population consists of the “whole” of observations or events that we are concerned. The
probability distribution of the random variable is determined based on the observations sampled from
the population.
Sample: A sample is a subset of observations or events that are selected from the population. The
random variables {x1, x2, …, x n} form a simple random sample of size n if all xi’s are statistically
independent random variables with the same probability distribution.
Statistic: Any function of the observations in a random sample is called a statistic. The probability
distribution of a statistic is called a sampling distribution. For example, the probability distribution of
̅ is the sampling distribution of the mean.
𝐗
Statistical Inference: Once the probability distributions related to events are known, one can identify
their uncertainties numerically (probability measure). The estimated probabilities are functions of the
parameter or parameters of the probability distribution (e.g. μ, σ for the normal and ν for the Poisson
distribution, λ, ξ for the lognormal distribution, etc.) These parameters are estimated from observational
data and are used to make estimations and generalizations on population characteristics by statistical
inference techniques. These techniques are either in the form of point or interval estimation of the
parameters or in the hypothesis testing form, both of which are based on the sample data. In short,
statistical inference can be defined as: the estimation of the statistical characteristics of a population
by using the sample data obtained from that population.
8.2. ESTIMATION
Point Estimate: Single numerical value θ̂ (theta hat) obtained from a random sample to estimate the
population parameter θ is called a point estimate. Point estimators must have some desirable
properties such as unbiasedness, efficiency, consistency and sufficiency. These desirable properties are
briefly explained in the following.
1
̅)2 is an unbiased estimator of σ2.
s2 = n−1 ∑ni(xi ‒ X
Efficiency: Among the unbiased point estimators of θ the one with minimum variance is called the
most efficient estimator of θ. Note that efficiency is inversely proportional to the sample size. ̅
X is an
efficient estimator of µ.
Consistency: Consistent point estimators should satisfy the following probabilistic requirement:
Pr [| θ̂ ‒ θ| < ε] → 1 as n → ∞ where ε is a small number.
8-1
CE 204 (2022-23 Spring Semester) MSY
σX
̅ is a consistent estimator of μ, since lim σX̅ = lim
For example 𝐗 →0
n→∞ n→∞ √n
Sufficiency: Let {x1, x2,…, xn} be a simple random sample with n observations from a population having
a probability distribution with an unknown parameter θ. Then any statistic T= f (x1, x2,…, xn) is said
to be a sufficient statistic for the estimation of θ, if the joint distribution of {x1, x2,…, xn} conditional
on the statistic T is independent of θ.
Standard Error: The standard error of an estimator θ̂ is defined as its standard deviation:
σθ̂ = √VAR(θ̂).
̅ is:
The standard error of the sample mean 𝐗
σ
σx = for n 0.1 N .
n
If the population size N is finite and n 0.1 N then
σ N-n
σx =
n N -1
Mean Square Error (MSE): The mean square error associated with an estimator θ̂ of the parameter
θ, is defined as MSE (θ̂) = E [(𝛉
̂ ‒ θ)2].
Central Limit Theorem: Consider a population having mean μ and standard deviation σ. Let X ̅ be the
mean of n statistically independent random observations taken from this population. Then as n → ∞,
the sampling distribution of 𝐗̅ tends to a normal distribution with mean μ and standard deviation,
σ
σx = (or loosely, ∑xi, approaches to a normal distribution as n → ∞). So, the statistic
n
x −μ
z=
σ
n
̅ is a
approaches to standard normal distribution or in other words the sampling distribution of 𝐗
normal distribution if σ is known.
Corollary: The probability distribution of the number of successes x, of a binomial distribution with
parameters, n and p, given below:
n
p(x/n, p) = p x (1 − p) n − x
x
8-2
CE 204 (2022-23 Spring Semester) MSY
tends to a normal distribution with mean, np and standard deviation, np(1 − p) . Accordingly, the
x−np
statistic, z = or the proportion p of successes (or failures) in a binomial distribution can
√np (1−p)
p̂ − p
be described by a normal distribution with statistic z = .
p̂(1 - p̂)/n
Interval Estimation: In interval estimation, instead of using a single value to estimate θ, we construct
a range, which includes θ with a certain probability level to express the degree of uncertainty associated
with the point estimate.
Confidence Intervals: A 100 (1 ‒ α ) percent confidence interval (level of confidence) for the parameter
θ is an interval of any one of the forms (that will be given below) introduced to express the degree of
confidence (belief) in terms of α where 0 < α < 1. The confidence interval will actually include θ values
that correspond to the values obtained from almost 100 (1 ‒ α) percent of the samples taken from the
same population. In the following interval definitions, the parameter, θ is an unknown constant but the
estimator 𝛉 ̂ is a random variable.
where, L2 and U2 are the lower and upper confidence limits, respectively (for the two-sided confidence
interval).
(1 − α)
α/2 α/2
0 U2 θ
L2
Pr [L1 < θ] = 1 ‒ α
where, L1 is the lower confidence limit (for the lower one-sided confidence interval).
(1 − α)
α
0 θ
L1
Figure 8.2 Lower one-sided confidence limit for the parameter θ
8-3
CE 204 (2022-23 Spring Semester) MSY
Pr [θ < U1] = 1 ‒ α
where, U1 is the upper confidence limit (for the upper one-sided confidence interval).
(1 − α)
α
0 U1 θ
The following are the sampling distributions that are used for the sample mean, sample variance, sample
proportion and for the goodness of fit test.
x −μ
a) If θ = μ, we use standard normal distribution with the statistic z = for cases where,
σ
n
σ of the population is known.
x −μ
b) If θ = μ, we use standard normal distribution with the statistic z = for cases where,
s
n
σ of the population is unknown but sample size n is sufficiently large (n ≥ 30). Here, s denotes
the sample standard deviation
x −μ
c) If θ = μ, we use student’s t distribution with the statistic t =
s with (n‒1) degrees of
n
freedom (d.f. or r or ν), where σ of the population is unknown and n < 30. The degrees of
freedom equal to n ‒ 1. The simple explanation for this is: from n observations x1 , x2 ,…, xn
we use “one degree of freedom” to compute ̅ X , leaving (n1) independent observations.
(𝐧−𝟏) 𝐬 𝟐
d) If θ = σ, we use Chi-square (χ2) distribution with the statistic χ2 = with (n – 1)
𝛔𝟐
degrees of freedom.
8-4
CE 204 (2022-23 Spring Semester) MSY
p̂ − p
occurrences among a sequence of n trials and the statistic z = has a normal
p̂(1 − p̂)/n
distribution when n is sufficiently large.
x1 -x 2 -(μ1 -μ 2 )
f) If θ = μ1 ‒ μ2 , the statistic t = has a t distribution with n1 + n2 – 2
s 21 2
s 2
+
n1 n2
8.4.1. Confidence Interval for the Population Mean, µ (σ known and population is normally
distributed)
where, ̅ X : sample mean; n: sample size; σ: standard deviation of the population. If σ is unknown
but the sample size n ≥ 30 assume σ ≅ s, where s is the standard deviation of the sample.
Additional Comments:
̅ is used as an estimator of µ we can be (1 ‒ α) 100 percent confident that the error, e, will be
i) If X
σ
less than zα/2 .
√n
̅ is used as an estimator of µ we can be (1 ‒ α) 100 percent confident that the error will be
ii) If X
zα/2 σ 2
less than a specified amount e*, when the sample size n ≥ [ ] .
e∗
8-5
CE 204 (2022-23 Spring Semester) MSY
(n−1) s2
σ 2 χ2 = n‒1 Chi-Square Distribution
σ2
x1 -x 2 -(μ1 -μ 2 )
tcr =
μ1 ‒ μ2 s 21 s 2 2 n 1 + n2 ‒ 2 Student’s t Distribution
+
n1 n 2
x: total x−np
zcr =
number of ̂ (1−p
√np ̂)
successes; p̂ − p Standard Normal Distribution
p: success =
ratio p̂(1 − p̂)/n
(foi ‒ fei )2
Goodness of Fit χ2 = ∑k1 k‒u‒1 Chi-Square Distribution
fei
(*) In the book, instead of n > 30, n > 50 is given. However, please use the n > 30 recommendation.
Given the above information, the following two-sided confidence intervals for various population
parameters can be stated:
8.4.2. Confidence Interval for the Population Mean, µ (σ unknown, n < 30 and population is
normally distributed)
where, 𝐗̅ : sample mean; n: sample size; s: standard deviation of the sample; ν: degrees of freedom
and equals to (n ‒ 1) in this case; 𝐭 𝛂/𝟐,𝛎: value of the t distribution with ν = n ‒ 1 degrees of freedom
leaving an area of α/2 on the right tail of the distribution. The t values are to be obtained from the
t-table based on the values of α/2 and the degrees of freedom (i.e. ν = n ‒ 1).
8-6
CE 204 (2022-23 Spring Semester) MSY
8.4.3. Confidence Interval for the Difference of Two Population Means (µ1 ‒ µ2)
(σ1 and σ2 are known or n1 ≥ 30 and n2 ≥ 30 and the population is normally distributed)
σ21 σ22
X 2 ) ± zα/2 √
̅1 − ̅
(µ1 ‒ µ2): (X +
n1 n2
where, X̅ i and σi: sample mean and population standard deviation corresponding to the ith
population, respectively; ni: sample size of the sample obtained from the ith population.
8.4.4. Confidence Interval for the Difference of Two Population Means (µ1 ‒ µ2)
(σ1 and σ2 are unknown and n1 < 30 and n2 < 30 and the population is normally
distributed)
1 1
̅1 − ̅
(µ1 ‒ µ2): (X X 2 ) ± t α/2,ν sp √n +
1 n2
where, ̅ X i and σi: sample mean and population standard deviation corresponding to the ith
population, respectively; ni: sample size of the sample obtained from the ith population; ν = degrees
of freedom = n1 + n2 – 2; s2p : pooled variance expressed as follows:
Note that the pooled variance is obtained by combining (pooling) the two sample data.
In cases (b) and (d), where the population variances are unknown and the sample sizes are less than 30,
sample variances, s12’s are used instead of population variances, σ2i ’s. To compensate for this
assumption, which creates additional uncertainty, the zα/2 values are replaced by the t α/2,ν values,
which yield larger confidence intervals.
Definition: If s 2 is the variance of a random sample of size n taken from a normal population having a
(𝐧−𝟏) 𝐬 𝟐
variance denoted by σ2 , then 𝛘𝟐 = is the value of a Chi-square (𝛘𝟐 ) distribution having
𝛔𝟐
ν = n ‒ 1 degrees of freedom. Accordingly,
(n−1) s2 (n−1) s2
≤ σ2 ≤
χ2α/2.ν χ2(1−α/2),ν
where, 𝐬 𝟐 = sample variance; n: sample size; 𝛎: degrees of freedom = n – 1; 𝛘𝟐𝛂/𝟐.𝛎: χ2 value leaving an
area of α/2 to the right; 𝛘𝟐(𝟏−𝛂/𝟐),𝛎: χ2 value leaving an area of (1 − α/2), to the right. These χ2 values
are to be obtained from the χ2 tables. Depending on the type of table the χ2α/2.ν and χ2(1−α/2),νvalues
8-7
CE 204 (2022-23 Spring Semester) MSY
may be interchanged. The key point here is: Among the two 𝜒 2 values the larger one should be placed
in the denominator of the lower bound of the confidence interval.
̂ sample proportion.
Let p denote population proportion and 𝐩
p (1−p)
p: p̂ ± 𝐳𝛂/𝟐 √
n
and
A (1 ‒ α) 100 percent confidence interval for p is (Small sample size, (n < 30) ):
̂ (1 − p
p ̂)
p: p̂ ± 𝐭 𝛂/𝟐 , 𝝂 √
n
For this case, a strictly conservative (1 ‒ α) 100 percent confidence interval for p is obtained by
̂ (1 − p
p ̂)
setting p̂ = 1/2, which is the value that maximizes the term: √ . Accordingly, this strictly
n
conservative confidence interval becomes:
1
p: p̂ ± zα/2 √
4n
This is quite a popular method in statistics used for point estimation. Brief information is provided in
the following and a more detailed explanation, together with an example, is given in the Appendix.
Maximum Likelihood Function (L): If {x1, x2,…, xn} are n statistically random observations from
the same population with pdf, f (x1, x2, …, xn ; θ1, θ2,…, θk) for k estimators , then the joint probability
density function of all these n independent observations is the following likelihood function, L:
L(x1, x2, …, xn; θ1, θ2 ,…, θk) = f (x1, θ1, θ2,…, θk) f (x2 , θ1, θ2,…, θk) … f (xn, θ1, θ2,…, θk)
n
= f (xi, θ1, θ2,…, θk)
i =1
Maximum Likelihood Estimator: The maximum likelihood point estimators, θ̂j’s are such that they
maximize the likelihood function L(xi, θ1, θ2 ,…, θk).
8-8
CE 204 (2022-23 Spring Semester) MSY
̅ is an unbiased estimator of µ.
Example 8.1. Check whether X
Solution:
1
= [E(X1) + E(X2) + …+ E(Xn)]
n
1 1
= [µ + µ +…+ µ] = n µ = µ
n n
̅) = µ
E(X
̅ is an unbiased estimator of µ.
Therefore, we can conclude that X
Example 8.2. A sample of size n = 2 is taken from a population with mean, µ and standard deviation,
σ. Let these two observations be denoted by X1 and X2. Three estimators defined below are used to
estimate µ.
1 1 2
θ̂1 = (X1 + X2) θ̂2 = X1 θ̂3 = 3 X1 + 3 X2
2
Solution:
All three estimators satisfy the unbiasedness requirement and their variances are as follows:
1 1 1
VAR(θ̂1 ) = [VAR(X1) + VAR(X2)] = (σ2 + σ2) = σ2
4 4 2
VAR(θ̂2 ) = VAR(X1) = σ2
1 4 1 4 5
VAR(θ̂3 ) = VAR(X1) + VAR(X2) = σ2 + σ2 = 9 σ2
9 9 9 9
θ̂1 has the smallest variance. Therefore it is the most efficient estimator. Please note that, θ̂1 is
actually the sample mean.
Example 8.3. The GPA’s of students taking the CE 204 course, X, are estimated to follow a normal
distribution. From a random sample of size n = 36, ̅ X and s are computed as 2.6 and 0.30,
respectively.
a) Obtain the 95% and 99% confidence intervals for the population mean, µ.
b) What is the minimum sample size if the error (i.e. difference between the estimate and the true
value) is at most 0.05 with 98% confidence.
Solution:
a) For 95% confidence level α = 0.05. From the standard normal distribution table
8-9
CE 204 (2022-23 Spring Semester) MSY
Note that since σ is unknown, we assumed 𝛔 = s but still used the z distribution, since
n= 36 > 30.
For 99% confidence interval α = 0.01 and z0.005 = 2.575 and the corresponding confidence interval
is: 2.48 ≤ µ ≤ 2.72. As expected the higher confidence level corresponds to a wider interval.
b) Solution:
zα/2 σ 2
The relationship derived earlier, n ≥ [ ] is to be used. 𝛔 ≅ s = 0.30, e* = 0.05, α = 0.05.
e∗
1.96 𝑥 0.30 2
n ≥[ ] → n ≥ 11.762 = 138.3 ≅ 139
0.05
Example 8.4. Let X denote the lifetime of batteries, in hours, produced by a certain company. X is
normally distributed. Based on a random sample of size n = 81, X ̅ and s are computed as 600 hours
and 40 hours, respectively. Write down the 95% confidence interval for the mean lifetime of the
population, i.e. for µ.
Solution:
For 95% confidence level α = 0.05. From the standard normal distribution table. z0.05/2 = z0.025 = 1.96.
σ is unknown. Therefore we assume σ ≅ s = 40 hours and still use the z distribution since
n = 81 > 30. Accordingly, the 95% confidence interval is:
40
µ: 600 ± 1.96 → µ: 600 ± 8.71
√81
Assume now the sample size n = 5 and you have observed the same sample values, i.e., ̅ X = 600
hours and s = 40 hours. Since σ is unknown, we will assume 𝛔 = s = 40 and use the t value, because
n = 5 < 30.
40
µ: 600 ± 2.776 → µ: 600 ± 50
√5
If by mistake the z distribution were used, the 95% confidence interval would be much smaller
as computed below:
40
µ: 600 ± 1.96 → µ: 600 ± 35 → 565 ≤ µ ≤ 635
√5
8-10
CE 204 (2022-23 Spring Semester) MSY
Example 8.5. The sample data obtained from two different populations are summarized below:
̅1 = 76; s1 = 6
Population 1: µ1, 𝜎1 : n1 = 50; X
Obtain the 96% confidence interval for the difference between two population means (µ1 ‒ µ2).
Solution:
For 96% confidence level α = 0.04. From the standard normal distribution table
z0.04/2 = z0.02 = 2.054. Accordingly, the 96% confidence interval is:
62 82
(µ1 − µ2): (76 − 82) ± 2.054 √ +
50 75
Example 8.6. The sample data obtained from two different populations are summarized as follows:
Population 1: µ1, 𝜎1 : n1 = 4; ̅
X1 = 74; s1 = √132.67
̅ 2 = 60; s2 = √93
Population 2: µ2, 𝜎2 : n2 = 3; X
Obtain the 95% confidence interval for the difference between two population means (µ1 − µ2).
Solution:
sp = 10.81
t 0.025,5 = 2.5706
1 1
(µ1 − µ2): (74 − 60) ± 2.5706 x 10.81 √ +
4 3
1 1
(µ1 − µ2): 14 ± 2.5706 x 10.81 √ +
4 3
8-11
CE 204 (2022-23 Spring Semester) MSY
Example 8.7. From a normally distributed population, a random sample of size n = 11 is selected.
If the sample variance is computed as s2 = 3.6, construct a 95% confidence interval for the population
variance.
Solution:
(n−1) s2 (n−1) s2
≤ σ2 ≤
χ2α/2.ν χ2(1−α/2),ν
α = 0.05 and ν = 11 – 1 = 10. Accordingly χ20.025.10 = 20.48 and χ20.975,0 = 3.25. Then,
Example 8.8. In a certain district of Istanbul, 500 people are randomly selected and checked for
Coronavirus. For 160 people the test was positive. Write down the 95% confidence interval for the
proportion of Coronavirus carriers for the whole district.
Solution:
n = 500 and x= 160. Therefore, p̂ = 160 / 500 = 0.32 and α = 0.05. Accordingly the 95% confidence
interval for population proportion becomes.
Example 8.9. A random sample of size n (i.e. X1, X2,…, Xn) is taken from a population
having a parameter denoted by and with the following probability density function:
a) Write down the likelihood function based on the information given above and find the
maximum likelihood estimator (̂ 𝐌𝐋𝐄 ) of the parameter, .
b) Is the maximum likelihood estimator, (̂ 𝐌𝐋𝐄 ) of the parameter you have obtained in
part (a), a biased or an unbiased estimator? Justify and prove your answer showing all the
details of your proof.
c) What is the name of the probability density function given in this problem? If the random
variable T represents time, what physical interpretation can you give to ?
8-12
CE 204 (2022-23 Spring Semester) MSY
Solution:
𝜕 ln[L(t1, …, tn ; )]
= n −1 − ∑i=n
i=1 t i = 0
𝜕
n 1
̂ MLE = =
∑i=n
i=1 ti
̅
T
n n n n
b) E(̂ MLE ) = E( )= ( )=[ 1 ]=[ 1 ]=
∑i=n
i=1 ti ∑i=n
i=1 E(ti ) ∑i=n
i=1 ( ) n( )
Therefore, ̂ MLE is an unbiased estimator of .
c) The name of the probability density function given in this problem is Exponential. is the mean
rate of occurrence and its reciprocal is the mean recurrence time, time or expected time to the first
occurrence or mean time interval between events.
8-13
CE 204 (2022-23 Spring Semester) MSY
Chapter 9
FITTING PROBABILISTIC MODELS TO OBSERVED DATA
9.1. INTRODUCTION
It is important to identify the appropriate probabilistic model that describes the observed data
best. This can be done through the tools of descriptive statistics by displaying the data in
graphical formats (e.g. histograms, bar charts, etc.) and then visually fitting a distribution to the
resulting plots. In Fig. 9.1, such a case is illustrated for sample data displaying normality. This
method is simple but quite subjective. Other simple tools are probability papers, P-P and Q-Q
plots. Probability papers are available for each type of probability distribution. To give an idea
the probability paper for normal distribution is shown in Fig. 9.2. If the empirical cumulative
distribution function derived from the observed data, when plotted on the corresponding
probability paper follows a linear trend, then the selected distribution is acceptable. The same
procedure can be implemented by using the P-P and Q-Q plot tools that are readily available in
most of the statistical software packages. The sample data displayed in Fig. 9.1 is transformed
into a Q-Q plot (Fig. 133). The resulting straight line indicates that the data does fit a normal
probability plot and hence the sample comes from a Normal distribution. More detailed
information on these methods can be found in Chapter 7 of the reference book by Ang and
Tang (2007). Here we will concentrate on the most popular statistical procedure, namely: The
Chi-Square Goodness of Fit Test. A nonparametric method, called the Kolmogorov-Smirnov
Test, will also be presented briefly.
9-1
CE 204 (2022-23 Spring Semester) MSY
(𝐟𝐨𝐢 − 𝐟𝐞𝐢 )𝟐
𝛘𝟐 = ∑𝐤𝟏 𝐟𝐞𝐢
Example 9.1
As an example, assume you are planning to play backgammon with your friend at his home.
Since you do not trust him very much you want to be sure that the dice are not loaded. For this
purpose, you randomly select a die and roll it 120 times. Actually, you are taking a sample of
size n = 120. The outcome of 120 rolls is given in Table 9.1.
9-2
CE 204 (2022-23 Spring Semester) MSY
The sum of the values given in the last column is the appropriate statistic to check the fairness
of the die. At this point, we state the chi-square goodness of fit test more formally through the
following theorem.
Figure 9.3 Q-Q plot for the sample data described by the histogram given in Fig. 1
Theorem: A chi-square goodness of fit test between observed and expected frequencies is
based on the following 𝛘𝟐 test statistic:
(𝐟𝐨𝐢 − 𝐟𝐞𝐢 )𝟐
𝛘𝟐 = ∑𝒌𝟏 𝐟𝐞𝐢
where, 𝛘𝟐 is a value of the random variable whose sampling distribution is approximately chi-
square. Note that the chi-square goodness of fit test is a right-tailed test and for a small 𝛘𝟐 value
the fit will be good (accept the null hypothesis, H0) whereas for a large 𝛘𝟐 value the fit will be
poor (reject the null hypothesis, H0). In selecting the 𝛘𝟐 value, besides the significance level, α,
the degrees of freedom, ν is needed. Here, the degrees of freedom will be computed from the
following relationship:
ν = k –1 – r
where, k: number of cells, r: number of parameters computed based on the sample data. The
subtraction of 1 result from the fact that 1 degree of freedom is lost due to ∑𝐤𝟏 𝐟𝐨𝐢 = ∑𝐤𝟏 𝐟𝐞𝐢 . For
different distributions the appropriate degrees of freedom values are given below:
i) Poisson: If λ is given, ν = k – 1; if not given and computed from the sample data,
ν = k – 1 – 1 = k –2.
ii) Binomial: If p is given, ν = k – 1; if not given and computed from the sample data,
ν = k – 1 – 1 = k –2.
9-3
CE 204 (2022-23 Spring Semester) MSY
iii) Normal: If μ and σ are given, ν = k – 1; if not given and computed from the sample data,
ν = k – 1 – 2 = k – 3.
The following are the basic steps to be followed in conducting the chi-square goodness of fit
test:
i) State the null hypothesis, H0, i.e. state the type of distribution proposed.
(𝐟𝐨𝐢 − 𝐟𝐞𝐢 )𝟐
iii) Compute the 𝛘𝟐 = ∑𝒌𝟏 value. Denote this computed value by, 𝛘𝟐𝐜 .
𝐟𝐞𝐢
iv) Compute ν. Based on this computed value of ν and the selected α value, obtain, 𝛘𝟐𝛂,𝛎 from
the 𝛘𝟐 distribution table. Denote this table value by, 𝛘𝟐𝐭 .
In the application of the chi-square goodness of fit test, it is recommended that the total number
of observations be greater than 50 for a good approximation of the chi-square distribution. Also
for each cell both observed and expected frequencies should be at least 5; if not make grouping
by combining the neighbouring cells.
i) H0: The underlying distribution is uniform, with pi = 1/6, for i = 1, 2,…, 6 (i.e. Die is fair).
iv) Since, pi’s are given and the number of cells, k = 6, ν = 6 – 1= 5. Accordingly, from the
𝛘𝟐 distribution table we obtain, 𝛘𝟐𝟎.𝟎𝟓,𝟓 = 11.07.
v) Check whether 𝛘𝟐𝐜 > 𝛘𝟐𝐭 →1.70 < 11.07. Therefore, accept H0. The distribution is uniform.
We conclude that the die is fair. The graphical description of the test is shown in Fig. 9.4.
Within the context of this example, it will be illustrative to explain the meaning of the
significance level, α, which is the probability of committing a Type I error. Type I error is
defined as the probability of rejecting H0 when H0 is true. In other words, the probability of
rejecting the hypothesis that the distribution is uniform (die is fair) when it is actually uniform
(die is fair) is assigned quite a low value of 5%. In this way, you are trying to avoid an unjust
evaluation of your friend by keeping the probability of blaming him for using a loaded die
although the die is fair.
9-4
CE 204 (2022-23 Spring Semester) MSY
𝝌𝟐𝟎.𝟎𝟓,𝟓 = 11.07
Example 9.2
Consider the sample data given in Fig. 9.5, in the form of a bar chart, for the number of defective
items (X) observed in a sample of size, n = 60. Is it possible to claim that the underlying
population has a Poisson distribution (with λ=0.85 computed from the sample data) at a 5%
significance level?
Bar Chart
30
25
Frequency
20
15
10
0
0 1 2 3
No. of Defected
Number Items,
of Defective X
Items
Solution:
9-5
CE 204 (2022-23 Spring Semester) MSY
iv) Since λ is also computed from the sample data and the cell size k = 4, ν = 4 – 1 – 1 = 2.
Accordingly, from the 𝛘𝟐 distribution table we obtain, 𝛘𝟐𝐭 = 𝛘𝟐𝟎.𝟎𝟓,𝟐 = 5.99.
v) Since 2.3181 ≤ 5.99, the underlying probability model can be adopted as the Poisson
distribution at α =5% significance level. Therefore, accept H0 and conclude that the distribution
is Poisson.
Example 9.3
A sociologist studying various aspects of the personal lives of families living in Turkish villages
had a sample consisting of 150 families having four children. The distribution (i.e. observed
frequencies) of the number of girls in those families is summarized in Table 9.3a. From the
given data probability of occurrence of the event (in this case, having a girl), p, is computed as
0.32.
a) Find the expected frequencies of X (i.e., number of girls) assuming a binomial distribution
with p=0.32, Round the expected frequencies you computed to the nearest integer.
9-6
CE 204 (2022-23 Spring Semester) MSY
b) Perform chi-square goodness of fit test to check whether the Binomial distribution is a good
fit to the given data at the α = 5% significance level. Assume that p is not given but computed
from the data as stated above.
Solution:
𝟒!
a) P(X=x) = 0.32x 0.684−x
(𝟒−𝒙)!𝒙!
p(0) = 0.2138 fe = 0.2138 x 150 = 32.07 ≅ 32
0 30 32 4 0.125
1 62 60 4 0.067
2 46 43 9 0.209
3+4 10 +2 = 12 13 +2 =15 9 0.600
4 2 2 0 0
Total = 150 Total = 150 1.001
𝛘𝟐𝐜 = 1.001
ν = 4 – 1 − 1= 2
α = 0.05
Therefore: Accept H0 at α =0.05 significance level, where, H0: X is binomially distributed with
p = 0.32.
9-7
CE 204 (2022-23 Spring Semester) MSY
Example 9.4
Data obtained based on a sample size of n = 100 for the random variable, X, is summarized in
the first three columns of Table 9.4, below. Check the claim at α = 1% significance level that
the underlying population has a normal distribution.
For this example, k = 5, μ and σ are computed from the data; therefore the degrees of freedom
is: ν = k-r-1= 5-2-1 = 2 and α = 0.01. The corresponding table value, denoted by, 𝛘𝟐𝟎.𝟎𝟏,𝟐 is:
9.21. Since 16.95 > 9.21, the claim that the probability distribution is Normal will be rejected
at α = 1% significance level. The distribution is not normal.
This is a non-parametric goodness of fit test. In this test, we compare the observed cumulative
frequency with the cumulative frequency computed from the postulated theoretical distribution.
Assume n observations from an unknown probability distribution are taken. Let 𝐒𝐧 (𝐱 𝐢 ) be the
observed cumulative frequency distribution function and F(xi) be the computed cumulative
probability values obtained from the hypothesized (theoretical) probability distribution. The
steps involved in the implementation of the Kolmogorov-Smirnov test are as follows:
9-8
CE 204 (2022-23 Spring Semester) MSY
= 𝟎 𝐱 ≤ 𝐱𝟏
𝐤
𝐒𝐧 (𝐱 𝐢 ) = { 𝐧 𝐱 𝐤 ≤ 𝐱 ≤ 𝐱 𝐤+𝟏
= 𝟏 𝐱 ≥ 𝐱𝐧
Table 9.5 Critical values of 𝐃𝐧𝛂 at significance level for the Kolmogorov-Smirnov test
9-9
CE 204 (2022-23 Spring Semester) MSY
Example 9.5
Table 9.6 Computation of the K-S test statistic for the data given in Example 9.2
From Table 9.6, Dn = 0.0407. For the Kolmogorov-Smirnov test with n = 60 and α =5%, from
𝟏.𝟑𝟓𝟖𝟏
Table 9.5, 𝐃𝟔𝟎
𝟎.𝟎𝟓 = = 0.175. Since 0.0407 ≤ 0.175, the Poisson distribution can be
√𝟔𝟎
accepted as the probability model at α = 5%, significance level. Both chi-square goodness of fit
and the Kolmogorov-Smirnov tests confirmed the Poisson distribution at a significance level of
α = 5%.
9-10
CE 204 (2022-23 Spring Semester) MSY
Chapter 10
BASIC CONCEPTS OF SIMPLE LINEAR REGRESSION
10.1. INTRODUCTION
Regression analysis between two variables, which are called as the dependent and independent
variables, is carried out basically to achieve the following:
• To determine whether these two variables are related mathematically to each other and if so
what is the degree of relationship between them (quantified by the correlation coefficient).
• If there is a certain degree of correlation between them, then derive an equation expressing the
dependent variable in terms of the independent variable (usually by a straight line fitted to the
scatter diagram) which is called the prediction equation (regression line or prediction line if
the fitted function is a line).
• To apply the derived relationship to predict the value of the dependent variable corresponding
to any given value of the independent variable.
Assume n pairwise observations (xi, yi) are obtained randomly from a normal population. Let
the function describing the relationship between the two random variables X and Y be described
by the following equation:
Y = g(X) (10.1)
Here Y: dependent variable, X: independent (prediction) variable and g(x): prediction function.
If there is only one independent variable and the prediction equation is linear, then it is referred
to as simple linear regression. In the simple linear regression problem, let us express the equation
of the line as:
Y=α+βx (10.2)
where, α = intercept and β = slope of the regression line. Eq. 10.2 is transformed to the following
probabilistic expression:
E (Y|X) = α + βx (10.3)
In this form, we can interpret α and β as the intercept and slope of the population regression
line. The aim here is to derive from the sample data the sample regression line, defined as
follows:
̂ + β̂ x
ŷ = α (10.4)
where, α̂ and β̂ are the estimators of the parameters of the population regression line, α and β,
respectively.
For any Xi value the corresponding Yi value as estimated from the regression line can be written
as follows:
Yi = α + β xi + εi (10.5)
10-1
CE 204 (2022-23 Spring Semester) MSY
εi = Yi – E(Yi|Xi) (10.6)
Furthermore, εi is assumed to have a normal distribution with mean 0 and variance, σ2 , i.e.
εi: N(0, σ2 ).
The statistical procedure for finding the best-fitting straight line for a set of points is based on the
principle of least squares, which may be stated as follows: The best-fitting line is the one that
minimizes the sum of squares of deviations of the observed values of y from those predicted. Expressed
mathematically, the aim is to minimize the sum of the squared errors given by:
̂ + β̂ xi )]2
SSE = ∑ni=1[yi - (α (10.8)
The numerical values of α ̂ and β̂ that minimizes SSE is obtained based on differential calculus, by
̂ and ̂β and setting equal to zero. From the simultaneous solution
differentiating SSE with respect to α
of the resulting two equations, the following formulas are obtained for the least squares estimators of
the slope and the intercept:
SSxy
̂β = and α ̅–βX
̂ = Y ̅ (10.9)
SSxx
where,
(∑n n
i=1 xi )(∑i=1 yi )
SSxy = ∑ni=1(xi – ̅ ̅ ) = ∑ni=1 xi yi ‒
X) (yi – Y (10.10)
n
is the sum of cross products around the respective mean values, and
2
(∑n
i=1 xi )
̅)2 = ∑ni=1 xi2 ‒
SSxx = ∑ni=1(xi – X (10.11)
n
2
(∑n
i=1 𝑦i )
̅)2 = ∑ni=1 yi2 ‒
SSyy = ∑ni=1(yi – Y (10.12)
n
10-2
CE 204 (2022-23 Spring Semester) MSY
𝟐
The conditional sample variance of Y given X is 𝐬𝐘/𝐗 (also denoted by s2) is used as an estimator of
the population variance, σ2. Here, σ2 measures the variation of y values around the mean line E (Y|X)
2
= α + βx. We estimate σ2 by sY/X , which can be computed from the following relationship:
2 SSE ∑n ̂i )2
i=1(yi − y
sY/X = = (10.13)
n−2 n−2
where,
(SSxy )2
SSE = SSyy – α
̂ SSxy = SSyy – (10.14)
SSxx
To illustrate the implementation of the equations introduced up to now, data on the advertising
expenditures and sales volumes for a certain firm during 10 randomly selected months, which is given
̂, β̂ and sY/X
in Table 10.1 and also displayed in Fig. 10.1, is to be used. Here, α 2
will be computed, the
regression line will be constructed and finally using this regression line the sales volume (Y)
corresponding to an advertising expenditure of (X) $1.0 x 104 in a month will be estimated. All the
necessary computations are carried out as shown in Table 10.2.
SSxy 23.34
Accordingly, ̂β = = = 52.57
SSxx 0.444
̂ = ̅
α Y–β̅
X = 95.9 – 52.57 x 0.94 = 46.49
Now using this regression equation it is possible to estimate the sales volume corresponding to an
advertising expenditure of $1.0 x 104 in a month as follows:
9592
For s2Y/X , first we need to compute SSyy = 93569 ‒ = 1600.9
10
23.342
Then, from Eq.10.14, SSE = 1600.9 - = 373.97
0.444
2 373.97
sY/X = = 46.75
10−2
and
10-3
CE 204 (2022-23 Spring Semester) MSY
Example 10.1
Table 10.1
Figure 10.1
10-4
CE 204 (2022-23 Spring Semester) MSY
a) Confidence Interval:
β̂ : N (β, σ/√SSxx).
The slope of the regression line is more important than the intercept, since it is an indicator of the degree
of a linear relationship. In this respect, we can construct confidence intervals and conduct hypothesis
tests for β, as summarized below:
σ
β: ̂β ∓ zα/2 (10.15)
√SSxx
If σ is unknown sample standard deviation, sY/X together with the t statistic is to be used as given below:
sY/X
β: ̂β ∓ tα/2,ν (10.16)
√SSxx
Example 10.2
Write down the 95% confidence interval for β based on the data given in Example 10.1.
6.84
β: 52.57 ∓ 2.306
√0.444
β: 52.57 ∓ 23.67
b) Hypothesis Testing:
β = 0 indicates that Y does not depend on X. Thus generally we test the following null hypothesis Ho:
H1: β ≠ 0
10-5
CE 204 (2022-23 Spring Semester) MSY
̂β−0
The test statistic to be computed from the data is: z (or t) = σ (or s (10.17)
Y/X )
√SSxx
Example 10.3
Consider the data given in Example 10.1 and check the validity of the null hypothesis H0 claiming that
β = 0, at a 5% significance level.
Ho: β = 0
H1: β ≠ 0
α = 0.05
52.57− 0
t= 6.84 = 5.12 > 2.306.
√0.444
Therefore, reject H0, concluding that β ≠ 0. This conclusion indicates that there is a significant linear
relationship between X and Y.
1 ̅̅̅2
(Xp − X)
̂ + β̂ xp) ∓ tα/2, n-2 sY/X √ +
E(Y/xp): (α (10.18)
n SSxx
Example 10.4
Write down the 95% confidence interval for E(Y/xp = $ 1x104) based on the data given in Example
10.1.
Utilizing Eq. 10.18 and taking t 0.025,8 = 2.306 and sY/X = 6.84
10-6
CE 204 (2022-23 Spring Semester) MSY
1 ̅̅̅2
(Xp − X)
̂ + β̂ xp) ∓ tα/2, n-2 sY/X √ +
Y/xp: (α +1 (10.19)
n SSxx
Example 10.5
Write down the 95% prediction interval for Y/xp = $ 1x104 based on the data given in Example 10.1.
Utilizing Eq. 10.19 and taking t 0.025,8 = 2.306 and sY/X = 6.84
It is to be noted that the prediction interval corresponding to a particular value is much larger than the
confidence interval corresponding to the expected value for the same level. This is because the
variability in the prediction of a particular value involves more uncertainty than the prediction of the
expected (mean) value. In both cases, the width of the interval (prediction/confidence band) is a
minimum when xp = ̅ X.
The correlation coefficient of the sample, denoted by R (or r) is an estimator of the population
correlation coefficient (ρ), which is a measure of the degree of linear dependence and bounded by: ±
1, i.e. ‒ 1 ≤ ρ ≤ 1.
where, RSS: regression sum of squares which measures the total variability explained by the
independent variable (due to regression) and when added to the sum of squared errors (SSE) yield to
the sum of squares of y (SSyy) which represents the original total variability in the dependent random
variable Y. Note:
The correlation coefficient R measures the degree of linear dependence, whereas the coefficient of
determination gives the percent of the total variation explained by the independent (predictive) variable
through regression.
10-7
CE 204 (2022-23 Spring Semester) MSY
Example 10.6
Find the coefficient of determination and correlation coefficient associated with the variables involved
in Example 10.1 by using the computed values summarized in Table 10.2.
Thus we can conclude that 76.6% of the total variability in the total sales volumes (dependent variable)
is explained by (or attributed to) the advertising expenditures. On the other hand, we can quantify the
degree of linear dependence by the correlation coefficient, which equals: R = r = √0.766 = 0.875.
An alternative way to compute the correlation coefficient is based on the concept of covariance
expressed as follows:
For the sake of completeness the following relationships associated with the intercept of the regression
line, α are given:
1 ̅
X
α: ̂α ∓ zα/2 σ √ + (10.23)
n SSxx
If σ is unknown, the sample standard deviation, 𝐬𝐘/𝐗, together with the t statistic is to be used as given
below:
1 ̅
X
α: ̂α ∓ tα/2,ν sY/X √n + SS (10.24)
xx
xi yi
2 1
3 3
5 4
7 7
9 10
10-8
CE 204 (2022-23 Spring Semester) MSY
c) Test the hypothesis that there is no linear relationship between x and y at α = 0.05 level (i.e. check
the hypothesis H0: β = 0 versus H1: β ≠ 0).
d) Find a 95% confidence interval for β.
e) Find a 95% confidence interval for E(y│x = 6).
f) Give the 95% prediction interval for the particular value of y when x = 6. Compare your result with
part (e) and comment.
g) Find the coefficient of determination and correlation coefficient and interpret your results.
ANSWERS:
a) ŷ = − 1.3415 + 1.2195 x ~ 𝐲̂ = − 1.34 + 1.22 x
When x = 6, 𝐲̂ = 5.98
2
b) sY/X = s2 = 0.4065; sY/X = s = 0.6376 ≅ 0.64
1 (6−5.2)2
e) E(y│x = 6): 5.98 ± 3.1824 x 0.64 √( )+
5 32.8
E (y│x = 6): 5.98 ± 3.1824 x 0.2987 = 5.98 ± 0.95 ~ 5.03 ≤ E (y│x = 6) ≤ 6.93
1 (6−5.2)2
f) (y│x = 6): 5.98 ± 3.1824 x 0.64 √1 + ( ) +
5 32.8
Correlation coefficient, R is close to 1.0; therefore, there is a strong positive linear relationship between
random variables X and Y.
10-9
CE 204 (2022-23 Spring Semester) MSY
Coefficient of determination, R2 = 0.98772 = 0.9756 ≅ 98%. This indicates that 98% of the total
variability in Y is explained by the random variable X.
Consider the following data on the number of hours which 10 students studied for the CE 204 final
examination and their grades on this examination:
Based on this data the following (least squares) equation is obtained: Y = 21.69 + 3.47 X
SXX = SSX = 376; SYY = SSY = 4752; SXY = SSXY =1305; SSE = 222.7
a)Test the null hypothesis that the slope of the regression line is equal to 3.0 versus the alternative
hypothesis that it is different than 3.0 at a 0.01 significance level.
(Answer: Accept H0. The slope of the regression line is equal to 3.0)
b) Obtain the 95% prediction interval for a student who has worked for 15 hours.
(Answer: 60.59 ≤ Y ̂15 ≤ 86.89)
10-10