0% found this document useful (0 votes)
36 views

Ce 204 Lecture Notes_spring 2022-2023

Uploaded by

dumut2728
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views

Ce 204 Lecture Notes_spring 2022-2023

Uploaded by

dumut2728
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 153

Spring Semester 2022-2023

CE 204
UNCERTAINTY AND DATA
ANALYSIS

SUPPLEMENTARY
LECTURE NOTES

M. Semih YÜCEMEN
Department of Civil Engineering
Middle East Technical University

February, 2023
© 2023 Mehmet Semih Yücemen All Rights Reserved
All publication rights of this work belong to the author.

PREFACE

These Supplementary Lecture Notes are for the use of the students enrolled in the CE 204
Uncertainty and Data Analysis course given during the Spring 2022-2023 semester at the
Department of Civil Engineering, Middle East Technical University. They may be used for
educational purposes without the written permission of the author by citing the source, which
is:
“Yücemen, M. S, Supplementary Lecture Notes, CE 204 Uncertainty and Data Analysis, Spring
2022-2023 Semester, Department of Civil Engineering, Middle East Technical University,
Ankara, Tukey, February, 2023”.
Since the main reference book for the course is Probability Concepts in Engineering Planning
and Design, Vol. I, First edition, 1975/ Second edition, 2007, by A.H.-S. Ang and W.H. Tang,
Wiley, a few examples are directly taken from the two editions of this book.
I have taught this civil engineering oriented basic statistic course together with a number of
colleagues for many years, in multiple sections. I thank all of them, but especially Dr. Engin
Karaesmen who authored Chapter 7 and Prof. Dr. Tuğrul Yılmaz for his contributions and
suggestions to some of the example problems in Chapter 6. A number of teaching assistants
also contributed by carrying out the numerical calculations of a few number of examples.
I commemorate with gratitude my co-advisors Prof. Alfredo. H-S. Ang and the late Prof. Wilson
Tang, who were among the pioneers of Structural Reliability, and Risk and Decision Analysis
in systems planning and design, for leading me to focus on these topics.

M. Semih Yücemen
Ankara, February, 2023
CE 204 (2022-23 Spring Semester) MSY

CONTENTS

Chapter 1 INTRODUCTION
Chapter 2 BASIC PROBABILITY CONCEPTS AND RULES
Chapter 3 RANDOM VARIABLES
Chapter 4 IMPORTANT PROBABILITY DISTRIBUTIONS
Chapter 5 MULTIPLE (MULTIVARIATE) RANDOM VARIABLES
Chapter 6 FUNCTIONS OF RANDOM VARIABLES
Chapter 7 BRIEF INFORMATION ON STATISTICS**
Chapter 8 SOME BASIC CONCEPTS OF STATISTICAL INFERENCE
Chapter 9 FITTING PROBABILISTIC MODELS TO OBSERVED DATA
Chapter 10 BASIC CONCEPTS OF SIMPLE LINEAR REGRESSION

______________________
**This chapter is originally prepared by Dr. Engin Karaesmen within the scope of the undergraduate
course, CE 204 Uncertainty and Data Analysis.

ii
CE 204 (2022-23 Spring Semester) MSY

Chapter 1
INTRODUCTION

1.1. IMPORTANCE OF UNCERTAINTY ANALYSIS: PROBABILITY AND


STATISTICS IN CIVIL ENGINEERING

The rational treatment and assessment of uncertainties in civil engineering have received
particular attention in the last six decades. In most cases, loading conditions, material
properties, geometry, and various other parameters show considerable variations. Observations
and measurements of physical processes as well as parameters exhibit random characteristics.
On top of these, modelling, workmanship, and human errors (gross errors) create additional
uncertainties.
Uncertainties yield to the risk of unsatisfactory performance and failure, which may cause loss
of life and property. Therefore, the management of risk is the most important issue, not only in
civil engineering but also in every other field, even in our daily lives. Statistical and
probabilistic procedures provide a sound framework for a rational basis for processing these
uncertainties.
Uncertainties and their effect on the safety of engineering structures can only be evaluated
rationally through probabilistic and statistical methods. Accordingly, the design and analysis of
structural systems have to be based on “stochastic” concepts.

Historical Development and Main Contributors


The pioneers of Structural Reliability can be listed as follows. The list is not comprehensive,
and several names could be missing.
• Freudenthal, University of Columbia, 1945
• Ferry Borges, Portugal, 1952
• Shinozuka, University of Columbia, 1960
• Benjamin, Stanford University, 1960
• Bolotin, Russia, 1960
• Ang, Amin, Tang, Wen, University of Illinois
• Cornell, Crandall, Davenport, Vanmarcke, Veneziano, MIT
• Shah, Stanford University
• Lind, Turkstra, University of Waterloo, Canada
• Ditlevsen, Denmark
• Rackwitz, Germany
• Rosenblueth, Esteva, Mexico

1-1
CE 204 (2022-23 Spring Semester) MSY

In the following, the classical (deterministic) and probabilistic (stochastic) approaches to civil
engineering problems are compared, emphasizing the merits of the probabilistic methods and
the deficiencies of the deterministic approach.
Classical (Deterministic) Approach
(i) It is not possible to quantify uncertainties explicitly.
(ii) Load (demand) and resistance (capacity) parameters are single-valued (deterministic) and
safety factors are used for achieving safety.
(iii) The risk (failure probability) associated with the design is unknown.
(iv) There is no systematic procedure for adjusting the safety factor based on the additional
information and data acquired.

Probabilistic (Stochastic) Approach


(i) It is possible to directly assess, analyze and quantify uncertainties.
(ii) Load and resistance parameters are treated as random variables.
(iii) It is possible to assess the risk involved in the design.
(iv) It is possible to combine different sources of information and data (observational,
subjective/expert opinion, etc.) to update the level of uncertainties and select the parameter for
which additional information will be most useful.
(v) It is possible to assess the reliability of components by considering all of the failure modes
and evaluating the safety of structural systems based on the reliabilities of the components.
(vi) Risk-benefit tradeoffs could be taken into consideration through decision theory.

1.2. HAZARD, VULNERABILITY AND RISK


The key terms are defined in the following:
HAZARD: The probability of occurrence of a potentially damaging phenomenon.
Identification of hazards is the first step in performing a risk assessment, though risk assessment
may not be necessary in some cases.

VULNERABILITY: The expected degree of loss resulting from the occurrence of the
phenomenon.

RISK: The likelihood or probability of a given hazard of a given level causing a particular level
of loss of damage. The elements of risk are: populations, communities, the built environment,
the natural environment, economic activities, and services that are under threat of disaster in a
given area. Mathematically total risk can be written as:

TOTAL RISK = (Sum of the elements at risk) x (hazard x vulnerability)

1-2
CE 204 (2022-23 Spring Semester) MSY

Uncertainty and Risk


➢ Occurrence, intensity and system response to natural and man-made hazards are
uncertain.
➢ Uncertainty leads to risk.
➢ Risk reduction comes at a cost.
➢ Risk cannot be eliminated. It must be managed.
➢ Uncertainties and their effect on the reliability of engineering systems can only be
evaluated rationally through probabilistic and statistical methods.
➢ Accordingly, the design and analysis of systems have to be based on "stochastic"
concepts.

1.3. CLASSIFICATION OF UNCERTAINTIES


➢ Inherent randomness (aleatory): Uncertainty explicitly recognized by a stochastic
model (irreducible)
➢ Knowledge-based (epistemic): Uncertainty in the model itself and its descriptive
parameters (reducible)
The distinction is somewhat arbitrary. All sources of uncertainty should be properly accounted
for in the risk analysis and displayed as a part of the decision framework. In the following
(Table 1.1), the terms used in the literature to describe these two types of uncertainties are listed
together with the corresponding terms in Turkish.
Terms Used in the Literature to Describe the Two Types of Uncertainties
Table 1.1 Terms Used in the Literature to Describe the Two Types of Uncertainties

Terms referring to uncertainty due to Terms referring to uncertainty due to lack of


naturally variable phenomena in time or knowledge, data or understanding
space “Uncertainties of Nature” “Knowledge Uncertainty”

(Doğal olarak zaman ve mekânda değişkenlik (Bilgi, veri ya da anlama eksikliğinden


gösteren olaylardaki belirsizlikler için kaynaklanan belirsizlikler için kullanılan
kullanılan terimler). terimler).
“Doğal/Rassal Belirsizlik” “Bilgi Belirsizliği”

Aleatory uncertainty (Aleatory belirsizlik) Epistemic uncertainty (Epistemik belirsizlik)


Natural /Inherent variability (Doğal/ İşin
Knowledge uncertainty (Bilgi belirsizliği)
doğasında olan değişkenlik)
Random or stochastic variation (Rassal ya da
Functional uncertainty (İşlevsel belirsizlik)
stokastik değişim/değişkenlik)
Objective uncertainty (Objektif /Nesnel Subjective uncertainty (Sübjektif /Öznel
belirsizlik) belirsizlik)
Chance (Şans) Systematic error (Sistematik hata)

1-3
CE 204 (2022-23 Spring Semester) MSY

In Table 1.2, several examples of the different types of uncertainties encountered in civil
engineering areExamples
given. for Different Types of Uncertainties
Table 1.2 Examples of Different Types of Uncertainties
Uncertainties in Loads Examples

Self-weight of the structure (dimensions, unit weight of


Operational loads the building material),
Live Load (human, furniture, equipment weight)

Wind load (wind speed)


Earthquake Load (ground motion parameter, ground
Environmental loads
motion prediction equation)
Tsunami (peak coastal wave amplitude)

Fire, explosion
Accidental loads
Moving vehicle impact

Uncertainties in Resistance Examples

Strength of materials (steel, concrete, wood, soil),


Building Materials fatigue effect, dimensions

Properties of soil Strength and other properties of soil

Uncertainties in Modeling Examples (Slope stability analysis)

Differences between laboratory and in-situ conditions


of soil (rate of loading, dimensions, stress)
Anisotropy
Determination of the critical slip surface
Progressive failure effect
Model of ground profile
Assumptions on drainage conditions
Two-dimensional (planar deformation) analysis instead
of three-dimensional slip analysis

Human Error Examples


Construction and workmanship errors
Engineering errors

Statistical Uncertainty Example

Inadequate number of samples taken from the


construction site

1-4
CE 204 (2022-23 Spring Semester) MSY

Best Point Estimate and Measures of Uncertainty and Variability


The following statistical parameters are used as the best estimate of the unknown value of an
engineering parameter and the associated uncertainties:
Mean value: μ (Best point estimate)
Variance: σ2
Standard deviation: σ (Square root of variance)
Coefficient of Variation: (c.o.v.) = 𝜹 = σ/μ: (Standard deviation) / (Mean value)
c.o.v. is a dimensionless measure of uncertainty and variability.
In the following example, these statistical parameters are illustrated within the context of
reinforced concrete beams.

Example 1.1

Statistics of the basic variables involved in the calculation of flexural


and shear capacities of reinforced concrete beams

Nominal Mean to Aleatory Epistemic Total


Parameter (Specified) Value Nominal ratio Uncertainty Uncertainty Uncertainty

Compressive strength of concrete (fc) 21.55 MPa 1.25 0.105 0.14 0.18

Yield strength of BC III (fy) 365 MPa 1.24 0.038 0.08 0.09

Beam width (bw) 250-600 mm 0.998 0.045 0.03 0.054

Beam depth (h) 300-1150 mm 0.996 0.025 0.03 0.04

Effective depth (dbe) 250-1100 mm 1.00 0.024 0.07 0.074

Reinforcement area (As) 100-4000 mm2 1.00 0.012 0.03 0.03

1.4.
1.4 MEASURING
MEASURINGRISKRISK
USING USING
BASIC RELIABILITY THEORY
RELIABILITY THEORY
Let R and S be two random variables describing capacity and demand. Referring
to the following figure, consider the following definitions of failure and limit state:

Schematic representation of the reliability problem

Failure: R ≤ S 1-5

Limit state: LS = R – S < 0


CE 204 (2022-23 Spring Semester)representation of the reliability problem
Schematic MSY

Failure: R ≤ S

Limit state: LS = R – S < 0

P[LS] = P [ R – S < 0 ] = PF

Where P(.) = probability. As a measure of safety the reliability index, 𝛃 is defined as


follows:
μR − μS
β=
σ 2R + σ 2S

where:

LS = limit state; µ, 𝛔 = mean, standard deviation; 𝛃 = reliability index

If R and S are assumed to be independent normally distributed random variables, then:

(μ R − μ S )
pF = 1 − 
(σ 2R + σ 2S )1 2

where, pF = failure probability;  = cumulative distribution function for standard normal


distribution

1.5. QUANTITATIVE RISK ASSESSMENT (QRA)

The purpose of quantitative risk assessment (QRA) is to calculate a value for the risk to enable
improved risk communication and decision-making.
Several frameworks for QRA have been proposed by many experts. The QRA frameworks have
a common intention to find answers to the following questions (Ho et al. 2000, Lee and Jones
2004):
1) Danger Identification [What are the probable dangers/problems?]
2) Hazard Assessment [What would be the magnitude of dangers/problems?]
3) Consequence/Elements at Risk Identification [What are the possible consequences and/or
elements at risk?]
4) Vulnerability Assessment [What might be the degree of damage in elements at risk?]
5) Risk Quantification/Estimation [What is the probability of damage?]
6) Risk Evaluation [What is the significance of estimated risk?]
7) Risk Management [What should be done?]

1-6
CE 204 (2022-23 Spring Semester) MSY

REFERENCES FOR CHAPTER 1

1. Ho, K., Leroi, E. and Roberds, B. (2000), “Quantitative risk assessment”, Application,
myths and future directions, GeoEng, Technomic Publishing, p. 269-312.

2. Lee, E.M. ve Jones, D.K.C. (2004), Landslide Risk Assessment, Thomas Tilford
Publishing, London.

1-7
CE 204 (2022-23 Spring Semester) MSY

Chapter 2
BASIC PROBABILITY CONCEPTS AND RULES
2.1. BASIC CONCEPTS OF STATISTICS

Statistics is directly related to data and provides the scientific tools and methodology for the
collection of data (sampling and design of experiments), description of data (descriptive
statistics), processing of data to derive the maximum information from the data utilizing methods
of statistical inference (estimation and hypothesis testing). The basic steps of statistical data
analysis are summarized in Fig. 2.1.

RANDOM SAMPLING
Sample Size: n
POPULATION SAMPLE
Parameters: e.g.: , σ ̅, s
Statistics: e.g.: 𝐗

STATISTICAL INFERENCE
Estimation and Hypothesis Testing

Figure 2.1 The basic steps of a statistical data analysis

The basic aim of statistical data analysis is to estimate the unknown values of population
parameters. Population parameters are generally shown by Greek letters. For example, you may
be interested in the mean value (X) and variability (quantified by standard deviation, σX) of the
yield stress of steel bars, X, produced by a certain steel plant. Since it is not possible to test all the
steel bars produced by this steel plant, which in this example forms the population, a random
sample, of size n, is taken from the population and is analyzed. In random sampling, all members
of the population have an equal chance of being selected. Based on this random sample, the sample
mean, 𝐗̅ and sample standard deviation, sX are computed using the following standard equations:

̅ = 1 ∑i=n
X X
n i=1 i

1
sX = √n−1 ∑i=n ̅ 2
i=1 (X i – X)

The values derived from data, like the sample mean, 𝐗 ̅ and the sample standard deviation, sX, are
called statistics and are used to estimate the population parameters X and σX, respectively. This
process is called statistical inference, where the methods of estimation and hypothesis testing are
implemented.

2-1
CE 204 (2022-23 Spring Semester) MSY

2.2. EVENTS AND PROBABILITY

2.2.1. Basic Definitions and Concepts of Set Theory

Set theory is a convenient tool for performing operations on events. Therefore, some concepts and
notations from set theory are given below:

An experiment is defined by:

a) A set of experimental outcomes. Outcomes are defined according to a problem or


context.
b) A collection or set of events resulting from these outcomes with a rule assigning
probabilities to them.

An event is a set of outcomes. An elementary event is a single outcome (a single point on the
sample space), whereas a compound event has multiple outcomes.

Sample space S, is a set of all possible experimental outcomes and it represents the sure event.
Sample space is sometimes referred to as the universal set and denoted by U. The sample space
may also be described by the outcome tree (tree diagram).

Null space (empty set), represents the impossible event.

Events are usually denoted by capital case letters. For example, failure and survival of a structural
element can be denoted by F and S, respectively.

Example 2.1 Assume there are two graders available at a construction site. Let the events F and S
be defined as follows: F: the grader is in a failed condition (cannot operate) and S: the grader is in
a satisfactory condition (can operate). The sample space, which is defined as: S: {SS, SF, FS, FF},
is shown in Fig. 2.2.

S
• SS (Both graders are in satisfactory condition).
• SF (First grader is in satisfactory condition and the other one failed).
• FS (First grader failed and the other one is in satisfactory condition.
• FF (Both graders failed)

Figure 2.2. Sample space for Example 2.1

2-2
CE 204 (2022-23 Spring Semester) MSY

The event A, defined as both graders failed A = {FF} contains only one sample point and is a
simple event. On the other hand, the event B defined as: at least one grader failed, B = {SF, FS,
FF}, contains three sample points, and is a compound event.

The sample space can also be described by an outcome tree (tree diagram) as shown in Fig. 2.3.
Second
Grader
First
Grader
S SS

S F SF

S FS
F
F FF

Figure 2.3 Outcome tree for Example 2.1

2.2.2 Operations with Events

The following are the most common operations with events:

The union of two events will be denoted as A ∪ B = C, where event C corresponds to the outcomes
of either A or B or both. The corresponding Venn diagram is shown in Fig. 2.4. Event C is the
crosshatched area.

A B

Figure 2.4 Venn diagram for the union of the events A and B

The intersection of two events will be denoted as: A∩B = AB = D, where D is the event with
outcomes common to both A and B. The corresponding Venn diagram is shown in Fig. 2.5. Event
D is the crosshatched area.

2-3
CE 204 (2022-23 Spring Semester) MSY

A B

Figure 2.5 Venn diagram for the intersection of events A and B

Complement of any event A is denoted by A ̅ (or Ac or Aʹ ), where, A


̅ (or Ac or Aʹ ) has all the
outcomes in S, except the outcomes of A. The corresponding Venn diagram is shown in Fig. 2.6.
̅ is the crosshatched area.
Event A

̅
A

Figure 2.6 Venn diagram for the complementary event A

2.2.3. Definitions of Probability

Notation for the probability of any event, say A is denoted either by P(A) or Pr(A) throughout
the course.

Subjective Definition of Probability: Based on a scale from 0 to 1 (or 0% to 100%), it is the degree
of one’s belief in the likelihood of occurrence or non-occurrence of an event. Although such a
measure is based on an individual’s judgment without any precise computation, still it may be used
if it is a reasonable assessment by an experienced and knowledgeable person, when other means
of obtaining probability is not possible. Probability based on expert opinion is an example of
subjective probability.

Frequency Definition of Probability: If an experiment is repeated N times and if a certain event,


say A, is observed nA times, then the following ratio yields to Pr(A):

nA
Pr(A) = lim
N→∞ N

Axiomatic Definition of Probability: Although the above definitions of probability are valid, the
proper definition of probability, within the framework of mathematical probability theory, is
based on the following three axioms:

2-4
CE 204 (2022-23 Spring Semester) MSY

a) Pr(A) ≥ 0
b) Pr (S) = 1
c) Pr (A∪B) = Pr(A) + Pr(B), iff Pr(A∩B) = , that is when A and B are mutually exclusive
events. The Venn diagram corresponding to two mutually exclusive events A and B is shown
in Fig. 2.7.

Accordingly, the axiomatic definition of probability is given as follows: Probability is a


mathematical measure, satisfying the three axioms stated above and defined on a scale of 0 to 1
showing the likelihood of occurrence of an event relative to a set of alternatives.

A B

Figure 2.7 Venn diagram for mutually exclusive events A and B

All of the probability rules presented in the following sections are developed based on these three
basic axioms.

d) Pr(A∪B) = Pr(A) + P(B) – Pr(A∩B) iff A and B are not mutually exclusive events, that is
(A∩B)  .

For three events, A, B, and C the above relationship takes the following form:

Pr (A∪B∪C) = Pr(A) + Pr(B) + Pr(C) – Pr(A∩B) – Pr(A∩C) – Pr(B∩C) + Pr(A∩B∩C)

̅) = 1– Pr(A)
e) Pr(A
f) De Morgan’s theorem: P( A  B) = P( A B) and P (AB ) = P( A  B)

Assigning Probabilities to Events: We may either assign empirical or theoretical


(distributional) probabilities to aleatory and if possible, to epistemic uncertainties. Assigning
probabilities to events has evolved from subjective to axiomatic within the history of science.

2.2.4 Some Rules of Probability

Let A and B be any two events.

a) Probability of Union of Events (Addition Rule)

Pr(A∪B ) = Pr(A) + Pr(B) – Pr(A∩B)

2-5
CE 204 (2022-23 Spring Semester) MSY

The keyword is OR. Probability of occurrence of either A or B or both events. The corresponding
Venn diagram is shown in Fig. 2.8.

A AB B

Figure 2.8 Venn diagram showing the events A and B

If A and B are mutually exclusive events, i.e. (A∩B) = , then since Pr(A∩B) = 0,

Pr(A∪B) = Pr(A) + Pr(B)

The addition rule can be generalized for the probability of mutually exclusive n events as follows:

If A1, A2, ..., An are mutually exclusive events, then

Pr(A1  A 2  ....  A n ) = Pr(A1) + Pr(A 2 ) + ... Pr(A n )

Fig. 2.9 illustrates the case where the union of the mutually exclusive events A1, A2, ..., An
constitute the sample space S. In such a case the set of events A1, A2, ..., An is called mutually
exclusive and exhaustive (MEE).

S
A1 A4 An
A5
A2
An-1

Ai
A3

Figure 2.9 The sample space, S, corresponding to the union of mutually exclusive
events, A1, A2, ..., An

b) Probability of Intersection of Events (Multiplication Rule)

Pr(A∩B) = Pr(A/B) Pr(B) = Pr(B/A) Pr(A)

The keyword is AND. Probability of occurrence of both events A and B. The corresponding Venn
diagram is shown in Fig. 2.5.

2-6
CE 204 (2022-23 Spring Semester) MSY

If A and B are statistically independent events, Pr(A/B) = Pr(B) and Pr(B/A) = Pr(B).

Accordingly,

Pr(A∩B) = Pr(A) Pr(B)

c) Statistical Independence

The events A and B are statistically independent if Pr(A/B) = Pr(B) or Pr(B/A) = Pr(B) or

Pr(A∩B) = Pr(A) Pr(B)

Note: Mutually exclusive events cannot happen at the same time. Statistically independent events
can happen at the same time but they do not depend on each other.

d) Conditional Probability

Pr(A∩B)
Pr(A/B) =
Pr(B)

Pr(A∩B)
Pr(B/A) =
Pr(A)

e) Probability of Complementary Events

̅ and A are complementary events, then (A ∪ A


If A ̅ ) = . The following rule
̅ ) = S and (A ∩ A
applies:

̅ ) + Pr(A) = 1.0, Therefore, Pr(A


Pr(A ̅) = 1.0 ‒ Pr(A)

f) Theorem of Total Probability

Let B1, B2, ..., Bn be a mutually exclusive and exhaustive set of events partitioning the sample
space in such a way that the following properties are satisfied.

i) Bi  S i = 1,2, ..., n

ii) BiBj =  ij

Furthermore, it is assumed that Pr(Bi)  0. For any event, A in the sample space, S, (see Fig. 2.10)
n
Pr(A) =  Pr(B i ) Pr(A/Bi )
i =1

2-7
CE 204 (2022-23 Spring Semester) MSY

B1 B2 B3

Bk Bn

Figure 2.10 Partitioning of the sample space S and the event A

Proof:

It is observed in Fig. 2.10 that event A is the union of mutually exclusive events, i.e.

“B1A, B2A, ..., BnA”.

In other words,

A = (B1  A)  (B 2  A)  ...  (B n  A)
Applying the probability rule for the union of a mutually exclusive set of events,

Pr(A) = Pr(B1  A) + Pr(B 2  A) + ... + Pr(B n  A)


n
=  Pr (Bi  A)
i =1
Based on the probability rule for the intersection of events,

Pr(Bi ∩ A) = Pr(Bi ) Pr(A/Bi )

and, finally

n
Pr(A) =  Pr(B i ) Pr(A/Bi )
i =1

This result is known as the Theorem of Total Probability in Statistics and is widely used in Civil
Engineering applications. For example, the Logic Tree Method used in Probabilistic Seismic
Hazard Analysis (PSHA) is based on this theorem.

g) Bayes’ Theorem

Consider again the same mutually exclusive and exhaustive set of events B1, B2, ..., Bn described
above. Let A be any event in the sample space, S, such that Pr(A) ≠ 0. Then,

2-8
CE 204 (2022-23 Spring Semester) MSY

Pr(A/Bk ) Pr(Bk )
Pr(B k / A) = n
k = 1,2,..., n
 Pr(A / Bi ) Pr(Bi )
i =1
Proof:

Based on the conditional probability rule,


Pr(B k  A)
Pr(B k / A) =
Pr(A)
Pr(B k ) Pr(A/Bk )
= n

 Pr (B ) Pr(A/B )
i =1
i i

where, the term Pr(A) in the denominator is replaced by the corresponding expression given by
the theorem of total probability.

Bayes’ Theorem is quite important in Statistics and forms the basis for the popular Bayesian
Statistics. It is also widely used in Civil Engineering applications, especially in combining expert
opinion with information based on observed data.

When the result is known, Bayes’ Theorem helps us to assess consistently the probability of a
specific event creating the observed result among all of the candidate events. As can be seen,
Bayes' Theorem has the opposite approach, such as reaching the cause from the result. The events
B1, B2, ..., Bn can be considered as hypotheses. It has been accepted that these events cannot happen
at the same time and there is no other assumption other than these. Pr(Bk), is called the prior
probability of the event Bk, Pr(Bk/A), is called the posterior probability of the event Bk.
Pr(A/Bk) is the likelihood of event A, given that the Bk assumption is valid. Bayes' Theorem, in
the light of new data, updates systematically and consistently the prior probabilities, leading to the
posterior probabilities.

h) De Morgan’s Rule: P( A  B) = P( A B) and P (AB ) = P( A  B)

i) Permutation and Combination (Counting of Sample Points)

The solution to probability problems can sometimes be possible by counting points in the sample
space. In this respect, the permutation and combination rules may be used.

Permutation: The number of permutations created from n different objects is n! (n factorial). The
number of permutations created by taking an r number of objects from n different objects is as
follows:
n!
n Pr =
(n − r )!
In permutation, the order of the selected objects is considered.

2-9
CE 204 (2022-23 Spring Semester) MSY

Combination: In some cases, it is important how many different objects can be selected out of n
objects without looking at their order. These selections are called combinations. The number of
combinations created by taking an r number of objects from n different objects at a time is given
by the following equation.
n n
n Cr =   =
 
r r!(n − r )!

Note: n! = n(n-1) (n-2) ...1; 0! = 1; 1! = 1

2.3. SOLVED EXAMPLES


Example 2.1 (Adopted from Ang and Tang, 2007; Example 2.12)

The water supply for two cities C and D comes from the two sources A and B as shown in Fig.
2.11. Water is transported by pipelines consisting of branches 1, 2, 3 and 4. Assume that either one
of the two sources, by itself, is sufficient to supply the water for both cities.

Denote:

E1 = failure of branch 1
E2 = failure of branch 2
E3 = failure of branch 3
E4 = failure of branch 4
Failure of a pipe branch means there is serious leakage or rupture of the branch.

Figure 2.11 The water supply system (from Ang and Tang, 2007)

Shortage of water in city C would be represented by (E1∩ E2)∪E3, and its complement
̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
(𝐸1 ∩ 𝐸2 ) ∪ 𝐸3 means that there is no shortage of water in city C. Applying de Morgan’s rule, we
have,

2-10
CE 204 (2022-23 Spring Semester) MSY

̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ ̅1∪ E
(𝐸1 ∩ 𝐸2 ) ∪ 𝐸3 = ( E ̅2) ∩ E
̅3

The last event above means that there is no failure in branch 1 or branch 2 and also no failure
in branch 3.

Similarly, the shortage of water in city D would be the event (E1∩ E2)∪E3∪E4. Therefore, no
shortage of water in city D is:

̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
(𝐸1 ∩ 𝐸2 ) ∪ 𝐸3 ∪ 𝐸4 = (E ̅1 ∪ E
̅2 ) ∩ E
̅3 ∩ E
̅4

̅1 ∪ E
which means that there is sufficient supply at the station, i.e., (E ̅2 ) and there are no
̅3 ∩ E
failures in both branches 3 and 4, represented by (E ̅4 ).

Example 2.2 (Adopted from Ang and Tang, 2007; Example 2.19)
Consider the following chain system consisting of two links as shown in Fig. 2.12 subjected to a
force F=300kg.

Figure 2.12 A two-link chain system (from Ang and Tang, 2007)

If the fracture strength of a link is less than 300kg, it will fail by fracture. Suppose that the
probability of this happening to either of the two links is 0.05. The chain will fail if one or both
of the two links should fail by fracture. To determine the probability of failure of the chain, define:
E1 = fracture of link 1

E2 = fracture of Link 2

Then Pr(E1) = Pr(E2) = 0.05 and the probability of failure of the chain system is
Ρr(E1 ∪ E2) = Pr(E1) + Pr(E2) − Pr (E1 ∩ E2)
= 0.05 + 0.05 − Ρr(E2|E1) Pr(E1)

We observe that the solution requires the value of the conditional probability Ρr(E2/E1), which
is a function of mutual dependence between E1 and E2. If there is no dependence or they are
statistically independent, Ρr(E2/E1) = Ρr(E2) = 0.05. In this case, the probability of failure of the
chain system is:
Ρr(E1 ∪ E2) = 0.10 − 0.05 x 0.05 = 0.0975

2-11
CE 204 (2022-23 Spring Semester) MSY

On the other hand, if there is complete or total dependence between E1 and E 2, which means
that if one link fractures the other will also fracture, then Ρr(E2/E1) = 1.0. In such a case, the
probability of failure of the chain system becomes:
Ρr(E1 ∪ E2) = 0.10 − 0.05 x 1.0 = 0.05
In this latter case, we see that the failure probability of the chain system is the same as the failure
probability of a single link. Therefore, we can state that the probability of failure of the chain
system ranges between 0.05 and 0.0975.

Example 2.3

Two nuclear power plants A and B supply energy to a northeastern region of Japan. Normally,
plant A is functioning and it is replaced by plant B if it fails. The failed plant is immediately
repaired while the other one is functioning so that it can replace the other if it also fails. Assume
that the probabilities of failure of nuclear power plants A and B are 0.0001 and 0.0002,
respectively. Find the probability that plant A is functioning.

Solution:

Let A be the event that the power plant A will be functioning and let B be the event that the power
plant B will be functioning and E be the event that energy will be supplied to the region.

E = A ∪ A∩B ∩A) ∪ (A∩B ∩A∩B∩A) ∪ (A∩B∩A∩B∩A∩B ∩A) ∪ …

Pr (E) = 0.9999 + 0.0001 x 0.0002 x 0.9999 + 0.0001 x 0.0002 x 0.0001 x 0.0002 x 0.9999 +….

1
=0.9999 ( ) = 0.99989998
1−0.0001 𝑥 0.0002

Note: c + cr2+cr3 +…+crn-1 is a geometric series and when r2 ≪ 1.0, the sum of an infinite geometric
series is: [c/ (1-r)].

Example 2.4

A certain steel factory produces steel bolts using three different machines labeled as A, B and C.
Machines A, B and C produce 40%, 25% and 35% of the daily production, respectively. At the
end of the day, all the bolts produced from these three machines are placed into the same storage
box.

a) If a bolt is selected randomly from the storage box what is the probability that it was produced
by Machine A? Machine B? Machine C?

2-12
CE 204 (2022-23 Spring Semester) MSY

b) Based on the previous production records it is estimated that machines A, B and C produce
defective bolts at a rate of 10%, 5% and 1%, respectively. What is the probability that the randomly
selected bolt is defective? Which probability rule is used?

c) If the bolt is defective, what is the probability that the bolt was produced by Machine A?

Solution:

a) Let A, B, C denote the events that the bolt was produced by Machines A, B and C, respectively
and D be the event that the bolt is defective. Then, using the relative frequency definition of
probability,

Pr(A) = 0.40, Pr(B) = 0.25 and Pr(C) = 0.35

These are the prior probabilities that will be updated based on the observed data and information
on the rate of defectives.

b) Based on the information on defective percentages, the likelihood of observing defective items
for each machine is given by the following conditional probabilities:

Pr(D/A) = 0.10, Pr(D/B) = 0.05 and Pr(D/C) = 0.01

The probability that the randomly selected bolt is defective is computed based on the theorem of
total probability as follows:

Pr(D) = Pr(D/A) x Pr(A) + Pr(D/B) x Pr(B) + Pr(D/C) x Pr(C)

= 0.10 x 0.40 + 0.05 x 0.25 + 0.01 x 0.35 = 0.056

c) The required probability is Pr(A/D) and is referred to as the posterior probability since it is
obtained by revising the prior probability after observing that the bolt is defective. The updating
will be done based on Bayes’ theorem as follows:

Pr(D/A) Pr(A)
Pr(A/D) =
Pr(D/A) Pr(A)+Pr(D/B) Pr(B)+Pr(D/C)Pr(C)

0.10 x 0.40 0.040


= = = 0.714
0.10x0.40+0.05x0.25+0.01 x 0.35 0.056

As expected, the prior probability of the defective bolt produced by Machine A is increased from
0.40 to 0.71. This is expected since Machine A has the highest daily production and also has the

2-13
CE 204 (2022-23 Spring Semester) MSY

highest defective rate. However, the quantification of this increase by engineering judgment may
yield different results, whereas Bayes’ theorem achieves this consistently and systematically.

The problem can be solved without using any probabilistic method as follows: Assume that the
daily production is 10000 bolts. At the end of the day, the storage box will be like this:

4000 A
2500 B
3500 C
10000
TOTAL
BOLTS

This also corresponds to the original sample space and Pr(A) = 4000/10000 = 0.40.
Knowing the defective production rate and that the bolt is defective, the revised sample space
will take the following form:

400 A
125 B
35 C
560
TOTAL
DEFECTIVES

Accordingly, Pr(A/D) = 400/560 = 0.714.

As observed, the Bayes Theorem updates the sample space and reduces it from 10000 bolts to
560 defective bolts consistent with the given information that the bolt is defective.

Example 2.5

The site of the new Corona Treatment hospital to be constructed in Istanbul is a moderately active
earthquake region. When a large magnitude earthquake occurs, the probability that the hospital
will experience structural damage (event D) is estimated to be 0.20. The probability of occurrence
of one or two large magnitude earthquakes in this region in one year is estimated as 0.15 and 0.10,
respectively, whereas the probability of occurrence of three or more earthquakes is negligible (i.e.
zero). Assume that the structural damages between earthquakes are statistically independent.

2-14
CE 204 (2022-23 Spring Semester) MSY

a) Obtain the probability mass function for the number of earthquakes, X, occurring at the site of
this hospital in one year and show it in a table.

b) What is the probability that there will be no structural damage in this hospital due to the
occurrence of earthquakes in this region during the next year?

c) What is the probability that there will be no structural damage in this hospital due to the
occurrence of earthquakes in this region in the next 5 years?

Solution:
a) The probability mass function for the number of earthquakes, X, occurring at the hospital site
in one year is:

pX(x=0) = 1- 0.15 - 0.10 = 0.75; pX(x=1) = 0.15; pX(x=2) = 0.10; pX(x≥3) = 0


Expressed in tabular form:
Probability mass function for X
x p(x)
0 0.75
1 0.15
2 0.10
X≥3 0

b) Pr(Damage) = Pr(D/x=0)*Pr(x=0) + Pr(D/x=1)*Pr(x=1) + Pr(D/x=2)*Pr(x=2)


= 0*0.75+0.2*0.15 + [0.2*0.2+0.2*0.8+0.8*0.2]*0.10
= 0 + 0.03 + 0.036 = 0.066
Pr(No Damage) =1– Pr(Damage) =1– 0.066 = 0.934
Note: If no earthquake occurs, the probability of structural damage due to earthquakes is 0.

Alternative way:

̅ /x=0)*Pr(x=0) + Pr(D
Pr(No Damage) = Pr(D ̅ /x=1)*Pr(x=1) + Pr(D
̅ /x=2)*Pr(x=2)

= 1*0.75 + 0.8*0.15 + 0.8*0.8*0.10

= 0.75 + 0.12 + 0.064

Pr(No Damage) = 0.934

c) Assume that the structural damage is statistically independent from year to year. Then:

Pr(No Damage in 5 years) = 0.9345 = 0.711

2-15
CE 204 (2022-23 Spring Semester) MSY

Chapter 3
RANDOM VARIABLES

3.1. THE RANDOM VARIABLE CONCEPT

The concepts of "event", "sample space" and "probability" were described in detail in the
previous chapter. This chapter will first describe the concept of a random variable, and then
show the probability distributions related to them and parameters that summarize the properties
of these distributions.

The set displaying all possible results of an experiment is called sample space. Generally, it is
more important to state the results of the experiment numerically than a detailed description of
each of these test results. The results of the experiment can sometimes be expressed directly
numerically, for example, earthquake magnitude, the length in mm of a screw selected
randomly from the daily production of a machine, etc. Sometimes the results of the experiment
are expressed in a non-numeric way, for example, the result of a coin toss: heads or tails,
whether it will rain on a given day or not, etc. In the second case, it is possible to give a different
number to each simple event (i.e. sample point) in the sample space.

In an experiment in which a fair coin was tossed three times, the sample space that shows the
results in the most detailed way is given in Fig. 3.1. If the actual aim is to record the number
of tails (T) in three tosses of the coin, then for each point in the sample space, one of the
numerical values 0, 1, 2 or 3 can be assigned.

HHH • X x
HHT • • 0
HTH • • 1
THH • • 2
HTT • • 3
THT • Reduced
Sample
TTH •
Space
TTT •

Original Detailed
Sample Space
.
Figure 3.1 Sample spaces for random variable X representing the number of tails in 3 tosses
of a fair coin (H = Heads; T = Tails)

The numbers 1, 2 and 3 are random values determined from the result of the experiment. In
other words, these numbers are the values that the random variable X receives. In this example,
X represents the number of tails in three tosses of a fair coin.

3-1
CE 204 (2022-23 Spring Semester) MSY

Definition: Random variable X, is a function that assigns each simple event in the sample
space a real numerical value, and its domain is only the sample space. Random variables are
indicated by uppercase letters such as X, Y and Z and their values are in lowercase letters such
as x, y and z.

Example 3.1 – Two balls in a row are drawn from a bag containing three red (R) and two white
(W) balls. Possible outcomes and the corresponding values of the random variable Y, where
Y = number of red balls, are given in Table 3.1.

Table 3.1 The values that the random variable Y defined in Example 3.1 will attain
Simple event Y
WW 0
WR 1
RW 1
RR 2

If there is a limited (finite) number of sample points in the sample space, this sample space is
called a discrete sample space. The random variable defined on the discrete sample space is
called a discrete random variable.

Definition: If a random variable X can only receive a specified limited (or countable infinite
number) of values, then X is called a discrete random variable. On the other hand, in the
case where there are an infinite number of sample points in the sample space (such as points
on a line segment), the sample space is called a continuous sample space, and the random
variable defined on that sample space is called a continuous random variable.

In the application problems encountered, continuous random variables represent measurable


data, such as: weight, length, earthquake magnitude and heat; and discrete random variables
represent countable data, such as annual traffic accidents and the annual number of
earthquakes.

3.2. DISCRETE PROBABILITY DISTRIBUTIONS

A discrete random variable X will attain each value with a specified probability. If X attains
only the values denoted by x1, x2, ... , xn, the following two conditions must be satisfied:

i) pX (xi ) = Pr( X = x i )  0 i = 1, 2, ..., n

ii) ∑𝑛𝑖=1 pX (xi ) = ∑𝑛𝑖=1 Pr(X = xi ) = 1.0 i = 1,2,…,n

In an experiment where a coin is tossed three times, if X shows the number of tails, X will
attain its various values with the probabilities given in Table 3.2. As seen, the above two
conditions are both satisfied.

Table 3.2 Probability distribution of the number of tails (x) in three tosses of a fair coin
x 0 1 2 3
Pr(X=x) 1/8 3/8 3/8 1/8
3-2
CE 204 (2022-23 Spring Semester) MSY

Often it may be appropriate to show the probability distributions of random variables with
equations. If we symbolize this equation by pX (x), then it can be written as pX (x) = Pr(X=x).
For example, pX (2) = Pr(X=2). pX (x) is called as: (discrete) probability function (or
probability mass function) of X.

Definition: An equation or a chart or a table that shows all the values that a discrete random
variable can take and their corresponding probabilities is referred to as a discrete probability
distribution.

Example 3.2 – Find the probability distribution of the sum of the number to be obtained in two
rolls of a die.

Solution: Let X be a random variable that shows the sum of the numbers displayed in two rolls
of a fair die. The value of X, denoted by x can be any number between 2 and 12. Two dice can
come in 6x6 = 36 different ways, and the probability of each is 1/36. For example, Pr(x=4) =
3/36, since this sum (i.e. x = 4) can be obtained in three different ways: (1,3), (3,1) ve (2,2) .
The desired probability distribution is given in Table 3.3.

Table 3.3 Discrete probability distribution obtained for Example 3.2


x 2 3 4 5 6 7 8 9 10 11 12
Pr(X=x) 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36

It is useful to show the probability distribution in a graphic form. The probability distribution
given in Table 3.2 is shown graphically in Fig. 3.2. The probability of each value is expressed
by the height of the corresponding bar. This graphical representation of the probability
distribution is called a bar diagram or bar chart.

pX (x)

3/8
2/8
1/8

0 1 2 3 x
Figure 3.2 Bar diagram (chart)

3.3. CONTINUOUS PROBABILITY DISTRIBUTIONS

For a continuous random variable, the probability of attaining any single value is zero. Since
there will be an infinite number of points in the sample space of the continuous random
variable, the probability of selecting any one of these points is 1/∞ = 0. In this case, the
probability distribution of the continuous random variable cannot be shown by a table, but an
equation will be used. The probability distribution of a continuous random variable X will be
represented by fX (x) and will be called the probability density function (pdf). Now that X is
defined on a continuous sample space, the diagram of fX (x) will be continuous, for example, as
shown in Fig. 3.3.
3-3
CE 204 (2022-23 Spring Semester) MSY

fX (x) fX (x)

x x
(a) (b)
fX (x) fX (x)

x (d) x
(c)
fX (x) fX (x)

x x
(e) (f)

Figure 3.3 Typical probability density functions

Definition: If a function fX (x) complies with the following conditions, then it is called the
probability density function of the continuous random variable X.

(i) fX (x) ≥ 0

(ii) the total area under the fX (x) curve and constrained by the x-axis is equal to one. Expressed
mathematically:

∫−∞ fX (x)dx = 1.0.

The probability of X attaining a value between a and b is equal to the area under the fX (x) curve
that is bounded by the x-axis, x=a, and x=b vertically. This area is shown as shaded in Fig. 3.4.

fX (x)

x
a b
Figure 3.4 The area (shaded) that corresponds to the Pr(a ≤x ≤ b) value

3-4
CE 204 (2022-23 Spring Semester) MSY

Since probabilities correspond to areas and since probabilities have positive values,
accordingly the entire density function will remain above the x-axis.

3.4. CUMULATIVE DISTRIBUTION FUNCTION

The probability distribution of a discrete random variable can be expressed by the discrete
probability function (or probability mass function) and the probability distribution of a
continuous random variable with the probability density function. There is another useful
method for specifying the probability distributions of discrete and continuous random
variables. This involves cumulative probabilities. The cumulative probability that the random
variable X attains a value equal to or less than a specified x value is expressed as follows:

FX (x) = Pr (X ≤ 𝑥) (3.1)

where, the function, FX (x), is called cumulative distribution function (CDF). A cumulative
distribution function must satisfy the following requirements.

i) 0 ≤ FX (x) ≤ 1.0
ii) If a  b, FX (a)  FX (b)
iii) FX (∞) = 1.0 and FX (−∞) = 0.

From a given probability mass function or probability density function, the cumulative
distribution function can be derived. For a given value of X = a, if X is a discrete random
variable

FX (a)= Pr(X ≤ 𝑎) = ∑(x≤a) pX (x) = ∑(x≤a) Pr (X = x) (3.2)

and if X is continuous
a
FX (a)= Pr(X ≤ 𝑎) = ∫−∞ fX (x)dx (3.3)

Example 3.3 – Let X be a discrete random variable that shows the number of heads in two
tosses of a fair coin. The probability mass function of X is as follows:

1/4 𝑓𝑜𝑟 𝑥 = 0 𝑜𝑟 𝑥 = 2
p(x) = { 1/2 𝑓𝑜𝑟 𝑥 = 1 }
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
As observed, X only gets the values 0, 1 and 2. For values of X equal to or greater than 2,
FX (x) = 1. Since X cannot be less than zero, for values of X less than zero, FX (x) = 0. The
corresponding cumulative distribution function is shown in Fig. 3.5.

3-5
CE 204 (2022-23 Spring Semester) MSY

FX (x)

1
3/4
1/2
1/4

0 1 2 x
Figure 3.5 Cumulative distribution function obtained for Example 3.3

This cumulative distribution function can also be expressed as follows:

0 x0
1
 0  x 1
4
F( x ) = 
3 1 x  2
4
1 x 2

The cumulative distribution functions of discrete random variables, an example of which is


shown in Fig. 3.5, are step functions and are continuous from the right.

3.5. MAIN NUMERICAL DESCRIPTORS OF A RANDOM VARIABLE

The probability distribution of each random variable X is fully described by its cumulative
distribution function (CDF) or by its probability mass function (pmf) if discrete or by
probability density function (pdf) if continuous. However sometimes, either because of a
lack of sufficient information or for simplicity we are satisfied with less information
provided by a number of numerical descriptors of the random variable. These are mainly
measures of central tendency, variability (dispersion, spread and uncertainty) and shape
(symmetry). In the following, we will consider the statistical parameters used to represent
these numerical descriptors summarizing the main characteristics of a random variable.

3.5.1 Measures of Central Tendency

The main measures of central tendency are the expected value (mean, average), median and
mode.

The expected value of a random variable is an important concept in theoretical statistics.

Definition: The expected value of a random variable, X, having a discrete distribution is


defined as follows:

E(X) = μX = ∑ni=1 xi Pr(X = xi ) = ∑ni=1 xi pX (xi ) (3.4)

The summation in Eq. 3.4 above covers all the different values that X can take and
3-6
CE 204 (2022-23 Spring Semester) MSY

∑ni=1 pX (xi )=1.0.

The expected value for a continuous random variable is:



E(X) = X = ∫−∞ x fX (x)dx, (3.5)
and

∫−∞ fX (x)dx = 1.0

The following theorems apply to the expected value.

Theorem 3.1 If g(x) is a function of X, then for discrete random variables,

E[g(x)] = ∑ni=1 g(xi ) pX (xi )

and for continuous random variables



E[g(x)] = ∫−∞ g(x) fX (x) dx

Theorem 3.2 If a and b are constants,

E (aX + b) = aE(X) + b

Corollary 1: If a = 0 then E(b) = b

Corollary 2: If b = 0 then E(aX) = a E(X)

Theorem 3.3 If X and Y are two random variables

E(XY) = E(X)  E(Y)

Theorem 3.4 If X and Y are two statistically independent random variables

E(XY) = E(X) E(Y)

Example 3.4 – If X is a random variable indicating the number observed when a fair die is
rolled, what is the expected value of X?

Solution: The numbers 1, 2, 3, 4, 5, and 6 will each occur with a probability of 1/6. Then, from
Eq. 3.4,
1 1 1 1 1 1
E(X) = 1x + 2x + 3x + 4x + 5x + 6x = 3.5
6 6 6 6 6 6

Example 3.5 – What is the expected value of the sum of numbers when a pair of dice is rolled?

Solution: The random variables X and Y are defined as follows:

X = the number observed in the first roll of the die


Y = the number observed in the second roll of the die

3-7
CE 204 (2022-23 Spring Semester) MSY

From Theorem 3.3

E(X+Y) = E(X) + E(Y)

From Example 3.4, E(X) = 3.5 and E(Y) = 3.5.

E(X+Y) = 3.5 + 3.5 = 7.

If the expected value of the product of the numbers when a pair of dice are tossed was desired,
then from Theorem 3.4 and since X and Y are statistically independent,

E(XY) = E(X) E(Y) = 3.5 x 3.5 = 12.25

The expected (average, mean) value is considered the most important measure of central
tendency and is considered the best point estimator for the design parameters whose exact
values are unknown in engineering applications. Another measure of central tendency that can
be preferred after the expected value is the median. Apart from these two measures of central
tendency, the mode is the third option. However, it is not preferred because it is based on a
single value.

Definition: The value of random variable X, for which the probabilities of getting values larger
and smaller than itself are equal, is called the median. If mX represents the median value, then
FX(mX) = 0.5.

Definition: The most likely value of a random variable is called modal value or mode for
short. (The root of this term is the word fashion). In other words, mode corresponds to the value
of the random variable, which occurs most often or has the greatest frequency or the highest
probability density of the probability density function.

3.5.2 Measure of Variability (Dispersion, Spread, Uncertainty)

The measure for the average value of a distribution is the expected value, which only
summarizes the central tendency of the distribution. Another important feature of the
distribution is its spread, dispersion, or variability around the expected value. This property of
the distribution is summarized by variance and standard deviation. In addition, the coefficient
of variation, defined as the ratio of standard deviation to expected value is also widely used.

Definition: The variance of the random variable X, having any type of distribution, is defined
as:

VAR(X) = σ2X = E(X − )2 (3.6)

VAR(X) or σ2X denotes the variance of X. Variance is a measure of the dispersion and spread
of the distribution. If X can only get the expected value, then σ2X equals zero. As the values
of X move away from each other and the expected value, the variance will grow.

Definition: The standard deviation of the random variable X, is denoted by σX and defined
as follows:
X = VAR (X) (3.7)
3-8
CE 204 (2022-23 Spring Semester) MSY

Theorem 3.5 The variance of the random variable X can be written as follows:

2X = E(X 2 ) −  2X (3.8)


Theorem 3.6 If X is a random variable and b is a constant, then

VAR (X + b) = 2X
Theorem 3.7 If X is a random variable and a is a constant, then
VAR(aX) = a2 σ2X
Theorem 3.8 If X and Y are two statistically independent random variables, then

VAR (X + Y) =  X 2 +  2Y

Corollary: If X and Y are two statistically independent random variables, then


VAR (X − Y) =  2X +  2Y

Example 3.6 – Compute the variance of X, if X is a random variable showing the number
obtained in a roll of a fair die.

Solution: From Example 3.4 X=3.5. Accordingly,

1 1 1 1 1 1 91
E(X 2 ) = 1x + 4x + 9x + 16x + 25x + 36x =
6 6 6 6 6 6 6

Employing Theorem 3.5,

91 35
 X2 = − 3.5 2 = = 2.92
6 12

Definition: The coefficient of variation (c.o.v.) of the random variable X is denoted by δX and
defined as follows:
σ
δX = μX (3.9)
X

Example 3.7 – If X represents the number observed in the first roll and Y represents the number
observed in the second roll of a fair die, what is the variance, standard deviation, and coefficient
of variation of (X+Y)?
35
Solution: From Example 3.6,  X2 =  Y2 =
12
Based on Theorem 3.8,
35 35 35
VAR (X + Y) =  X2 +  Y2 = + = = 5.83
12 12 6
σ(X+Y) = √VAR(X + Y) = √5.83 = 2.42

3-9
CE 204 (2022-23 Spring Semester) MSY

Using the value computed in Example 3.5 for E(X+Y) = 3.5 + 3.5 = 7, the c.o.v. of X is found
as:
2.42
δ(X+Y) = = 0.346
7

Example 3.8 – The probability density function of the continuous random variable, X is as
follows:
f(x) = c x3 0≤x≤5
=0 otherwise

a) Find the value of the coefficient, c.


b) Calculate the average value, median, mode, standard deviation and coefficient of variation
of X.
c) What is the probability that X is greater than 2.0, if X is known to be between 1.0 and 4.0?

Solution:
5
a) ∫0 cx 3 dx = 1.0 → c = 4/625 = 0.0064

5
b) E(X) = μX = ∫0 x (0.0064 x 3 ) dx = 4.0

If m denotes the median, then


m
Using the fact that ∫0 0.0064 x 3 dx = 0.5, we compute

4
m = √0.5 𝑥 625 = 4.205

mode = 5.0

σ2X = E(X2) − μ2X


5
E(X2) = ∫0 x 2 (0.0064 x 3 ) dx = 50/3 = 16.67

σ2X = 16.67 – 4.02 = 0.67

σX = √0.67 = 0.82

0.82
c.o.v. (X) = δX = = 0.205
4.0

4 (0.0064 x4 ) 4
Pr(2 < X < 4) ∫2 (0.0064 x3 ) dx {
4
}2
c) Pr(X > 2 | 1< X < 4) = = 4 = 4
0.0064 x 4
Pr(1< X < 4) ∫1 (0.0064 x3 ) dx { }1
4

256−16 240
= = = 0.941
256−1 255
3-10
CE 204 (2022-23 Spring Semester) MSY

d) Cumulative Distribution Function (CDF) - FX(x):


x x x 4 1
FX (x) = ∫0 fX (x)dx =∫0 fX (x)dx = ∫0 (625)x 3 dx = (625)x 4

𝟎 for x < 0
𝟏
𝐅𝐗 (𝐱) = { (𝟔𝟐𝟓)𝐱 𝟒 for 0 ≤ x < 5 }
𝟏 for x ≥ 5
𝟏 𝟏𝟔
Check: 𝐅𝐗 (𝟐) = (𝟔𝟐𝟓)𝟐𝟒 = (𝟔𝟐𝟓) = 0.0256

3.5.3 Measures of Shape

Besides the measures of central tendency and spread, two measures are available concerning
the shape of a distribution. Although the histogram provides a general view of the shape, it is
advisable to examine these two numerical measures of shape, which give more information.
These are the coefficients of skewness and kurtosis.

The skewness coefficient is an indicator of the degree and direction of skew, i.e. deviation
from horizontal symmetry. The measure of skewness or asymmetry is based on the third
central moment, 𝛍(𝟑)
𝐗 , defined for the discrete and continuous random variables, respectively,
as follows:
(3)
μX = E(X − )3 = ∑all xi(xi − X )3 pX (xi ) (3.10)

(3) ∞
μX = E(X − )3 = ∫−∞(x − X )3 fX (x)dx (3.11)

If the pdf or pmf of a random variable is symmetric about the mean value, 𝐗 , the third central
moment will be zero. If it is positive, then the distribution will be skewed in the positive
direction, i.e.towards the right, and will be called positively skewed. On the other hand, if it is
negative the skewness will be in the negative direction and the distribution will be called
negatively skewed. A dimensionless measure of skewness is the skewness coefficient, 𝛄1,
defined as:
(3)
E(X − )3 μX
γ1 = = (3.12)
σ3 σ3

The other measure related to the shape is the coefficient of kurtosis, which measures the
central peakedness (or flatness) of the distribution relative to the standard bell-shaped curve of
the normal distribution. Kurtosis is based on the fourth central moment, 𝛍(𝟒) 𝐗 , and is defined
for the discrete and continuous random variables, respectively, as follows:
(4)
μX = E(X − )4 = ∑all xi(xi − X )4 pX (xi ) (3.13)

(4) ∞
μX = E(X − )4 = ∫−∞(x − X )4 fX (x)dx (3.14)

3-11
CE 204 (2022-23 Spring Semester) MSY

A dimensionless measure of kurtosis is the coefficient of kurtosis, 𝛄2, defined as:


(4)
E(X − )4 μX
γ2 = = (3.15)
σ4 σ4

One main reason why we are interested in these two coefficients is that most of the inferences
in statistics require that the distribution be normal or approximately normal. For a normal
distribution, the coefficients of skewness and kurtosis are 0 and 3, respectively. Therefore, if
the distribution under consideration has values close to these, then it is possible to justify the
normality assumption. The coefficient of kurtosis (𝛄2) is interpreted as follows: If 𝛄2 = 3,
normal kurtosis (i.e. equal to that of the normal distribution); 𝛄2 > 3, more peaked than the
normal distribution; 𝛄2 < 3, flatter than the normal distribution.

3.5.4 Moments of Distributions

It is possible to make an analogy between the numerical descriptors of random variables and
moments in engineering mechanics. For this purpose, we define the nth general moment of X
(𝐧)
(𝐦𝐧𝐗 ) and the nth central moment of X (𝐗 ), respectively, as follows:

For discrete random variables,


(𝐧)
𝐦𝐗 = E(X n ) = ∑all xi(xin ) pX (xi )

(𝐧)
𝐗 = E(X − )n = ∑all xi(xi − X )n pX (xi )

For continuous random variables,


(𝐧) ∞
𝐦𝐗 = E(X n ) = ∫−∞ x n fX (x)dx

(𝐧) ∞
𝐗 = E(X − )n = ∫−∞(x − X )n fX (x)dx
(1)
For example, for n = 1, mX = E(X) = X .

(1) (2)
Also for n = 1, X = 0 and for n = 2, X = VAR(X) = σ2X .

As it will be observed from the definitions of moments given above, the mean value is
analogous to the centroidal distance and the variance to the moment of inertia of a unit area as
shown in Fig. 3.6. In this figure, an irregular-shaped unit area defined by the function y = f(x)
is considered. The centroidal distance, 𝑥0 of the unit area is:

3-12
CE 204 (2022-23 Spring Semester) MSY

Figure 3.6 An irregular-shaped unit area (adopted from Ang and Tang, 2007)


∫−∞ x fX (x)dx ∞
x0 = =∫−∞ x fX (x)dx = m(1)
X
Area

which is also the first general moment of the irregular-shaped unit area, which equals the mean
value, X . The moment of inertia of the area about the vertical axis through the centroid, IY is:

∞ (2)
IY = ∫−∞(x − x0 )2 fX (x)dx = X

which is also the second central moment of the irregular-shaped unit area that equals the
variance of X.

Example 3.9 – (Adopted from Ang and Tang, 2007)

The useful life, T of welding machines is assumed as a random variable having an exponential
probability distribution. The pdf and CDF of T are, respectively, as follows:

fT(t) = λe−λt and FT(t) = 1 – e−λt t≥0

As an illustrative example, the corresponding graphs for μT = 50 are shown, respectively, in


Figs. 3.7a and 3.7b.

(a) (b)
Figure 3.7 Exponential (a) pdf and (b) CDF of useful life T of a welding machine for μT = 50

3-13
CE 204 (2022-23 Spring Semester) MSY

Compute mean, median, mode, variance, standard deviation and coefficient of variation of T.

Solution:

The mean useful life of the welding machines is:



μT = E(T) = ∫0 t λe−λt dt

Performing the integration by parts, we obtain 𝛍T =1/ 𝛌.

Therefore, the parameter λ of the exponential distribution is the reciprocal of the mean value;
i.e., λ = 1/E(T).

In this case, the mode is zero, whereas the median life, m is obtained as follows:
m
∫0 λe−λt dt = 0.50

m = (- ln 0.50) / λ = 0.693/ 𝛌

Therefore,

m = 0.693 𝛍T

The variance of T is

VAR(T) = ∫0 (t − 1/ λ)2 λe−λt dt

Integration by parts yields

VAR(T) = 1/ 𝛌2

From which we obtain the standard deviation of T, as

σT = 1/ 𝛌 = 𝛍T

This shows that the c.o.v. of the exponential distribution is 100%.

Example 3.10 – (From Ang and Tang, 2007)

For the exponential distribution of the useful life of welding machines, T, of Example 3.9, the
mean useful life of the machines is µT. Then, the third central moment of the pdf is (using µ for
µT),

E(T – µ)3 = ∫0 (t − µ)3 (1/µ)/e−(t/µ) dt

= (1/µ) ∫0 (t 3 − 3𝑡 2 µ + 3t µ2 + µ3 ) e−(t/µ) dt

= µ3e- t/µ {- [(t/µ)3 + 3(t/µ)2 + 6(t/µ) + 6] +3[(t/µ)2+2(t/µ) + 2] – 3[(t/µ) + 1] + 1}∞


0

= 2µ3

3-14
CE 204 (2022-23 Spring Semester) MSY

We recall from Example 3.9 that the standard deviation of the exponential distribution is:
σT = µT. Therefore, the skewness coefficient of this distribution is:

𝛄𝟏 = 2µ3/ σ3 = = 2µ3/ µ3 = 2.0

3-15
CE 204 (2022-23 Spring Semester) MSY

EXERCISE PROBLEMS

3.1. Compute X, X and δX for the following discrete probability distribution of X.

x 2 3 8
p(x) = Pr(X=x) 1/4 1/2 1/4

3.2. The probability mass function of the random variable X is given as follows.

x −3 6 9
p(x) = Pr(X=x) 1/6 1/2 1/3

a) Compute the values of E(X) and E(X2).


b) Compute the value of E{(2X+1)2} by using the theorems related to the expected value.

3.3. The probability mass function of the discrete random variable X is given as follows.

x −2 1 2 4
p(x) = Pr(X=x) 1/4 1/8 1/2 1/8

Plot the cumulative distribution function, FX (x) of the random variable, X.

3.4. The probability density function of the random variable X is given as follows.

fX (x) = cx 0 x  1
=0 otherwise
a) What should be the value of c so that fX (x) is a proper probability density function?
b) Plot the function fX (x).
c) Compute E(X), median and mode of X.
d) Compute the variance, standard deviation and coefficient of variation of X.

3.5. Prove Theorem 3.7.

3.6. The random variable X takes the value of 5 with a probability of 0.30 and has a triangular
distribution in the range [0, 20] as shown in the following figure (Fig. 3.6).

a) What should be the value of the coefficient k so that X has a valid probability density
function? Write down the equation that specifies the probability distribution of X.
b) Find the mean, median and mode values of X.
c) Find the variance, standard deviation and coefficient of variation of X.
d) If it is known that X will not get a value greater than 15, what is the probability that X will
be greater than 12?

3-16
CE 204 (2022-23 Spring Semester) MSY

fx(x)

0.3

0 x
5 10 20

Figure 3.6 Problem 3.6

3.7. The probability density function of the total load on a roof, denoted by S, is as follows:

c
fS(s) = 3 tons ≤ s ≤ 6 tons
s3
=0 otherwise

a) What should be the value of the coefficient, c so that the given relationship corresponds to
a valid probability density function?
b) What is the expected value and median of the total load S?
c) Find the variance and coefficient of variation of the total load S.
d) If it is known that the roof can carry a maximum load of 5.5 tons, what is the probability of
collapse?

3-17
CE 204 (2022-23 Spring Semester) MSY

Chapter 4
SOME IMPORTANT PROBABILITY DISTRIBUTIONS

4.1. NORMAL (GAUSSIAN) DISTRIBUTION

The most important continuous probability distribution in statistics and also in civil engineering
applications is the "normal distribution". The diagram of this distribution is a bell-shaped
curve (see Figs. 4.1a, b, c). The normal distribution is symmetrical and the mean, median, and
mode are equal to the same value. It has a wide range of applications in terms of describing
the distribution of many populations in nature. The equation of the normal distribution curve
was first derived by DeMoivre in 1733. Later, Gauss (1777-1855) obtained this distribution
function after a study that investigated errors in a repeated measurement. For this reason, the
normal distribution is sometimes called the Gaussian distribution.

Figure 4.1 Normal (Gaussian) probability density function


(a) with varying σ values, (b) with varying  values (after Ang and Tang, 2007)

A random variable, X with a normal distribution is called a normal random variable. The
function that gives the probability distribution of the normal random variable depends on only
two parameters, X (−∞ ≤ X ≤ ∞) and  (σ2X > 0). Accordingly, the probability density
function of X, will be shown as N(x; X ,σX ).

 x

Figure 4.1c Normal distribution (Gaussian distribution)

Definition: For a normal random variable, X, with mean  and variance 2, the equation of
the normal distribution curve is as follows:
4-1
CE 204 (2022-23 Spring Semester) MSY

x - 2
1 -1/2 ( )
f ( x ) = N ( x ; ,  ) = e  -X  (4.1)
2 

where, π = 3.141 and e = 2.718. The normal curve is fully defined when the values of  and 
are given. Theoretically, X gets any value between − and +. However, as shown in
Fig. 4.1c, the normal curve approaches the x-axis asymptotically but does not cut the x-axis.
Because the values in the tail sections are very close to zero, the range (spread) of X is often
taken as   3 in practice, which covers 99.7% of the total area. The two normal distributions
shown in Fig. 4.2a have the same mean value. However, the distribution with a large variance
is more compressed and spread. The two distributions in Fig. 4.2b are formally identical, but
because their mean values are different, they are located at different positions on the x-axis.

1

2

1= 2 x

Figure 4.2a Comparison of two normal distributions with the same mean value but different
standard deviations (1 = 2 and 2 > 1)

1 2

1  x

Figure 4.2b Comparison of two normal distributions with equal standard deviations but
different mean values (1  2 and  1= 2)

4.2. AREAS UNDER A NORMAL DISTRIBUTION CURVE

For the normal distribution displayed in Fig. 4.3, Pr(x1<X<x2) is shown with the shaded area.
For different values of  and , different normal distributions will result. Integration of the
normal curve equation to compute the areas under this curve and hence probabilities do not
yield a closed-form solution. This difficulty may be overcome by numerical integration, but
this is not a practical solution. For this reason, it is convenient to use tables. It will not be
appropriate to prepare a table for each normal distribution. Therefore, a single table has been
prepared that can be used for all normal distributions. This table applies to the standard
normal random variable Z, which has a normal distribution with  = 0 and  = 1.

4-2
CE 204 (2022-23 Spring Semester) MSY

x1  x2 x

Figure 4.3 Pr(x1 < X < x2) = area of the shaded section

Any normal random variable X can be converted to the standard normal random variable, Z,
by utilizing the following relationship.

X − x
Z= (4.2)
x

As shown below the mean value and the variance of Z are equal to zero and 1, respectively.

E( Z) =
1
E(X −  x ) =
1
E(X) −  X  = 1 ( X −  X ) = 0
x X X
and
 X − X 
 = 2 VAR (X) + VAR (  X ) = 2  X = 1 .
1 1 2
 Z2 = VAR 
 X  X X

Definition: The distribution of a normal random variable with a mean value of 0 and a standard
deviation of 1 is called a standard normal distribution.

When the random variable X is between X=x1 and X=x2, the random variable Z will fall
between z1=(x1-)/ and z2=(x2-)/ values. This is illustrated in Fig. 4.4. Thus, the area below
the X curve and between the x=x1 and x=x2 lines will be equal to the area below the Z curve
and between the z=z1 and z=z2 lines and
Pr( x1  X  x 2 ) = Pr(z1  Z  z 2 )

=1

x1 x2  x z1 z2 μ=0 z

Figure 4.4 The corresponding equivalent areas under the normal distribution curves of
random variables X and Z

In Figs. 4.5a, b, c and d, the CDF and pdf’s of standard normal distribution, with areas covering,
1, 2 and 3 standard deviations are shown, respectively.
4-3
CE 204 (2022-23 Spring Semester) MSY

(a)

(b)

(c)

(d)
Figure 4.5 (a) The CDF of the standard normal distribution and (b), (c), (d) pdf’s of a
standard normal distribution with areas covering, 1, 2 and 3 standard deviations
(Adopted from Ang and Tang, 2007)

Example 4.1 (a) – For a normal population with X = 50 and X = 10, calculate the z1 and z2
values that will satisfy the following equality.

Pr(45 < X < 62) = Pr(z1 < Z <z2)


4-4
CE 204 (2022-23 Spring Semester) MSY

Solution: From Z transformation:

45−50 62−50
z1 = = ‒ 0.5 and z2 = = 1.2
10 10

Accordingly, Pr(45 < X < 62) = Pr(− 0.5 < X < 1.2).

The tables required for normal distributions are thus reduced to the table prepared for a single
standard normal distribution. Table 4.1 at the end of this section lists the areas that are below
the standard normal curve and correspond to Pr(Z<z). In this table, the Z value varies between
0 and 4.0. The following examples show how to use this table.

Example 4.1 (b) – In Example 4.1(a) it was shown that Pr (45<X<62) = Pr(− 0.5<Z<1.2). Now
the corresponding probability will be computed as follows:

Pr(45 < X < 62) = Pr(− 0.5 < Z < 1.2)


= Pr(Z < 1.2) − Pr(Z < − 0.5)
= 0.8849 – 0.3085
= 0.5764

- 0.5 0 1.2 z

Figure 4.6 The required area in Example 4.1 (b)

Example 4.2 – Car tires produced by a factory last an average of 2 years and show a standard
deviation of 0.5 years. Assuming that the lifetime of these tires is normal, what is the
probability that a purchased tire will wear out before 1.5 years?

Solution: First, a figure is drawn (Fig. 4.7) and the desired area is marked. To find Pr(X<1.5),
the area to the left of x=1.5 must be calculated. This area is equal to the area to the left of the z
value, which is the equivalent of x=1.5. This z value is:
1.5 − 2
z= = −1.0
0.5
From Table 4.1, Pr(X<1.5)=Pr(Z<−1.0) = 0.1587.

4-5
CE 204 (2022-23 Spring Semester) MSY

=

1.5  =  x

Figure 4.7 The required area in Example 4.2

Example 4.3 − The average value of points taken in a statistics examination was 70 and the
standard deviation was 8. If 10% of the class is given the grade A, what is the smallest point
that is enough to get an A? Points taken in the examination will be assumed to show a normal
distribution.

Solution: In the previous examples the z values corresponding to the x values were found and
then the desired areas were obtained from Table 4.1. In this example, the opposite will be done.
The z value will be found from the given area (or probability) and using this z value, x will be
calculated from x=+z. In Fig. 4.8, the area of 0.10 is shown as shaded.

=

0.10
 x

Figure 4.8 The area to be considered in Example 4.3

The desired z value should satisfy the requirement: Pr(Z > z) = 0.10 or equivalently Pr(Z < z)
= 0.90. From Table 4.1, it is observed that Pr(Z < 1.28) = 0.90, meaning that the desired z value
is 1.28. The smallest point, xA , required to get an A grade is calculated below.

xA =  + z
= 70 + 1.28 x 8 = 70 + 10.24
= 80.24

4.3. LOGARITHMIC-NORMAL (LOGNORMAL) DISTRIBUTION

Lognormal distribution, which is used as much as the normal distribution in engineering


applications, is similar to the normal distribution; but the most important difference is that it
does not permit negative values. In other words, the normal distribution is defined in the range
“− to +”, while the lognormal distribution is valid only within the range “0 to +”. The

4-6
CE 204 (2022-23 Spring Semester) MSY

logarithmic transformation (according to the e-base) of a lognormal random variable X will


create a normally distributed random variable, Y (i.e. Y= Ln X. Y is normal if X is lognormal).

Definition: If X is a random variable having a lognormal distribution, then the equation of the
probability density function of X is as follows:

ln x - 2
1 -1/2 ( )

f ( x ) = LN ( x; , ) = e 0X (4.3)
2  x

Here,  = 3.141 and e = 2.718.  = X = E (Ln X) and 2 = 2X = VAR (Ln X), respectively,
denote the mean value and the variance of Ln X. The shape of the lognormal distribution is
illustrated in Fig. 4.8 for the different values of its parameter, 𝜉.

Figure 4.9 The lognormal probability density function corresponding to different 𝜉 values
(adopted from Ang and Tang, 2007)

Since it is possible to convert a random variable with a lognormal distribution to a random


variable with a normal distribution by applying a logarithmic transformation, the standard
normal distribution table (Table 4.1) will also be used to compute the probabilities associated
with the lognormal distribution. For the conversion of a lognormal variate X to the equivalent
standard normal variate, the following relationship applies:

Ln X −  x
Z= (4.4)
x

4-7
CE 204 (2022-23 Spring Semester) MSY

The following relationships apply between the mean value, μX and the standard deviation, X
of the random variable, X and the mean value, X and standard deviation, X of the random
variable, LnX:

X = Ln μX − (1/2) 2X (4.5)

2X = Ln [1 + (σ2X / 2X )] = Ln (1 + 2X ) (4.6)

σX
Here, X = , and denotes the coefficient of variation of X. For small values of the coefficient
X
of variation ( 0.30), it is possible to assume, X ≅ X .

Often the median value is taken as the mean value of a lognormally distributed random variable,
since,  = Ln (median).

Example 4.4 – For a population having a lognormal distribution with, =50 and =10,
calculate Pr(45 < X <62).

Solution: Using the relationships given above, the parameters of the lognormal distribution are
calculated as shown below.

X ≅ X = σX / X = 10/50 = 0.20 (< 0.30)


X = Ln X − (1/2) 2X = Ln 50 − (1/2) 0.202 = 3.89

Then, from the z conversion,

Ln 45 − 3.89
z1 = = − 0.42
0.20
and
Ln 62 − 3.89
z2 = = 1.19
0.20

Therefore,

Pr(45 < X < 62) = Pr(− 0.42 < Z < 1.19)

Using Table 4.1 the desired probability value is calculated as shown below.

Pr(45 < X < 62) = Pr(− 0.42 < Z < 1.19)


= Pr(Z < 1.19) – Pr(Z < – 0.42)
= 0.8830 – 0.3372
= 0.5458

Example 4.5 – The efficiency (E) of a company producing construction materials is estimated
based on the following relationship:

4-8
CE 204 (2022-23 Spring Semester) MSY

Y
E= √eT eS/9
M

where,

Y = the fatigue life of the heavy machinery used,


S = the weekly working times of the workers,
T = the past experience periods of the engineers controlling the production, and
M =the unit costs of the materials used.

Y and M are lognormally distributed random variables with median values of 5000 hours and
250 TL, respectively and coefficients of variation of 0.20 and 0.15, respectively. T and S are
normally distributed random variables with mean values of 6 years and 45 hours, respectively,
and standard deviations of 2 years and 4.5 hours, respectively. T and S are dependent variables,
the coefficient that reflects the correlation between them is ρT,S = 0.75. All other variables are
independent of each other.

a) Calculate the expected value and coefficient of variation of E.


b) Find the probability distribution of E.
c) Calculate the probability that the efficiency is greater than 90.

Solution:

a) The parameters of the lognormally distributed random variables Y and M are as follows:

ξY ≅ 0.2 and ξM ≅ 0.15; λY = ln 5000 = 8.517 and λM = ln 250 = 5.522

If we take the logarithm of both sides of the equation given for efficiency,

ln E = ln Y – ln M + 0.5 (T + S/9)

This is a linear equation and the expected value and variance of E is calculated as follows:

E (ln E) = E (ln Y) – E (ln M) + 0.5 [E (T) + E (S/9)]


λE = λY – λM + 0.5 [μT + μS/9] = 8.517 – 5.522 + 0.5 [6 + 45/9] = 8.495

VAR (ln E) = VAR (ln Y) + VAR (ln M) + 0.25 [VAR (T) + (1/81) VAR (S)]
+ 2 x 0.52 x (1/9) x ρ x σT x σS

ξ2E = ξ2Y + ξM2 + 0.25 [σ2T + (1/81) σ2S ] + 2 x 0.52 x (1/9) x ρ x σT x σS

ξ2E = 0.22 + 0.152 + 0.25 [22 + (1/81) 4.52] + 2 x 0.25 x (1/9) x 0.75 x 2 x 4.5

= 0.04 + 0.0225 + 1.0625 + 0.375 = 1.5


ξE = 1.225

b) Since Y and M have lognormal distributions, ln Y and ln M will be normally distributed. T


and S are given as normally distributed. Therefore, ln E, which is a linear function of these four
normally distributed random variables, will be normal and E will be lognormally distributed.
4-9
CE 204 (2022-23 Spring Semester) MSY

c) According to the results obtained in parts a and b, E is a lognormally distributed random


variable with parameters λE = 8.495 and ξE = 1.225. Accordingly,

ln 90 −8.495 4.500 − 8.495


Pr (E > 90) = Pr (ln E > ln 90) = Pr (z > ) = Pr (z > )
1.225 1.225
− 3.995
= Pr (z > ) = Pr (z > − 3.26) = 0.99944
1.225

Example 4.6 − The safety factor, F, for a building element, is defined as follows:

R
F=
S

Here,

R = the strength (carrying capacity) of the building element


S = the load that the building element is exposed to.

R is a random variable with a mean value of μR = 40 kN and a coefficient of variation of


δR = 0.15. The random variable, S has an average value of μS = 20 kN and a coefficient of
variation of δS = 0.25. R and S are statistically independent and both are lognormally
distributed.

a) Find the distribution parameters (λR, ξR and λS, ξS) of the lognormal random variables R and
S.
b) Since R and S are random variables, F will also be a random variable. Accordingly, obtain
the probability distribution of F and state the name of this distribution. At the same time, find
the mean value and coefficient of variation of F.
c) If the safety factor, F, is greater than 3.0, the structure is considered to be "safe". What is the
probability that the structure will be rated as safe?

Solution:

a) ξR ≅ δR = 0.15

λR = ln 40 – 0.5 x 0.152 = 3.689 – 0.01125 = 3.678

ξS ≅ δS = 0.25

λS = ln 20 – 0.5 x 0.252 = 2.996 – 0.03125 = 2.964

b) F = R/S

ln F = ln R – ln S

Since R and S are lognormally distributed, ln R and ln S will be normally distributed and ln F,
which is a linear function of these two variables, will also be normally distributed. If ln F is
normal, then F will have a lognormal distribution.

4-10
CE 204 (2022-23 Spring Semester) MSY

E(ln F) = λF = E(ln R) – E(ln S) = λR – λS = 3.678 – 2.964 = 0.714

VAR (ln F) = ξ2F = VAR (ln R) + VAR (ln S) = ξ2R + ξ2S = 0.152 + 0.252 = 0.085

ξF = √0.085 = 0.2915 ≅ 0.292

δF ≅ ξF ≅ 0.292

λF = ln μF – 0.5 x ξ2F

ln μF = λF + 0.5 x ξ2F = 0.714 + 0.5 x 0.2922 = 0.7566

μF = e0.7566 = 2.13

ln F−λF ln 3.0−0.714 1.0986−0.714


c) Pr (F > 3.0) = Pr (ln F > ln 3.0) = Pr ( > ) = Pr (𝑧 > )
ξF 0.292 0.292

Pr (𝑧 > 1.317) = 1 –  (1.317) = 1− 0.9066 = 0.0934

4.4. BINOMIAL DISTRIBUTION

The binomial distribution is used to compute the probability of achieving r number of


successes in a Bernoulli experiment repeated n times. A Bernoulli experiment shows the
following characteristics:

i) The experiment can only result in two ways: Success (S) or Failure (F);

ii) The probability of success, p, is the same in each experiment;

iii) Experiments are independent of each other.

If such an experiment is repeated n times, the probability of observing a specified order of


outcomes say, S, S, F, ... , F, S, …, S with r number of successes will be as follows:

p.p.(1-p)…(1-p).p...p = pr (1 ‒ p)n−r

Since in n trials, r number of successes will result in  n  = n! different ways and since
 r  (n − r )! r!
 
these events will be mutually exclusive, the required probability can be written as follows:

n!
B(r; n, p) = Pr (R = r; n, p) = p r (1 - p) n - r r = 0, 1, 2, … , n
(n - r)! r !

This distribution, which gives the probability of observing r number of successes in a Bernoulli
experiment repeated n times is called the binomial distribution.

4-11
CE 204 (2022-23 Spring Semester) MSY

n!
{ } is called the Binomial coefficient, and the failure probability, “1 – p” is generally
(n − r )! r!
shown by “q”.

Definition: In a Bernoulli experiment with the probability of success p, probability of failure


q=1-p and repeated n times, the probability distribution of the binomial random variable R,
showing the number of successes, is as follows:

n
B (r; n, p) =   p r q n -r r = 0, 1, 2, ..., n (4.7)
r 

Example 4.7 – A soldier hits the target in 75% of his shots. What is the probability that he will
not be able to hit the target at least three times in his next five shots?

Solution: If we consider it a success not to hit the target in this problem, then p = 0.25, and the
desired probability is computed as follows:

Pr (R≥ 3) = Pr(R=3) + Pr(R=4) + Pr(R=5)


5!
Pr(R = 3) = 0.253 x 0.752 = 0.088
(5 − 3)! 3!

In a similar way, Pr (R=4) = 0.0146 and Pr (R=5) = 0.00098. Thus,

Pr (R≥ 3) = 0.08800 + 0.01460 + 0.00098 = 0.10358 ≅ 0.1036

Example 4.8 – If a patient has a 0.90 chance of recovery as a result of heart surgery, what is
the probability of recovery of only 5 out of the 7 patients who will undergo heart surgery?

Solution:
7!
Pr(R = 5) = 0.905 x 0.102 = 0.124
(7 − 5)! − 5!
Theorem 4.1 The expected value and variance of the binomial distribution are as follows:

E(R) = np (4.8)

VAR (R) = np(1– p) = npq (4.9)

Proof: According to the definition of the expected value for a discrete random variable (see
Section 3.5)
n
E(R ) =  r B(r; n, p)
r =0
n n!
= r p r (1 - p) n -r
r =0 (n − r )! r!

Since for r = 0, the value of the corresponding term will be zero,

4-12
CE 204 (2022-23 Spring Semester) MSY

n n (n − 1)!
E(R ) = r p p r -1 (1 - p) n -r
r =1 (n − r )! r (r - 1)!
n (n − 1)!
= np  p r -1 (1 - p) n -r
r =1 (n − r )! (r - 1)!

Let, n – 1 = n′ and r – 1 = r′,


n'
n'
E ( R ) = np p r' (1 - p) n' -r'
0 (n '− r ' )! r'!
n'
= np  B(r ' ; n' , p)
0
n'
Since  B(r'; n' , p) = 1
0

E(R) = R = np

According to Theorem 3.5

VAR (R) = E(R2) -  2R = E(R 2 ) − n 2 p 2

From Theorem 3.1


n
n!
E(R 2 ) =  r 2 P r -1 (1 - p) n -r
r =0 (n − r )! r!
n
(n − 1)!
= r
r =1
2

(n − r )! (r - 1)! r
p p r -1 (1 - p) n -r
n
(n − 1)!
= np  r p r -1 (1 - p) n -r
r =1 (n − r )! (r - 1)!

Let, n – 1 = n′ ve r – 1 = r′
n' n '!
E(R 2 ) = np  (r '+1) P r' (1 - p) n'-r
r '= 0 (n '−r ' )! r'!
 n n' 
= np   r ' B(r' ; n' , p) +  B(r ' ; n' , p)
r'= 0 r'= 0 
= np n' p + 1 = np(n - 1) p + 1
= n 2 p 2 - np 2 + np

VAR (R ) = n 2 p 2 - np 2 + np - n 2 p 2 = np - np 2
= np(1 - p) = npq

Example 4.9 –What is the expected value and variance of the number of patients who will
recover among the 7 patients in Example 4.8?

Solution:

n=7, p=0.90 and q=1− 0.90 = 0.10; From Theorem 4.1,

E(R) = np = 7x0.90 = 6.3 and VAR(R) = npq = 7x0.90x0.10 = 0.63


4-13
CE 204 (2022-23 Spring Semester) MSY

Example 4.10 – If the number of experiments (trials) is fixed, what is the p value that
maximizes the variance of the Binomial distribution?

Solution: Based on Theorem 4.1 it is possible to write:

VAR (R) = npq = np(1-p) = np – np2 = f(p)

To find the p value that maximizes the variance, the derivative of the f(p) function relative to
p is taken and set equal to zero as shown below:

d f (p) 1
= n − 2np = 0 p=
dp 2

4.5. GEOMETRIC DISTRIBUTION

In a Bernoulli experiment, the probability distribution of the random variable describing the
number of trials repeated until the first occurrence of a specified event, N1, is called the
geometric probability distribution and is expressed as follows:

pN1 = Pr(N1= n) = p qn−1 n = 1, 2, … (4.10)

The expected (average) value of N1 and its variance are, respectively,

E(N1) = μN1 = 1/p (4.11)

VAR(N1) = σ2N1 = q/p2 (4.12)

The average number of trials up to the first occurrence, μN1 is also called the mean recurrence
number (duration) or the mean return period (interval).

Example 4.11 – A building is designed for a 500-year earthquake. What are the chances of an
earthquake of this magnitude hitting the building for the first time within ten years after the
building was constructed?

Solution: What is meant by the 500-year earthquake is simply an earthquake having a


magnitude of a size that is repeated on the average every 500 years. The mean recurrence
interval of a 500-year earthquake is 500 years. Accordingly and by using Equation 4.11, the
probability of an earthquake of this magnitude occurring in any given year is p = 1/500 = 0.002.

Pr(N1 = 10) = 0.002 x (0.998)10−1 = 0.00196

4.6. NEGATIVE BINOMIAL DISTRIBUTION

In a Bernoulli experiment, the probability distribution of the random variable describing the
number of trials repeated until the kth occurrence of a specified event, Nk, is called the negative
binomial probability distribution and is expressed as follows:

4-14
CE 204 (2022-23 Spring Semester) MSY

n −1 (n−1)!
pNk = Pr(Nk = n) = ( ) pk qn−k = pk qn−k n = k, k+1, …
k−1 (n−k)!(k−1)!
(4.13)
=0 n<k

The expected (average) value and variance of Nk are as follows, respectively:

E(Nk) = μNk = k/p (4.14)

VAR(Nk) = σ2Nk = k(1−p)/p2 = kq/p2 (4.15)

Example 4.12 – According to the information provided in Example 4.11, what is the
probability of exposure of this building to the second 500-year earthquake in the 10th year after
its construction?

Solution: Based on Eq. 4.13, with p = 1/500 = 0.002, n = 10 and k = 2,

(10−1)!
Pr(N2 = 10) = 0.0022 (0.998)10−2 = 9 x 4x10−6 x 0.984 = 3.543x10−5
(10−2)!(2−1)!

4.7. POISSON DISTRIBUTION

Before defining the Poisson distribution, it is worth explaining the Poisson process. In a
Poisson process, the following properties are assumed to be satisfied:

i) The probability of an event occurring over a short period of time, (t; t+Δt), is approximately
equal to (xΔt).  is the average number of events per unit time (Stationarity);
ii) The number of events in any time period is independent of the number of events in other
time intervals (Independence);
iii) The probability of more than one event occurring in a short period of time is negligible
compared to the probability of occurrence of one or zero events (Nonmultiplicity).

The above-mentioned assumptions are given for the time-dependent Poisson processes (e.g.
the number of earthquakes over a certain period of time, the number of planes landing at an
airport, the number of accidents at an intersection). However, the same assumptions apply to
Poisson processes related to space (e.g. the number of particles in a certain volume, the number
of errors on a page, the number of earthquakes in a region).

Definition: The probability distribution of a Poisson random variable X, which gives the
likelihood of the number of events that will occur within a certain time period or a certain space
interval (or region), is as follows:

e -  x
P( x ; ) = Pr(X = x ) = x = 0, 1, 2, ... (4.16)
x!

4-15
CE 204 (2022-23 Spring Semester) MSY

Here,  = the average number of events that occur here during a specified time period (or in a
specified region). If the specified time (or space) interval is denoted by t and the average
number of events that occur within unit time (or space) is , then =t and the distribution is
written as follows:
e-t (t)x
P ( x ; , t ) = t≥0 x = 0, 1, 2,… (4.17)
x!

Example 4.13 – A secretary makes an average of 2 mistakes on a page. On the page, she is
currently typing,

a) What is the probability that she makes more than two mistakes?
b) What is the probability she makes no mistakes?

Solution:

a)  = 2 mistakes/page; t = 1 page; Pr(X > 2) = ?

Pr(X > 2) = 1 – Pr(X ≤ 2)

Pr(X  2) = Pr(X=0) + Pr(X=1) + Pr(X=2)

e −2 2 o e −2 2 e −2 2 2
Pr(X  2) = + + = 0.677
0! 1! 2!

Pr(X > 2) = 1 – 0.677 = 0.323


−2 o
b) Pr(X = 0) = e 2 = 0.1353
0!

Example 4.14 – There are an average of 3/7 traffic accidents per day at an intersection.
Compute the probability of occurrence of five accidents at this intersection in a week?

Solution:

 = 3/7 accidents/day; t = 7 days; Pr(X = 5) = ?

3
e − 3 / 7 x 7 ( x 7) 5
7 e −3 35
Pr(X = 5) = = = 0.1008
5! 5!

Theorem 4.2 The expected value and variance of a Poisson distribution are equal to , which
is the parameter of the distribution. That is, E(X) = VAR(X) = .

Proof: a) According to the definition given for the expected value of discrete random variables
(See Section 3.5)

e -  x
E(X) =  x
x =0 x!
4-16
CE 204 (2022-23 Spring Semester) MSY

Since for x = 0 the corresponding term will be zero,

 e -  x −1
E( X) =   x
x =1 x ( x − 1)!

if we set x−1 = x′, then

 e -  x '
E(X) =  
x '= 0 x '!


Since,  P(x'; ) = 1 → E(X) = μX = 
x '= 0

b) VAR(X) =  (see Exercise Problem 4.9).

Example 4.15 − An offshore platform will be constructed to withstand the forces created by
ocean waves.

a) The maximum annual wave height (relative to average sea level) is a Gaussian random
variable with an average value of 4.0 m and a coefficient of variation of 0.80. What is the
probability that wave height exceeds 6.0 m?

Solution:

a) If we denote the maximum annual wave height by H then, H: N (4; 3.2) m.

6.0 – 4.0
Pr (H > 6) = 1 –  ( ) = 1 –  (0.625) = 1− 0.734 = 0.266
3.2

b) If the platform is to be designed according to a wave height (above average sea level) where
there is an 80% chance that the waves will not overtop, in a 3-year period, how many meters
above the average sea level should this height be? Over the years, it has been assumed that the
events exceeding the design height of wave heights are statistically independent of each other.

Solution:

The design criterion is defined as follows:

Pr (Not exceeding the design height in the three consecutive years) = 0.80 = (1-p)3
Here, p = Pr (exceeding design height within a specified year).

Therefore, p = 1− (0.8)1/3 = 1 – 0.9283 = 0.0717

h – 4.0
Accordingly,  ( ) = 0.9283
3.2
where, h = design wave height.

4-17
CE 204 (2022-23 Spring Semester) MSY

h = 3.2 -1(0.9283) + 4.0 = 3.2 x 1.46 + 4.0 = 8.67 m

c) It is assumed that ocean waves exceeding six meters occur according to a Poisson process,
and each of them has a 0.40 probability of damaging the platform. Based on these two
assumptions, what is the probability that the platform will experience damage in the next 3
years due to ocean waves? The annual events in which waves cause damage to the platform are
assumed to be statistically independent.

Solution:

If D is defined as an event in which ocean waves cause damage to the platform, then the
probability of a wave height that will cause damage to the platform occurring within a year is
as follows:

Pr (D) = Pr (D/H > 6.0) Pr (H > 6) = 0.266 x 0.40 = 0.1064

Accordingly, the mean rate of occurrence of wave heights that cause damage to the platform
will be ν = 0.1064 per year.

Pr (No damage to the platform from ocean waves in 3 years) = Pr (No damage-inducing wave
height results in 3 years) = e − 0.1064x3 = e − 0.3192 = 0.727

Example 4.16 − It is assumed that the capacity (C), of a building according to the equivalent
horizontal load coefficient is a lognormal variable with a median value of 6.5 and a coefficient
of variation of 0.20. It is estimated that the equivalent horizontal load coefficient due to the
largest earthquake to occur at the construction site will be 5.5.

a) What is the probability that the largest earthquake that can occur at the construction site will
cause damage to the building?

Solution: Seismic capacity, C, will have a lognormal distribution, and its parameters, λ and ξ,
are calculated as follows:

λ = ln 6.5 = 1.872 and ξ ≅ δ = 0.20

ln 5.5−ln 6.5
Pr(Damage to building) = Pr(C≤5.5) =  ( ) =  (− 0.835) = 1 – 0.7985 = 0.2015
0.20

b) If it is known that the building did not experience any damage when subjected to an
equivalent horizontal load coefficient equal to 4.0 created by a previous moderate earthquake,
what is the probability that it will not be damaged if it is subjected to the largest earthquake?

Solution:

According to the given information, what is asked is a conditional probability and is calculated
as follows:

4-18
CE 204 (2022-23 Spring Semester) MSY

Pr(No damage to building given that no damage was experienced when subjected to an
equivalent horizontal load coefficient of 4.0) = Pr(C ≥ 5.5C > 4.0)

Pr (C  4  C  5.5) Pr (C  5.5)
= =
Pr (C  4.0) Pr (C  4.0)

ln 5.5 − ln 6.5
1−  ( )
0.20 1 − 0.20 0.80 0.80
= = = = = 0.806
ln 4.0 − ln 6.5 1 −  (− 2.43) 1 − 0.007 0.993
1−  ( )
0.20

c) The frequency of large magnitude earthquakes that may occur in the future is modelled by a
Poisson process with an average return period of 500 years. If the damage caused by
earthquakes is assumed to be statistically independent among themselves, what is the
probability that this building will not experience any earthquake damage during its economic
lifetime set to100 years?

Solution:

Mean annual occurrence rate of large magnitude earthquakes = ν =1/500 = 0.002 earthquakes
per year.

Mean annual occurrence rate of damaging earthquakes = Pr (Damage occurrence) x ν

= 0.2015 x 0.002 = 0.0004 earthquakes / year

Pr (No earthquake damage to the building during next 100 years) = e− 0.0004x100 = e− 0.04 = 0.961

d) The building in question is located at a site consisting of five buildings designed considering
the same earthquake magnitude. Compute the probability that at least four of these five
buildings will not experience earthquake damage in their 100-year economic lifetime. It will
be assumed that damages to the buildings are statistically independent events.

Solution:

Pr (At least four out of five buildings do not experience earthquake damage during 100 years)

 5  5
=  (0.8)4 (0.2)1 +  (0.8)5 (0.2)0 = 0.4096 + 0.3277 = 0.737
 4  5

The mean annual occurrence rate of earthquakes that will cause damage at most three of the
five buildings:

= 0.737 x (1/500) = 0.00147 earthquakes / year

Pr (At least four out of five buildings do not experience earthquake damage during 100 years)

= e− 0.00147x100 = e− 0.147 = 0.863


4-19
CE 204 (2022-23 Spring Semester) MSY

4.8. EXPONENTIAL DISTRIBUTION

If a series of events occur according to the Poisson distribution, the random variable T1, which
symbolizes the time passed up to the first occurrence, will have an exponential distribution.
The probability density function of the exponential distribution is as follows:

fT1 (t) = ν e−νt t≥0 (4.18)

The expected (average) value and variance of T1 are as follows, respectively:

E(T1) = μT1 = 1/ν (4.19)

VAR(T1) = σ2T1 = 1/𝜈 2 (4.20)

The average time until the first occurrence, μT1 is also called the mean recurrence time or
mean return period.

Example 4.17 – In a seismically active region, 10 earthquakes with a magnitude greater than
6.0 have occurred between 1900 and 2015, according to the past earthquake records. Assuming
that large magnitude earthquakes in this region occur according to the Poisson process,
compute the probability of occurrence of earthquakes of this magnitude in the next 5 years.
How long is the mean return period?

Solution:

10 10
ν= = = 0.087 earthquakes/year
2015−1900 115

5 5 5
Pr (T1 ≤ 5) = ∫0 fT1 (t)dt = ∫0 ν e−νt dt = ∫0 0.087 e− 0.087t dt = 1 − e− 0.087x5

= 1 – 0.647 = 0.353

Mean return period = 1/ν = 1/0.087 = 11.5 years.

4.9. GAMMA DISTRIBUTION

If a series of events occur according to the Poisson distribution, the random variable Tk, which
symbolizes the time passed up to the kth occurrence, will have a gamma distribution. The
probability density function of the gamma distribution is as follows:

ν (νt)k−1
fTk (t) = e−νt t≥0 (4.21)
(k−1)!

Here, k > 0 and t ≥ 0.

The expected (average) value of Tk is

4-20
CE 204 (2022-23 Spring Semester) MSY

E(Tk) = μTk = k/ν (4.22)

and the variance is:

VAR(Tk) = σ2Tk = k/ν2 (4.23)

If the parameter k is equal to an integer, then the gamma probability distribution is also called
the Erlang distribution.

Example 4.18 – In the region considered in Example 4.17, write down the probability density
function of the time elapsed until the occurrence of the third earthquake greater than magnitude
6.0. Find the expected value and coefficient of variation of the corresponding distribution.

Solution:

If the time elapsed until the occurrence of the third earthquake greater than magnitude 6.0 is
denoted by T3, the probability distribution of the random variable T3 will be equal to a gamma
probability density function as given below.

0.087 (0.087t)3−1
fT3 (t) =
(3−1)!
e− 0.087t

= 0.00033 t2 e− 0.087t t ≥ 0

From Eq. 4.22,

E(T3) = μT3 = k/ν = 3/0.087 = 34.48 years

From Eq. 4.23,

VAR(T3) = σ2T3 = 3/ν2 = 3/0.0872 = 396.35 and σT3 = √396.35 = 19.91 years

Accordingly, the coefficient of variation will be: δT3 = 19.91/34.48 = 0.58.

4.10. OTHER DISTRIBUTIONS

Previous sections focused on the most commonly used probability distributions in civil
engineering applications. However, apart from these distributions, there are many other
distributions in statistics. Extreme value distributions are widely used in the modelling of live
and environmental loads such as wind, earthquake and snow. t, χ2 (chi-square) and F
distributions within the normal distribution family are widely used in point and interval
estimation and hypothesis testing in statistics. These distributions are not discussed here, but it
is possible to find information about them in any statistics book. Another important distribution
family is beta. The most important feature of this distribution family is that it allows defining
distributions in different forms. In particular, this distribution can be used efficiently in

4-21
CE 204 (2022-23 Spring Semester) MSY

quantifying uncertainties. For the sake of completeness, brief information on the beta
distribution is presented here.

If the two parameters of the beta distribution, denoted by n and r, meet the condition n>r> 0,
then the probability density function can be expressed as follows:

(n −1)!
f(x) = xr – 1 (1 – x)n – r – 1 0≤x≤1
(r −1)!(n −r −1)!
(4.24)
=0 otherwise

Although the equation of the beta distribution is similar in format to that of the binomial
distribution, the binomial distribution is discrete whereas beta is continuous. Another point to
be noted is that n and r do not need to be integers, although it is necessary to satisfy the
condition n > r > 0. When n and r are not integers, the terms (n – 1)!, (r – 1)! and (n – r – 1)!
taking place in Eq. 4.24 will be replaced by the corresponding gamma functions, Γ(n), Γ(r),
and Γ(n − r), respectively. For any variable, say y, the gamma function is defined as follows:

Γ(y) = ∫0 x y −1 e− x dx (4.25)

In case n and r are integers, Γ(y) = (y – 1)!, and hence the factorial terms taking place in
Eq. 4.24 will be valid.

The expected value and variance of the beta distribution are as follows, respectively:
r
E(X│r, n) = μX = (4.26)
n

r(n –r)
VAR(X│r, n) = σ2X = (4.27)
n2 (n+1)

The shape of the beta distribution depends on the values of the parameters r and n. If r = n/2,
the distribution is symmetrical. If r > n/2, the distribution will be skewed to the left (negative
direction) and if r < n/2, it will be skewed to the right (positives direction). If r = 1 and n = 2,
the beta distribution will have a uniform distribution within the range of 0 to 1. The shapes of
the standard beta probability density functions corresponding to the different q(=n) and r values
are shown in Fig. 4.10.

Example 4.19 –The ratio of the defective steel rods produced in a steel manufacturing plant,
p, is modelled by a beta distribution with parameters, r = 1 and n = 20 items. Find the mean
value and variance of p.

Solution:

1
Based on Eq. 4.26, E(p) = μp = = 0.05
20

1(20 –1)
Based on Eq. 4.27, VAR(p) = σ2p = = 0.0023
202 (20+1)

4-22
CE 204 (2022-23 Spring Semester) MSY

Figure 4.10 The standard beta probability density functions with different q(=n) and r values
(From Ang and Tang, 2007)

4.11. ADDITIONAL SOLVED PROBLEMS

Example 4.20 – (From Ang and Tang, 2007)

The drainage from a community during a storm is a normal random variable estimated to have
a mean of 1.2 million gallons per day (mgd) and a standard deviation of 0.4 mgd; i.e.,
N(1.2, 0.4) mgd. If the storm drain system is designed with a maximum drainage capacity of
1.5 mgd, what is the underlying probability of flooding during a storm that is assumed in the
design of the drainage system?

Solution:

Flooding in the community will occur when the drainage load exceeds the capacity of the
drainage system; therefore, the probability of flooding is
1.5 −1.2
Pr(X > 1.5) = 1 − Pr(X ≤ 1.5) = 1- Pr (z ≤ ) = 1− (0.75)
0.4
= 1 − 0.7734 = 0.227
Also, the following are of interest:

(i) The probability that the drainage during a storm will be between 1.0 mgd and 1.6 mgd,
which is computed as follows:
1.6 − 1.2 1..0 − 1.2
Pr(1.0 < X ≤ 1.6) = ( ) − ( ) = (1.0) − (−0.5)
0.4 0.4

= 0.8413 − [1 − (0.5)] = 0.8413 − (1 − 0.6915) = 0.533

4-23
CE 204 (2022-23 Spring Semester) MSY

(ii) The 90-percentile drainage load from the community during a storm. This is the value of
the random variable at which the cumulative probability is less than 0.90, which we could
obtain as:
x0.90 −1.2 x0.90 −1.2
Pr(X ≤ x0.90) = Pr(z ≤ ) = ( ) = 0.90
0.4 0.4

Therefore,

x0.90 −1.2
= −1 (0.90)
0.4

From the z-Table, we obtain −𝟏 (0.90) = 1.28; thus,

𝐱 𝟎.𝟗𝟎 = 1.28(0.40) + 1.2 = 1.71 mgd

Example 4.21 – (From Ang and Tang, 2007)

In Example 4.20, if the distribution of storm drainage from the community is a lognormal
random variable instead of normal, with the same mean and standard deviation, compute:

a) The probability that the drainage during a storm will be between 1.0 mgd and 1.6 mgd.
b) The 90-percentile drainage load from the community during a storm.

Solution:

a) First, we obtain the parameters λ and ζ of the lognormal distribution as follows:

𝜉 2 = ln [1 + (0.4/1.2)2] = ln(1.111) = 0.105

Thus, ξ = 0.324

and  = ln(1.20) − (1/2) (0.324) = 0.130

ln 1.6 −0.13 ln 1.0 −0.13


Pr(1.0 < X ≤ 1.6) = Pr(ln 1.0 < ln X ≤ ln 1.6) = ( ) − ( )
0.324 0.324

= (1.049) − (−0.401)

= 0.8531 − [1 − (0.401)] = 0.8531 − (1 − 0.6554) = 0.509


b) The 90% value of the drainage load from the community, with the lognormal distribution,
would be:

ln x0.90 −0.13 ln x0.90 −0.13


Pr(lnX ≤ ln x0.90 ) = Pr(z ≤ ) = ( ) = 0.90
0.324 0.324

ln x0.90−0.13
Therefore, = −1 (0.90)
0.324

4-24
CE 204 (2022-23 Spring Semester) MSY

From the z-Table, we obtain −1 (0.90) = 1.28; thus,

𝐥𝐧 𝐱 𝟎.𝟗𝟎 = 1.28∗0.324 + 0.13 = 0.54472 → 𝐱 𝟎.𝟗𝟎 = e0.54472 = 1.724 mgd

4-25
CE 204 (2022-23 Spring Semester) MSY

EXERCISE PROBLEMS

4.1. If Z is the standard normal variable, compute the following probabilities using Table 4.1.

a) Pr(−1<Z<+1)
b) Pr(Z < 1.64)
c) Pr(−2 < Z)
d) Pr(Z < 2)

4.2. If Pr(− zo < Z < zo) = 0.95, what is the value of zo?

4.3. If the weight of 1000 students shows a normal distribution with an average weight of 68.5
kg and a standard deviation of 2.7 kg, what percentage of these students weigh:

a) less than 63 kg?


b) more than 74 kg?
c) between 63 kg and 74 kg?

4.4. In Problem 4.3, assume that the weights show the lognormal distribution and solve the
problem again according to this assumption.

4.5. The average voltage of a battery is 15 volts and its standard deviation is 0.2 volts. What
is the probability that the total voltage will be greater than 60.8 volts if such four batteries are
connected in series? Assume that the voltage of the batteries is normally distributed.

4.6. The weight of silver teaspoons manufactured by a company is normally distributed. The
mean value is 10.10 gm and the standard deviation is 0.04 gm. On the spoons, it is written that
they weigh 10 gm.

a) What percentage of spoons weigh less than 10 gm?


b) What should be the mean value () of the spoons so that at most 1% will weigh less than
10?

4.7 The columns of a high-rise building will be carried by pile groups consisting of two piles.
The load-carrying capacity of the piles, C, is equal to the sum of the friction resistance that will
develop along the total length of the pile, F, and the bearing capacity, B, at the tip of the pile.
The mean values of B and F are 20 and 30 tons, respectively, and the coefficients of variation
(c.o.v.) are 0.20 and 0.30, respectively. In addition, the random variables B and F are assumed
to be statistically independent and normally distributed.

a) Find the mean value and coefficient of variation of the load-carrying capacity C of a single
pile. What is the probability distribution of C? Why? Explain.
b) A certain number of piles can be connected to each other, creating pile groups to carry larger
external loads. It is assumed that the carrying capacity of the pile group is equal to the sum of
the carrying capacities of the piles that make up the group. In this problem, there is a group of
piles consisting of two piles. For various reasons, the capacities of these two piles are
dependent. The correlation coefficient reflecting this dependency is estimated to be ρ = 0.25.
If T represents the capacity of this pile group, consisting of two piles, find the mean value of T
and its c.o.v.

4-26
CE 204 (2022-23 Spring Semester) MSY

c) If the maximum load L this pile group will be exposed to is a normal random variable with
a mean value of 50 tons and a coefficient of variation of 0.30, what is the probability of this
pile group collapsing? The total carrying capacity, T, is assumed to be normally distributed and
statistically independent of L.

4.8. Between the years 1921 and 2000, four earthquakes with a magnitude value of 6.5 and
greater were recorded in a certain region. According to the assumption that earthquakes of this
magnitude in this region follow a Poisson distribution:

a) Compute the probability of occurrence of earthquakes of this magnitude in the next five
years. What is the mean recurrence period?
b) Write down the expression corresponding to the probability density function of the time
elapsed for the occurrence of the fourth earthquake of this magnitude. Find the expected value
and coefficient of variation.
c) A structure built in this region did not experience any damage during earthquakes with a
magnitude value of less than 6.5. Assuming that earthquakes with a magnitude value of 6.5 and
greater occur according to the Poisson process, what is the probability that this structure will
be damaged by earthquakes during its 50-year economic lifetime?

4.9. Prove the VAR(X) = ν relationship given for the Poisson distribution in Theorem 4.2.

4.10. A reinforced concrete tower is exposed to horizontal loads created by strong winds. One
of the considerations in the strengthening of the tower is the duration of strong winds. The
duration period of wind, T is assumed to be a normally distributed random variable with a mean
value of 4 hours and a standard deviation of 1 hour. According to this given information:

a) Compute the probability of wind blowing for more than 6 hours.


b) Compute the probability of wind blowing for more than 9 hours, if it is known that wind has
already blown for more than 5 hours.
c) Compute the value of t, where t = duration period of wind blow, which will satisfy the
condition: Pr(T  t) = 2 Pr(T > t)

4.11. The depth from the ground surface to the rock layer, H, is not known for sure. Therefore,
it is treated as a normal random variable with a mean value of 10 m and a coefficient of variation
of 0.25. To create adequate support, the steel piles must be embedded 0.5 m into the rock.

a) What is the probability that a 14 m long steel pile will not anchor satisfactorily into the rock
layer?
b) If no rock layer up to 13 m depth was encountered during the driving of a 14 m long steel
pile to the ground, what is the probability that the pile will satisfactorily anchor in the rock
layer when an additional pile of 1 m is welded to the original length?

4.12. A column is designed to have a central safety factor of 1.6 (μR / μS = 1.6, where μR and
μS denote the mean strength and mean load values, respectively). The strength of the column
against axial loads is a random variable denoted by R with a coefficient of variation of 0.25.
The total axial load acting on column S consists of the sum of the effects of moving, dead, wind
and snow loads in the axial direction. It is assumed that the loads are independent of each other
and have the mean values and coefficients of variation given in the following table.

4-27
CE 204 (2022-23 Spring Semester) MSY

a) Calculate the expected value and coefficient of variation of the total axial load acting on the
column (S).
b) Calculate the failure probability of this column according to the assumptions that the axial
strength, R, also has a normal distribution and is statistically independent of the total load.
c) Compute the failure probability of this column if the axial strength and total load are
statistically dependent normal random variables. The correlation coefficient, which is a
measure of the degree of this dependence, is estimated as ρ = − 0.60.

Type of Load Mean value (kN) Coefficient of variation


Live 70 0.15
Dead 90 0.05
Wind 30 0.30
Snow 20 0.20

4.13. In the construction of a dam, the contractor uses four construction equipment with the
same characteristics. The operating lifetime of this construction equipment until the first failure
(T) is assumed a lognormally distributed random variable with a mean value of 1200 hours and
a coefficient of variation of 0.25.

a) Find the distribution parameters (λ and ζ) of the lognormal variate, T.


b) Assuming that the failures of this construction equipment are statistically independent, what
is the probability that at least three of these four pieces of construction equipment will fail
before 1000 hours?

4.14. The safety margin for a building element, M, is defined as follows: M = R – S. Here,
R = the strength (carrying capacity) of the building element, and S = the load that the building
element is exposed to. R is a random variable with a mean value, μR = 40 kN, and a coefficient
of variation δR = 0.15. S is also a random variable with a mean value μS = 20 kN and a
coefficient of variation δS = 0.25. R and S are dependent variables, and the value of the
correlation coefficient, which is the measure of dependence between them, is given as
ρR,S = − 0.20. Both variables are normally distributed.

a) Since R and S are random variables, M will also be a random variable. Accordingly, obtain
the probability distribution of M and specify the name of the distribution. At the same time,
compute the mean value, μM, and coefficient of variation, δM of M.
(Answer: Normal distribution; μM = 20 kN; δM = 0.427)
b) Calculate the probability of failure of this building element according to the information
provided.
(Answer: pf = 0.0096)
c) If the probability of failure is less than 0.01, the building element is considered "safe". To
meet this condition, what should be the smallest value of the mean strength, μR , of this building
element?
(Answer: μR = 39.85 kN)

4.15. If the expected value and variance of a beta distribution are 2/3 and 1/72, respectively,
calculate the values of the parameters r and n of this distribution.
(Answer: r = 10, n = 15)

4-28
CE 204 (2022-23 Spring Semester) MSY

4.16. For the seismic design of ordinary structures, generally, the peak ground acceleration,
corresponding to an exceedance probability of 10% in 50 years is used in the calculations.
Compute the mean return period corresponding to this peak ground acceleration.
(Answer: 475 years)

4.17. In the design of a building, maximum wind speed, corresponding to an exceedance


probability of 0.50 in 50 years is to be used in the calculation of the wind load Compute the
mean return period corresponding to this maximum wind speed.
(Answer: 72.64 years)

4-29
CE 204 (2022-23 Spring Semester) MSY

TABLE 4.1

AREAS UNDER THE STANDARD NORMAL DISTRIBUTION CURVE


BETWEEN 0 and z0

0 z0 z

z0 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

0.0 0.00000 0.00399 0.00798 0.01197 0.01595 0.01994 0.02392 0.02790 0.03188 0.03586
0.1 0.03983 0.04380 0.04776 0.05172 0.05567 0.05962 0.06356 0.06749 0.07142 0.07535
0.2 0.07926 0.08317 0.08706 0.09095 0.09483 0.09871 0.10257 0.10642 0.11026 0.11409
0.3 0.11791 0.12172 0.12552 0.12930 0.13307 0.13683 0.14058 0.14431 0.14803 0.15173
0.4 0.15542 0.15910 0.16276 0.16640 0.17003 0.17364 0.17724 0.18082 0.18439 0.18793
0.5 0.19146 0.19497 0.19847 0.20194 0.20540 0.20884 0.21226 0.21566 0.21904 0.22240
0.6 0.22575 0.22907 0.23237 0.23565 0.23891 0.24215 0.24537 0.24857 0.25175 0.25490
0.7 0.25804 0.26115 0.26424 0.26730 0.27035 0.27337 0.27637 0.27935 0.28230 0.28524
0.8 0.28814 0.29103 0.29389 0.29673 0.29955 0.30234 0.30511 0.30785 0.31057 0.31327
0.9 0.31594 0.31859 0.32121 0.32381 0.32639 0.32894 0.33147 0.33398 0.33646 0.33891
1.0 0.34134 0.34375 0.34614 0.34849 0.35083 0.35314 0.35543 0.35769 0.35993 0.36214
1.1 0.36433 0.36650 0.36864 0.37076 0.37286 0.37493 0.37698 0.37900 0.38100 0.38298
1.2 0.38493 0.38686 0.38877 0.39065 0.39251 0.39435 0.39617 0.39796 0.39973 0.40147
1.3 0.40320 0.40490 0.40658 0.40824 0.40988 0.41149 0.41308 0.41466 0.41621 0.41774
1.4 0.41924 0.42073 0.42220 0.42364 0.42507 0.42647 0.42785 0.42922 0.43056 0.43189
1.5 0.43319 0.43448 0.43574 0.43699 0.43822 0.43943 0.44062 0.44179 0.44295 0.44408
1.6 0.44520 0.44630 0.44738 0.44845 0.44950 0.45053 0.45154 0.45254 0.45352 0.45449
1.7 0.45543 0.45637 0.45728 0.45818 0.45907 0.45994 0.46080 0.46164 0.46246 0.46327
1.8 0.46407 0.46485 0.46562 0.46638 0.46712 0.46784 0.46856 0.46926 0.46995 0.47062
1.9 0.47128 0.47193 0.47257 0.47320 0.47381 0.47441 0.47500 0.47558 0.47615 0.47670
2.0 0.47725 0.47778 0.47831 0.47882 0.47932 0.47982 0.48030 0.48077 0.48124 0.48169
2.1 0.48214 0.48257 0.48300 0.48341 0.48382 0.48422 0.48461 0.48500 0.48537 0.48574
2.2 0.48610 0.48645 0.48679 0.48713 0.48745 0.48778 0.48809 0.48840 0.48870 0.48899
2.3 0.48928 0.48956 0.48983 0.49010 0.49036 0.49061 0.49086 0.49111 0.49134 0.49158
2.4 0.49180 0.49202 0.49224 0.49245 0.49266 0.49286 0.49305 0.49324 0.49343 0.49361
2.5 0.49379 0.49396 0.49413 0.49430 0.49446 0.49461 0.49477 0.49492 0.49506 0.49520
2.6 0.49534 0.49547 0.49560 0.49573 0.49585 0.49598 0.49609 0.49621 0.49632 0.49643
2.7 0.49653 0.49664 0.49674 0.49683 0.49693 0.49702 0.49711 0.49720 0.49728 0.49736
2.8 0.49744 0.49752 0.49760 0.49767 0.49774 0.49781 0.49788 0.49795 0.49801 0.49807
2.9 0.49813 0.49819 0.49825 0.49831 0.49836 0.49841 0.49846 0.49851 0.49856 0.49861
3.0 0.49865 0.49869 0.49874 0.49878 0.49882 0.49886 0.49889 0.49893 0.49896 0.49900
3.1 0.49903 0.49906 0.49910 0.49913 0.49916 0.49918 0.49921 0.49924 0.49926 0.49929
3.2 0.49931 0.49934 0.49936 0.49938 0.49940 0.49942 0.49944 0.49946 0.49948 0.49950
3.3 0.49952 0.49953 0.49955 0.49957 0.49958 0.49960 0.49961 0.49962 0.49964 0.49965
3.4 0.49966 0.49968 0.49969 0.49970 0.49971 0.49972 0.49973 0.49974 0.49975 0.49976
3.5 0.49977 0.49978 0.49978 0.49979 0.49980 0.49981 0.49981 0.49982 0.49983 0.49983
3.6 0.49984 0.49985 0.49985 0.49986 0.49986 0.49987 0.49987 0.49988 0.49988 0.49989
3.7 0.49989 0.49990 0.49990 0.49990 0.49991 0.49991 0.49992 0.49992 0.49992 0.49992
3.8 0.49993 0.49993 0.49993 0.49994 0.49994 0.49994 0.49994 0.49995 0.49995 0.49995
3.9 0.49995 0.49995 0.49996 0.49996 0.49996 0.49996 0.49996 0.49996 0.49997 0.49997
4.0 0.49997 0.49997 0.49997 0.49997 0.49997 0.49997 0.49998 0.49998 0.49998 0.49998

4-30
CE 204 (2022-23 Spring Semester) MSY

Chapter 5
MULTIPLE (MULTIVARIATE) RANDOM VARIABLES

5.1. TWO RANDOM VARIABLES - BIVARIATE DISTRIBUTIONS


If there is only one random variable involved, we refer to it as univariate; if two random
variables are involved, we refer to it as bivariate. If more than two random variables are of
concern, generally we call it the multivariate case. Here, we will start with the bivariate case.

If X and Y are two random variables, then their joint random characteristics and the
probabilities associated with given values of x and y can be described either by the joint
Cumulative Distribution Function, (CDF) or for the discrete random variables by the joint
probability distribution or joint probability mass function (pmf) and for the continuous
random variables by the joint probability density function (pdf) of X and Y. The basic
concepts and definitions are summarized in the following.

5.1.1 Joint Probability Distributions


i) Discrete Case:
a) Joint probability distribution (joint probability mass function, pmf) of X and Y: p(x,y)

p(x, y) = Pr (X = x ∩ Y = y)

p(x, y) ≥ 0 ; ∑ ∑ p(x, y) = 1.0


x y

b) Joint cumulative distribution function (CDF) of X and Y: F(x, y)

F(x, y) = Pr(X ≤ x ∩ Y ≤ y) = ∑ ∑ p(s, t)


s≤x t≤y
ii) Continuous Case:
a) Joint probability density function (pdf) of X and Y: f(x, y)

Pr[(X, Y) ∈ A] = ∬ f(x, y)dxdy


A

A is any region in the xy plane.

∞ ∞

f(x, y) ≥ 0 ; ∫ ∫ f(x, y)dxdy = 1.0


−∞ −∞

The volume under pdf gives the probability (see Fig. 5.1)
5-1
CE 204 (2022-23 Spring Semester) MSY

𝑏 𝑑

Pr(a < X ≤ b, c < Y ≤ d) = ∫ ∫ fX,Y (s, t) ds dt


𝑎 𝑐

Moreover, the volume under joint pdf gives the joint CDF.

Figure 5.1 Volume under the joint pdf corresponding to Pr(a < X ≤ b, c < Y ≤ d)
(From Ang and Tang, 2007)

b) Joint cumulative distribution function (CDF) of X and Y: F(x, y)


y x

F(x, y) = Pr(X ≤ x ∩ Y ≤ y) = ∫ ∫ f(s, t)dsdt


−∞ −∞

∂2
f(x, y) = F(x, y)
∂x ∂y

Note that, the joint cumulative distribution function of X and Y, F(x, y), satisfies the following
conditions; hence, FX,Y (X ≤ x, Y ≤ y) is a non-negative and non-decreasing function of
x and y.

FX,Y (X ≤ −∞, Y ≤ −∞) = 0 FX,Y (X ≤ ∞, Y ≤ ∞) = 1.0


FX,Y (X ≤ −∞, Y ≤ y) = 0 FX,Y (X ≤ ∞, Y ≤ y) = FY (Y ≤ y)
FX,Y (X ≤ x, Y ≤ −∞) = 0 FX,Y (X ≤ x, Y ≤ ∞) = FX (X ≤ x)

5.1.2 Marginal Distributions

i) Discrete Case:
Marginal probability distribution of X and Y:
p(x) = ∑y p(x, y) ; p(y) = ∑x p(x, y)

5-2
CE 204 (2022-23 Spring Semester) MSY

ii) Continuous Case:


Marginal probability density function of X and Y:

∞ ∞
f(x) = ∫ f(x, y)dy ; f(y) = ∫ f(x, y)dx
−∞ −∞

5.1.3 Conditional Distributions

i) Discrete Case:

p(x,y)
p(x/y) = p(y) ≠ 0
p(y)

p(x,y)
p(y/x) = p(x) ≠ 0
p(x)

ii) Continuous Case:

f(x,y)
f(x/y) = f(y) ≠ 0
f(y)

f(x,y)
f(y/x) = f(x) ≠ 0
f(x)

5.1.4 Statistical Independence

Two random variables X and Y are statistically independent iff:


f(x, y) = f(x). f(y)
For independent random variables
f(x/y) = f(x) and f(y/x) = f(y)

5.1.5 Expectation and Variance

E(aX ± bX) = aE(X) ± bE(Y) = aμX ± bμY ; where a and b are constants

∞ ∞
E[g(x, y)] = ∑x ∑y g(x, y) p(x, y) E[g(x, y)] = ∫−∞ ∫−∞ g(x, y) f(x, y)dxdy

VAR(aX ± bY) = a2 VAR(X) + b2 VAR(Y) = a2 σx 2 + b2 σy 2


VAR(X ± Y) = σx 2 + σy 2 } (if X and Y are 𝐬𝐭𝐚𝐭𝐢𝐬𝐭𝐢𝐜𝐚𝐥𝐥𝐲 𝐢𝐧𝐝𝐞𝐩𝐞𝐧𝐝𝐞𝐧𝐭)
COV(X, Y) = 0

5-3
CE 204 (2022-23 Spring Semester) MSY

5.1.6 Covariance and Correlation

When there are two random variables, the presence of a linear relationship is determined by:
i) Joint First Moment of X and Y
∞ ∞

E(XY) = ∫ ∫ x y fX,Y (x, y) dx dy


−∞ −∞

If X and Y are statistically independent,


∞ ∞

E(XY) = ∫ ∫ x y fX (x) fY (y) dx dy


−∞ −∞
∞ ∞

E(XY) = ∫ x fX (x) dx ∫ y fY (y) dy = E(X) E(Y)


−∞ −∞

ii) Joint Second Central Moment of X and Y

𝐂𝐎𝐕(𝐗, 𝐘) = E[(X − μX )(Y − μY )] = E(X Y) − E(X)E(Y) = 𝐄(𝐗 𝐘) − 𝛍𝐗 𝛍𝐘


Proof:
COV(X, Y) = E[(X − μX )(Y − μY )] = [E(XY) − μX E(Y) − E(X)μY + μX μY )]
= E(X Y) − μX μY − μX μY + μX μY = E(X Y) − μX μY = E(X Y) − E(X)E(Y)
= 𝐄(𝐗 𝐘) − 𝛍𝐗 𝛍𝐘

If X and Y are statistically independent, then COV(X, Y) = 0. However, the converse is not
necessarily true, since the correlation coefficient detects only linear dependencies between
two variables.

𝐂𝐎𝐕(𝐗, 𝐘) is a measure of the degree of the linear relationship between two random variables
X and Y. For practical purposes, it is better to use normalized covariance, which is called the
correlation coefficient, defined as follows:

COV(X, Y)
ρ=
σX σY
ρ is a dimensionless measure of the linear dependence between two random variables. Its value
ranges between -1.0 and 1.0; i.e.

−1.0 ≤ ρ ≤ 1.0

Some examples of the degree of correlation are shown in Fig. 5.2.

As stated before covariance, accordingly, the correlation coefficient measures the degree of a
linear relationship and no causality information can be inferred from the correlation coefficient.

5-4
CE 204 (2022-23 Spring Semester) MSY

Comments:
a) Perfect positive
correlation. ρ = + 1.0
b) Perfect negative
correlation. ρ = – 1.0
c) No correlation.
d) Positively correlated.
e) ρ = 0. No linear
dependence, but a
functional (circular)
relationship.
f) ρ = 0. No linear
dependence, but a
functional (sinusoidal)
relationship.

Figure 5.2 Some examples of the degree of correlation and the corresponding values of the
correlation coefficient (From Ang and Tang, 2007)

Example 5.1
Consider the following joint probability mass function, which is given as a table:

Table 5.1 The joint probability mass function, p(x,y) given in Example 5.1 and the computed
marginal distributions, p(x) and p(y)
x
0 1 2 p(y)
y
0 1/6 1/3 1/12 7/12
1 2/9 1/6 0 7/18
2 1/36 0 0 1/36
p(x) 5/12 1/2 1/12

a) Find the marginal probability mass functions of X and Y.


b) Find p(x/Y=1).
c) Are X and Y statistically independent?

5-5
CE 204 (2022-23 Spring Semester) MSY

Solution:
a) Using the equations given in Sect. 5.1.2, the marginal distributions, p(x) and p(y), are
computed and displayed on the margins of Table 5.1.

pXY (x,1)
b) 𝐩𝐗/𝟏 = , for x = 0, 1, 2
pY (1)

pXY (0,1) 2/9


𝐩𝟎/𝟏 = = = 4/7
pY (1) 7/18

pXY (1,1) 1/16


𝐩𝟏/𝟏 = = = 3/7
pY (1) 7/18

pXY (2,1) 0
𝐩𝟐/𝟏 = = =0
pY (1) 7/18

Table 5.2 Conditional probability mass function, p(x/1)


x p(x/1)
0 4/7
1 3/7
2 0

c) Check pXY(0,0) =? pX(0) pY(0)

1/6 = (5/12) x (7/12)

0.1667 ≠ 0.243

Therefore, X and Y are not statistically independent.

Example 5.2
Consider the following joint probability density function:

fXY(x,y) = 4xy 0 ≤ 𝑥 ≤ 1 and 0 ≤ 𝑦 ≤ 1

=0 elsewhere

a) Find the marginal probability density functions of X and Y.

b) Find f(x/y).

c) Are X and Y statistically independent?

5-6
CE 204 (2022-23 Spring Semester) MSY

Solution:
1 1
a) fX(x) = ∫0 fXY (x, y)dy = ∫0 4 xy dy = 2x for 0 ≤ 𝐱 ≤ 𝟏
=0 elsewhere
1 1
fY(y) = ∫0 fXY (x, y)dx = ∫0 4 xy dx = 2y for 0 ≤ 𝐲 ≤ 𝟏
=0 elsewhere

f(x,y) 4xy
b) f(x/y) = = = 2x for 0 ≤ 𝐱 ≤ 𝟏
f(y) 2𝑦

=0 elsewhere

c) Check fXY(x,y) =? fX(x) fY(y)


4xy = (2x) × (2y) = 4xy
Since the equality is satisfied, we can conclude that X and Y are statistically independent.

Example 5.3 (From Ang and Tang, 2007)


Consider the cantilever beam shown in Fig. 5.3, subjected to statistically independent random
loads, S1 and S2 with respective means and variances, 1, σ1 and 2, σ2. Compute the correlation
coefficient between the shear force, Q and bending moment, M at the fixed support of the
beam.

Figure 5.3 The cantilever beam subjected to two loads

Solution:
Q = S1 + S2
M = aS1 + 2aS2
E(Q) = Q = 1 + 2 VAR(Q) = σ2Q = σ12 + σ22
E(M) = M = a1 + 2a2 VAR(M) = σ2M = a2 σ12 + 4a2 σ22
COV(Q,M) = E(QM) ‒ QM
COV(Q,M)
𝛒𝐐𝐌 =
σQ σM

E(QM) = E[(S1 + S2) (aS1 + 2aS2)] = aE(S12 ) + 3aE(S1S2) + 2aE(S22 )


5-7
CE 204 (2022-23 Spring Semester) MSY

But,
E(S1S2) = E(S1) E(S2) = 1 2
E(S12 ) = σ12 + 12 E(S22 ) = σ22 + 22
Substituting these into the E(QM) expression and simplifying,
E(QM) = a(σ12 + 2σ22 ) + QM
COV(Q,M) = E(QM) ‒ QM = a(σ12 + 2σ22 ) + QM ‒ QM
= a(σ12 + 2σ22 )

COV(Q,M) 𝐚(𝛔𝟐𝟏 + 𝟐𝛔𝟐𝟐 )


𝛒𝐐𝐌 = =
σQ σM
√{(𝛔𝟐𝟏 + 𝛔𝟐𝟐 )( 𝐚𝟐 𝛔𝟐𝟏 + 𝟒𝐚𝟐 𝛔𝟐𝟐 )}

Note that, the correlation coefficient depends only on variances but not on mean values.
If, it is assumed that 𝛔𝟐𝟏 = 𝛔𝟐𝟐 = 𝛔𝟐 , then
3aσ2 3aσ2 3
𝛒𝐐𝐌 = = = = 0.948
√{(2σ2 )(5a2 σ2 )} √10 (aσ2 ) √10

The computed value of 𝛒𝐐𝐌 indicates a high linear dependence between the shear force, Q
and bending moment, M at the fixed end of the beam. This is because both Q and M are
functions of the same loads S1 and S2.

Example 5.4
A cantilever beam is subjected to a random concentrated load P as shown in Fig. 5.4, below.
The load P may take on values of 10 kN or 15 kN with probabilities of 0.8 and 0.2, respectively.
The moment capacity of the beam MR is assumed to be NORMAL (30 kN.m, 6 kN.m). The
shear capacity VR is also assumed to be NORMAL (19 kN, 5kN).

a) Compute the probability of failure due to bending moment.


b) Compute the probability of failure due to shear.
c) Compute the survival probability of the beam. What assumptions did you make in
computing this probability?

Figure 5.4 The cantilever beam subjected to load P


5-8
CE 204 (2022-23 Spring Semester) MSY

Solution:

a) Pr(Failure in bending) = Pr(MR < 2P)

= Pr(MR < 20) × 0.8 + Pr(MR < 30) × 0.2


20 − 30 30 − 30
= Pr (z < ) × 0.8 + Pr (z < ) × 0.2
6 6

= Pr(z < −1.67) × 0.8 + Pr(z < 0) × 0.2

= 0.0485 × 0.8 + 0.5 × 0.2 = 0.1388

b) Pr(Failure in shear) = Pr(VR < P)

= Pr(VR < 10) × 0.8 + Pr(VR < 15) × 0.2


10 − 19 15 − 19
= Pr (z < ) × 0.8 + Pr (z < ) × 0.2
5 5

= Pr(z < −1.8) × 0.8 + Pr(z < −0.8) × 0.2

= 0.0359 × 0.8 + 0.2119 × 0.2 = 0.0711

c) Assuming the two failure modes to be statistically independent,

Pr(Survival) = (1 − 0.1388)(1 − 0.0711) = 0.8612 × 0.9289 = 0.79997 ≅ 𝟎.𝟖𝟎

On the other hand, if the two failure modes are assumed to be perfectly correlated, the weakest
mode will be considered,

Pr(Survival) = Minimum of (0.8612,0.9289) = 0.8612

As observed the bounds on the survival probability of the given cantilever beam is:

0.80 ≤ Pr(Survival) ≤ 0.86

Example 5.5 Bivariate Normal Distribution

As an example of bivariate distributions, the popular Bivariate Normal Distribution is


considered. In the following, the joint probability density function and its graphical
representation (Fig. 5.5) are given.

5-9
CE 204 (2022-23 Spring Semester) MSY

Figure 5.5 Bivariate Normal Distribution

5-10
CE 204 (2022-23 Spring Semester) MSY

Chapter 6
FUNCTIONS OF RANDOM VARIABLES

6.1. INTRODUCTION

In this chapter the following problem will be answered: “If we know the random characteristic
of a random variable, X, in terms of its CDF or pdf or pmf and if another random variable, say
Y, is related to X by a deterministic function, say, Y = g(X), how can we obtain the
corresponding CDF or pdf or pmf of Y. Here, X is called the independent variable and Y as
the dependent variable. First, we will consider the function of a single random variable, and
then extend it to the function of multiple random variables.

6.2. DERIVED PROBABILITY DISTRIBUTIONS

6.2.1 Function of a Single Random Variable

Let

Y = g(X) (6.1)

Then when Y = y, X = g-1(y), where g-1(∗) is the inverse function of g(∗). If the inverse function
g-1(y) is single-valued, i.e. has a single root, then

Pr(Y = y) = Pr[X = g-1(y)]

Accordingly, the pmf of Y becomes

pY (y) = pX [g-1(y))] (6.2)

The relationship given by Eq. 6.2, is illustrated in Fig. 6.1(b) for the following function and the
pmf of X is displayed in Fig. 6.1(a):

Y = X2 for x ≥ 0

(a) (b)
Figure 6.1 Illustration of Eq. 6.2 (a) pmf of X; (b) pmf of Y

6-1
CE 204 (2022-23 Spring Semester) MSY

Because X is discrete, the graphical representation given in Fig. 6.1 can also be displayed in a
table format:

Table 6.1 pY(y) derived from pX (x) according to the functional relationship Y=X2 for x ≥ 0

X=x pX (x) Y = y (= x2) 𝐩𝐘 (y)


1 0.25 1 0.25
2 0.50 4 0.50
3 0.25 9 0.25

According to Eq. 6.2, as seen in Table 6.1 when y = 1, x = 1 and pY (1) = pX (1) = 0.25; similarly,
when y = 4, x = 2 and pY (4) = pX (2) = 0.50 and when y = 9, x = 3 and pY (9) = pX (3) = 0.25.
At all other values of Y, pY (y) = 0.The probability mass function of Y is shown in Fig. 6.1(b)
and given in Table 6.1.

Similar expressions can be written in terms of cumulative distribution functions, as follows:

FY(y) = Pr(Y ≤ y) = Pr[X ≤ g-1(y))]

if g(x) is an increasing function of x. On the other hand, if g(x) is a decreasing function of x,

Pr(Y ≤ y) = Pr[X ≥ g-1(y))]

Thus, when y increases with x, the CDF of Y will be

FY(y) = FX[g-1(y)]

Therefore, for the discrete cases,

FY(y) = ∑{all xi ≤ g−1 (y)} pX (xi ) (6.3)

For the continuous case, with fX (x) denoting the probability density function of X,

g−1 (y)
FY(y) = ∫−∞ fX (x)dx (6.4)

By making a change of variable in this integration, according to the basic rules of calculus, we
get,

g−1 (y) y dg−1 (y)


FY(y) = ∫−∞ fX (x)dx = ∫−∞ fX (g −1 (y)) dy
dy

From FY(y) we obtain,

dF(y) dg−1 (y)


fY(y) = = fX (g −1 (y))
dy dy

6-2
CE 204 (2022-23 Spring Semester) MSY

dg−1 (y)
In Eq. 6.5, when y increases with x, will be positive; however, when y decreases with
dy
x, FY(y) = 1 ‒ FX(g −1 (y)) and accordingly

dF(y) dg−1 (y)


fY(y) = = ‒ fX (g −1 (y))
dy dy

dg−1 (y)
but, since is also negative in this case, the derived pdf of y, for both positive and
dy
negative cases will be:

dg−1 (y)
fY(y) = fX (g −1 (y)) | |
dy

where, | ∗ | denotes the absolute value. The above results are summarized in the following
theorem.

Theorem 6.1

Suppose that X is a continuous random variable and g(∗) is a strictly monotonic differentiable
function. Let Y = g(X). Then the PDF of Y is given by:

dg−1 (y)
fY(y) = fX (g −1 (y)) | | where g(x) = y
dy
(6.5)
=0 if g(x) = y does not have a solution.

Example 6.1
The pmf of X is given in the following table. A new random variable dependent on X is defined
as follows:
Y = X2 ‒ X
Obtain the pmf of Y.

Solution:
The solution is presented in the following table:

X=x pX (x) Y = y (= x2 ‒ x)
-3 1/7 12
-2 1/7 6
-1 1/7 2
0 1/7 0
1 1/7 0
2 1/7 2
3 1/7 6

6-3
CE 204 (2022-23 Spring Semester) MSY

The resulting 𝐩𝐘 (y) is shown in the following table:

Y=y 𝐩𝐘 (y)
0 2/7
2 2/7
6 2/7
12 1/7

Example 6.2
Let X∼Uniform (−1, 1) and Y=X2. Find the CDF and pdf of Y.
Solution:
First, we note that the range of Y = [0, 1]. As usual, we start with the CDF. For y∈[0, 1], we
have

FY(y) = Pr(Y ≤ y) = Pr(X2 ≤ y) = Pr(−√y ≤ X ≤ √y)

√y −(−√y )
= since X ∼ uniform (−1, 1)
1− (−1)

2 √y )
= = √y 0<y<1
2

Thus, the CDF of Y is given by

0 for y < 0
FY(y) = {√y for 0 ≤ y ≤ 1}
1 for y > 1

Note that the CDF is a continuous function of Y, so Y is a continuous random variable. Thus,
we can find the PDF of Y by differentiating FY(y),

d 1
fY(y) = (FY(y)) = for 0 ≤ y ≤ 1
dy 2 √y

=0 otherwise

Example 6.3 (From Ang and Tang, 2007)

Consider a normal variate, X with parameters µ and σ; i.e., X: N(µ, σ) with pdf:

x - 2
1 -1/2 ( )
f ( x ) = N ( x ; ,  ) = e  -X 
2 

6-4
CE 204 (2022-23 Spring Semester) MSY

X− 
If Z = , determine the pdf of Z.
σ

Solution:
dg−1
First, we observe that the inverse function is: g −1 (z) = σz + μ and = σ. Then, according
dz
to Theorem 6.1, the pdf of Z is:

1
1 − 2 (σz + μ − μ)2
fZ (z) = exp [ ] |σ|
√2π σ σ2

1 2
1
= e−2 z
√2π

which is the pdf of the standard normal distribution, 𝐍(𝟎, 𝟏).

Example 6.4 (From Ang and Tang, 2007)

The random variable X has a lognormal distribution with parameters λ and ζ. Derive the pdf
of the random variable Y, where, Y = ln X.

Solution:

The pdf of X is as follows:

1 1 1 ln x − λ 2
fX (x) = exp [− ( ) ]
√2π ζ x 2 ζ

The inverse function is

g −1 (y) = ey

and

dg −1
= ey
dy

Therefore, according to Theorem 6.1,

1 1 1 y−λ 2 y 1 1 y−λ 2
fY (y) = exp [− ( ) ] |e | = exp [− ( ) ]
√2π ζ ey 2 ζ √2π ζ 2 ζ

which means that the distribution of Y = ln X is normal with a mean of λ and a standard
deviation of ζ; i.e., 𝐘: 𝐍(𝛌, 𝛇). This result also shows that:

E(ln X) = λ and VAR(ln X) = ζ2

6-5
CE 204 (2022-23 Spring Semester) MSY

In certain cases, the inverse function, g −1 (y) may not be single-valued and for a given value
of y, there may be multiple values of g −1 (y). For example, if

g −1 (y) = x1, x2,…,xk, then, if X is discrete the pmf of Y is

pY (y) = ∑ki=1 pX (xi ) (6.6)

and if X is continuous the pdf of Y is

dgi−1 (y)
fY(y) = ∑ki=1 fX (g −1
i (y)) | | (6.7)
dy

where, g i−1 (y) = xi is the ith root of g −1 (y).

Example 6.5 (From Ang and Tang, 1975)

The strain energy stored in a linearly elastic bar of length, L, subjected to an axial force, S, is
given by the following equation:
L
U= S2
2AE

where, A = cross-sectional area of the bar and E = modulus elasticity of the material. Here, the
only random variable is S and is assumed to be to have a standard normal distribution, i.e.
S: N(0, 1). Obtain the pdf of U.

Solution:

Since all of the variables, except S (i.e. L, A and E) are deterministic quantities we introduce a
constant c defined as:

c=L/2AE,

then

U = c S2

𝑢
The inverse will be s = ± √
𝑐

As observed the inverse function has two roots:


𝑢 𝑢
s1 = √ 𝑐 s2 = ‒√ 𝑐 with the corresponding derivative

ds 1 ds 1
=± , but as explained above the absolute value will be taken as: | |= .
du 2 √cu du 2 √cu

The distribution of S is as follows:


1 2
1
fS (s) = e− (2 s )
√2π

6-6
CE 204 (2022-23 Spring Semester) MSY

Now based on Theorem 6.1 and because the inverse function has two roots,
2 2
1 𝑢 1 𝑢
− [ (√ ) ] − [ (‒√ ) ]
1 2 𝑐 1 1 2 𝑐 1
fU (u) = e + e
√2π 2 √cu √2π 2 √cu

1 𝑢
1 − [ ( )]
fU (u) = e 2 𝑐 for u ≥ 0
√2πcu

The resulting distribution is known as the Chi-square distribution with one degree-of-
freedom.

Example 6.6 X has the following pdf:

fX (x) = 1/2 ‒1≤𝑥≤1

=0 otherwise

If Y = X2 find the pdf of Y.

Solution:

We will obtain fY (y) in two different ways:

(i) FY(y) = Pr(Y ≤ y) = Pr(X2 ≤ y) = Pr(−√y ≤ X ≤ √y)

√y 1 1
=∫
− √y 2
dx = [√y − (−√y)] = √y
2

1
d[FY (y)] 1
𝑦− 2 =
1
fY (y) = = 0<y<1
dy 2 2 √y

Therefore,

1
fY (y) = 0<y<1
2 √y

=0 otherwise

ii) If y = x2, then the inverse function is x = g −1 (y) = ±√𝑦 and

dx 1

dy 2 √y

dx 1
but we will take the absolute value | |=
dy 2 √y

6-7
CE 204 (2022-23 Spring Semester) MSY

According to Theorem 6.1 and Eq. 6.7,

1 1 1 1 1
fY (y) = ∗ + ∗ =
2 2 √y 2 2 √y 2 √y

We conclude that:
1
fY (y) = 0<y<1
2 √y

=0 otherwise

6.2.2 Functions of Multiple Random Variables


A dependent variable may be a function of two or more independent variables. In this case, the
pdf will be derived from the distributions of the independent variables depending on the
function relating these random variables.

We will consider first the case of two independent random variables, X and Y. Let the function
be defined as follows:

Z = g(X, Y)

Then for the discrete case, the corresponding pmf and CDF will be as follows, respectively:

pZ (z) =∑{g(x,y)=z} pX,Y (xi , yi )


(6.8)
FZ (z) =∑{g(x,y)≤z} pX,Y (xI , yI )

For the functions of continuous multiple random variables Theorem 6.1 should be extended.
The resulting equations will be more complex however, there exist some general rules, which
simplify the derivations. These are listed below:

Sum of Statistically Independent Poisson Variates


Let Z = ∑ni=1 Xi , where each Xi has a Poisson distribution with parameter, νi and Xi’s are
statistically independent. In this case, Z will also have a Poisson distribution with its parameter
denoted by νZ, where νZ = ∑𝐧𝐢=𝟏 𝛎𝐢 . It is important to note that this rule is valid for the sum but
not for the differences.

Example 6.7
A certain site is exposed to seismic hazard due to three active faults close to the site. Earthquake
occurrences on these three faults are assumed to be statistically independent events and
described by the Poisson distribution with mean annual rates as follows:
ν1 = 0.01 earthquakes/year, ν2= 0.04 earthquakes/year and ν3 = 0.05 earthquakes/year.
What is the probability distribution of the seismic activity at the site?

6-8
CE 204 (2022-23 Spring Semester) MSY

Solution:
Let Z, be the random variable representing the seismic activity at the site. According to the rule
stated above, Z will have Poisson distribution with a mean rate,
νZ = ∑3i=1 νi = 0.01 + 0.04 + 0.05 = 0.10 earthquakes/year

Linear Function (Sum and Difference) of Statistically Independent Normal Variates


This is a very useful rule, which simplifies the computations involving linear functions of
statistically independent normal variates. The statement of this rule is given below:
If
Z = ∑ni=1 ai Xi
where, each Xi has a normal distribution with parameters, i and σi and are statistically
independent and 𝐚𝐢 ’s are constant, then Z will also have a normal distribution with its
parameters denoted by Z and σZ, where

Z = ∑ni=1 ai i and σZ = √∑ni=1 a2i σ2i .


It is important to note that this rule is valid for both the sum and differences.

Example 6.8
Let Z = X1 + X2 and W = X1 ‒ X2 and let X1 and X 2 be two statistically independent normally
distributed random variables defined as X1 : N(20, 4) and X2 : N(10, 3). Derive the distributions
of Z and W.
Solution:
According to the rule stated above, Z and W will have normal distributions with the following
parameters.

Z = 20 + 10 = 30 and σZ = √42 + 32 = 5

W = 20 ‒10 = 10 and σZ = √42 + (−3)2 = 5


Thus, Z: N(30, 5) and W: N(10, 5)
This example shows that the rule is applicable for both the sum and difference. The standard
deviation is the same in both cases, but the mean differs.

Example 6.9
The total axial load, Y that is acting on a column is the sum of the following load effects, which
are listed in the table given below, together with their statistical parameters. All load effects are
assumed to be statistically independent and normally distributed.

a) Find the mean value, standard deviation and coefficient of variation of the total load, Y.
b) If the coefficient of variation of the axial design capacity of the column, D is 0.25 and if it

6-9
CE 204 (2022-23 Spring Semester) MSY

is a normally distributed random variable, statistically independent of Y, what is the


reliability? (i.e. survival probability) of this column, if it is designed for a mean safety factor
(S.F.) of 1.75?

c) If the coefficient of variation of the axial design capacity of the column, D is 0.25 and if it
is a normally distributed random variable correlated with Y with a correlation coefficient
ρ = 0.7, what is the reliability of this column, if it is designed for a mean safety factor (S.F.) of
1.75?

Load Effect Expected Value (kN) Coefficient of Variation


Dead Load (D) 50 0.03
Live Load (L) 80 0.20
Equivalent Wind Load (W) 20 0.25
Snow Load (S) 10 0.25
Equivalent Earthquake Load (E) 35 0.4

Solution:

a) E(Y) = 50 + 80 + 20 + 10 + 35 = 𝟏𝟗𝟓 𝐤𝐍 = 𝛍𝐘

VAR(Y) = (50 × 0.03)2 + (80 × 0.2)2 + (20 × 0.25)2+ (10 × 0.25)2 +(35 × 0.4)2
σ2Y = 485.5(kN)2
σ 22.03
𝛔𝐘 = √485.5 = 22.03 kN and 𝛅𝐘 = μY = = 𝟎. 𝟏𝟏𝟑
Y 195

b)Let the safe state be defined as: "D>Y" or alternatively "D − Y > 0", where D is the
design load.

D = 1.75Y → E(D) = 1.75E(Y) = 341.25 kN

VAR(D) = (0.25 × 341.25)2 = 7278.22(kN)2

D − Y − E(D − Y) 0 − E(D) + E(Y)


Pr(D − Y > 0) = Pr ( > )
√V(D − Y) √V(D) + V(Y)

−341.25 + 195
= Pr (z > ) ≅ Pr(z > −1.66)
√7278.22 + 485.5

= 𝟎. 𝟗𝟓

c)VAR(D − Y) = VAR(D) + VAR(Y) − 2ρDY √VAR(D)VAR(Y)

= 7278.22 + 485.5 − 2 × 0.7 × √7278.22 × 485.5 = 5132.0 (kN)2


σD−Y ≅ 𝟕𝟏. 𝟔𝟒 𝐤𝐍

6-10
CE 204 (2022-23 Spring Semester) MSY

−146.25
Pr (z > ) = Pr(z > −2.04) ≅ 𝟎. 𝟗𝟖
71.64

Example 6.10
The water supply for a city comes from two reservoirs, A and B with a total capacity of X m 3.
The amount of water in reservoir A comes from three rivers. Each of the rivers feeding reservoir
A has a discharge X1 varying normally as N(250000 m3, 15000 m3) and the amount of water in
reservoir B comes from two rivers each having discharge X2 normally distributed as N(150000
m3, 35000 m3). The demand (D) by the city also fluctuates normally with a mean of 800000 m3
and a coefficient of variation of 0.2. It is assumed that the amount of water coming from the
rivers is statistically independent.
a) Determine the expected value of the total amount of water X in reservoirs A and B.

b) Determine the variance of the total amount of water X in reservoirs A and B. What is the
probability that there may be a water shortage in the city (Shortage: D > X).

Solution:

X1: N (250000 m3, 15000 m3)


X2: N (150000 m3, 35000 m3)
D : N (800000 m3, 800000 * 0.2 m3)
a) In A: X1 + X1 + X1 → E(X1) + E(X1) + E(X1) = 3 E(X1) = 3*250000 = 750 000 m3
In B: X2 + X2 → E(X2) + E(X2) = 2 E(X2) = 2*150000 = 300 000 m3

b) X = X1 + X1 + X1 + X2 + X2
E(X) = 3 E(X1) + 2 E(X2) = 750000 + 300000 = 1 050 000 m3
VAR(X) = 3 VAR(X1) + 2 VAR(X2) = 3*150002 + 2*350002 = 3125*106 m6
Shortage: D > X → Pr(D > X) = Pr(D ‒ X > 0)
Let Y = D ‒ X
E(Y) = Y = E(D) – E(X) = 800000 – 1050000 = ‒250 000 m3
VAR(Y) = σ2Y = VAR(D) + VAR(X) = 160 0002 + 3125*106 m6
Since both X and D are normally distributed and statistically independent, Y will be also
normally distributed.
Y−Y 0−(−250000)
Pr(Y > 0) =Pr ( > )
√ σY √1600002 + 3125∗106

= Pr(z > 1.475) = 1 − Φ(1.475) = 1 − 0.93 = 0.07 = 𝟕%


If the rivers are identical, i.e. perfectly correlated, then

6-11
CE 204 (2022-23 Spring Semester) MSY

X = 3 X1 + 2 X2 → E(X) = 1 050 000 m3


But, VAR(X) = 9 VAR(X1) + 4 VAR(X2) = 6925*106 m6

VAR(D − X) = VAR(D) + VAR(X) = 1600002 + 6925*106 m6


Y− Y 0 − (−250000)
Pr ( > ) = Pr(z > 1.386) ≈ 0.08 = 𝟖%
√ σY √1600002 + 6925∗106

Products and Quotients of Statistically Independent Lognormal Random Variables


If
Z = ∏𝑛𝑖=1 Xi
where, each Xi has a lognormal distribution with parameters, λi and i and are statistically
independent of each other, then

ln Z = ∑ni=1 ln Xi (6.9)

Since, lnXi ’s are statistically independent normally distributed random variables, ln Z, which is
their sum, will also be normally distributed with mean λZ = ∑ni=1 λi and variance 2Z = ∑ni=1 2i .
Hence, Z will be lognormally distributed with the following parameters,
λZ = ∑ni=1 λi (6.10a)
and

Z = √∑ni=1 2i (6.10b)

On the other hand, if we have the following quotient:


W = X/Y where X: LN(λZ, Z ) and Y: LN(λY, Y ), and X and Y are statistically independent.
Then,
ln Z = ln X – ln Y (6.11)
Since, ln X and ln Y are statistically independent normally distributed random variables, ln Z,
which is their difference, will also be normally distributed with mean, λZ = λX ‒ λY, and
variance, 2Z = 2X + 2Y . Hence, Z will be lognormally distributed with the following parameters:

λZ = λX ‒ λY (6.12a)
and

Z = √2X + 2Y (6.12b)

6-12
CE 204 (2022-23 Spring Semester) MSY

Example 6.11
The efficiency (E) of a company producing construction materials is estimated based on the
following relationship:

Y
E= √eT eS/9
M

where,

Y = the fatigue life of the heavy machinery used,


S = the weekly working times of the workers,
T = the past experience periods of the engineers controlling the production, and
M =the unit costs of the materials used.

Y and M are lognormally distributed random variables with median values of 5000 hours and
250 TL, respectively, and coefficients of variation of 0.20 and 0.15, respectively. T and S are
normally distributed random variables with mean values of 6 years and 45 hours, respectively
and standard deviations of 2 years and 4.5 hours, respectively. T and S are dependent variables,
the coefficient that reflects the correlation between them is 𝛒𝐓,𝐒 = 0.75. All other variables are
independent of each other.

a) Calculate the expected value and coefficient of variation of E.


b) Find the probability distribution of E.
c) Calculate the probability that the efficiency is greater than 90.

Solution:

a) The parameters of the lognormally distributed random variables Y and M are as follows:

ξY ≅ 0.2 and ξM ≅ 0.15; λY = ln 5000 = 8.517 and λM = ln 250 = 5.522

If we take the logarithm of both sides of the equation given for efficiency,

ln E = ln Y – ln M + 0.5 (T + S/9)

This is a linear equation and the expected value and variance of E are calculated as follows:

E (ln E) = E (ln Y) – E (ln M) + 0.5 [E (T) + E (S/9)]


λE = λY – λM + 0.5 [μT + μS/9] = 8.517 – 5.522 + 0.5 [6 + 45/9] = 8.495

VAR (ln E) = VAR (ln Y) + VAR (ln M) + 0.25 [VAR (T) + (1/81) VAR (S)]

ξ2E = ξ2Y + ξM2 + 0.25 [σ2T + (1/81) σ2S ] + 2 x 0.52 x (1/9) x ρ x σT x σS

𝛏𝟐𝐄 = 0.22 + 0.152 + 0.25 [22 + (1/81) 4.52] + 2 x 0.25 x (1/9) x 0.75 x 2 x 4.5

= 0.04 + 0.0225 + 1.0625 + 0.375 = 1.5

ξE = 1.225
6-13
CE 204 (2022-23 Spring Semester) MSY

b) Since Y and M have lognormal distributions, ln Y and ln M will be normally distributed. T


and S are given as normally distributed. Therefore, ln E, which is a linear function of these four
normally distributed random variables, will be normal and E will be lognormally distributed.

c) According to the results obtained in parts (a) and (b), E is a lognormally distributed random
variable with parameters λE = 8.495 and ξE = 1.225. Accordingly,

ln 90 −8.495
Pr (E > 90) = Pr (ln E > ln 90) = Pr (z > )
1.225
4.500 − 8.495 − 3.995
= Pr (z > ) = Pr (z > )
1.225 1.225
= Pr (z > − 3.26) = 0.99944

6.3. APPROXIMATE MOMENTS OF FUNCTIONS OF RANDOM VARIABLES


6.3.1. Expected Value of Functions of Random Variables

a) Exact Results
Let Y depend on n random variables denoted by Xi, i = 1, 2,…, n, according to the following
functional relationship:
Y = g(x1 , … , xn ) (6.13)
Then the mathematical expectation of Y, E(Y), will be obtained from the following equation:
⏟ ∫ g(x1 , … , xn ) fX1…X2 (x1 , … , xn )dx1 … dxn
E(Y) = ∬ … (6.14)
D𝑋

Here, Xi = i th independent variable (i = 1, …, n); fX1,…, Xn (x1,…, xn) = joint probability density
function; DX = the region over which the joint probability density function is defined.
In computing E(Y) using Eq. 6.14, two problems will be faced:
i) Computational difficulties due to n-tuple integrals.
ii) Lack of sufficient data to assess the joint probability density function.

On the other hand if g(x1 , … , xn ) is a linear function, then these difficulties will be of no concern
as illustrated in the following.
i) Let, Y= a + bX where a and b are constants
E(Y) = a + bE(X) = a + bμX
VAR(Y) = a2 VAR(X) = a2 σ2X

ii) Let, Y = a1 X1 + a2 X2
E(Y) = μY = E(a1 X1 + a2 X2 ) = a1 E(X1 ) + a2 E(X2) = a1 μ1 + a2 μ2 (6.15)
VAR(Y) = E[(Y ‒ μY )2 ] = E[(a1 X1 + a2 X2 ) ‒ (a1 μ1 + a2 μ2 )]2

6-14
CE 204 (2022-23 Spring Semester) MSY

= E[𝑎12 (X1 ‒ μ1 )2] + E[𝑎22 (X2 ‒ μ2 )2] + E[2 a1 a2 (X1 ‒ μ1)(X2 ‒ μ2 )]


= 𝑎12 𝐸(X1 ‒ μ1 )2 + 𝑎22 𝐸(X2 ‒ μ2 )2] + 2 a1 a2 𝐸[(X1 ‒ μ1)(X2 ‒ μ2 )]
VAR(Y) = σ2Y = a21 σ12 + a22 σ22 + 2 a1 a2 COV(X1 , X 2 ) (6.16)
Since, COV(X1 , X2 ) = ρ12 σ1 σ2 ; where ρ12 = correlation coefficient between X1 and X2 .

VAR(Y) = σ2Y = a21 σ12 + a22 σ22 + 2 a1 a2 ρ12 σ1 σ2 (6.17)

If X1 and X2 are statistically independent, ρ12 = 0, and

VAR(Y) = σ2Y = a21 σ12 + a22 σ22 (6.18)

If W = a1 X1 ‒ a2 X2
E(W) = μW = E(a1 X1 ‒ a2 X2 ) = a1 E(X1 ) ‒ a2 E(X2 ) = a1 μ1 ‒ a2 μ2
VAR(W) = σ2W = a21 σ12 + a22 σ22 ‒ 2 a1 a2 COV(X1 , X2 )

VAR(W) = σ2W = a21 σ12 + a22 σ22 ‒ 2 a1 a2 ρ12 σ1 σ2


The above results can be generalized to linear functions of multiple random variables as shown
below:
iii) Let, Y = ∑ni=1 ai Xi (6.19)

where, each Xi has a mean value, i and a standard deviation, σi and 𝒂𝒊 ’s are constants, and if
Xi’s are correlated with correlation coefficient, ρij, where i = 1, 2,…, n and j = 1, …, n, then
the mean value and variance of Y will be:

 = ∑ni=1 ai i (6.20)
Y

𝛔𝟐𝐘 = ∑𝐧𝐢=𝟏 𝐚𝟐𝐢 𝛔𝟐𝐢 + ∑𝐧𝐢=𝟏 ∑𝐧𝐣=𝟏 𝐚𝐢 𝐚𝐣 𝐂𝐎𝐕(𝐗 𝐢 , 𝐗 𝐣 ) (6.21)

Since, 𝐂𝐎𝐕(𝐗 𝐢 , 𝐗 𝐣 ) = 𝛒𝐢𝐣 𝛔𝐢 𝛔𝐣

𝛔𝟐𝐘 = ∑𝐧𝐢=𝟏 𝐚𝟐𝐢 𝛔𝟐𝐢 + ∑𝐧𝐢=𝟏 ∑𝐧{𝐣=𝟏, 𝐢≠𝐣} 𝐚𝐢 𝐚𝐣 𝛒𝐢𝐣 𝛔𝐢 𝛔𝐣 (6.22)

If Xi’s are statistically independent, then

 = ∑ni=1 ai i (6.23)
Y

𝛔𝟐𝐘 = ∑ni=1 a2i σ2i (6.24)

σY = √∑ni=1 a2i σ2i (6.25)

In the multivariate case, the correlation structure can conveniently be described by the
covariance matrix, CX:

6-15
CE 204 (2022-23 Spring Semester) MSY

2 2 2
σ11 σ12 … … … … … … … . σ1n
σ221 σ222 … … … … … … … . σ22n
………………………..
CX = … … … … σ2kk … … … . (6.26)
………………………..
… …..………………..
[ σn1 σ2n2 … … … … . … … … σ2nn ]
2

The covariance matrix is square and symmetric. In this matrix, the σ2ij term corresponds to
variance if i = j and to the covariance between the random variables Xi and Xj if i ≠ j.

iv) Expected value of the product of statistically independent random variables

Let T = ∏ni=1 ai Xi i = 1, 2,…, n and j = 1, …, n, (6.27)

where, each Xi has a mean value, i and 𝒂𝒊 ’s are constants and Xi’s are statistically
independent. Then the mean value of T will be:

E(T) = ∏ni=1 ai E(Xi ) = ∏ni=1 ai i (6.28)

Similarly,

E(T2) = ∏ni=1 a2i i E(Xi2 ) = ∏ni=1 a2i 2i (6.29)

6.3.2 Expected Value and Variance of General Functions

̃) = X1, X2, …, Xn be the vector of random variables involved in the problem at hand.
Let (X
These variables will be called basic variables. Let Y be a function of these n basic variables
defined as follows:

Y = g(x̃) = g(x1 , x, … , 𝑥n ) (6.30)

In the previous section, we have seen that for linear functions finding the expected value and
the variance is computationally simple. For nonlinear functions to obtain the exact results,
multiple integrations must be performed. This could be avoided by linearizing the function but
getting approximate results. This linearization can be done by applying the Taylor series
expansion at the mean vector denoted by, ̃:1, 2, …, n and keeping only the linear terms
as shown below:
∂g
Y = g(x̃) = g(μ1 , μ2 , … , μn ) + ∑ni=0 (∂x ) (xi − μi )
i 
̃

+ (Second and higher order terms) (6.31)


In Eq. 6.31 the partial derivatives will be calculated using the mean values of the random
variables, 
̃:1, 2, …, n. Keeping only the linear terms and ignoring the higher-order terms
in the Taylor series expansion of the given function as displayed in Eq. 6.31 the following
approximate relationships are obtained for, Y and Y (Ang ve Tang, 1984):

E(Y) = μY ≅ g(μ1 , … , μn ) (6.32)

6-16
CE 204 (2022-23 Spring Semester) MSY

∂g 2 ∂g ∂g
VAR(Y) = σ2Y ≅ ∑ni=1 ( ) σ2i + ∑i≠j ∑ ( ) ( ) ρij σi σj (6.33)
∂xi  ∂xi ̃ ∂xj
̃ 
̃

Here, i = standard deviation of Xi; ij = correlation coefficient between Xi and Xj. In case, the
basic variables are statistically independent,

∂g 2
VAR(Y) = σ2Y ≅ ∑ni=1 ( ) σ2i (6.34)
∂xi 
̃

The above method is generally referred to as the First-Order Second Moment (FOSM)
approximation.
Note that for one variable case, i.e. Y = g(X), the following relationships will be valid.

E(Y) = μY ≅ g(μX ) (6.35)

dg 2
VAR(Y) = σ2Y ≅ ( ) σ2X (6.36)
dx 
̃

If we keep the second order term, then the following, so-called second-order approximation
results will be obtained:

1 dg 2
E(Y) = μY ≅ g(μX )+ ( ) σ2X (6.37)
2 dx 
̃

2
dg 2 1 d2 g dg d2 g
VAR(Y) = σ2Y ≅ ( ) σ2X ‒ ( 2 ) σ2X + E(X ‒ μX )3 ( ) ( 2 )
∂x 
̃ 4 dx 
̃ ∂x dx 
̃ 
̃
2
1 4 d2 g
+ E(X ‒ μX ) ( ) (6.38)
4 dx2 
̃

Example 6.12 (From Ang ad Tang, 2007)


The maximum impact pressure (in psf) of ocean waves on coastal structures may be determined
by,
pmax = (2.7ρ K U2)/D
where, U = random horizontal velocity of the advancing wave, with a mean of 4.5 fps and a
c.o.v. of 20%. The other parameters are all constants as follows: ρ = 1.96 slugs/cu ft, the density
of seawater, K = length of the hypothetical piston, D = thickness of air cushion. Assume a ratio
of K/D = 35.
The first-order mean and standard deviation of pmax, according to Eqs. 6.35 and 6.36 are
E(pmax) ≅ 2.7 (1.96) (35) (4.5)2 = 3750.7 psf = 26.05 psi
VAR(pmax) ≅ VAR(U) (2.7ρK/D)2 (2 U)2 = (0.20 ×4.5)2 (2.7 ×1.96 ×35)2 (2 ×4.5)2
Therefore, the standard deviation of pmax is
𝛔𝐩𝐦𝐚𝐱 ≅ (0.20 ×4.5) (2.7 ×1.96 ×35) (2 ×4.5) = 1500.3 psf = 10.42 psi

6-17
CE 204 (2022-23 Spring Semester) MSY

Accordingly, the coefficient of variation will be:


U = 10.42/26.05 = 0.40
Which is twice that of the wave velocity.
For an improved mean value, we evaluate the second-order mean using Eq. 6.37 as follows:
E(pmax) ≅ 3750.7 + (1/2) (0.20 × 4.5)2 (2. 7ρK/D) (2)
= 3750.7 + (1/2) (0.20 × 4.5)2 (2.7 ×1.96 ×35) ×2
= 3750.7 + 150.0 = 3900.7 psf = 27.09 psi
This shows that for this case, the first-order mean is about (27.09 – 26.05)/26.05 = 4% less than
the second-order mean.

6.4. ADDITIONAL SOLVED PROBLEMS

Example 6.13 (From Ang ad Tang, 2007)

The annual operational cost, C, for a waste treatment plant is a function of the weight of solid
waste, W, the unit cost factor, F, and an efficiency coefficient, E, as follows:
WF
C=
√E
where W, F, and E are statistically independent lognormal variates with the following
respective medians and coefficients of variation (c.o.v.):

Variable Median c.o.v


W 2000 tons/yr 20%
F $20 per ton 15%
E 1.6 12.50%

As C is a function of the product and quotient of lognormal variates, its probability distribution
is also lognormal, which we can show as follows:
1
ln C = ln W + ln F − ln E
2
1
Accordingly, ln C is normal with mean, λC = λW + λF − 2 λE

1 2
and variance, ζ 2C = ζ 2W + ζ 2F + (2 ζE ) .

Therefore, C is lognormal with


λC = ln 2000 + ln 20 − 0.5 x ln 1.6 = 10.36
and

ζC = √(0.20)2 + (0.15)2 + (1/2 × 0.125)2 = 0.26

6-18
CE 204 (2022-23 Spring Semester) MSY

On the basis of the above, the probability that the annual cost of operating the waste treatment
plant will exceed $35 000 is:
𝐏𝐫(𝐂 > 𝟑𝟓𝟎𝟎𝟎) = 1 − Pr(C ≤ 35000)
ln 35000−10.36
= 1 – ϕ( )= 1 − ϕ(0.397)
0.26

= 1– 0.655 = 0.345

Example 6.14 (From Ang and Tang, 2007)


The maximum load on a column of a high-rise reinforced concrete building may be composed
of the dead load (D), the live load (L), and the earthquake-induced load (E). The total maximum
load carried by the column would be T=D+L+E. Suppose the statistics of the individual load
components are as follows:
μD = 2000 tons, σD = 210 tons

μL = 1500 tons, σL = 350 tons

μE = 2500 tons, σE = 450 tons

If the three loads are statistically independent, i.e., 𝛒𝐢𝐣 = 𝟎, the mean and standard deviation
of the total load T, where, T = D + L + E, are:
𝛍𝐓 = 2000 + 1500 + 2500 = 𝟔𝟎𝟎𝟎 𝐭𝐨𝐧𝐬
and
𝛔𝟐𝐓 = 2102 + 3502 + 4502 = 𝟑𝟔𝟗 𝟏𝟎𝟎 𝐭𝐨𝐧𝐬𝟐
Hence, the standard deviation is 𝛔𝐓 = 𝟔𝟎𝟕. 𝟓𝟎 𝐭𝐨𝐧𝐬
However, the dead load, D, and the earthquake load, E, may be correlated, say with a correlation
coefficient of 𝛒𝐢𝐣 = 𝟎. 𝟓, whereas the live load L is uncorrelated with D and E. Then, the
corresponding variance would be:
𝛔𝟐𝐓 = 2102 + 3502 + 4502 + 2x1x1(𝟎. 𝟓)(210)(450) = 𝟒𝟔𝟑 𝟔𝟎𝟎 𝐭𝐨𝐧𝐬𝟐
and the standard deviation becomes 𝛔𝐓 = 𝟔𝟖𝟎. 𝟖𝟖 𝐭𝐨𝐧𝐬.
Now, suppose the mean and standard deviation of the load-carrying capacity of column C are
𝛍C = 𝟏𝟎 𝟎𝟎𝟎 𝐭𝐨𝐧𝐬 and 𝛔𝐂 = 𝟏𝟓𝟎𝟎 𝐭𝐨𝐧𝐬. The probability that the column will be overloaded
is then,
Pr(C < T) = Pr(C − T = 0)
But E(C – T) = 𝐂−𝐓= 10000 – 6000 = 4000 tons, and its standard deviation is

𝛔𝐂−𝐓 = √607.52 + 15002 = 𝟏𝟔𝟏𝟖 𝐭𝐨𝐧𝐬

6-19
CE 204 (2022-23 Spring Semester) MSY

Assuming that all the variables are Gaussian, and therefore, the difference (C – T) is also
Gaussian, the probability of overloading the column will be:
0−4000
𝐏𝐫([𝐂 − 𝐓)] < 𝟎) = ϕ ( ) = ϕ(−2.47) = 1 − ϕ(2.47) = 1 − 0.9932 = 0.007
1618

Example 6.15 (From Ang and Tang, 2007)


The applied stress, S, in a beam is calculated as:
𝐌 𝐏
S= +
𝐙 𝐀

where:
M = applied bending moment, P = applied axial force, A = cross-sectional area of the beam and
Z = section modulus of the beam. The following statistical information is given for the
engineering parameters:
μM = 45 000 in-lb δM = 0.10
μZ = 100 in3 δZ = 0.20
μP = 5 000 lb δP = 0.10
A = 50 in2
Assume that M and P are correlated with a correlation coefficient of 𝛒𝐌,𝐏 = 0.75, whereas Z
is statistically independent of M and P.
Compute the mean and standard deviation of the applied stress S in the beam by first-order
approximation.
Solution:

Example 6.16
The following empirical equation is derived for the solution of an engineering problem:

Z = X Y2 √W

6-20
CE 204 (2022-23 Spring Semester) MSY

where:
X: Uniformly distributed between 2.0 and 4.0,
Y: Normally distributed with a median of 1.0 and Pr(Y ≤ 2.0) = 0.9207,
W: Exponentially distributed with a median of 1.0,
and X, Y and W are statistically independent.
a) Compute the mean values, variances and coefficients of variation of X, Y and W.
b) Compute the mean, standard deviation and coefficient of variation of Z using the first-order
approximation.

Solution:

6-21
CE 204 (2022-23 Spring Semester) MSY

Example 6.17

A column of a building is designed in such a way that its strength, R, has a median value of
336 kN and Pr(R ≤ 532 kN) = 0.99. The strength of the column, R, is assumed to be normally
distributed and uncorrelated with loads. The total column load, T, is the sum of live (L), dead
(D), wind (W) and snow (S) loads. Assume these loads to be mutually independent normal
variables with the following statistical parameters:

Load type Mean value (kN) Coefficient of variation


Live (L) 70 0.15
Dead (D) 90 0.05
Wind (W) 30 0.30
Snow (S) 20 0.20

a) Compute the expected values and coefficients of variation of the strength of the column, R
and total column load, T.
b) Compute the reliability index defined by Cornell (i.e. βC) and the probability of failure of
the column, if the strength is also assumed to be a normal random variable and independent of
the total load.
c) Compute the reliability index defined by Cornell (i.e. βC) and the probability of failure of the
column, if the strength and total load are negatively correlated normal random variables with a
correlation coefficient of 𝛒 = − 0.6. Explain also what is meant by negative correlation.
d) Assume that the safety level found in part (b) is rated to be inadequate and the design should
be revised to improve the safety level. Assuming all statistical data associated with the total
load T and strength R are the same as you have computed in part (a) and all variables are
mutually uncorrelated, compute the revised mean strength, 𝛍𝐑∗ , in order to have a survival
probability of 0.97 (i.e. failure probability, pf = 0.03).
(Hint: For the definition of the reliability index, βC, please refer to Section 7.4).
Solution:
a) T = L + D + W+ S
E(T) = µT = E(L + D + W+ S) = E(L) + E(D) + E(W) + E(S)

6-22
CE 204 (2022-23 Spring Semester) MSY

= 70 + 90 + 30 + 20 = 210 kN
VAR(T) = 𝛔𝟐𝐓 = VAR(L + D + W+ S) = VAR(L) + VAR(D) + VAR(W) + VAR(S)
= 10.52 + 4.52 + 92 + 42 = 110.25 + 20.25 + 81 + 16 = 227.5 kN2
𝛔𝐓 = 15.08 kN 𝛅𝐓 = 15.08/210 = 0.0718
µR = Median(R) = 336 kN
532 − 336
Pr(R ≤ 532) = Pr(z < ) = 0.99
𝜎𝑅
532 − 336
Since, R is normal, ϕ(0.99) = 2.33, Therefore, = 2.33
σR
196
𝛔𝐑 = = 84.12 kN ≅ 84 kN 𝛅𝐑 = 84.12/336 = 0.25
2.33

b) 𝛔𝐑 = 84 kN VAR(R) = 842 = 7056 kN2


Let M = R – T,
µM = 336 – 210 = 126 kN

𝛔𝐌 = √227.5 + 7056 =√7283.5 = 85.34 kN


βC = 126/85.34 = 1.476
pf = 1- ϕ(1.476) = 1 – 0.9305 = 0.0695 ≅ 0.07

c) µM = 336 – 210 = 126 kN


VAR(M) = VAR(R) + VAR(T) + 2 x (– 0.6) x (1) (–1) x 15.08 x 84
= 7056 + 227.5 + 1520.06 = 8803.56 kN2

𝛔𝐌 = √8803.56 = 93.83 kN
βC = 126/93.83 = 1.343
pf = 1 – ϕ(1.343) = 1 – 0.9099 = 0.0901 ≅ 0.09
There is a negative linear dependence between the total load T and resistance R, implying that
as one increases the other one decrease or vice versa.

d) µM = μ∗R – 210

𝛔𝐌 = √(μ∗R x 0.25)𝟐 + 15.082


pf = 0.03
Therefore, βC = ϕ−1(1 – pf) = ϕ−1(1 – 0.03) = ϕ−1 (0.97) = 1.88

6-23
CE 204 (2022-23 Spring Semester) MSY

μ∗R – 210
βC = = 1.88
𝟐
√(μ∗R x 0.25) + 15.082
(μ∗R – 210)2 = 1.882 [(μ∗R x 0.25)𝟐 + 15.082 ]

(μ∗R )2 – 420μ∗R + 44100 = 0.2209(μ∗R )2 + 803.75

0.7791(μ∗R )2 – 420μ∗R + 43296.25 = 0

𝛍∗𝐑 ≅ 400.23 kN

6-24
CE 204 (2022-23 Spring Semester) MSY

Chapter 7
BRIEF INFORMATION ON STATISTICS**

7.1 INTRODUCTION AND TYPES OF UNCERTAINTIES

Many real-life problems that engineers deal with are formulated under conditions of uncertainty.
The engineering design of a physical problem may involve natural processes and phenomena that
are inherently random. The related information may not be complete, adequate, or not satisfactory
to the problem of concern. Therefore, the idealized prototype of the problem and/or its mathematical
model (formulated form) may involve such uncertainties together with the uncertainties related to
the imperfections in modelling and parameters used. In short, uncertainties may enter the problem
at the input stage (physical uncertainties) and the modelling phase (model uncertainties) resulting
in output uncertainties (statistical uncertainties). Thus, inevitably, decisions required for planning
and design are to be made or are made under conditions of uncertainty.
The above-mentioned uncertainties are in general grouped into two, namely: “aleatory” and
“epistemic”.
Aleatory uncertainties (random/stochastic uncertainties) deal with the randomness or predictability
of an event, mostly reflecting external variability in the system. They are uncertainties ascribed to
the physical system and/or environment under consideration. They are irreducible, inherent, and
stochastic. (e.g., wind speed and direction are aleatory (random) uncertainties).
Epistemic uncertainties (parameter uncertainties) reflect the possibility of errors in our general
knowledge. Such uncertainties result from some level of ignorance or incomplete information about
the system or surrounding environment. They are subjective and model forms of uncertainties and
are related to the state of knowledge uncertainty. Since generally, we do not know the correct values
of the parameters in the model constructed, parameter uncertainties are of epistemic type. Epistemic
uncertainties are reducible. (e.g., I believe that the speed of the wind is less than 40 km/hr, but I am
not sure of that).
The output or final uncertainty that results in the problem-solution over which decisions are to be
made may appear to be aleatory but usually, it may result from both sources of uncertainty, which
are both aleatory and epistemic. Whatever the types are, the effects of uncertainty are important in
both the design and planning of engineering systems and require quantification. For scientific
quantification of uncertainties, engineers use the concepts and methods of probability. On the other
hand, to reduce the degree of epistemic uncertainties one may require obtaining more
information/data via observations, experiments and records where concepts, tools and methods of
statistics are almost inevitable.

7.2 STATISTICS

Statistics is the science and art of collecting, displaying/tabulating/compiling/summarizing and


gaining insight to interpret/understand data obtained from an appropriately constructed experiment
(descriptive statistics) or study to test theories and make inferences/decisions on these theories
(inferential statistics). In short, the scientific approach in statistics first requires an underlying theory
to be tested for which test data will be obtained from relevant observations or experiments. The very
basic point in data analysis is to establish robust mathematical models to organize and gather
information efficiently.
____________________
** This chapter is originally prepared by Dr. Engin Karaesmen within the scope of the undergraduate
course, CE 204 Uncertainty and Data Analysis.
7-1
CE 204 (2022-23 Spring Semester) MSY

Before we proceed to the exercises, we need to define two main concepts: population and sample.

• A population is a set of well-defined distinct objects and its elements or in other words, it
is the whole set or group that we are interested in. Usually, we denote the population by the
set S. S can be finite (if so, the population size is usually denoted by N) or infinite in extent.
For several reasons, we may not be able to observe the population totally, but we may be
able to study only a portion of it, called the sample.
• When the observations are numerical values, the population is referred to as a quantitative
population. If the observations are on attributes (type of structure, level of damage, or
similar category) the population is a qualitative population.
• A sample is a subset of S (let it be denoted by A, A= {s1, … ,sn}). Usually, A is a small part
of the population that we draw from S to make observations on it to learn about S. The more
representative sample we collect from the population, the better we may learn about the
population.
• Raw data is the list of observations/ measurements in the sample whose values are not
manipulated at all.
• Statistics is concerned with data. If the population is quantitative, the data set will constitute
numbers. However, if the population is qualitative the observations will be non-numerical
and for a statistical study, numerical data can be artificially created. We call such data
nominal data since the numbers will represent arbitrary codes.
In engineering, most of the time the data we encounter is ratio data, for which the basic
arithmetic operations (addition, subtraction, division and multiplication) are valid for such
data.
For some types of data only addition and subtraction are meaningful, such data are scale-
dependent as in the case of temperature and are called interval data (e.g. In Celsius 0o, 20o
and 40o correspond to 32o, 68o and 104o Fahrenheit, respectively). The ratios in the two
scales differ between the temperatures but intervals remain constant. Therefore, the
temperature is an interval data.
Data, where no arithmetic operations are meaningful, are called ordinal data. The numbers
in the data will represent an ordering relation in other words ranking in terms of importance,
preference, strength, etc.

The elements of the sample represented as real-valued function X (si) = xi that are defined on
the sample A= {s1, … ,sn} is called data set. The data set may be discrete or continuous
depending on the physical characteristics of xi. (If your sample is for the number of vehicles or
the type of vehicles crossing a bridge you have a discrete sample but if you are interested in the
weights or lengths of cars, the data set will be continuous).

• The number of elements in the data set is called the sample size and is usually denoted by
n.
• Collecting sample data from a larger population and using it to make predictions and /or
decisions for the entire population is called statistical inference. The efficiency of the
inference will lie in selecting a representative, appropriate in general a random sample.
• The random sample is the one obtained by the items giving the same chance of being chosen
to every item as any other item of the population.
• Random variable: Quantities that are measured or observed are termed variables and
because of the inherent randomness, they are called random variables (since the values of
measured or observed quantities depend on chance).
• Continuous Random Variable: Variables that can have any value on a continuous interval.

7-2
CE 204 (2022-23 Spring Semester) MSY

• Discrete Random Variable: Variables that can have countable isolated numbers (e.g.
integers).
• Distribution refers to the variability pattern of the random variable.

7.3 DESCRIPTIVE STATISTICS

Descriptive statistics consists of methods used to organize, display and analyze data from some
population or sample.

• Data Collection: When the population size is large, it can be time-consuming, expensive or
impractical, and/or impossible to study its set values. In practice, it is, therefore, more
common and usually desirable to study a relatively small fraction of the population, which
we have defined as the sample which is required to be representative of the entire population
(so to be chosen at random).

• Organization of Data: We may organize data in several different ways to see if a pattern
exists for the characteristics of the data.

- One basic way is to list the data in numerical order (ascending or descending order).

When the observations are listed in ascending order, say for a sample of size n, as
{x (1)  x (2)  ...  x (n) } (1)
the elements of the set in Eq. (1) are called the order statistic and x(i) is called the i'th order
(i − 0.5)
statistic. The i’th order statistic x(i) is the quantile. Sometimes it may be useful to
n
obtain the d’th percentile value (Q d) of the ordered data (sorted in ascending form) by the

following formula:

Q d = x (i) + [(n+1)d – i ](x (i+1) - x (i)) (2)

where, n is the sample size and the subscript i, is the largest integer such that i ≤ (n+1) d.

Note that in general, x(i)  xi and

x (1) = min{x1,..., , x 2 }
(3)
x (n) = max{x1,..., , x 2 }

The range of the data is:


r = x(n) – x(1) (4)

The interquartile range (iqr) is the length of the interval that contains the middle half of
the data. The interquartile range is defined as

(iqr) = Q3 - Q1 (5)

where, (Q3 - Q1) is called the middle 50 percent range (Q1 and Q3 correspond to the first
25 percent (lower quartile) and 75 percent (upper quartile) ranges of the data, respectively.

7-3
CE 204 (2022-23 Spring Semester) MSY

Symbolic Representation of Quartile Ranges (each box refers to 25 % of the data)

Median of the Lower Median, Q2 Median of the Upper


Values, Q1 Values, Q3

The number of class intervals

- In case n or the range of existing numeric values are or data come from continuous
set of numbers one may arrange the data in categories or in the form of class intervals.
The number of class intervals k is usually chosen at least 5 (for small-sized data) but
not more than 20 (for large-sized data) and the intervals are to be non-overlapping. The
frequency estimation in interval form is usually for continuous data but if the sample
size is large one may prefer to represent the discrete set of data in intervals also.
The practical guide to choosing “k” can be based on the following empirical
mathematical models:
a) rule of thumb: choose the closest integer to n as k where n is the sample size.
b) due to Sturges (1926) : k = 1 + 3.3 log10 n where n is the sample size.
r n1/3
c) due to Friedman and Diaconis (1981): k = where r is the range, n is the
2(iqr)
size and (iqr) is the interquartile range of the sample data, respectively.

Note that the intervals will be expressed in the form [a, b) (left parenthesis is
closed and the right one is open). a and b will be class boundaries or class
limits and class marks (or mid-point) are the mid-points of each class interval.

The number of observations (frequencies) fi for each number in the data set or for each
class interval can be counted and may be listed as frequency or class frequency in a
tabular form. Such a table will give a full summary of the population frequency
distribution or sample frequency distribution(or empirical frequency distribution)
depending on whether the data represent the entire population or a portion of it.
Usually, only samples will be available, thus population frequency distribution will
remain unknown.

The above arrangements or organizations of data may often be easier to summarize


and to help draw conclusions from data when represented in tabular and especially in
graphical forms.

• Graphical Representation of Data


- The Histogram and Frequency Curve (Frequency Polygon), f̂ (x): A histogram is a bar
chart in which the vertical axis represents the number of observations (frequencies) and
the horizontal axis is for the observed values or corresponding class intervals. When the
data is discrete and not large in size, the plot of the data versus its frequency will be
referred to as a line diagram or bar chart. For large data or continuous data, each bar
showing the frequency is centered over the classmark or as a rectangle over the class
interval to obtain the histogram.

7-4
CE 204 (2022-23 Spring Semester) MSY

Whenever some frequencies are large the relative frequency (rel. freq.)
fj fj
rel.freq. = k =
n or N
fj
j=1

where j = 1, 2, …, k can be used on the vertical axis. Frequency distributions are the
basic shapes of their histograms.

A frequency polygon is obtained by joining the frequencies of the sample data points or
class marks linearly.

Cumulative frequency (or relative frequency) distribution (ogive) F̂ (x): is constructed


by adding frequency (or relative frequency) of each point or class interval of the
data to the frequencies (or relative frequencies) of the lower values or classes. It
gives us less than frequencies of a specific data value or interval.

- Dot Diagram: When the sample size is small and/or data is continuous, one may
sometimes use a dot diagram. One dot per data is placed over the numerical value of the
data represented on the horizontal axis.

- Stem and Leaf Plot: Histograms may not be effective when n < 50 (may not give a clear
indication of variability and other characteristics). In such cases, we may prefer stem and
leaf plots that yield no loss of information since all magnitudes are represented. Such a
plot will also highlight extreme values and other characteristics and can be constructed
easily.

- Box Plot: For showing the three quartiles Q1 (lower quartile), Q2 (median) and Q3
(upper quartile) on a rectangular box, a representation may be used to indicate variability
of the data.

- Scatter Diagram: If there are n pairs of data (x1, y1), (x2, y2), …, (xn, yn) a preliminary
indication of correlation between them is obtained by a scatter diagram in which the
horizontal axis is reserved for the independent (or the variable with least uncertainty)
and the vertical axis is for the dependent or uncertain variable.

Several other graphical representations are used in practice, but in engineering bar
charts, histograms, ogives, scatter diagrams, and sometimes stem and leaf plots are the
most commonly preferred visual tools.

You may easily note that graphic displays and frequency tables depend on the size and
number of class intervals chosen (so as the stem and leaf). A good graphic description and
display is mostly partly art and partly science. Unless they are accompanied by the
following statistical descriptors, they may not give sound ideas about the sample and
population.

• Numerical Statistical Descriptors

In addition to numerical descriptors defined above like the sample range, quartiles several
others exist to give locational and variability characteristics of the data.
7-5
CE 204 (2022-23 Spring Semester) MSY

In the following xi represents the data value in the ungrouped data or the classmark (mid
point value of the interval) in the grouped data unless.

- Central Value Measures (Measures of Location):

- Sample Mean (sample average or arithmetic mean): The mean defines


the center of mass of the frequency distributions and for a set of numerical
data {x1, x 2, …, xn }the sample mean x is defined by

x=
 x i or x =  f j x j (6)
n n
where, fi is the frequency of xi. xi is either the numerical value of the data
or is the classmark (class center) for the grouped data. The arithmetic
mean is easy to find and it is unique for a given data set but it is highly
affected by the extreme values present in the data. In such a case the
sample mean may not be a good representer of the data set.

- Harmonic Mean: When one needs to find the average of the reciprocals of a
variable or if xi values are very large to get a meaningful x , one may
compute the harmonic mean as
1
xh = (7)
1 1 1 1
( + + ... + )
n x1 x 2 xn

- Geometric Mean: When averaging values that represent a rate of change one
may use the geometric mean defined as:

n
x g = n x1 x 2 ..... x n = n  x i (8)
i =1

- Median: When the data is ordered, the central value is defined as the median.
The median also corresponds to the second quartile Q2 of the data set (x med).
So that half of the data is less than the median and the other half is larger
than the median. If the number of data is odd median is middle point data as
xmed = x((n+1)/2) (9)
If the number of data is even, the median is the average of the middle two
data as
x (n/2) + x ((n/2) +1)
x med = (10)
2

- Mode: That value of the data set, which occurs most frequently, is defined as
the mode of the data (x mod).

One should note that if instead of the sample the data covers all population values we have
population descriptors with N instead of n, and usually population mean will be denoted by
μ as:
 xi  f ixi
μ= or μ = (11)
N N

7-6
CE 204 (2022-23 Spring Semester) MSY

- Measures of Dispersion (or Spread)

The following quantities are for measuring how far does the data spread from a central
value: mean or median.

- Mean Absolute Deviation: It is the average distance of the data points from
the central value as
n k

x 1 − x + x 2 − x + ... x n − x  (x i − x ) f
j=1
j (x j − x)
d= = i =1
or (12)
n n n

- Variance and Standard Deviation: The variance s2 is defined as

1
s2 =  (x i − x ) for ungrouped data
2
(13)
n
and
1 k
s2 =  f j (x j − x )2 for grouped data (14)
n j=1

For reasons that will be explained later the sample variance (with sample size
n) is modified as
1
s2 =  (x i − x )
2
(15)
n −1
and
1 k
s2 = 
n − 1 j=1
f j (x j − x ) 2 (16)

for ungrouped and grouped data, respectively.

Computationally, the following forms for the variances defined above will be more
preferable (since the number of computations will be reduced):
1 n 2 2
s2 =  xi − x (17)
n i =1
and
1 n 2 n 2
s2 =  xi − x (18)
n - 1 i =1 n −1

The variance is a positive quantity and physically it corresponds to mass or area


moment of inertia about the centroidal axis (second moment of mass or area).
The square root of the variance is defined as the standard deviation (s) and has the
same units as that of the data quantities. The standard deviation measures
the spread from the mean values; so small s may correspond to a data set clustered
around the mean but since it has units s value depends on the magnitudes of the
data values and it may not be easy to judge the true variation in the data set. The
absolute variation can be measured by the coefficient of variation.

For the population variance or standard deviation (σ), the deviations are measured

7-7
CE 204 (2022-23 Spring Semester) MSY

from the population mean μ in equations 13, 14 or 17.

- Coefficient of Variation: The unitless quantity, defined as the ratio of the standard
deviation s to that of the mean value is a measure of the absolute variation of the
data and is called the coefficient of variation
s
cov = (19)
x
If the coefficient of variation is small (approximately less than 0.20 or so) the
spread in the data around the mean is small and in that case, the mean can
represent the data more efficiently.

There are also other measures (higher-ordered moments of the variations from the
central values) of variability, some of them are

- Standard Error of the Mean is defined as

s
s. e.( x ) = (20)
n

- Coefficient of Skewness: It is a coefficient that measures the symmetry/asymmetry


of the distribution and defined as
n

 (x i − x )3
g1 = i =1
(for ungrouped data)
ns3
and (21)
k

 f (x
j=1
j j − x )3
g 1= (for grouped data)
ns3

If the coefficient of skewness is positive it implies that the long tail of the
distribution is on the right-hand side, and so on.

- Coefficient of Kurtosis: This coefficient is a measure of the peakedness of the


distribution and defined as

∑𝑛𝑖=1(𝑥𝑖 − 𝑥)4 ∑𝑘𝑗=1 𝑓𝑗 (𝑥𝑗 − 𝑥)4


𝑔2 = (ungrouped data) and g2 = (grouped data) (22)
ns4 ns4

A small coefficient of kurtosis implies that the tail weight of the distribution is small
and data has a peak.

- Covariance: When dealing with pairs of data (X and Y) as (x1 , y1), (x2 , y2),
…, (xn , yn) it is usually necessary to observe how two sets of data vary together.
The measure for the common variation of the sample is given by the sample
covariance sXY.
7-8
CE 204 (2022-23 Spring Semester) MSY

s XY =
1 n

 (x i − x)(yi − y) =
x y i i
−x y
n i =1 n
(23)
or s XY =
1 n

(x i − x)(yi − y) =
x y i i − nx y
n − 1 i =1 n −1

- Correlation Coefficient: The correlation coefficient expresses the strength of linear


relation between X and Y numerically where on scatter diagrams general trend of the
relations may be observed roughly. Negative values refer to negative slopes and
positive values refer to positive slopes for the relation (-1≤ rXY ≤ 1). For a sample
the correlation coefficient rXY

rXY =
s XY 1 n (x − x) (y i − y)
=  i =
x y i i − nx y
(24)
s X s Y n i =1 s X sY  1 2  1 2
 x i − ( n )( x i )   y i − ( n )( y i ) 
2 2

The square of the correlation coefficient gives the degree of tightness for a linear fit of
the data set and is called the coefficient of determination.

• Outliers
Sometimes some extreme values exist in the data, which will highly affect the mean value,
to detect the outliers we may use interquartile ranges or z- scores of the data:

If a data point is more than approximately 1.5 iqr, (where (iqr) is the interquartile range
as defined previously) from an end of an extreme quartile value, the data is considered as
an outlier and if it is approximately 3.0 iqr from the ends then it is an extreme outlier.

MEAN and STANDARD DEVIATIONS ARE GOOD DESCRIPTORS OF THE


SAMPLE IN CASE THERE ARE NO OUTLIERS IN THE DATA.

Z- Score: By definition z- score of a data is the transformed variable, z as


x−x
z= (25)
s
Usually, 68 % of the data must lie in within one standard deviation of the sample
mean and 95% of the data must lie within two standard deviations from the mean. If
values fall outside such ranges; they may be considered as outliers.

When outliers are present in the data one may prefer to use trimmed means to get more
effective means. In the ordered data T% (or outliers) of the observations are removed
from each and then the sample mean of the remaining numbers is calculated. The
resulting mean is the T% trimmed mean and it generally lies between the sample mean and
the sample median.

7-9
CE 204 (2022-23 Spring Semester) MSY

7. 4 EXCERCISES

NOW PLEASE GO OVER THE FOLLOWING TWO EXERCISES AND MAKE THE
NECESSARY CORRECTIONS AND COMMENTS.

Class Exercise 1

Data on Grades of Students: n = 90

45 50 35 95 60 70 55 95 43 65 60 58 75 62 65 90 95 60 75 60 30 100
55 50 60 60 35 60 35 53 60 55 45 85 95 50 55 69 45 45 25 55 43 26
30 21 50 55 17 30 35 25 23 27 20 07 20 55 15 21 13 30 30 38 15 40
50 75 80 80 75 85 40 55 60 55 85 65 65 47 41 28 35 36 25 23 30 40
55 13

- Let us arrange the above raw data in ascending order:

07 13 13 15 15 17 20 20 21 21 23 23 25 25 25 26 27 28 30 30 30 30
30 30 35 35 35 35 35 36 38 40 40 40 41 43 43 45 45 45 45 47 50 50
50 50 50 53 55 55 55 55 55 55 55 55 55 55 58 60 60 60 60 60 60 60
60 60 62 65 65 65 65 69 70 75 75 75 75 80 80 85 85 85 90 95 95 95
95 100

The data can be summarized by a FREQUENCY TABLE using class intervals (which shows how
many data points are around the midpoint value of the interval). Note that in this example we chose
11 groups from 0 to 110 to have a nice interval size (as 10 here) and the smallest and largest values
of the data are included in the first and last intervals, respectively.

Class Class Mark Relative Cumulative


Interval (Mid-point) Frequency Frequency Relative Z-score
Frequency

0-10 5 1 0.011 0.011 -2.063


10-20 15 5 0.056 0.067 -1.618
20-30 25 12 0.133 0.200 -1.173
30-40 35 13 0.144 0.344 -0.727
40-50 45 11 0.122 0.467 -0.282
50-60 55 17 0.189 0.656 0.163
60-70 65 15 0.167 0.822 0.609
70-80 75 5 0.056 0.878 1.054
80-90 85 5 0.056 0.933 1.499
90-100 95 5 0.056 0.989 1.944
100-110 105 1 0.011 1.000 2.390

a) We may use one of the following graphical representations to see the frequency variations
of the data:

7-10
CE 204 (2022-23 Spring Semester) MSY

a.1) DOT PLOT (one student per dot)



• •
• •
• •
• • •
• • • • •
• • • • • • • • •
• • • • • • • • • • • •
• • • •• • • • • •• • • • • • • • •
• • • • • •• ••• • • •• •• •• •• • • • • • • •• • • • • • •
0 10 20 30 40 50 60 70 80 90 100 (grades)

a.2) HISTOGRAM / BAR CHART (and CUMULATIVE FREQUENCY DIAGRAM)

In this example, we draw the following histograms from the data given in the above table.
A histogram or frequency polygon may have the vertical frequency axis as the relative
frequencies. The basic shape of the plots will not change but relative frequencies may give a
better overall measure of the number of occurrences of the grades. The relative frequency
polygons show us the shape of the distribution of the variable in our data set.

Histogram
20
frequency

15
10
5
0
5 15 25 35 45 55 65 75 85 95 105
grades

The cumulative relative frequency diagram of the data (we assume the tabular form of the data
is given) is as follows:

Cumulative Relative Frequency Diagram


Frequency Polygon
1
frequency

20
15 0.8
cum. rel. frequency

10
5 0.6
0
0 10 20 30 40 50 60 70 80 90 100 110 0.4
grades 0.2
0
0 10 20 30 40 50 60 70 80 90 100
grades

7-11
CE 204 (2022-23 Spring Semester) MSY

a.3) STEM and LEAF DIAGRAM


The stem is a column of numbers and the leaves are to show the weights of the data.

Stem Leaves
0 7
10 3 3 5 5 7
20 0 0 1 1 3 3 5 5 5 6 7 8
30 0 0 0 0 0 0 5 5 5 5 5 6 8
40 0 0 0 1 3 3 5 5 5 5 7
50 0 0 0 0 0 3 5 5 5 5 5 5 5 5 5 5 8
60 0 0 0 0 0 0 0 0 0 2 5 5 5 5 9
70 0 5 5 5 5
80 0 0 5 5 5
90 0 5 5 5 5
100 0

a.4) BOX DIAGRAM

We divide the ordered data into four quartiles and observe how different the extreme groups
are. Interquartile range is defined as, IQR = Q3 – Q1.
i) Assuming only grouped data is available (that is using the table) Q1 = 35, Q2 = 55
(corresponds to median) and Q3 = 65 ; IQR = 65-35 = 30.

ii) If the original ordered data is available:


• i ≤ (90 +1)x 0.25 → i = 22 → Q0.25 = Q1 = 30 +[(90+1)x 0.25- 22](30-30) = 30
• i ≤ (90 +1)x 0.50 → i = 45 → Q0.50 = Q2 = 50 +[(90+1)x 0.5- 45](50-50) = 50
• i ≤ (90 +1)x 0.75 → i = 68 → Q0.75 = Q3 = 60 +[(90+1)x 0.75- 68](62-60) = 60.5
IQR: 60.5-30=30.5

Note that we don’t have any outliers (small or large) in the SGD.

0
Q1= 35 (30)

Q2= 55 (50)

100 Q3= 65 (60).

b) The numerical descriptors of the data are:

• SAMPLE MEAN (ARITHMETIC AVERAGE OR AVERAGE)

Assuming only grouped data is available


11
 fi x i
1 4620
x= 1
= (1  5 + 5  15 + ... + 1  105 ) = = 51.333
n 90 90
If we use the ordered original data (ungrouped data : x=
 xi =
4464
= 49.6
n 90

7-12
CE 204 (2022-23 Spring Semester) MSY

• MEDIAN: xmed = 55 (From the table). From the ordered data (original, ungrouped data)
x + x 46
median is ~
x = 45  50.
2
• 60th percentile of the data from the ordered, original form: i ≤ (90 +1)x 0.60 → i = 54
Q60 = 55 +[(90+1)x 0.60- 1](55-55) = 55

• MODE: xmod = 55 (From the table). The original data mode is 55 also.

• SAMPLE VARIANCE: s2 for the grouped data


11

f x
j=1
j
2
j
n 2 282050 90
s =
2
− x = − (51.3333) 2 = 504.382 or
n −1 n −1 89 89
11

 f (x
j=1
j j − x)2
44890
s2 = = = 504.382
n −1 89
90
 ( xi − x ) 2
44171.6
If we use the original data: s2 = i =1 = = 496.31 0
n −1 89

• STANDARD DEVIATION: s
s = 22.458 (s= 22.278)

• COEFFICIENT OF VARIATION: cov = 22.458/51.333 = 0.437 (spread is quite large,


though we don’t have any outliers still mean and standard deviation may not be good
statistics).

• COEFFICIENT OF SKEWNESS:
11

 f (x
j=1
j j − x) 3
g1 = = 0.244 ( > 0 it implies the tail of the frequency dist. is longer
ns 3
on the positive side)

• COEFFICIENT OF KURTOSIS:
11

 f (x
j =1
j j − x) 4
g2 = = 2.400 (sufficiently small number implying some
ns 4
peakedness in the distribution)

• STANDARD ERROR OF THE MEAN:


s. e.( x ) = s = 2.367
n

7-13
CE 204 (2022-23 Spring Semester) MSY

• STANDARD z- SCORES: If you observe the z-scores on the table, approximately 75%
of the data lies within one and 95 % lies within two standard deviations from the mean.
• NUMBER OF CLASS INTERVALS: Note that we may decide on the number of class
intervals (n c) as a rule of thumb between 5 and 15 depending on the size of the data or
from n c = n = 90 = 9.49 or nc =1+3.3 log10 n =1+3.3 log10 90=7.45 or
1 1
rn 3 (105 − 5) 90 3
nc = = =7.47, so the choice of nc as 11 is not a bad preference.
2(iqr) 2(65 − 35)

CLASS EXERCISE 2:

Given the following two sets of data {X,Y}:


X: 15.5 15.6 15.1 15.3 15.3 15.0
Y: 12.3 13.3 11.3 19.3 17.3 18.3
• Scatter Plot:

25
20
15
y

10
5
0
14.8 15 15.2 15.4 15.6 15.8
x

• Numerical Descriptors:
x = 15.3 , s 2 x = 0.052 , s x = 0.228 , v x = 0.015
y = 15.3 , s 2 y = 11 .6 , s y = 3.406 , v y = 0.223
Note that the two data sets have the same mean but quite different coefficients of
variation !!!

• COVARIANCE
1 n  x i yi − n x y = - 0.260
s XY =  (x i − x )(yi − y) =
n − 1 i =1 n −1

• CORRELATION COEFFICIENT

s XY − 0.260
rXY = = = −0.335
s X s Y 0.228 * 3.406

(Negative correlation implies x increases as y decreases; correlation coefficient


– 0.335 gives r XY2 = 0.112, which implies that the strength or degree of linear
relationship between X and Y is quite low).
7-14
CE 204 (2022-23 Spring Semester) MSY

Chapter 8
SOME BASIC CONCEPTS OF STATISTICAL INFERENCE

8.1. RANDOM SAMPLING AND SAMPLING DISTRIBUTIONS

Population: A population consists of the “whole” of observations or events that we are concerned. The
probability distribution of the random variable is determined based on the observations sampled from
the population.

Sample: A sample is a subset of observations or events that are selected from the population. The
random variables {x1, x2, …, x n} form a simple random sample of size n if all xi’s are statistically
independent random variables with the same probability distribution.

Statistic: Any function of the observations in a random sample is called a statistic. The probability
distribution of a statistic is called a sampling distribution. For example, the probability distribution of
̅ is the sampling distribution of the mean.
𝐗

Statistical Inference: Once the probability distributions related to events are known, one can identify
their uncertainties numerically (probability measure). The estimated probabilities are functions of the
parameter or parameters of the probability distribution (e.g. μ, σ for the normal and ν for the Poisson
distribution, λ, ξ for the lognormal distribution, etc.) These parameters are estimated from observational
data and are used to make estimations and generalizations on population characteristics by statistical
inference techniques. These techniques are either in the form of point or interval estimation of the
parameters or in the hypothesis testing form, both of which are based on the sample data. In short,
statistical inference can be defined as: the estimation of the statistical characteristics of a population
by using the sample data obtained from that population.

8.2. ESTIMATION

8.2.1. Point Estimation

Point Estimate: Single numerical value θ̂ (theta hat) obtained from a random sample to estimate the
population parameter θ is called a point estimate. Point estimators must have some desirable
properties such as unbiasedness, efficiency, consistency and sufficiency. These desirable properties are
briefly explained in the following.

Unbiasedness: The point estimator, θ̂, is unbiased if E(θ̂) = θ.

̅ is an unbiased estimator of μ, whereas the mean squared deviation (MSD) defined as


For example, 𝐗
1
̅)2 is a biased estimator of σ2. On the other hand, the sample variance (s2)
MSD = n ∑ni(xi ‒ X

1
̅)2 is an unbiased estimator of σ2.
s2 = n−1 ∑ni(xi ‒ X

Efficiency: Among the unbiased point estimators of θ the one with minimum variance is called the
most efficient estimator of θ. Note that efficiency is inversely proportional to the sample size. ̅
X is an
efficient estimator of µ.

Consistency: Consistent point estimators should satisfy the following probabilistic requirement:
Pr [| θ̂ ‒ θ| < ε] → 1 as n → ∞ where ε is a small number.

8-1
CE 204 (2022-23 Spring Semester) MSY

σX
̅ is a consistent estimator of μ, since lim σX̅ = lim
For example 𝐗 →0
n→∞ n→∞ √n

Sufficiency: Let {x1, x2,…, xn} be a simple random sample with n observations from a population having
a probability distribution with an unknown parameter θ. Then any statistic T= f (x1, x2,…, xn) is said
to be a sufficient statistic for the estimation of θ, if the joint distribution of {x1, x2,…, xn} conditional
on the statistic T is independent of θ.

Standard Error: The standard error of an estimator θ̂ is defined as its standard deviation:

σθ̂ = √VAR(θ̂).

Here, VAR(.) stands for variance.

̅ is:
The standard error of the sample mean 𝐗
σ
σx = for n  0.1 N .
n
If the population size N is finite and n  0.1 N then
σ N-n
σx =
n N -1

̅ can be estimated based on the


or if σ is unknown, then the standard error of the sample mean X
sample standard deviation, s, as follows:
s
σx = .
n
Note that the constraint: (n ≤ 0.1 N) corresponds to (n << N), i.e. sample size n is much smaller than
the population size N; and (n ≥ 0.1 N) vice versa.

Mean Square Error (MSE): The mean square error associated with an estimator θ̂ of the parameter
θ, is defined as MSE (θ̂) = E [(𝛉
̂ ‒ θ)2].

Central Limit Theorem: Consider a population having mean μ and standard deviation σ. Let X ̅ be the
mean of n statistically independent random observations taken from this population. Then as n → ∞,
the sampling distribution of 𝐗̅ tends to a normal distribution with mean μ and standard deviation,
σ
σx = (or loosely, ∑xi, approaches to a normal distribution as n → ∞). So, the statistic
n
x −μ
z=
σ
n
̅ is a
approaches to standard normal distribution or in other words the sampling distribution of 𝐗
normal distribution if σ is known.

Corollary: The probability distribution of the number of successes x, of a binomial distribution with
parameters, n and p, given below:

n
p(x/n, p) =   p x (1 − p) n − x
x

8-2
CE 204 (2022-23 Spring Semester) MSY

tends to a normal distribution with mean, np and standard deviation, np(1 − p) . Accordingly, the
x−np
statistic, z = or the proportion p of successes (or failures) in a binomial distribution can
√np (1−p)
p̂ − p
be described by a normal distribution with statistic z = .
p̂(1 - p̂)/n

8.2.2. Interval Estimation: Confidence Intervals

Interval Estimation: In interval estimation, instead of using a single value to estimate θ, we construct
a range, which includes θ with a certain probability level to express the degree of uncertainty associated
with the point estimate.

Confidence Intervals: A 100 (1 ‒ α ) percent confidence interval (level of confidence) for the parameter
θ is an interval of any one of the forms (that will be given below) introduced to express the degree of
confidence (belief) in terms of α where 0 < α < 1. The confidence interval will actually include θ values
that correspond to the values obtained from almost 100 (1 ‒ α) percent of the samples taken from the
same population. In the following interval definitions, the parameter, θ is an unknown constant but the
estimator 𝛉 ̂ is a random variable.

a) Two-sided confidence interval for the parameter θ:

Pr [L2 < θ < U2] = 1 ‒ α

where, L2 and U2 are the lower and upper confidence limits, respectively (for the two-sided confidence
interval).

(1 − α)

α/2 α/2
0 U2 θ
L2

Figure 8.1 Two-sided confidence interval for the parameter θ

b) Lower one-sided confidence interval for the parameter θ:

Pr [L1 < θ] = 1 ‒ α

where, L1 is the lower confidence limit (for the lower one-sided confidence interval).

(1 − α)

α
0 θ
L1
Figure 8.2 Lower one-sided confidence limit for the parameter θ

8-3
CE 204 (2022-23 Spring Semester) MSY

c) Upper one-sided confidence interval for the parameter θ:

Pr [θ < U1] = 1 ‒ α

where, U1 is the upper confidence limit (for the upper one-sided confidence interval).

(1 − α)

α
0 U1 θ

Fig. 8.3 Upper one-sided confidence limit for the parameter θ

8.3. SAMPLING DISTRIBUTIONS FOR SELECTED STATISTICS

The following are the sampling distributions that are used for the sample mean, sample variance, sample
proportion and for the goodness of fit test.

x −μ
a) If θ = μ, we use standard normal distribution with the statistic z = for cases where,
σ
n
σ of the population is known.

x −μ
b) If θ = μ, we use standard normal distribution with the statistic z = for cases where,
s
n

σ of the population is unknown but sample size n is sufficiently large (n ≥ 30). Here, s denotes
the sample standard deviation

x −μ
c) If θ = μ, we use student’s t distribution with the statistic t =
s with (n‒1) degrees of
n

freedom (d.f. or r or ν), where σ of the population is unknown and n < 30. The degrees of
freedom equal to n ‒ 1. The simple explanation for this is: from n observations x1 , x2 ,…, xn
we use “one degree of freedom” to compute ̅ X , leaving (n1) independent observations.

(𝐧−𝟏) 𝐬 𝟐
d) If θ = σ, we use Chi-square (χ2) distribution with the statistic χ2 = with (n – 1)
𝛔𝟐
degrees of freedom.

e) If θ = p, where p is the probability of occurrence of an event in a trial (parameter of the Binomial


distribution; probability of success), or in other words, the estimate 𝐩 ̂ of p is the proportion of

8-4
CE 204 (2022-23 Spring Semester) MSY

p̂ − p
occurrences among a sequence of n trials and the statistic z = has a normal
p̂(1 − p̂)/n
distribution when n is sufficiently large.

x1 -x 2 -(μ1 -μ 2 )
f) If θ = μ1 ‒ μ2 , the statistic t = has a t distribution with n1 + n2 – 2
s 21 2
s 2
+
n1 n2

degrees of freedom if X1 and X2 are independent.

g) If the goodness of fit of a probability distribution model to the observed data is to be


(𝐟𝐨𝐢 − 𝐟𝐞𝐢 )𝟐
tested : the statistic 𝛘𝟐 = ∑𝒌𝟏 has a Chi-square (χ2) distribution with degrees of
𝐟𝐞𝐢
freedom r = k – u – 1, where k is the number of independent outcomes or number of groups
and u is the number of known parameters computed from the sample data. More information
on the Chi-square goodness of fit test will be provided later.

The above information is summarized in the following table:

8.4. CONFIDENCE INTERVALS FOR SELECTED POPULATION PARAMETERS

8.4.1. Confidence Interval for the Population Mean, µ (σ known and population is normally
distributed)

A (1 ‒ α) 100 percent confidence interval for µ is:


σ
̅ ± 𝐳𝛂/𝟐
µ: X
√n

where, ̅ X : sample mean; n: sample size; σ: standard deviation of the population. If σ is unknown
but the sample size n ≥ 30 assume σ ≅ s, where s is the standard deviation of the sample.

Additional Comments:

̅ is used as an estimator of µ we can be (1 ‒ α) 100 percent confident that the error, e, will be
i) If X
σ
less than zα/2 .
√n

̅ is used as an estimator of µ we can be (1 ‒ α) 100 percent confident that the error will be
ii) If X
zα/2 σ 2
less than a specified amount e*, when the sample size n ≥ [ ] .
e∗

8-5
CE 204 (2022-23 Spring Semester) MSY

Table 8.1 Statistical Parameters and Governing Distributions


Degrees of
Parameter Test Statistics Sampling Distribution
Freedom
x − μo
zc r =
μ σ Standard Normal Distribution
n
x − μo Standard Normal
z cr =
μ s Distribution,
n σ is unknown and n > 30*
x −μo
tcr =
μ s n‒1 Student’s t Distribution,
n σ is unknown and n < 30

(n−1) s2
σ 2 χ2 = n‒1 Chi-Square Distribution
σ2

x1 -x 2 -(μ1 -μ 2 )
tcr =
μ1 ‒ μ2 s 21 s 2 2 n 1 + n2 ‒ 2 Student’s t Distribution
+
n1 n 2

x: total x−np
zcr =
number of ̂ (1−p
√np ̂)
successes; p̂ − p Standard Normal Distribution
p: success =
ratio p̂(1 − p̂)/n

(foi ‒ fei )2
Goodness of Fit χ2 = ∑k1 k‒u‒1 Chi-Square Distribution
fei

(*) In the book, instead of n > 30, n > 50 is given. However, please use the n > 30 recommendation.

Given the above information, the following two-sided confidence intervals for various population
parameters can be stated:

8.4.2. Confidence Interval for the Population Mean, µ (σ unknown, n < 30 and population is
normally distributed)

A (1 ‒ α) 100 percent confidence interval for µ is:


s
µ: ̅
X ± 𝐭 𝛂/𝟐,𝛎
√n

where, 𝐗̅ : sample mean; n: sample size; s: standard deviation of the sample; ν: degrees of freedom
and equals to (n ‒ 1) in this case; 𝐭 𝛂/𝟐,𝛎: value of the t distribution with ν = n ‒ 1 degrees of freedom
leaving an area of α/2 on the right tail of the distribution. The t values are to be obtained from the
t-table based on the values of α/2 and the degrees of freedom (i.e. ν = n ‒ 1).

8-6
CE 204 (2022-23 Spring Semester) MSY

8.4.3. Confidence Interval for the Difference of Two Population Means (µ1 ‒ µ2)
(σ1 and σ2 are known or n1 ≥ 30 and n2 ≥ 30 and the population is normally distributed)

A (1 ‒ α) 100 percent confidence interval for (µ1 ‒ µ2) is:

σ21 σ22
X 2 ) ± zα/2 √
̅1 − ̅
(µ1 ‒ µ2): (X +
n1 n2

where, X̅ i and σi: sample mean and population standard deviation corresponding to the ith
population, respectively; ni: sample size of the sample obtained from the ith population.

8.4.4. Confidence Interval for the Difference of Two Population Means (µ1 ‒ µ2)
(σ1 and σ2 are unknown and n1 < 30 and n2 < 30 and the population is normally
distributed)

A (1 ‒ α) 100 percent confidence interval for (µ1 ‒ µ2) is:

1 1
̅1 − ̅
(µ1 ‒ µ2): (X X 2 ) ± t α/2,ν sp √n +
1 n2
where, ̅ X i and σi: sample mean and population standard deviation corresponding to the ith
population, respectively; ni: sample size of the sample obtained from the ith population; ν = degrees
of freedom = n1 + n2 – 2; s2p : pooled variance expressed as follows:

(n1 −1) s21 + (n2 −1) s22


sp2 =
n1 + n2 −2

Note that the pooled variance is obtained by combining (pooling) the two sample data.

In cases (b) and (d), where the population variances are unknown and the sample sizes are less than 30,
sample variances, s12’s are used instead of population variances, σ2i ’s. To compensate for this
assumption, which creates additional uncertainty, the zα/2 values are replaced by the t α/2,ν values,
which yield larger confidence intervals.

8.4.5. Confidence Interval for the Variance (𝛔𝟐 ) of a Normal Population

Definition: If s 2 is the variance of a random sample of size n taken from a normal population having a
(𝐧−𝟏) 𝐬 𝟐
variance denoted by σ2 , then 𝛘𝟐 = is the value of a Chi-square (𝛘𝟐 ) distribution having
𝛔𝟐
ν = n ‒ 1 degrees of freedom. Accordingly,

A (1 ‒ α) 100 percent confidence interval for 𝛔𝟐 of a normal population is:

(n−1) s2 (n−1) s2
≤ σ2 ≤
χ2α/2.ν χ2(1−α/2),ν

where, 𝐬 𝟐 = sample variance; n: sample size; 𝛎: degrees of freedom = n – 1; 𝛘𝟐𝛂/𝟐.𝛎: χ2 value leaving an
area of α/2 to the right; 𝛘𝟐(𝟏−𝛂/𝟐),𝛎: χ2 value leaving an area of (1 − α/2), to the right. These χ2 values
are to be obtained from the χ2 tables. Depending on the type of table the χ2α/2.ν and χ2(1−α/2),νvalues

8-7
CE 204 (2022-23 Spring Semester) MSY

may be interchanged. The key point here is: Among the two 𝜒 2 values the larger one should be placed
in the denominator of the lower bound of the confidence interval.

8.4.6 Confidence Interval for the Proportion (p)

̂ sample proportion.
Let p denote population proportion and 𝐩

A (1 ‒ α) 100 percent confidence interval for p is (Large sample size):

p (1−p)
p: p̂ ± 𝐳𝛂/𝟐 √
n

and

A (1 ‒ α) 100 percent confidence interval for p is (Small sample size, (n < 30) ):

̂ (1 − p
p ̂)
p: p̂ ± 𝐭 𝛂/𝟐 , 𝝂 √
n

For this case, a strictly conservative (1 ‒ α) 100 percent confidence interval for p is obtained by
̂ (1 − p
p ̂)
setting p̂ = 1/2, which is the value that maximizes the term: √ . Accordingly, this strictly
n
conservative confidence interval becomes:
1
p: p̂ ± zα/2 √
4n

8.5. METHOD OF MAXIMUM LIKELIHOOD ESTIMATION

This is quite a popular method in statistics used for point estimation. Brief information is provided in
the following and a more detailed explanation, together with an example, is given in the Appendix.

Maximum Likelihood Function (L): If {x1, x2,…, xn} are n statistically random observations from
the same population with pdf, f (x1, x2, …, xn ; θ1, θ2,…, θk) for k estimators , then the joint probability
density function of all these n independent observations is the following likelihood function, L:

L(x1, x2, …, xn; θ1, θ2 ,…, θk) = f (x1, θ1, θ2,…, θk) f (x2 , θ1, θ2,…, θk) … f (xn, θ1, θ2,…, θk)
n
=  f (xi, θ1, θ2,…, θk)
i =1

Maximum Likelihood Estimator: The maximum likelihood point estimators, θ̂j’s are such that they
maximize the likelihood function L(xi, θ1, θ2 ,…, θk).

8-8
CE 204 (2022-23 Spring Semester) MSY

8.6. SOLVED EXAMPLES

̅ is an unbiased estimator of µ.
Example 8.1. Check whether X

Solution:

̅) = E[1 (X1 + X2 + …+ Xn)]


E(X
n

1
= [E(X1) + E(X2) + …+ E(Xn)]
n

1 1
= [µ + µ +…+ µ] = n µ = µ
n n

̅) = µ
E(X

̅ is an unbiased estimator of µ.
Therefore, we can conclude that X

Example 8.2. A sample of size n = 2 is taken from a population with mean, µ and standard deviation,
σ. Let these two observations be denoted by X1 and X2. Three estimators defined below are used to
estimate µ.

1 1 2
θ̂1 = (X1 + X2) θ̂2 = X1 θ̂3 = 3 X1 + 3 X2
2

Which estimator is the most efficient?

Solution:

All three estimators satisfy the unbiasedness requirement and their variances are as follows:

1 1 1
VAR(θ̂1 ) = [VAR(X1) + VAR(X2)] = (σ2 + σ2) = σ2
4 4 2

VAR(θ̂2 ) = VAR(X1) = σ2

1 4 1 4 5
VAR(θ̂3 ) = VAR(X1) + VAR(X2) = σ2 + σ2 = 9 σ2
9 9 9 9

θ̂1 has the smallest variance. Therefore it is the most efficient estimator. Please note that, θ̂1 is
actually the sample mean.

Example 8.3. The GPA’s of students taking the CE 204 course, X, are estimated to follow a normal
distribution. From a random sample of size n = 36, ̅ X and s are computed as 2.6 and 0.30,
respectively.

a) Obtain the 95% and 99% confidence intervals for the population mean, µ.

b) What is the minimum sample size if the error (i.e. difference between the estimate and the true
value) is at most 0.05 with 98% confidence.

Solution:

a) For 95% confidence level α = 0.05. From the standard normal distribution table

8-9
CE 204 (2022-23 Spring Semester) MSY

z0.05/2 = z0.025 = 1.96. Accordingly, the 95% confidence interval is:


0.30
µ: 2.6 ± 1.96 → µ: 2.6 ± 0.098
√36

The 95% confidence interval is: 2.5 ≤ µ ≤ 2.7

Note that since σ is unknown, we assumed 𝛔 = s but still used the z distribution, since
n= 36 > 30.

For 99% confidence interval α = 0.01 and z0.005 = 2.575 and the corresponding confidence interval
is: 2.48 ≤ µ ≤ 2.72. As expected the higher confidence level corresponds to a wider interval.

b) Solution:
zα/2 σ 2
The relationship derived earlier, n ≥ [ ] is to be used. 𝛔 ≅ s = 0.30, e* = 0.05, α = 0.05.
e∗

1.96 𝑥 0.30 2
n ≥[ ] → n ≥ 11.762 = 138.3 ≅ 139
0.05

The minimum sample size is n = 139

Example 8.4. Let X denote the lifetime of batteries, in hours, produced by a certain company. X is
normally distributed. Based on a random sample of size n = 81, X ̅ and s are computed as 600 hours
and 40 hours, respectively. Write down the 95% confidence interval for the mean lifetime of the
population, i.e. for µ.

Solution:

For 95% confidence level α = 0.05. From the standard normal distribution table. z0.05/2 = z0.025 = 1.96.
σ is unknown. Therefore we assume σ ≅ s = 40 hours and still use the z distribution since
n = 81 > 30. Accordingly, the 95% confidence interval is:

40
µ: 600 ± 1.96 → µ: 600 ± 8.71
√81

The 95 % confidence interval is: 591.29 ≤ µ ≤ 608.71

Assume now the sample size n = 5 and you have observed the same sample values, i.e., ̅ X = 600
hours and s = 40 hours. Since σ is unknown, we will assume 𝛔 = s = 40 and use the t value, because
n = 5 < 30.

t α/2,ν = t 0.025.4 = 2.776

40
µ: 600 ± 2.776 → µ: 600 ± 50
√5

The 95 % confidence interval is: 550 ≤ µ ≤ 650

If by mistake the z distribution were used, the 95% confidence interval would be much smaller
as computed below:
40
µ: 600 ± 1.96 → µ: 600 ± 35 → 565 ≤ µ ≤ 635
√5

8-10
CE 204 (2022-23 Spring Semester) MSY

Example 8.5. The sample data obtained from two different populations are summarized below:

̅1 = 76; s1 = 6
Population 1: µ1, 𝜎1 : n1 = 50; X

Population 2: µ2, 𝜎2 : n2 = 75; ̅


X 2 = 82; s2 = 8

Obtain the 96% confidence interval for the difference between two population means (µ1 ‒ µ2).

Solution:

For 96% confidence level α = 0.04. From the standard normal distribution table
z0.04/2 = z0.02 = 2.054. Accordingly, the 96% confidence interval is:

62 82
(µ1 − µ2): (76 − 82) ± 2.054 √ +
50 75

(µ1 − µ2): − 6 ± 2.58 → − 8.58 ≤ (µ𝟏 − µ𝟐 ) ≤ − 3.42

Example 8.6. The sample data obtained from two different populations are summarized as follows:

Population 1: µ1, 𝜎1 : n1 = 4; ̅
X1 = 74; s1 = √132.67

̅ 2 = 60; s2 = √93
Population 2: µ2, 𝜎2 : n2 = 3; X

Obtain the 95% confidence interval for the difference between two population means (µ1 − µ2).

Solution:

Since σ1 and σ2 are unknown, we will assume σ1 = σ2 = 𝛔 = sp.

(4−1) 132.67 + (3−1)93


sp2 =
4+3−2

398+ 186 584


sp2 = = = 116.80
4+3−2 5

sp = 10.81

t 0.025,5 = 2.5706

1 1
(µ1 − µ2): (74 − 60) ± 2.5706 x 10.81 √ +
4 3

1 1
(µ1 − µ2): 14 ± 2.5706 x 10.81 √ +
4 3

(µ1 − µ2): 14 ± 21.16 → − 7.16 ≤ (µ𝟏 − µ𝟐 ) ≤ 𝟑𝟓. 𝟏𝟔

8-11
CE 204 (2022-23 Spring Semester) MSY

Example 8.7. From a normally distributed population, a random sample of size n = 11 is selected.
If the sample variance is computed as s2 = 3.6, construct a 95% confidence interval for the population
variance.

Solution:

The expression to be used in this case is as follows.

(n−1) s2 (n−1) s2
≤ σ2 ≤
χ2α/2.ν χ2(1−α/2),ν

α = 0.05 and ν = 11 – 1 = 10. Accordingly χ20.025.10 = 20.48 and χ20.975,0 = 3.25. Then,

(11−1) 3.6 (11−1) 3.6


20.48
≤ σ2 ≤
3.25
→ 1.76 ≤ 𝛔𝟐 ≤ 11.08

Example 8.8. In a certain district of Istanbul, 500 people are randomly selected and checked for
Coronavirus. For 160 people the test was positive. Write down the 95% confidence interval for the
proportion of Coronavirus carriers for the whole district.

Solution:

n = 500 and x= 160. Therefore, p̂ = 160 / 500 = 0.32 and α = 0.05. Accordingly the 95% confidence
interval for population proportion becomes.

0.32 (1−0.32) 0.32x 0.68


p: 0.32 ± 1.96 √ → p : 0.32 ± 1.96 √
500 500

p: 0.32 ± 0.041 → 0.279 ≤ 𝐩 ≤ 𝟎. 𝟑𝟔𝟏

Example 8.9. A random sample of size n (i.e. X1, X2,…, Xn) is taken from a population
having a parameter denoted by  and with the following probability density function:

fX (x) =  exp(−x) x≤0


= 0 x>0
𝟏
It is also given that E(X) =

a) Write down the likelihood function based on the information given above and find the
maximum likelihood estimator (̂ 𝐌𝐋𝐄 ) of the parameter, .

b) Is the maximum likelihood estimator, (̂ 𝐌𝐋𝐄 ) of the parameter  you have obtained in
part (a), a biased or an unbiased estimator? Justify and prove your answer showing all the
details of your proof.
c) What is the name of the probability density function given in this problem? If the random
variable T represents time, what physical interpretation can you give to ?

8-12
CE 204 (2022-23 Spring Semester) MSY

Solution:

a) fT (t) =  exp(−t), t≥0


i=n i=n
L(t1, . . . , t n ; ) = ∏  e−ti = n exp(− ∑ ti)
i=1 i=1

Take the natural logarithm of both sides,

ln[L(t1, . . . , t n ; )] = n ln() + (− ∑i=n


i=1 t i )

𝜕 ln[L(t1, …, tn ; )]
= n −1 − ∑i=n
i=1 t i = 0
𝜕

n 1
̂ MLE = =
∑i=n
i=1 ti
̅
T

Therefore, ̂ MLE is equal to the reciprocal of the sample mean T


̅.

n n n n
b) E(̂ MLE ) = E( )= ( )=[ 1 ]=[ 1 ]=
∑i=n
i=1 ti ∑i=n
i=1 E(ti ) ∑i=n
i=1 ( ) n( )
 
Therefore, ̂ MLE is an unbiased estimator of .

c) The name of the probability density function given in this problem is Exponential.  is the mean
rate of occurrence and its reciprocal is the mean recurrence time, time or expected time to the first
occurrence or mean time interval between events.

8-13
CE 204 (2022-23 Spring Semester) MSY

Chapter 9
FITTING PROBABILISTIC MODELS TO OBSERVED DATA

9.1. INTRODUCTION

It is important to identify the appropriate probabilistic model that describes the observed data
best. This can be done through the tools of descriptive statistics by displaying the data in
graphical formats (e.g. histograms, bar charts, etc.) and then visually fitting a distribution to the
resulting plots. In Fig. 9.1, such a case is illustrated for sample data displaying normality. This
method is simple but quite subjective. Other simple tools are probability papers, P-P and Q-Q
plots. Probability papers are available for each type of probability distribution. To give an idea
the probability paper for normal distribution is shown in Fig. 9.2. If the empirical cumulative
distribution function derived from the observed data, when plotted on the corresponding
probability paper follows a linear trend, then the selected distribution is acceptable. The same
procedure can be implemented by using the P-P and Q-Q plot tools that are readily available in
most of the statistical software packages. The sample data displayed in Fig. 9.1 is transformed
into a Q-Q plot (Fig. 133). The resulting straight line indicates that the data does fit a normal
probability plot and hence the sample comes from a Normal distribution. More detailed
information on these methods can be found in Chapter 7 of the reference book by Ang and
Tang (2007). Here we will concentrate on the most popular statistical procedure, namely: The
Chi-Square Goodness of Fit Test. A nonparametric method, called the Kolmogorov-Smirnov
Test, will also be presented briefly.

Figure 9.1 Histogram and the approximating Normal distribution

9.2. CHI-SQUARE GOODNESS OF FIT TEST


This test requires n observations from an unknown probability distribution. The n observations
will be given either by k independent outcomes or by k class intervals (or by bar charts or
histograms); let 𝐟𝐨𝐢 be the observed frequency of the i’th outcome or i’th class interval. Let the
expected frequency computed from the hypothesized (theoretical) probability distribution of
the corresponding i’th observation be 𝐟𝐞𝐢 ; then the test statistic:

9-1
CE 204 (2022-23 Spring Semester) MSY

(𝐟𝐨𝐢 − 𝐟𝐞𝐢 )𝟐
𝛘𝟐 = ∑𝐤𝟏 𝐟𝐞𝐢

can approximately be described by a chi-square distribution with ν = k - r -1 degrees of freedom,


where, r is the number of parameters of the theoretical distribution estimated from the sample
data. The hypothesis claiming the validity of the proposed probabilistic distribution will be
rejected if the value calculated from the sample data for the test statistic, 𝛘𝟐𝐜 , is greater than the
table value at a significance level of  (i.e. 𝛘𝟐𝐜  𝟐𝛂,(𝐤−𝐫 −𝟏) ) .

Figure 9.2 Normal probability paper

Example 9.1
As an example, assume you are planning to play backgammon with your friend at his home.
Since you do not trust him very much you want to be sure that the dice are not loaded. For this
purpose, you randomly select a die and roll it 120 times. Actually, you are taking a sample of
size n = 120. The outcome of 120 rolls is given in Table 9.1.

Table 9.1 The outcome of 120 rolls of the selected die


Observed frequency Expected frequency (𝐟𝐨𝐢 − 𝐟𝐞𝐢 )𝟐
The face of the die
(𝐟𝐨𝐢 ) (𝐟𝐞𝐢 ) 𝐟𝐞𝐢
1 20 20 0
2 22 20 0.20
3 17 20 0.45
4 18 20 0.20
5 19 20 0.05
6 24 20 0.80
𝛘𝟐 = 1.70

9-2
CE 204 (2022-23 Spring Semester) MSY

The sum of the values given in the last column is the appropriate statistic to check the fairness
of the die. At this point, we state the chi-square goodness of fit test more formally through the
following theorem.

Figure 9.3 Q-Q plot for the sample data described by the histogram given in Fig. 1

Theorem: A chi-square goodness of fit test between observed and expected frequencies is
based on the following 𝛘𝟐 test statistic:

(𝐟𝐨𝐢 − 𝐟𝐞𝐢 )𝟐
𝛘𝟐 = ∑𝒌𝟏 𝐟𝐞𝐢

where, 𝛘𝟐 is a value of the random variable whose sampling distribution is approximately chi-
square. Note that the chi-square goodness of fit test is a right-tailed test and for a small 𝛘𝟐 value
the fit will be good (accept the null hypothesis, H0) whereas for a large 𝛘𝟐 value the fit will be
poor (reject the null hypothesis, H0). In selecting the 𝛘𝟐 value, besides the significance level, α,
the degrees of freedom, ν is needed. Here, the degrees of freedom will be computed from the
following relationship:

ν = k –1 – r

where, k: number of cells, r: number of parameters computed based on the sample data. The
subtraction of 1 result from the fact that 1 degree of freedom is lost due to ∑𝐤𝟏 𝐟𝐨𝐢 = ∑𝐤𝟏 𝐟𝐞𝐢 . For
different distributions the appropriate degrees of freedom values are given below:

i) Poisson: If λ is given, ν = k – 1; if not given and computed from the sample data,
ν = k – 1 – 1 = k –2.

ii) Binomial: If p is given, ν = k – 1; if not given and computed from the sample data,
ν = k – 1 – 1 = k –2.
9-3
CE 204 (2022-23 Spring Semester) MSY

iii) Normal: If μ and σ are given, ν = k – 1; if not given and computed from the sample data,
ν = k – 1 – 2 = k – 3.

The following are the basic steps to be followed in conducting the chi-square goodness of fit
test:

i) State the null hypothesis, H0, i.e. state the type of distribution proposed.

ii) Select the significance level, α. Generally α = 0.01, 0.05, 0.10.

(𝐟𝐨𝐢 − 𝐟𝐞𝐢 )𝟐
iii) Compute the 𝛘𝟐 = ∑𝒌𝟏 value. Denote this computed value by, 𝛘𝟐𝐜 .
𝐟𝐞𝐢

iv) Compute ν. Based on this computed value of ν and the selected α value, obtain, 𝛘𝟐𝛂,𝛎 from
the 𝛘𝟐 distribution table. Denote this table value by, 𝛘𝟐𝐭 .

v) If 𝛘𝟐𝐜 > 𝛘𝟐𝐭 , then reject H0, otherwise accept H0.

In the application of the chi-square goodness of fit test, it is recommended that the total number
of observations be greater than 50 for a good approximation of the chi-square distribution. Also
for each cell both observed and expected frequencies should be at least 5; if not make grouping
by combining the neighbouring cells.

Now, by applying these steps to Example 9.1 described above, we obtain:

i) H0: The underlying distribution is uniform, with pi = 1/6, for i = 1, 2,…, 6 (i.e. Die is fair).

ii) Select significance level as, α = 0.05.

iii) 𝛘𝟐𝐜 = 1.70 (computed in Table 9.1).

iv) Since, pi’s are given and the number of cells, k = 6, ν = 6 – 1= 5. Accordingly, from the
𝛘𝟐 distribution table we obtain, 𝛘𝟐𝟎.𝟎𝟓,𝟓 = 11.07.

v) Check whether 𝛘𝟐𝐜 > 𝛘𝟐𝐭 →1.70 < 11.07. Therefore, accept H0. The distribution is uniform.

We conclude that the die is fair. The graphical description of the test is shown in Fig. 9.4.

Within the context of this example, it will be illustrative to explain the meaning of the
significance level, α, which is the probability of committing a Type I error. Type I error is
defined as the probability of rejecting H0 when H0 is true. In other words, the probability of
rejecting the hypothesis that the distribution is uniform (die is fair) when it is actually uniform
(die is fair) is assigned quite a low value of 5%. In this way, you are trying to avoid an unjust
evaluation of your friend by keeping the probability of blaming him for using a loaded die
although the die is fair.

9-4
CE 204 (2022-23 Spring Semester) MSY

Significance Level α = 0.05


Acceptance Region Rejection Region

𝝌𝟐𝟎.𝟎𝟓,𝟓 = 11.07

Figure 9.4 Shape of the 𝛘𝟐 distribution for Example 9.1

Example 9.2

Consider the sample data given in Fig. 9.5, in the form of a bar chart, for the number of defective
items (X) observed in a sample of size, n = 60. Is it possible to claim that the underlying
population has a Poisson distribution (with λ=0.85 computed from the sample data) at a 5%
significance level?

Bar Chart

30

25
Frequency

20

15

10

0
0 1 2 3

No. of Defected
Number Items,
of Defective X
Items

Figure 9.5 Bar Chart for Example 9.2

Solution:

Again following the steps outlined earlier,

i) H0: The underlying distribution is Poisson with λ = 0.85.


ii) α = 0.05.

9-5
CE 204 (2022-23 Spring Semester) MSY

Table 9.2 Sample Data and Computations for Example 9.2


Sample Data and Summary of Computations Computation
of 2

Number of Computed probabilities from


Observed Expected (𝐟𝐨𝐢 − 𝐟𝐞𝐢 )𝟐
defective the Poisson distribution
Frequency Frequency
items (with λ = 0.85) 𝐟𝐞𝐢
(𝐟𝐨𝐢 ) (𝐟𝐞𝒊 ) = 60 * pX(xi)
(xi) pX(xi)

0 28 0.4274 25.6449 0.2163


1 17 0.3633 21.7982 1.0562
2 11 0.1544 9.2642 0.3252
3 (or more) 4 0.0437 2.6249 0.7204
Total 60 ~ 60 2.3181

iii) Based on the computations summarized in Table 2, 𝛘𝟐𝐜 = 2.3181.

iv) Since λ is also computed from the sample data and the cell size k = 4, ν = 4 – 1 – 1 = 2.
Accordingly, from the 𝛘𝟐 distribution table we obtain, 𝛘𝟐𝐭 = 𝛘𝟐𝟎.𝟎𝟓,𝟐 = 5.99.

v) Since 2.3181 ≤ 5.99, the underlying probability model can be adopted as the Poisson
distribution at α =5% significance level. Therefore, accept H0 and conclude that the distribution
is Poisson.

Example 9.3

A sociologist studying various aspects of the personal lives of families living in Turkish villages
had a sample consisting of 150 families having four children. The distribution (i.e. observed
frequencies) of the number of girls in those families is summarized in Table 9.3a. From the
given data probability of occurrence of the event (in this case, having a girl), p, is computed as
0.32.

Table 9.3a Sample data for Example 9.3


X Observed
Number of girls frequencies
0 30
1 62
2 46
21 3 10
4 2
Total = 150

a) Find the expected frequencies of X (i.e., number of girls) assuming a binomial distribution
with p=0.32, Round the expected frequencies you computed to the nearest integer.

9-6
CE 204 (2022-23 Spring Semester) MSY

b) Perform chi-square goodness of fit test to check whether the Binomial distribution is a good
fit to the given data at the α = 5% significance level. Assume that p is not given but computed
from the data as stated above.

Solution:
𝟒!
a) P(X=x) = 0.32x 0.684−x
(𝟒−𝒙)!𝒙!
p(0) = 0.2138 fe = 0.2138 x 150 = 32.07 ≅ 32

p(1) = 0.4025 fe = 0.4025 x 150 = 60.37 ≅ 60

p(2) = 0.2841 fe = 0.2841 x 150 = 42.61 ≅ 43

p(3) = 0.0891 fe = 0.2841 x 150 = 13.37 ≅ 13

p(4) = 0.0105 fe = 0.0105 x 150 = 1.57 ≅ 2

Table 9.3b Computation of the 𝛘𝟐 value for Example 9.3


X Observed Expected (𝐟𝐨𝐢 − 𝐟𝐞𝐢 )𝟐
(f0 - fe)2
Number of girls frequencies (𝐟𝐨𝐢 ) frequencies (𝐟𝐞𝒊 ) 𝐟𝐞𝐢

0 30 32 4 0.125
1 62 60 4 0.067
2 46 43 9 0.209
3+4 10 +2 = 12 13 +2 =15 9 0.600
4 2 2 0 0
Total = 150 Total = 150 1.001

b) Combine cells 3 and 4 so that frequencies are ≥ 5 (Grouping).

𝛘𝟐𝐜 = 1.001

ν = 4 – 1 − 1= 2

α = 0.05

𝛘𝟐𝐭𝐚𝐛𝐥𝐞 = 𝛘𝟐𝟎.𝟎𝟓,𝟐 = 5.991

𝛘𝟐𝐜 = 1.001 < 5.991.

Therefore: Accept H0 at α =0.05 significance level, where, H0: X is binomially distributed with
p = 0.32.

9-7
CE 204 (2022-23 Spring Semester) MSY

Example 9.4

Data obtained based on a sample size of n = 100 for the random variable, X, is summarized in
the first three columns of Table 9.4, below. Check the claim at α = 1% significance level that
the underlying population has a normal distribution.

Table 9.4 Sample Data and Computations for Example 9.4


Computation
Original Data Computations
of 2
zi- Theoretical Expected
Class Observed (𝐟𝐨𝐢 − 𝐟𝐞𝐢 )𝟐
Class values: Probability: Frequency
Mark Freq. 𝐟𝐨𝐢 xim (xi,m-51.8)2 x i,m -x 𝐟𝐞𝒊 : 𝐟𝐞𝐢
Interval P(l < z <
(xi,m) (𝐟𝐨𝐢 )
s u)* n ˖ PX(xi)

0-20 10 9 90 15725.2 -1.8225 0.034 3.4 9.2235

20-40 30 22 660 10455.3 -0.9505 0.137 13.7 5.0285

40-60 50 32 1600 103.7 -0.0785 0.3 30 0.1333

60-80 70 25 1750 8281.0 0.7936 0.328 32.8 1.8549

80-100 90 12 1080 17510.9 1.6656 0.153 15.3 0.7118


5180 52076 𝛘𝟐𝐜 = 16.9512
̅
(𝐗= (s2=52076/
Sum - n=100 5180/100 99
= 51.8) =526.02)
s = 22.94
(*) l and u are the lower and upper bounds of the interval, respectively.

For this example, k = 5, μ and σ are computed from the data; therefore the degrees of freedom
is: ν = k-r-1= 5-2-1 = 2 and α = 0.01. The corresponding table value, denoted by, 𝛘𝟐𝟎.𝟎𝟏,𝟐 is:
9.21. Since 16.95 > 9.21, the claim that the probability distribution is Normal will be rejected
at α = 1% significance level. The distribution is not normal.

9.3. KOLMOGOROV-SMIRNOV (K-S) TEST

This is a non-parametric goodness of fit test. In this test, we compare the observed cumulative
frequency with the cumulative frequency computed from the postulated theoretical distribution.
Assume n observations from an unknown probability distribution are taken. Let 𝐒𝐧 (𝐱 𝐢 ) be the
observed cumulative frequency distribution function and F(xi) be the computed cumulative
probability values obtained from the hypothesized (theoretical) probability distribution. The
steps involved in the implementation of the Kolmogorov-Smirnov test are as follows:

9-8
CE 204 (2022-23 Spring Semester) MSY

i) Arrange the data in increasing order (i.e. from smallest to largest).


ii) Take 𝐒𝐧 (𝐱 𝐢 ) as defined below:

= 𝟎 𝐱 ≤ 𝐱𝟏
𝐤
𝐒𝐧 (𝐱 𝐢 ) = { 𝐧 𝐱 𝐤 ≤ 𝐱 ≤ 𝐱 𝐤+𝟏
= 𝟏 𝐱 ≥ 𝐱𝐧

iii) Compute F(xi) values from the proposed theoretical distribution.


iv) Compute Dn = Max | F(xi) – 𝐒𝐧 (𝐱 𝐢 )|
v) Based on the sample size, n, and the selected significance level, , obtain 𝐃𝐧𝛂 from Table9.5.

If Dn ≤ 𝐃𝐧𝛂 accept the proposed distribution, otherwise reject it.

Table 9.5 Critical values of 𝐃𝐧𝛂 at significance level  for the Kolmogorov-Smirnov test

9-9
CE 204 (2022-23 Spring Semester) MSY

Example 9.5

Answer Example 9.2 based on Kolmogorov-Smirnov (K-S) Test.

The computations are summarized in Table 9.6.

Table 9.6 Computation of the K-S test statistic for the data given in Example 9.2

Number Theoretical Observed


of Observed Theoretical Cumulative Cumulative
defective Frequency PX(xi) Frequency Frequency
|FX (xi )- SX (xi )|
items (𝐟𝐨𝐢 ) (λ = 0.85) Function Function
(xi) FX (xi ) SX (xi )

0 28 0.4274 0.4274 0.4667 0.0393


1 17 0.3633 0.7907 0.7500 0.0407
2 11 0.1544 0.9451 0.9333 0.0118
3 (or
4 0.0437 0.9888 1.0000 0.0112
more)
Total 60

From Table 9.6, Dn = 0.0407. For the Kolmogorov-Smirnov test with n = 60 and α =5%, from
𝟏.𝟑𝟓𝟖𝟏
Table 9.5, 𝐃𝟔𝟎
𝟎.𝟎𝟓 = = 0.175. Since 0.0407 ≤ 0.175, the Poisson distribution can be
√𝟔𝟎

accepted as the probability model at α = 5%, significance level. Both chi-square goodness of fit
and the Kolmogorov-Smirnov tests confirmed the Poisson distribution at a significance level of
α = 5%.

9-10
CE 204 (2022-23 Spring Semester) MSY

Chapter 10
BASIC CONCEPTS OF SIMPLE LINEAR REGRESSION

10.1. INTRODUCTION

Regression analysis between two variables, which are called as the dependent and independent
variables, is carried out basically to achieve the following:

• To determine whether these two variables are related mathematically to each other and if so
what is the degree of relationship between them (quantified by the correlation coefficient).

• If there is a certain degree of correlation between them, then derive an equation expressing the
dependent variable in terms of the independent variable (usually by a straight line fitted to the
scatter diagram) which is called the prediction equation (regression line or prediction line if
the fitted function is a line).

• To apply the derived relationship to predict the value of the dependent variable corresponding
to any given value of the independent variable.

Assume n pairwise observations (xi, yi) are obtained randomly from a normal population. Let
the function describing the relationship between the two random variables X and Y be described
by the following equation:

Y = g(X) (10.1)

Here Y: dependent variable, X: independent (prediction) variable and g(x): prediction function.
If there is only one independent variable and the prediction equation is linear, then it is referred
to as simple linear regression. In the simple linear regression problem, let us express the equation
of the line as:

Y=α+βx (10.2)

where, α = intercept and β = slope of the regression line. Eq. 10.2 is transformed to the following
probabilistic expression:

E (Y|X) = α + βx (10.3)

In this form, we can interpret α and β as the intercept and slope of the population regression
line. The aim here is to derive from the sample data the sample regression line, defined as
follows:

̂ + β̂ x
ŷ = α (10.4)

where, α̂ and β̂ are the estimators of the parameters of the population regression line, α and β,
respectively.

For any Xi value the corresponding Yi value as estimated from the regression line can be written
as follows:

Yi = α + β xi + εi (10.5)

where, εi is the error term defined as:

10-1
CE 204 (2022-23 Spring Semester) MSY

εi = Yi – E(Yi|Xi) (10.6)

Furthermore, εi is assumed to have a normal distribution with mean 0 and variance, σ2 , i.e.

εi: N(0, σ2 ).

The following are the basic assumptions of simple linear regression:


i) The relationship is linear,
ii) The variance, 𝜎 2 is constant along the regression line,
iii) Yi’s are statistically independent,
iv) εi is normally distributed.

10.2. ESTIMATION OF THE REGRESSION LINE

The statistical procedure for finding the best-fitting straight line for a set of points is based on the
principle of least squares, which may be stated as follows: The best-fitting line is the one that
minimizes the sum of squares of deviations of the observed values of y from those predicted. Expressed
mathematically, the aim is to minimize the sum of the squared errors given by:

SSE = ∑ni=1(yi - ŷi )2 (10.7)

̂ + β̂ xi, this quantity can be substituted for ŷ𝑖


Since the predicted value of y corresponding to x = xi, is α
in Eq. 10.7, to obtain the following expression:

̂ + β̂ xi )]2
SSE = ∑ni=1[yi - (α (10.8)

The numerical values of α ̂ and β̂ that minimizes SSE is obtained based on differential calculus, by
̂ and ̂β and setting equal to zero. From the simultaneous solution
differentiating SSE with respect to α
of the resulting two equations, the following formulas are obtained for the least squares estimators of
the slope and the intercept:

SSxy
̂β = and α ̅–βX
̂ = Y ̅ (10.9)
SSxx

where,
(∑n n
i=1 xi )(∑i=1 yi )
SSxy = ∑ni=1(xi – ̅ ̅ ) = ∑ni=1 xi yi ‒
X) (yi – Y (10.10)
n
is the sum of cross products around the respective mean values, and
2
(∑n
i=1 xi )
̅)2 = ∑ni=1 xi2 ‒
SSxx = ∑ni=1(xi – X (10.11)
n
2
(∑n
i=1 𝑦i )
̅)2 = ∑ni=1 yi2 ‒
SSyy = ∑ni=1(yi – Y (10.12)
n

are the sum of squares around the respective mean values.

10-2
CE 204 (2022-23 Spring Semester) MSY

𝟐
The conditional sample variance of Y given X is 𝐬𝐘/𝐗 (also denoted by s2) is used as an estimator of
the population variance, σ2. Here, σ2 measures the variation of y values around the mean line E (Y|X)
2
= α + βx. We estimate σ2 by sY/X , which can be computed from the following relationship:
2 SSE ∑n ̂i )2
i=1(yi − y
sY/X = = (10.13)
n−2 n−2

where,

(SSxy )2
SSE = SSyy – α
̂ SSxy = SSyy – (10.14)
SSxx

To illustrate the implementation of the equations introduced up to now, data on the advertising
expenditures and sales volumes for a certain firm during 10 randomly selected months, which is given
̂, β̂ and sY/X
in Table 10.1 and also displayed in Fig. 10.1, is to be used. Here, α 2
will be computed, the
regression line will be constructed and finally using this regression line the sales volume (Y)
corresponding to an advertising expenditure of (X) $1.0 x 104 in a month will be estimated. All the
necessary computations are carried out as shown in Table 10.2.

̂ and β̂ are obtained from Eq. 10.9 as follows:


The least squares estimators of α

9.4 𝑥 959 9.42


SSxy = 924.8 ‒ = 23.34 and SSxx = 92.8 – = 0.444
10 10

SSxy 23.34
Accordingly, ̂β = = = 52.57
SSxx 0.444

̂ = ̅
α Y–β̅
X = 95.9 – 52.57 x 0.94 = 46.49

The regression line is y = 52.57 x + 46.49

Now using this regression equation it is possible to estimate the sales volume corresponding to an
advertising expenditure of $1.0 x 104 in a month as follows:

Y = 52.57 x 1 + 46.49 = $ 99.06 (in 1x104)

9592
For s2Y/X , first we need to compute SSyy = 93569 ‒ = 1600.9
10

23.342
Then, from Eq.10.14, SSE = 1600.9 - = 373.97
0.444

2 373.97
sY/X = = 46.75
10−2

and

sY/X = √46.75 = 6.84

10-3
CE 204 (2022-23 Spring Semester) MSY

Example 10.1

Table 10.1

Figure 10.1

Table 10.2 Calculations for the Data of Table 10.1

10-4
CE 204 (2022-23 Spring Semester) MSY

10.3. INFERENCES CONCERNING THE SLOPE OF THE REGRESSION LINE (β)

a) Confidence Interval:

We use ̂β to estimate β. ̂β is a statistic like X


̅, thus has a distribution, has a mean value and a standard
σ2
deviation. In fact, ̂β is normally distributed with E(β̂) = β and VAR(β̂) = . In other words,
SSxx

β̂ : N (β, σ/√SSxx).

The slope of the regression line is more important than the intercept, since it is an indicator of the degree
of a linear relationship. In this respect, we can construct confidence intervals and conduct hypothesis
tests for β, as summarized below:

A (1 – α) confidence interval for β is as follows:

σ
β: ̂β ∓ zα/2 (10.15)
√SSxx

If σ is unknown sample standard deviation, sY/X together with the t statistic is to be used as given below:

sY/X
β: ̂β ∓ tα/2,ν (10.16)
√SSxx

̂ and β̂ are estimated


where, ν = degrees of freedom = n – 2. Subtraction of 2 is because both parameters α
from the data.

Example 10.2

Write down the 95% confidence interval for β based on the data given in Example 10.1.

t 0.025,8 = 2.306 (α = 0.05, ν = 10 – 2 = 8)

6.84
β: 52.57 ∓ 2.306
√0.444

β: 52.57 ∓ 23.67

b) Hypothesis Testing:

β = 0 indicates that Y does not depend on X. Thus generally we test the following null hypothesis Ho:

Ho: β = 0 (i.e. No linear relationship between X and Y)

H1: β ≠ 0

Specify a significance level α

10-5
CE 204 (2022-23 Spring Semester) MSY

̂β−0
The test statistic to be computed from the data is: z (or t) = σ (or s (10.17)
Y/X )
√SSxx

Rejection of Ho implies the existence of a linear relationship.

Example 10.3

Consider the data given in Example 10.1 and check the validity of the null hypothesis H0 claiming that
β = 0, at a 5% significance level.

The hypothesis testing problem is restated as follows:

Ho: β = 0

H1: β ≠ 0

α = 0.05

tcr = t 0.025,8 = 2.306

From Eq. 10.17,

52.57− 0
t= 6.84 = 5.12 > 2.306.
√0.444

Therefore, reject H0, concluding that β ≠ 0. This conclusion indicates that there is a significant linear
relationship between X and Y.

10.4. CONFIDENCE INTERVAL ESTIMATE OF THE EXPECTED VALUE OF Y FOR A


GIVEN VALUE OF X = xp [E(Y/xp)]

A (1 – α) confidence interval for E(Y/xp):

1 ̅̅̅2
(Xp − X)
̂ + β̂ xp) ∓ tα/2, n-2 sY/X √ +
E(Y/xp): (α (10.18)
n SSxx

Example 10.4

Write down the 95% confidence interval for E(Y/xp = $ 1x104) based on the data given in Example
10.1.

Utilizing Eq. 10.18 and taking t 0.025,8 = 2.306 and sY/X = 6.84

1 (1 x 104 − 0.94 x 104 )2


E(Y/1x104): (52.57 x 1x104+ 46.49) ∓ 2.306 x 6.84 √ +
10 0.444

99.06 ∓ 5.13 (in $1x104)

10-6
CE 204 (2022-23 Spring Semester) MSY

10.5. PREDICTION INTERVAL ESTIMATE FOR A PREDICTED VALUE OF Y


CORRESPONDING TO A PARTICULAR VALUE OF xp (Y/xp)

A (1 – α) confidence interval for Y/xp:

1 ̅̅̅2
(Xp − X)
̂ + β̂ xp) ∓ tα/2, n-2 sY/X √ +
Y/xp: (α +1 (10.19)
n SSxx

Example 10.5

Write down the 95% prediction interval for Y/xp = $ 1x104 based on the data given in Example 10.1.

Utilizing Eq. 10.19 and taking t 0.025,8 = 2.306 and sY/X = 6.84

1 (1 x 104 − 0.94 x 104 )2


(Y/1x104): (52.57 x 1x104+ 46.49) ∓ 2.306 x 6.84 √ + +1
10 0.444

99.06 ∓ 16.6 (in $1x104)

It is to be noted that the prediction interval corresponding to a particular value is much larger than the
confidence interval corresponding to the expected value for the same level. This is because the
variability in the prediction of a particular value involves more uncertainty than the prediction of the
expected (mean) value. In both cases, the width of the interval (prediction/confidence band) is a
minimum when xp = ̅ X.

10.6. CORRELATION ANALYSIS AND CORRELATION COEFFICIENT

The correlation coefficient of the sample, denoted by R (or r) is an estimator of the population
correlation coefficient (ρ), which is a measure of the degree of linear dependence and bounded by: ±
1, i.e. ‒ 1 ≤ ρ ≤ 1.

The square of R is the coefficient of determination, denoted by R2 and is defined as:

Total Variation Explained by Regression SSR SSyy ‒ SSE SSE


R2 = = = =1‒ (10.20)
Total Variation in Y SSyy SSyy SSyy

where, RSS: regression sum of squares which measures the total variability explained by the
independent variable (due to regression) and when added to the sum of squared errors (SSE) yield to
the sum of squares of y (SSyy) which represents the original total variability in the dependent random
variable Y. Note:

SSR + SSE = SSyy (10.21)

The correlation coefficient R measures the degree of linear dependence, whereas the coefficient of
determination gives the percent of the total variation explained by the independent (predictive) variable
through regression.

10-7
CE 204 (2022-23 Spring Semester) MSY

Example 10.6

Find the coefficient of determination and correlation coefficient associated with the variables involved
in Example 10.1 by using the computed values summarized in Table 10.2.

From Example 10.1, SSyy = 1600.9 and SSE = 373.97. Therefore,

SSR SSE 373.97


From Eq. 10.20: R2 = =1‒ =1‒ = 1 ‒ 0.234 = 0.766
SSyy SSyy 1600.9

Thus we can conclude that 76.6% of the total variability in the total sales volumes (dependent variable)
is explained by (or attributed to) the advertising expenditures. On the other hand, we can quantify the
degree of linear dependence by the correlation coefficient, which equals: R = r = √0.766 = 0.875.

Note that SSR = 1600.9 ‒373.97 = 1226.93.

An alternative way to compute the correlation coefficient is based on the concept of covariance
expressed as follows:

COV (X,Y) SSxy ∑n ̅ ̅


i=1(xi – X) (yi – Y)
R= = = (10.22)
σx σy √(SSxx ) (SSyy ) √∑n ̅ 2 n ̅ 2
i=1(xi – X) ∑i=1(yi – Y)

23.34 23.34 23.34


Using the data given in Example 10.1, R = = = = 0.875;
√0.444 x 1600.9 √710.80 26.66

and R2 = 0.8752 = 0.766, the same answer as before.

For the sake of completeness the following relationships associated with the intercept of the regression
line, α are given:

A (1-α) confidence interval for α is as follows:

1 ̅
X
α: ̂α ∓ zα/2 σ √ + (10.23)
n SSxx

If σ is unknown, the sample standard deviation, 𝐬𝐘/𝐗, together with the t statistic is to be used as given
below:

1 ̅
X
α: ̂α ∓ tα/2,ν sY/X √n + SS (10.24)
xx

Example 10.7 (Exercise Problem with Answers)

Consider the following data involving paired observations (xi, yi):

xi yi
2 1
3 3
5 4
7 7
9 10

10-8
CE 204 (2022-23 Spring Semester) MSY

̂ + β̂ x. What is the value of y if x = 6?


a) Find the best fitting line ŷ = σ
b) Calculate sY/X as an estimator of σ.

c) Test the hypothesis that there is no linear relationship between x and y at α = 0.05 level (i.e. check
the hypothesis H0: β = 0 versus H1: β ≠ 0).
d) Find a 95% confidence interval for β.
e) Find a 95% confidence interval for E(y│x = 6).
f) Give the 95% prediction interval for the particular value of y when x = 6. Compare your result with
part (e) and comment.
g) Find the coefficient of determination and correlation coefficient and interpret your results.

ANSWERS:
a) ŷ = − 1.3415 + 1.2195 x ~ 𝐲̂ = − 1.34 + 1.22 x
When x = 6, 𝐲̂ = 5.98

2
b) sY/X = s2 = 0.4065; sY/X = s = 0.6376 ≅ 0.64

c) sY/X = s = 0.64; Sxx = 32.8; tcr = t(0.025, 3) = 3.1824


1.22 −0
t= 0.64 = 10.92 > 3.1824
( )
√32.8

Reject H0 and conclude that there is a linear relationship between x and y.

d) A 95% confidence interval for β is: 0.87 ≤ 𝛃 ≤ 1.57

1 (6−5.2)2
e) E(y│x = 6): 5.98 ± 3.1824 x 0.64 √( )+
5 32.8

E (y│x = 6): 5.98 ± 3.1824 x 0.2987 = 5.98 ± 0.95 ~ 5.03 ≤ E (y│x = 6) ≤ 6.93

1 (6−5.2)2
f) (y│x = 6): 5.98 ± 3.1824 x 0.64 √1 + ( ) +
5 32.8

(y│x = 6): 5.98 ± 2.25 ~ 3.73 ≤ (y│x = 6) ≤ 8.23

Prediction interval for the actual value of y at x=6 is wider.

Sxy 40.0 40.0


g) Correlation coefficient, R = = = 40.50 = 0.9877 ≅ 99%
√Sxx · Syy √32.8 x 50.0

Correlation coefficient, R is close to 1.0; therefore, there is a strong positive linear relationship between
random variables X and Y.

10-9
CE 204 (2022-23 Spring Semester) MSY

Coefficient of determination, R2 = 0.98772 = 0.9756 ≅ 98%. This indicates that 98% of the total
variability in Y is explained by the random variable X.

Example 10.8 (Exercise Problem with Answers)

Consider the following data on the number of hours which 10 students studied for the CE 204 final
examination and their grades on this examination:

Hours Studied (X) 4 9 10 14 4 7 12 22 1 17


Grade(Y) 31 58 65 73 37 44 60 91 21 84

Based on this data the following (least squares) equation is obtained: Y = 21.69 + 3.47 X

In addition, the following values are computed:

SXX = SSX = 376; SYY = SSY = 4752; SXY = SSXY =1305; SSE = 222.7

a)Test the null hypothesis that the slope of the regression line is equal to 3.0 versus the alternative
hypothesis that it is different than 3.0 at a 0.01 significance level.
(Answer: Accept H0. The slope of the regression line is equal to 3.0)

b) Obtain the 95% prediction interval for a student who has worked for 15 hours.
(Answer: 60.59 ≤ Y ̂15 ≤ 86.89)

c) Compute the values of the coefficient of correlation and coefficient of determination.


(Answers: R = 0.976; R2 = 0.9531)

10-10

You might also like