0% found this document useful (0 votes)

9 views

%5B1%5D+Random+Variables+and+Exploratory+Data+Analysis

The document discusses random phenomena in civil and environmental engineering, emphasizing the variability and unpredictability of certain processes. It introduces key statistical concepts such as random variables, populations, samples, and measures of central tendency and dispersion, which are essential for analyzing data. Additionally, it explains how statistical methods can quantify uncertainty and draw inferences about populations based on sample data.

Uploaded by

Mrs Aamir

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

%5B1%5D+Random+Variables+and+Exploratory+Data+Analysis

Uploaded by

Mrs Aamir

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

RANDOM VARIABLES AND EXPLORATORY DATA ANALYSIS

1 RANDOM PHENOMENA
Many processes that are encountered in civil and environmental engineering disciplines are
subject to chance in that they exhibit substantial variability in time and/or space that cannot be
fully explained by physical laws. Variability means that successive observations of a system do
not produce the same results. Often, when we refer to these phenomena, we use the term random.
The term random is common in geophysical sciences and engineering and it conveys the idea of
the occurrence of a phenomenon that is uncertain. To put it another way, the occurrences of the
phenomena are not predictable with certainty. For instance, the occurrence of speed of vehicles
at a highway location is of a random nature since the outcome of an individual occurrence of
such event cannot be determined with certainty. Likewise, if we refer to the occurrence of
streamflow in a river, the flow volume and discharge cannot be determined with certainty at any
time or location.
In order to properly describe and analyze random phenomena mathematically, it is necessary to
define additional terminology such as random events and random variables. To talk about
random events, it is useful to introduce the concept of random experiments and sample space.
Consider the measurement of the speed of vehicles passing a specific location at a given time as
an experiment. The outcome varies from measurement to measurement. Thus, the measurements
can be considered to have a random component. An experiment that can result in different
outcomes, even though it is repeated in the same manner every time, is called a random
experiment. The set of all possible outcomes of an experiment is called the sample space for the
experiment.
In a random experiment, a variable whose value can change from one replicate of the experiment
to another is referred to as a random variable. A random variable is discrete if its possible values
come from a discrete set. For example, gender and race are discrete random variables. Note that
the set of possible values for a discrete random variable may be infinite, e.g., the set of all
integers is a discrete set. A random variable is continuous if its values come from interval(s)
(either finite or infinite) of real numbers. For example, speed of vehicles at a highway location is
a continuous random variable with possible values that may vary in the [10, 140] mph range.
In an experiment, a measurement is usually denoted by a variable, e.g. X. An uppercase letter is
used to denote a variable. For example, in the traffic example, X = speed of vehicles passing the
specified location. The measured value of a variable is denoted by a lowercase letter, e.g., x. In
the traffic data example shown in Fig 1, 𝑋1 = 65.7 𝑚𝑝ℎ. Thus, the sample of 35 measurements
may be denoted as 𝑋 = {𝑋1 , 𝑋2 , … , 𝑋35 } = {65.7,66.7, … ,54.9} 𝑚𝑝ℎ. The number of random
measurements (or observations) is called sample size, which may be denoted by N.

Page | 1
CIVE 203 Class Notes - For resident students only - Do not distribute
Fig 1. Illustration of N random observations of speed of vehicles

2 POPULATION VERSUS SAMPLE

The field of statistics deal with methods for drawing inferences about the properties of a
population based on the properties of a sample from the population. However, statistics go
beyond merely representing the properties of the population. Statistical measures are also used to
quantify uncertainty in knowledge about the population. Statistical methods are employed to
collect and analyze data to make decisions, solve problems, and design systems. In simple terms,
statistics is the science of data.
The fundamental concept in statistics is the population, which refers to a set of events (or
objects) whose measurable outcomes and properties are of interest. A population consists of the
set of all possible outcomes for a random variable. A sample is a subset of the population that is
collected via laboratory experiments or monitoring. Populations of interest can be finite and
enumerated explicitly. For example, we may be interested in the number of vehicles passing
through a certain highway intersection per minute. Populations can be also infinite, as in the
speed of vehicles passing through a road intersection.

3 DEFINITION OF STATISTICS AND PROBABILITY

Probability provides a theoretical underpinning for statistical methods. Probability deals with
methods for quantifying the likelihood of an event given known properties of the population. For
example, one may use probability methods to compute the likelihood of annual maximum speed
of vehicles at a highway location exceeding 100 mph, given that mean and standard deviation of
annual maximum speed at the location are 90 mph and 12 mph, respectively.
Conversely, statistics deals with methods for drawing inferences about the properties (e.g., mean
or variance) of a population based on a given sample. For example, one may postulate that mean
chloride concentration in a drinking water well exceeds the maximum safe drinking water level
(e.g., 50 milligrams per liter). Suppose once had collected 35 samples of water from the well and
measured chloride concentration in each sample. One could ask whether, based on the

Page | 2
CIVE 203 Class Notes - For resident students only - Do not distribute
information in the sample, the mean of the population is greater than 50 mg/L, which would be a
hypothesis test.

4 BASIC CONCEPTS OF STATISTICS

When independent experiments are conducted repeatedly, as in flipping a coin, the relative
frequency of events often appears to approach a limit even though the outcomes of individual
experiments remain uncertain and are defined by chance. This effect is called statistical
regularity. In laboratory experiments, one can develop confidence that statistical regularity is
presented by repeating experiments under nearly identical conditions. However, much of the data
that arise in civil and environmental engineering are observational rather than experimental. For
these data, statistical regularity cannot be demonstrated by repetition of the same experiment. For
example, we cannot repeat the experiment of a severe drought or flood. Thus, the justification for
the use of statistics and probability in civil and environmental engineering rests, in most cases,
on the insight that statistical methods provide into the expected magnitude and variability of
future observations.
The temporal and spatial variability of random processes that are commonly encountered in
engineering systems may be characterized by statistical analysis of empirical data (observations).
For this purpose, several statistical methods are available to measure various properties of a
random sample:
• Central tendency: mean, median, mode
• Dispersion: range, standard deviation, variance, coefficient of variation
• Asymmetry: skewness coefficient, kurtosis coefficient
For instance, the sample variance is one of the most important statistical characteristics of a
random sample, which provides some relevant information about the variability of the data.
Other statistics such as temporal and spatial correlations are important for describing the degree
of association and dependence that observations taken at various points in time and space may
possess. While sample statistics are important, often the frequency distribution is also needed in
order to observe numerically or graphically how the data is distributed and to make frequency
(probability) statements about the data.

5 MEASURES OF CENTRAL TENDENCY

Sample Mean
The sample mean measures the central tendency of a given sample. If 𝑋 = {𝑋1 , 𝑋2 , … , 𝑋𝑁 }
represents a sample or a sequence (series) of observations, where N is the sample size or the
number of observations, the sample mean (𝑋̅) can be determined by:

Page | 3
CIVE 203 Class Notes - For resident students only - Do not distribute
𝑁
1
𝑋̅ = ∑ 𝑋𝑖
𝑁
𝑖=1

The sample mean 𝑋̅ is also referred to as the sample arithmetic mean.

Alternative measures of the mean are the geometric mean, the harmonic mean, and the root mean
square. The sample geometric mean (𝑋̅𝐺 ) is estimated as:
𝑁 1/𝑁

𝑋̅𝐺 = (𝑋1 𝑋2 … 𝑋𝑁 )1/𝑁 = (∏ 𝑋𝑖 )

𝑖=1

Likewise, the sample harmonic mean (𝑋̅𝐻 ) is estimated as:

1 𝑁
𝑋̅𝐻 = = ; 𝑋𝑖 > 0
1 1 1 1 1
( + + ⋯ + ∑𝑁
𝑁 𝑋1 𝑋2 𝑋𝑁 ) 𝑖=1 𝑋𝑖

The sample root mean square (𝑋̅𝑅 ) is determined as:

𝑁 1/2
1/2
1 1
𝑋̅𝑅 = [ (𝑋12 + 𝑋22 + ⋯ + 𝑋𝑁2 )] = [ ∑ 𝑋𝑖2 ]
𝑁 𝑁
𝑖=1

It may be shown that 𝑋̅𝐻 < 𝑋̅𝐺 < 𝑋̅. Also note that the geometric mean is equal to zero if at least
one of the data is zero. And if any value is zero the harmonic mean becomes indefinite.

Sample Weighted Arithmetic mean:

Samples of discrete random variables may contain repeated values. In these cases, each value is
weighted by the number of observations of each value (𝑁𝑗 ):
𝐾
1
𝑋̅ = ∑(𝑁𝑗 𝑋𝑗 )
𝑁
𝑗=1

where 𝐾 is the number of discrete options with ∑𝐾

𝑗=1 𝑁𝑗 = 𝑁.

Example 1: Compute sample mean for the following 35 random observations of speed of
vehicles at a road segment:

𝑋 = {65.7,66.7,67.8,72.2,67.0,68.2,68.6,65.5,67.4,64.4,70.2,66.7,68.9,70.1,70.2,70.6,69.0,70.3,
67.4, 68.8,67.4,66.5,61.5,69.1,71.0,66.4,68.6,68.3,70.9,70.6,72.5,66.9,57.4,54.4,54.9} mph

Page | 4
CIVE 203 Class Notes - For resident students only - Do not distribute
Solution:
𝑁
1 1
𝑋̅ = ∑ 𝑋𝑖 = (65.7 + 66.7 + ⋯ + 54.9) = 67.2 𝑚𝑝ℎ
𝑁 35
𝑖=1
𝑁 1/𝑁

𝑋̅𝐺 = (∏ 𝑋𝑖 ) = (65.7 × 66.7 × … × 54.9)1/35 = 67.1 𝑚𝑝ℎ

𝑖=1

𝑁 35
𝑋̅𝐻 = = = 66.9 𝑚𝑝ℎ
1 1 1 1
∑𝑁
𝑖=1 ( + 66.7 + ⋯ + )
𝑋𝑖 65.7 54.9

𝑁
1 1
𝑋̅𝑅 = √ ∑ 𝑋𝑖2 = √ (65.72 + 66.72 + ⋯ + 54.92 ) = 67.3 𝑚𝑝ℎ
𝑁 35
𝑖=1

Sample Median
The median is another measure of central tendency of a given sample. The sample median,
denoted by 𝑋𝑚 , is the value such that half of the values of the sample lie on either side of 𝑋𝑚 .
Let 𝑌1 < 𝑌2 < ⋯ < 𝑌𝑁 denote the ordered values (smallest to largest) of the random sample
𝑋1 , 𝑋2 , … , 𝑋𝑁 . The sample median is determined as:
𝑋𝑚 = 𝑌(𝑁+1)/2 if N is odd
1
𝑋𝑚 = 2 [𝑌(𝑁/2) + 𝑌(𝑁/2)+1 ] if N is even

Often the sample median is a preferred statistic over the sample mean because the former is not
affected by outlier observations.

Example 2: Compute sample median for the speed data in example 1.

Solution:
Sort the observations in an ascending order (smallest to largest) and find the middle
value(s):
𝑌 = {54.4, 54.9, 57.4, 61.5, 64.4, 65.5,65.7,66.4,66.5,66.7,66.7,66.9,67.0,67.4,67.4,67.4,
67.8,68.2,68.3,68.6,68.6,68.8,68.9,69.0,69.1,70.1,70.2,70.2,70.3,70.6,70.6,70.9,71.0,
72.2,72.5}
Since N = 35 is odd:
→ 𝑋𝑚 = 𝑦(𝑁+1)/2 = 𝑦18 = 68.2 𝑚𝑝ℎ

Page | 5
CIVE 203 Class Notes - For resident students only - Do not distribute
Sample Mode
The sample mode (𝑋̂) is most frequent observation. For continuous random variables, sample
mode may be obtained from the histogram of the empirical data.

Example 3: Compute sample mode for the speed data in example 1.

Solution:
Since speed of vehicles is a continuous random variable, compute the histogram of the
observations, and then find the center of the bin with the highest frequency:
→ 𝑋̂ = 68.0 𝑚𝑝ℎ
More information about histograms is presented in the Statistical Plots section.

6 MEASURES OF DISPERSION
The sample standard deviation (𝑠) measures the dispersion of sample values around the sample
mean. The sample variance is the square of the standard deviation and is denoted by 𝑠 2 . An
unbiased estimator of the sample standard deviation is estimated:
𝑁 1/2
1
𝑠=[ ∑(𝑋𝑖 − 𝑋̅)2 ]
𝑁−1
𝑖=1

where 𝑁 is the sample size and 𝑋̅ denotes the sample mean, while 𝑠 2 is also commonly used to
denote the unbiased sample variance.
Samples of discrete random variables may contain repeated values. In these cases, each value is
weighted by the number of observations of each value (𝑁𝑖 ):
1/2
𝐾
1 2
𝑠=[ ∑ 𝑁𝑗 (𝑋𝑗 − 𝑋̅) ]
𝑁−1
𝑗=1

Page | 6
CIVE 203 Class Notes - For resident students only - Do not distribute
where 𝐾 is the number of discrete options with ∑𝐾
𝑗=1 𝑁𝑗 = 𝑁.

The sample coefficient of variation is a dimensionless dispersion statistic that is equal to the
ratio of the sample standard deviation (𝑠) and the sample mean (𝑋̅), i.e.
𝑠
𝜂̂ = 𝐶𝑣 =
𝑋̅
The coefficient of variation gives a measure of the uncertainty of a sample relative to the mean.
When an ordered set of data is divided into four equal parts, the division points are called
quartiles. The first quartile (𝑄1) or lower quartile is a value that has proximally 25% of
observations below and approximately 75% of observations above it. The third quartile (𝑄3 ) or
upper quartile has proximally 75% of observations below its value. Similar to the sample
median, first and third quantiles of a sample may be obtained from the ordered sample values.
Other measures of dispersion or variability of a sample data includes the range (R), interquartile
range (IQR), and mean absolute deviation (MAD). The range, the difference between the
maximum and the minimum, is a crude measure of dispersion. Instead, the range of some
specific quantiles such as the 25% and 75% quantiles (i.e., the first and third quartiles,
respectively) may be used. The mean absolute deviation is the average of the absolute deviations
of the sample. These measures of dispersion are summarized below:
𝑅 = 𝑋𝑚𝑎𝑥 − 𝑋𝑚𝑖𝑛
𝐼𝑄𝑅 = 𝑄3 − 𝑄1
𝑁
1
𝑀𝐴𝐷 = ∑|𝑋𝑖 − 𝑋̅|
𝑁
𝑖=1

7 MEASURES OF ASYMMETRY
The sample skewness coefficient indicates the degree of asymmetry of the frequency distribution
of the sample data. It may be computed by:
∑𝑁 ̅ 3
𝑖=1(𝑋𝑖 − 𝑋 )
𝛾̂ =
𝑁 𝑠3
where 𝑁 and 𝑠 are the sample size and standard deviation, respectively. Division by the cube of
the sample standard deviation (s) gives a dimensionless measure. However, this equation is a
biased estimator of the population skewness coefficient. An unbiased sample skewness
coefficient is:
𝑁 ∑𝑁 ̅ 3
𝑖=1(𝑋𝑖 − 𝑋 )
𝛾̂ =
(𝑁 − 1)(𝑁 − 2) 𝑠 3
Samples of discrete random variables may contain repeated values. In these cases, each value is
weighted by the number of observations of each value (𝑛𝑖 ):

Page | 7
CIVE 203 Class Notes - For resident students only - Do not distribute
3
∑𝐾 ̅
𝑗=1 𝑁𝑗 (𝑋𝑗 − 𝑋 )
𝛾̂ =
𝑁 𝑠3
or (for unbiased estimator):

𝑁 ∑𝐾 ̅ 3
𝑗=1 𝑁𝑗 (𝑋𝑗 − 𝑋 )
𝛾̂ =
(𝑁 − 1)(𝑁 − 2) 𝑠3
where 𝐾 is the number of discrete options with ∑𝐾
𝑗=1 𝑁𝑗 = 𝑁.

The skewness coefficient has an important meaning since it gives an indication of the symmetry
of the distribution of the data. Symmetrical frequency distributions have small or negligible
sample skewness coefficient while asymmetrical distributions have large positive (skewed to the
left) or negative (skewed to the right) coefficients. A small value of |𝛾̂| may indicate that the
frequency distribution of the sample may be approximated by the normal distribution function
since  = 0 for the normal distribution.

No skew: |𝛾̂| ≈ 0 Positive skew: 𝛾̂ ≫ 0 Negative skew: 𝛾̂ ≪ 0

Fig 2. Illustration of the frequency distribution of random variable with different skewness coefficients

The sample kurtosis coefficient measures the peakedness or the flatness of the frequency
distribution near its mean. It can be estimated by:
∑𝑁 ̅ 4
𝑖=1(𝑋𝑖 − 𝑋 )
𝜅̂ =
𝑁 𝑠4
where 𝑁 and 𝑠 are the sample size and standard deviation, respectively. Division by 𝑠 4 gives a
dimensionless coefficient. This equation gives a biased estimator of the population kurtosis
coefficient. An unbiased estimator of the sample kurtosis coefficient is:
𝑁2 ∑𝑁 ̅ 4
𝑖=1(𝑋𝑖 − 𝑋 )
𝜅̂ =
(𝑁 − 1)(𝑁 − 2)(𝑁 − 3) 𝑠4
Figure 3 illustrates the frequency distribution of random variables with different kurtosis
coefficients.

Page | 8
CIVE 203 Class Notes - For resident students only - Do not distribute
Positive
kurtosis

Norma
Negative
kurtosis

Fig 3. Illustration of the frequency distribution of random variable with different kurtosis coefficients

A related coefficient called excess coefficient is defined by 𝜀̂ = 𝜅̂ − 3. For the Gaussian

(normal) distribution, 𝜅̂ = 3 and 𝜀̂ = 0. Positive values of 𝜀̂ indicate that a frequency distribution
is more peaked around its mean than the Gaussian distribution while negative values indicate that
the frequency distribution is more flat around its mean than the normal (note that for the normal
distribution 𝜅̂ = 3 and 𝜀̂ = 0).

Example 4: Compute sample variance, coefficient of variation, skewness coefficient, and

kurtosis coefficients for the speed data in example 1.
Solution:
Sample standard deviation:
𝑁 1/2
1
𝑠=[ ∑(𝑋𝑖 − 𝑋̅)2 ]
𝑁−1
𝑖=1

𝑁 1/2
1
=[ ∑(65.7 − 67.2)2 + (66.7 − 67.2)2 + ⋯ + (54.9 − 67.2)2 ]
35 − 1
𝑖=1

= 4.26 𝑚𝑝ℎ

Sample variance:
𝑠 2 = 18.17

Sample coefficient of variation:

𝑠
𝜂̂ = 𝐶𝑣 = = 0.06
𝑋̅

Page | 9
CIVE 203 Class Notes - For resident students only - Do not distribute
Sample skewness coefficient:
𝑁 ∑𝑁 ̅ 3
𝑖=1(𝑋𝑖 − 𝑋 )
𝛾̂ =
(𝑁 − 1)(𝑁 − 2) 𝑠3
𝐾
35
= ∑(65.7 − 67.2)3 + (66.7 − 67.2)3 + ⋯ + (54.9 − 67.2)3
34 × 33 × 4.263
𝑖=1

= −1.82
→ Sample distribution is heavily negative skewed.

Sample kurtosis coefficient:

𝑁 ∑𝑁 ̅ 4
𝑖=1(𝑋𝑖 − 𝑋 )
𝜅̂ =
(𝑁 − 1)(𝑁 − 2)(𝑁 − 3) 𝑠4
𝐾
352
= ∑(65.7 − 67.2)3 + (66.7 − 67.2)3 + ⋯ + (54.9 − 67.2)3
34 × 33 × 32 × 4.263
𝑖=1

= 6.47
→ Sample is highly peaked or flashy.

8 STATISTICAL VISUALIZATION

Scatter Plot
A scatter plot depicts values for two variables for a set of data. Data points are typically
displayed as markers with no line segments connecting them. Figure 5 shows an example of a
scatter plot for 20 observations of concrete compressive strength (y-axis) versus concrete density
(x-axis).

Fig 4. Scatter plot of compressive strength versus density of concrete

Page | 10
CIVE 203 Class Notes - For resident students only - Do not distribute
Time Series Plot
A time series is a graph in which the observations are displayed in order in which they occur (in
time): the y-axis denotes the observed values and the x-axis denotes the time (which could be
minutes, days, years, etc.).

Bar Graph
The occurrence of a discrete variable can be classified on a bar chart. In this type of graph, the
horizontal axis gives the values of the discrete variable and the occurrences are represented by
the height of the vertical lines.

Fig 5. Bar graph of speed of vehicles in Example 1

Histogram
If there are at least, say, 25 observations, one of the most common graphical form to depict the
frequency of observation is a histogram. To construct a histogram, the data are divided into
groups according to their magnitudes. The horizontal axis (x-axis) of the graph gives the
magnitude of classes while the vertical axis (y-axis) represents the number of observations in
each class (i.e., frequency). Histograms are used to determine the most common values (or
ranges) and symmetry in observed data. It is also common to re-scale the y-axis to show relative
frequency instead of number of occurrences. For each class, relative frequency is the number of
occurrences in the class divided by total number of observations.
Care should be given to number of classes used for constructing a histogram. Too many classes
will not give a clear picture, while too few classes will cause omission of important features. As
a rule of thumb, the number of classes should be between 5 and 25. An appropriate number of
classes can be obtained as follows:
𝑁𝑐 = 1 + 3.322 log10 (𝑁)
where N is the sample size. The number of classes may be adjusted to the closest lower integer.
For example, for 𝑁 = 35 → 𝑁𝑐 = 6.

Page | 11
CIVE 203 Class Notes - For resident students only - Do not distribute
Fig 6. Histogram of speed of vehicles in Example 1

Boxplot
A boxplot (Fig. 8) shows the three quartiles on a rectangular box, aligned either horizontally or
vertically. The box enclosed the interquartile range (IQR) with the left (or lower) edge at the first
quartile (Q1) and the right (or upper) edge at the third quartile (Q3). A line is drawn at the
second quartile (or median, which is the 50th percentile). Note on the figure below how the upper
and lower whiskers lines are drawn and how outliers are determined.

Fig 7. Boxplot explanation (from Montgomery and Runger, Applied Statistics and probability for Engineers, 7th
edition)

The boxplot of the data for example 1 is shown in Fig. 9.

Fig 8. Bar graph of speed of vehicles in Example 1

Page | 12
CIVE 203 Class Notes - For resident students only - Do not distribute
9 CROSS-CORRELATION COEFFICIENT
Consider two paired random samples 𝑋 = {𝑋1 , 𝑋2 , … , 𝑋𝑁 } and 𝑌 = {𝑌1 , 𝑌2 , … , 𝑌𝑁 }. For instance,
the 𝑋’s may represent annual precipitation over a drainage area and the 𝑌’s annual runoff at the
drainage outlet. The linear relationship between them may be investigated using cross-
correlation analysis. Specifically, the sample cross-correlation coefficient denoted by 𝜌̂
measures the linear association (dependence) between the samples 𝑋 and 𝑌 and is estimated by
∑𝑁 ̅ ̅
𝑖=1(𝑋𝑖 − 𝑋 ) (𝑌𝑖 − 𝑌)
𝜌̂ =
√∑𝑁 ̅ 2 √∑𝑁
𝑖=1(𝑋𝑖 − 𝑋 )
̅ 2
𝑖=1(𝑌𝑖 − 𝑌 )

where 𝑋̅ and 𝑌̅ are the sample means of 𝑋 and 𝑌, respectively. Often 𝑟 is used to denote the
sample cross-correlation coefficient. The cross-correlation coefficient is bounded by -1 and +1.
If 𝜌̂ is one in absolute value, then there is a perfect linear dependence between 𝑋 and 𝑌. A value
of zero on the other hand, means no linear dependence. A positive 𝜌̂ value indicated that the
value of 𝑌 increases as the values of 𝑋 increases. Conversely, a negative 𝜌̂ value indicated that
the value of 𝑌 decreases as the values of 𝑋 increases. When |𝜌̂| < 0.3, the dependence is weak,
while the dependence may be deemed as strong when |𝜌̂| > 0.7.

Example 5. Compute the correlation coefficient for the concrete data below:
Density 145.4 265.0 507.3 491.9 83.3 269.6 339.8 279.2 411.3 395.4
(kg/m^3)
Compressive Strength 27.7 48.8 5.3 72.6 4.5 22.8 37.2 52.6 34.4 77.7
(N/mm^2)
Density 210.6 287.4 58.5 591.2 141.9 108.4 254.8 159.0 319.3 236.7
(kg/m^3)
Compressive Strength 12.7 40.3 7.8 63.8 19.8 14.5 9.1 4.1 63.8 8.1
(N/mm^2)
Solutions:
𝑘𝑔
𝑁 = 20; 𝑋̅ = 277.8 ; 𝑌̅ = 31.38 𝑁/𝑚𝑚2
𝑚3
𝑁

𝑆𝑋𝑋 = ∑(𝑋𝑖 − 𝑋̅)2 = 405714

𝑖=1
𝑁

𝑆𝑌𝑌 = ∑(𝑌𝑖 − 𝑌̅)2 = 11423

𝑖=1
𝑁

𝑆𝑋𝑌 = ∑(𝑋𝑖 − 𝑋̅) (𝑌𝑖 − 𝑌̅) = 41817

𝑖=1

→ 𝜌̂ = 0.61

Page | 13
CIVE 203 Class Notes - For resident students only - Do not distribute

Stats Textbook
No ratings yet
Stats Textbook
374 pages
IB Standard Level Maths Analysis Approaches
No ratings yet
IB Standard Level Maths Analysis Approaches
23 pages
Statistics: a QuickStudy Laminated Reference Guide
From Everand
Statistics: a QuickStudy Laminated Reference Guide
BarCharts Publishing, Inc.
No ratings yet
Statistics Lecture Course 2022-2023
No ratings yet
Statistics Lecture Course 2022-2023
66 pages
Satistics
No ratings yet
Satistics
18 pages
الإحصاء الهندسي
No ratings yet
الإحصاء الهندسي
64 pages
Modified Ps Final 2023
No ratings yet
Modified Ps Final 2023
124 pages
MT233 October 2019-1
No ratings yet
MT233 October 2019-1
39 pages
Statistics For Beginners 2024
No ratings yet
Statistics For Beginners 2024
37 pages
Probability, Statistics, and Data Analysis Notes # 1
No ratings yet
Probability, Statistics, and Data Analysis Notes # 1
5 pages
TEC 106-Complete Converted Notes-1
No ratings yet
TEC 106-Complete Converted Notes-1
35 pages
Prof. Joy V. Lorin-Picar Davao Del Norte State College: New Visayas, Panabo City
No ratings yet
Prof. Joy V. Lorin-Picar Davao Del Norte State College: New Visayas, Panabo City
91 pages
Frequency Analysis in Hydrology
No ratings yet
Frequency Analysis in Hydrology
24 pages
Statistical Concepts and Principles
No ratings yet
Statistical Concepts and Principles
37 pages
Basic Statistics: Statistics: Is A Science That Analyzes Information Variables (For Instance
No ratings yet
Basic Statistics: Statistics: Is A Science That Analyzes Information Variables (For Instance
14 pages
Statistics For Beginners
No ratings yet
Statistics For Beginners
35 pages
Module 2 Introduction To Probability and Statistics
No ratings yet
Module 2 Introduction To Probability and Statistics
7 pages
ملزمة الاحصاء د.عبدالخالق
No ratings yet
ملزمة الاحصاء د.عبدالخالق
106 pages
BDU Biometrics
No ratings yet
BDU Biometrics
122 pages
Probability & Statistics Theme 6 Sampling Distribution Random Sample
No ratings yet
Probability & Statistics Theme 6 Sampling Distribution Random Sample
4 pages
Prob Stat Definition
No ratings yet
Prob Stat Definition
2 pages
1 - Introduction To Statistics and Data Analysis
100% (1)
1 - Introduction To Statistics and Data Analysis
6 pages
Maity2018 Chapter BasicConceptsOfProbabilityAndS
100% (1)
Maity2018 Chapter BasicConceptsOfProbabilityAndS
46 pages
Statistics YTU Day 1_70c47b3d-23fd-4707-8184-60cbab30a3c3
No ratings yet
Statistics YTU Day 1_70c47b3d-23fd-4707-8184-60cbab30a3c3
37 pages
Reviewer_in_Statistics_and_Probability
No ratings yet
Reviewer_in_Statistics_and_Probability
7 pages
Fundamentals of Statistics I - Lecture Notes
No ratings yet
Fundamentals of Statistics I - Lecture Notes
77 pages
Erwin John Landicho
No ratings yet
Erwin John Landicho
8 pages
CH1 - Populations, Samples, and Branches of Statistics
No ratings yet
CH1 - Populations, Samples, and Branches of Statistics
2 pages
Engineering Data Analysis
No ratings yet
Engineering Data Analysis
64 pages
DSML
No ratings yet
DSML
510 pages
Business Statistics - Chapter 1
No ratings yet
Business Statistics - Chapter 1
26 pages
Statistics and Probability Chapter 1 2 3
No ratings yet
Statistics and Probability Chapter 1 2 3
89 pages
Stats Week 1 PDF
No ratings yet
Stats Week 1 PDF
6 pages
Chapter 1 Statistics: Case Study 1.1
No ratings yet
Chapter 1 Statistics: Case Study 1.1
5 pages
DAY 8
No ratings yet
DAY 8
30 pages
Unit 4 Sampling and Estimation_21MA41
No ratings yet
Unit 4 Sampling and Estimation_21MA41
18 pages
Tpe 517 Geostatistics II
No ratings yet
Tpe 517 Geostatistics II
83 pages
STAT Vocab
No ratings yet
STAT Vocab
15 pages
Statistical and Probability Tools For Cost Engineering
No ratings yet
Statistical and Probability Tools For Cost Engineering
16 pages
Lab 3 Statistics Intro
No ratings yet
Lab 3 Statistics Intro
12 pages
Worksheet For Surds
No ratings yet
Worksheet For Surds
21 pages
STATISTICS Module 1
No ratings yet
STATISTICS Module 1
31 pages
Stats Reviewer
No ratings yet
Stats Reviewer
3 pages
Bio-Statistics: School of Bio-Science and Engineering, 2016
No ratings yet
Bio-Statistics: School of Bio-Science and Engineering, 2016
134 pages
INTRODUCTION
No ratings yet
INTRODUCTION
16 pages
MAS113X Fundamentals of Statistics I Lecture Notes
No ratings yet
MAS113X Fundamentals of Statistics I Lecture Notes
8 pages
STATISTICS
No ratings yet
STATISTICS
9 pages
Module 3 - Assignment Rakesh Thakor
No ratings yet
Module 3 - Assignment Rakesh Thakor
16 pages
Lecture Note 1
No ratings yet
Lecture Note 1
9 pages
Discovering Statistics: Detailed Table of Contents
100% (1)
Discovering Statistics: Detailed Table of Contents
16 pages
Lecture 1
No ratings yet
Lecture 1
34 pages
Chapter 1
No ratings yet
Chapter 1
17 pages
Qam 2 Sampling Distributions Full
No ratings yet
Qam 2 Sampling Distributions Full
71 pages
Unit 1: Exploratory Data Analysis
No ratings yet
Unit 1: Exploratory Data Analysis
28 pages
Basics of Data - OpenStax
No ratings yet
Basics of Data - OpenStax
39 pages
What Is Statistics1
No ratings yet
What Is Statistics1
20 pages
Chapter 6
No ratings yet
Chapter 6
9 pages
Okkokoko
No ratings yet
Okkokoko
4 pages
Chapter 1: Introduction To Statistics
No ratings yet
Chapter 1: Introduction To Statistics
40 pages
Statistics II Essentials
From Everand
Statistics II Essentials
Emil Milewski
2.5/5 (1)
Instant Access to Modern Information Systems 1st Edition Christos Kalloniatis ebook Full Chapters
100% (8)
Instant Access to Modern Information Systems 1st Edition Christos Kalloniatis ebook Full Chapters
47 pages
Data Transformation
No ratings yet
Data Transformation
5 pages
Full Schaum's Outline of Statistics, 6e 6th Edition Murray R. Spiegel Ebook All Chapters
100% (4)
Full Schaum's Outline of Statistics, 6e 6th Edition Murray R. Spiegel Ebook All Chapters
62 pages
MAT 243 Project One Summary Report Template
No ratings yet
MAT 243 Project One Summary Report Template
6 pages
04_Chapter6_Random Errors in Chemical Analysis
No ratings yet
04_Chapter6_Random Errors in Chemical Analysis
26 pages
M0008 Understanding Six Sigma
No ratings yet
M0008 Understanding Six Sigma
60 pages
Viii. Measures of Dispersion
No ratings yet
Viii. Measures of Dispersion
2 pages
Screenshot 2024-12-25 at 10.54.48 PM
No ratings yet
Screenshot 2024-12-25 at 10.54.48 PM
12 pages
Fundamentals of Biostatistics 8th Edition Rosner Solutions Manual - PDF DOCX Format Is Available For Instant Download
100% (5)
Fundamentals of Biostatistics 8th Edition Rosner Solutions Manual - PDF DOCX Format Is Available For Instant Download
51 pages
Statistical Methods
No ratings yet
Statistical Methods
43 pages
04 CPK Acceptance
No ratings yet
04 CPK Acceptance
2 pages
BAA Class Notes
No ratings yet
BAA Class Notes
16 pages
Python Developer
No ratings yet
Python Developer
7 pages
Ipuc-Statistics-Pract. Assignment Problems
No ratings yet
Ipuc-Statistics-Pract. Assignment Problems
126 pages
Sampling Theory - Module 7
No ratings yet
Sampling Theory - Module 7
59 pages
IMPACT-OF-ABOLISHING-THE-DEATH-PENALTY-ON-CRIME-RATES-IN-THE-PHILIPPINES-A-COMPARATIVE-ANALYSIS
No ratings yet
IMPACT-OF-ABOLISHING-THE-DEATH-PENALTY-ON-CRIME-RATES-IN-THE-PHILIPPINES-A-COMPARATIVE-ANALYSIS
16 pages
[FREE PDF sample] The Essentials of Statistics A Tool for Social Research 3rd Edition Joseph F. Healey ebooks
100% (1)
[FREE PDF sample] The Essentials of Statistics A Tool for Social Research 3rd Edition Joseph F. Healey ebooks
51 pages
AP Stat Exploring data 2
No ratings yet
AP Stat Exploring data 2
9 pages
Visvesvaraya Technological University, Belagavi: VTU-ETR Seat No.: A
No ratings yet
Visvesvaraya Technological University, Belagavi: VTU-ETR Seat No.: A
48 pages
Contingency Approach To Construction
No ratings yet
Contingency Approach To Construction
15 pages
business stat ch 1
No ratings yet
business stat ch 1
17 pages
Unit 9 Part 1
No ratings yet
Unit 9 Part 1
19 pages
B Statistics
No ratings yet
B Statistics
72 pages
Assignment S & P Unit 1-1
No ratings yet
Assignment S & P Unit 1-1
2 pages
Normal or Gaussian Curve of Errors
No ratings yet
Normal or Gaussian Curve of Errors
22 pages
209 Business Statictics MCQS BCOM III
No ratings yet
209 Business Statictics MCQS BCOM III
12 pages
Wilson Sy - Excess Deaths in The United Kingdom-Midazolam and Euthanasia in The Covid19 Pandemic
No ratings yet
Wilson Sy - Excess Deaths in The United Kingdom-Midazolam and Euthanasia in The Covid19 Pandemic
22 pages
Instrumentation, Measurement and Analysis 4th Edition Chaudhary Nakra - eBook PDFpdf download
100% (2)
Instrumentation, Measurement and Analysis 4th Edition Chaudhary Nakra - eBook PDFpdf download
42 pages
Ponce Elec4 Ulo3
No ratings yet
Ponce Elec4 Ulo3
6 pages
3 2LeastSquaresRegression
No ratings yet
3 2LeastSquaresRegression
29 pages

%5B1%5D+Random+Variables+and+Exploratory+Data+Analysis

Uploaded by

%5B1%5D+Random+Variables+and+Exploratory+Data+Analysis

Uploaded by

RANDOM VARIABLES AND EXPLORATORY DATA ANALYSIS

2 POPULATION VERSUS SAMPLE

3 DEFINITION OF STATISTICS AND PROBABILITY

4 BASIC CONCEPTS OF STATISTICS

5 MEASURES OF CENTRAL TENDENCY

The sample mean 𝑋̅ is also referred to as the sample arithmetic mean.

𝑋̅𝐺 = (𝑋1 𝑋2 … 𝑋𝑁 )1/𝑁 = (∏ 𝑋𝑖 )

Likewise, the sample harmonic mean (𝑋̅𝐻 ) is estimated as:

The sample root mean square (𝑋̅𝑅 ) is determined as:

Sample Weighted Arithmetic mean:

where 𝐾 is the number of discrete options with ∑𝐾

𝑋̅𝐺 = (∏ 𝑋𝑖 ) = (65.7 × 66.7 × … × 54.9)1/35 = 67.1 𝑚𝑝ℎ

Example 2: Compute sample median for the speed data in example 1.

Example 3: Compute sample mode for the speed data in example 1.

No skew: |𝛾̂| ≈ 0 Positive skew: 𝛾̂ ≫ 0 Negative skew: 𝛾̂ ≪ 0

A related coefficient called excess coefficient is defined by 𝜀̂ = 𝜅̂ − 3. For the Gaussian

Example 4: Compute sample variance, coefficient of variation, skewness coefficient, and

Sample coefficient of variation:

Sample kurtosis coefficient:

Fig 4. Scatter plot of compressive strength versus density of concrete

Fig 5. Bar graph of speed of vehicles in Example 1

The boxplot of the data for example 1 is shown in Fig. 9.

Fig 8. Bar graph of speed of vehicles in Example 1

𝑆𝑋𝑋 = ∑(𝑋𝑖 − 𝑋̅)2 = 405714

𝑆𝑌𝑌 = ∑(𝑌𝑖 − 𝑌̅)2 = 11423

𝑆𝑋𝑌 = ∑(𝑋𝑖 − 𝑋̅) (𝑌𝑖 − 𝑌̅) = 41817

You might also like