0% found this document useful (0 votes)

8 views

ESci 117-Module 2-Lesson 2.3

Uploaded by

Gulferic Giomer

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

ESci 117-Module 2-Lesson 2.3

Uploaded by

Gulferic Giomer

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

ESci 117

Engineering Data Analysis

Module 2. Descriptive Statistics STUDENT LEARNING GUIDE
TP-IMD-02 v0 No. CET.ESC SLG20-06

Dr. Jacqueline M. Guarte

College of
ENGINEERING AND
TECHNOLOGY

Department of
AGRICULTURAL AND
BIOSYSTEMS ENGINEERING
2020
ii

Student Learning Guide in

ESci 117: Engineering

Data Analysis
Lesson 2.3: Measures of Variability

Lesson Summary
In addition to measures of location, the collected data can be farther
described in terms of their spread from the center. Such measures are
referred to as measures of variability or dispersion. Also, the boxplot is
included to gauge the symmetry in the data and to identify outliers.

Learning Outcomes
At the end of the lesson, the students should be able to use the measures of
variability appropriately; and interpret the measures of variability correctly.

Motivation Question
Why are measures of variability necessary in providing a complete description
of our collected data?

Discussion
We will not get a complete picture of our data if we only use measures of
location to describe or summarize the observed values. We must also have
measures that can tell us how different or how variable the observed values
are from each other. This becomes even more important when we want to
know whether our collected data on a characteristic of interest can be
considered homogeneous or not. Or, when we are comparing several
characteristics of interest and we want to know which is the most variable and
the least variable in this group.

Almeda et al. (2010) point out that a measure of dispersion (or variability)
determines the degree of dispersion or spread from the center of the
distribution (which can be represented by the measures of central tendency).
For at least ordinal-level data, a small value of such measure will indicate that
―the observations are not too different from each other so that there is a
concentration of observations about the center‖ (Almeda et al, 2010). In
contrast, the same authors state that a large value will ―indicate that the
observations are very different from each other so that they are widely spread
out from the center.‖

For nominal-level data, we can use the relative frequencies or

proportions of the categories studied to gauge whether the characteristic of
interest is homogeneous or heterogeneous based on the collected data. For
example, suppose we have a yes/no response to a question in a survey
conducted among coconut farmers in a certain village and the relative
frequencies are 0.9 for ―yes‖ and 0.1 for ―no.‖ This 0.9-0.1 (or 0.1-0.9) pair
tells us that the responses of the farmers are ―relatively homogeneous.‖ The
same description can be made for the pairs 0.8-0.2 (or 0.2-0.8) and 0.7-0.3
(or 0.3-0.7)as a 1.0-0 (or 0-1.0) pair reflects ―perfectly homogeneous‖
responses. For the pair 0.6-0.4 (or 0.4-0.6), we can say ―relatively
2

heterogeneous‖ responses as the pair 0.5-0.5 indicates the most

heterogeneous responses.

For at least ordinal-level data, we can use the range (the difference
between the largest and smallest observed values) but only if the values are
close to each other. It will be a misleading measure for highly variable data.
Its usual application is in quality control, where it is used ―to determine if the
production lines in a manufacturing company are in control‖ (Almeda et al,
2010). Otherwise, the median absolute deviation (MAD) will be more
appropriate as all observed values are considered in the calculation. We will
adapt Ott and Longnecker’s (2016) definition of the MAD as the median of the
absolute deviations of a set of measurements about their
median :̃

{| ̃| | ̃| | ̃ |}

Example: (Ordinal data)

Suppose 15 randomly selected household beneficiaries of a solar home

system in a certain island were asked to rate their unit’s performance after
one year of use based on the following rating scale: 1 - poor, 2 - fair, 3 –
satisfactory, 4 – very satisfactory, and 5 – excellent. Suppose further that the
sample responses were: 4, 3, 5, 3, 3, 3, 3, 3, 3, 2, 3, 3, 1, 3, 1. The range of
this data set is simply and indicates a relatively heterogeneous set
of ratings. Now, we proceed to find the MAD.

First, we find the median of the (ordered) values 1, 1, 2, 3, 3, 3, 3, 3, 3, 3, 3,

3, 3, 4, 5. This is ̃ ( ) ( ) 3. Next, we determine the absolute

difference between each value and their median (using the raw data): 1, 0, 2,
0, 0, 0, 0, 0, 0, 1, 0, 0, 2, 0, 2. Then, we find the median of these (ordered)
absolute deviations about the median: 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 2, 2, 2.
This is just the eighth value in the array so that the . Thus, our
computed measure of variability indicates that the ratings given by the 15
sample household beneficiaries are not too different from each other (as the
15 ratings are concentrated about the center—the median rating of 3). This is
the opposite of what the range tells us.

For at least interval-level data, we can use the measures of variability that
are based on the median and the mean of the data. We present these
measures in Table8 for easy familiarization. Our choice on the measure of
variability to use will depend on which measure of central tendency is more
appropriate for the data. We recall from Lesson 2.2 (measures of location)
that the mean is not a good measure of central tendency if there are values
that are very much different from the rest. In this case, the median is
preferred as it is not affected by such values. On the other hand, we can also
use these variation measures to decide on the appropriate measure of central
tendency to use. For example, a large variance would indicate that the
observations are far or very different from the mean so that we cannot
consider the mean as a good measure of central tendency in this case
(Almeda et al, 2010).
For the measures based on the mean, the sample variance, , is used to
estimate the value of population variance, . We use instead of for
3

the divisor to improve the estimation of (Almeda et al, 2010). Otherwise,

we will tend to underestimate the population variance.

For the measures based on the median, the formulas for the sample and the
population are just the same. Ott and Longnecker (2016) explain why the
median of the absolute deviations is divided by the value 0.6745 as follows:
―In a population having a normal distribution (recall the symmetric unimodal
histogram in Lesson 2.1, Figure 6) with standard deviation , the expected
value (or mean) of the absolute deviation about the median is 0.6745 . By
dividing the MAD by 0.6745, the expected value of MAD in a population
having a normal distribution is equal to .‖ At this point in time, we will
understand this as telling us that the MAD can also be used to estimate
when the population has a normal distribution since the population mean and
median are equal in this type of distribution.

Table 1. Summary of the different measures of variability for at least interval-

level data
Measure Definition/Formula Data Requirement
Based Variance Population: Values are close to each
on the ∑ ( ) other.
Mean , average squared
difference of each observation from
the mean

Sample:
∑ ( ̅)

Standard Population:
Deviation √ , average ―distance‖ of each
observation from the mean

Sample:
√
Based Median Histogram represents a
on the Absolute {| ̃| | ̃| normal distribution.
Median Deviation | ̃ |}
(MAD)
Average ∑ | ̃| Presence of extreme
Deviation , average distance values; histogram is
of each observation from the median skewed.

Example:

Consider again the population data on the average monthly high

temperatures ( ) for 2019 in Manila, Philippines: 29.5, 30.2, 31.9, 33.3, 33.4,
32.1, 31.2, 30.4, 30.6, 30.9, 30.5, 29.6. The population mean and median can
be computed to be and , respectively. Since the stem-and-
leaf display indicates a positively skewed histogram, the median is the
appropriate measure of central tendency and we can choose the average
deviation (based on the median) to describe the variability of the
temperatures. The MAD is only useful for symmetric unimodal or normal
distribution. For illustration and comparison purposes, we will compute all the
four measures of variability.

1. Measures based on the mean:

a. Population variance:
∑ ( ) ( ) ( )

b. Population standard deviation:

√ , not representative of the average difference
from the center of the data

2. Measures based on the median:

a. Median absolute deviation:

*| ̃| | ̃| | ̃|+
A
*| | | | | |+

* +

( )
, not useful as the
distribution is not
normal

b. Average deviation:
∑ | ̃|
, appropriate
value to report

Interpretation: On the average, the 12 average monthly high

temperatures in 2019 were away from their median.

The measures of variability that we have discussed so far are called absolute
measures of variability as they are used to describe the variability of a single
characteristic. To compare the variability in two or more characteristics, we
need a relative measure. For at least interval level of measurement, the
standard deviation can also be a relative measure but only when the means
are equal and the units of measurement are the same. There is a relative
measure, called the coefficient of variation, that can be used even if these
two conditions are not met since it is unitless. Almeda et al. (2010) define the
coefficient of variation as the ratio of the standard deviation to the mean,
expressed as a percentage. The formulas for the population and for the
sample are:

and ̅

provided the mean is not zero or negative. These restrictions make sense as
Almeda et al. (2010) point out that the expresses the standard deviation
as a percentage of the mean. A large , which indicates high variability in
the data set, results whenever the standard deviation is large compared to the
size of the mean. In contrast, a small , indicating low variability in the data
set, results whenever the standard deviation is small relative to the size of the
mean. The assumption here, however, is that the mean is a good measure of
central tendency. When comparing the variability of two or more
5

characteristics using the , this assumption is ignored. For the population

data on the average monthly high temperatures in Manila, Philippines, the
is determined as follows:

Note that we will not find this value useful to interpret for this characteristic
since the appropriate measure of variability is the average deviation based on
the median. However, we can use the to compare the variability of two or
more years’ data on this characteristic.

Describing the variability of a characteristic of interest measured in at least

interval scale can include identifying the presence of “unusual values” or
outliers as this will lend support to the extent that values can be different
from each other. Also, this will give insights on the possible range of values
the characteristic of interest can take in the population under study. For these
purposes, we will use the box-and-whisker plot, or simply, the boxplot, an
important tool in exploratory data analysis. We present the method, based on
Almeda et al. (2010) and Ott and Longnecker (2016), which makes use of the
quartiles. The basic steps can be stated as follows:

1. Construct a rectangle with one end at the first quartile ( ) and the other
end at the third quartile ( ). This can be drawn vertically (y-axis is the
measurement scale) or horizontally (x-axis is the measurement scale). This
rectangle indicates where the middle 50% of the data set lie.

2. Put a line across the interior of the rectangle at the median.

3. Let be the interquartile range. This is also a measure of

dispersion. Based on this, we compute for the following ―fences‖:

These fences are cutoffs for outliers. Ott and Longnecker (2016) qualify
that ―any data value beyond an inner fence on either side is a mild outlier,
and any data value beyond an outer fence on either side is an extreme
outlier. The smallest and largest data values that are not outliers are
called the lower adjacent value and upper adjacent value, respectively.‖

4. Draw a line from each quartile to its adjacent value. These lines are
referred to as the ―whiskers.‖

5. Mark each mild outlier with a closed circle, .

6. Mark each extreme outlier with an open circle, O.

A boxplot with outliers is shown in Figure 9. We note that the four largest
observations are extreme outliers while the next four largest are mild
outliers. Since there are no negative observations, there are no outliers on
the lower end or left part of the data. This boxplot clearly shows the high
6

level of variability present in this sample data on total nitrogen loads (kg
N/day) from a particular Chesapeake Bay location in the United States. The
data were collected as part of a study to determine if the water in this bay is
―fishable and swimmable‖ ( evore, 2012).

Figure 1. A boxplot of the nitrogen load data showing mild and extreme
outliers

Source: Taken from J. evore’s Probability and Statistics for

Engineering and the Sciences, 8 th edn., Brooks/Cole, Cengage
Learning, Boston, MA, USA,
2012, p. 41.

We can examine the degree and direction of symmetry in the data by the
relative position of the line inside the rectangle to its sides as this shows the
respective distances of the median from the two quartiles (Almeda et al,
2010). Specifically, if the median line is in the middle of the rectangle, the
distribution is symmetric; if the median line is closer to the lower quartile, the
distribution is positively skewed or skewed to the right (as shown in Figure 9);
if the median line is closer to the upper quartile, the distribution is negatively
skewed or skewed to the left.

Additional information about skewness can be obtained from the lengths of

the whiskers—the longer one whisker is relative to the other, the more
skewness there is in the tail with the longer whisker (Ott and Longnecker,
2016). This is illustrated in Figure 9. However, this whisker-based
assessment may not always agree with that based on the median line. In
such situations, we follow that based on the median line. With a lot of
information supplied by a boxplot, we just need to remember that a skewed
distribution is categorically heterogeneous.

Example:

Consider again the population data on the average monthly high

temperatures ( ) for 2019 in Manila, Philippines: 29.5, 30.2, 31.9, 33.3, 33.4,
32.1, 31.2, 30.4, 30.6, 30.9, 30.5, 29.6. The median is , with
and Our interquartile range is then . Before
constructing the boxplot, we will first find the four fences to determine any
outlier in the data:

( )
( )
( )
( )
7

The boxplot for the data is shown in Figure 10. We see a positively skewed
distribution with the median line closer to the lower quartile. The 12 monthly
high temperatures comprise a heterogeneous population with no outliers.

Figure 2. A boxplot of the average monthly high temperature in

Manila, Philippines: 2019.

Average monthly high temperature ( )

Devore (2012) shares that ―a comparative or side-by-side boxplot is a very

effective way of revealing similarities and differences between two or more
data sets consisting of observations on the same characteristic or variable—
fuel efficiency for four different types of automobiles, crop yields for three
different varieties, and so on.‖ This is best done with the vertical axis as the
measurement scale as this will allow us to construct two or more boxplots,
one after the other, for each of the data sets to be compared. To better
comprehend and appreciate this possibility, two examples are presented as
shown in Figures 11 and 12 (Ott and Longnecker, 2016).
8

Figure 3. A boxplot of impurities removed using three filter types.

Source: Taken from R.L. Ott, and . Longnecker’s An Introduction to

Statistical Methods and Data Analysis, 7 th edn., Cengage
Learning, Boston, MA, USA, 2016, p.109.

Figure 4.. A boxplot of math and reading scores for each grade.

Source: Taken from R.L. Ott, and . Longnecker’s An Introduction to

Statistical Methods and Data Analysis, 7 th edn., Cengage
Learning, Boston, MA, USA, 2016, p.120.

Learning Activity
Consider the data on crack length, given its stem-and-leaf display below, from
Lesson 2.2. Perform as indicated. Follow the guide.

1. Compute the appropriate measure of variability, justify choice, and

interpret.
2. Compute a relative measure of variability.
3. Construct a boxplot and characterize farther the differences among the
data values.
9

SALD:

0H 89 96
1L 03 18 27 40 46 Stem: tens digit, H-high, L-low
1H 61 85 Leaf: one and tenths digit
2L 04 12 33 42 49
2H 53 58 71 85 or Unit = 0.1
3L 02 24
3H
4L
4H 50

1. Appropriate measure of variability:

Justification:
Interpretation:

2. Relative measure of variability:

4. Summary measures needed for the boxplot:

The boxplot for this data set is:

References
ALMEDA, J.V., T.S. CAPISTRANO, and G.M.F. SARTE. 2010. Elementary
Statistics. The University of the Philippines Press, Quezon City. pp.
231-233, 236-238, 253, 430-434.

DEVORE, J. 2012. Probability and Statistics for Engineering and the

Sciences, 8th edn. Brooks/Cole, Cengage Learning, Boston, MA, USA.
p.41.

OTT, R.L. and M. LONGNECKER. 2016. An Introduction to Statistical

Methods and Data Analysis, 7 th edn. Cengage Learning, Boston, MA,
USA. pp. 98-100, 106-109,120.
10

Module Posttest
Instruction: Answer the following questions to the best of your ability.

1. Why does a stem-and-leaf display resemble a frequency distribution?

2. How does a relative frequency histogram summarize data?

3. What are quantiles?

4. How does a boxplot show outliers?

DEPARTMENT OF
AGRICULTURAL AND BIOSYSTEMS ENGINEERING
College of Engineering and Technology

For inquiries, contact:

ENGR. ELDON P. DE PADUA

[email protected] • [email protected]
+63 53 565 0600 Local 1015

Use this code when referring to this material:

TP-IMD-02 v0 07-15-20 • No. CET.ABE SLG20-06

Visca, Baybay City, Leyte

Philippines 6521
[email protected]
+63 53 565 0600

The Gamsat Bible
No ratings yet
The Gamsat Bible
43 pages
Set 4
67% (12)
Set 4
2 pages
Frit 7739 Professional Development Workshop Instructional Design Unit
100% (1)
Frit 7739 Professional Development Workshop Instructional Design Unit
15 pages
Basics of Statistics: Definition: Science of Collection, Presentation, Analysis, and Reasonable
100% (1)
Basics of Statistics: Definition: Science of Collection, Presentation, Analysis, and Reasonable
33 pages
Examples Biostatistics. Final
No ratings yet
Examples Biostatistics. Final
90 pages
Statistics: a QuickStudy Laminated Reference Guide
From Everand
Statistics: a QuickStudy Laminated Reference Guide
BarCharts Publishing, Inc.
No ratings yet
Central Tendency
No ratings yet
Central Tendency
5 pages
ESci 117 Module 2 Lesson 2.2
No ratings yet
ESci 117 Module 2 Lesson 2.2
10 pages
Bio Statistics 3
No ratings yet
Bio Statistics 3
13 pages
Unit-3-Measure-of-Central-Location
No ratings yet
Unit-3-Measure-of-Central-Location
29 pages
Inferential Statistics 1
No ratings yet
Inferential Statistics 1
34 pages
4485-2
No ratings yet
4485-2
44 pages
8417.1 Ok-1
No ratings yet
8417.1 Ok-1
20 pages
LabModule - Exploratory Data Analysis - 2023ic
No ratings yet
LabModule - Exploratory Data Analysis - 2023ic
24 pages
Shapes
No ratings yet
Shapes
36 pages
Amit Singh - Ssjcet20024 - Business Statistic Assignment
No ratings yet
Amit Singh - Ssjcet20024 - Business Statistic Assignment
14 pages
LM 8 1 Statistical Tool
No ratings yet
LM 8 1 Statistical Tool
5 pages
Mean Deviation: (For M.B.A. I Semester)
No ratings yet
Mean Deviation: (For M.B.A. I Semester)
20 pages
01_Ram Kishor MTECH_3rd SEM_ ESE-711_BATCH (2022-2024)_research Methodology
No ratings yet
01_Ram Kishor MTECH_3rd SEM_ ESE-711_BATCH (2022-2024)_research Methodology
23 pages
3) Measures of Dispersion
No ratings yet
3) Measures of Dispersion
8 pages
Stat Chapter 3
No ratings yet
Stat Chapter 3
24 pages
Dispersion 1
No ratings yet
Dispersion 1
32 pages
ICS Week 2 - Handouts
No ratings yet
ICS Week 2 - Handouts
20 pages
RM-Topic 1-Descriptive Statistics
No ratings yet
RM-Topic 1-Descriptive Statistics
12 pages
Statistical Foundations for Psychology
From Everand
Statistical Foundations for Psychology
James C. Ware
No ratings yet
1 Descriptive Statistics - Unlocked
No ratings yet
1 Descriptive Statistics - Unlocked
18 pages
Mathgrr
No ratings yet
Mathgrr
11 pages
Class 11 Mathematics Notes 2024-25 Chapter -13. Statistics
No ratings yet
Class 11 Mathematics Notes 2024-25 Chapter -13. Statistics
48 pages
RMBS BPT402
No ratings yet
RMBS BPT402
103 pages
Educ 201-No.1
No ratings yet
Educ 201-No.1
4 pages
MMW 6 Data Management Part 3 Central Location Variability PDF
No ratings yet
MMW 6 Data Management Part 3 Central Location Variability PDF
5 pages
Business Statistics & Analytics For Decision Making Assignment 1 Franklin Babu
100% (1)
Business Statistics & Analytics For Decision Making Assignment 1 Franklin Babu
9 pages
Angilan, Ef
No ratings yet
Angilan, Ef
5 pages
Activity 3 Experimental Errors and Acoustics (Ver10182020) - Unlocked
No ratings yet
Activity 3 Experimental Errors and Acoustics (Ver10182020) - Unlocked
8 pages
Brick Exchange • Descriptive Statistics and Data Representation
No ratings yet
Brick Exchange • Descriptive Statistics and Data Representation
24 pages
الثامنة
No ratings yet
الثامنة
14 pages
3median - Wikipedia
No ratings yet
3median - Wikipedia
44 pages
Mmw Data Management
No ratings yet
Mmw Data Management
35 pages
chapter 4
No ratings yet
chapter 4
11 pages
Lecture 3 Notes - PSYC 204
No ratings yet
Lecture 3 Notes - PSYC 204
8 pages
Variance and Standard Deviation
100% (3)
Variance and Standard Deviation
15 pages
Gurruh Dwi Septano Tugas Rangkuman BAB 2
No ratings yet
Gurruh Dwi Septano Tugas Rangkuman BAB 2
16 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
24 pages
Module 3 - Branches of Statistics (1)
No ratings yet
Module 3 - Branches of Statistics (1)
50 pages
Measures of Central Tendency
No ratings yet
Measures of Central Tendency
31 pages
Probability and Statistics Lab Da2
No ratings yet
Probability and Statistics Lab Da2
6 pages
Measures of Central Tendency Position
No ratings yet
Measures of Central Tendency Position
12 pages
L2-More On Describing Data
No ratings yet
L2-More On Describing Data
154 pages
Lecture 2-Summarizing Data - HSciences Biostats - 010232en
No ratings yet
Lecture 2-Summarizing Data - HSciences Biostats - 010232en
37 pages
Module 10 Introduction To Data and Statistics
No ratings yet
Module 10 Introduction To Data and Statistics
63 pages
Statistics Assignment Chinar Dawod Ozair
100% (1)
Statistics Assignment Chinar Dawod Ozair
12 pages
Lesson 6c, 7, 8
No ratings yet
Lesson 6c, 7, 8
46 pages
Lesson 6. Measures of Variability RAD (1)
No ratings yet
Lesson 6. Measures of Variability RAD (1)
21 pages
Statistics Part 1 and 2
No ratings yet
Statistics Part 1 and 2
53 pages
Define Statistics
No ratings yet
Define Statistics
89 pages
Confidence Intervals PDF
No ratings yet
Confidence Intervals PDF
5 pages
Define Statistics
No ratings yet
Define Statistics
89 pages
3 Numerical Descriptive Measures
No ratings yet
3 Numerical Descriptive Measures
55 pages
(Week2) Social Data Analysis_240911 (2)
No ratings yet
(Week2) Social Data Analysis_240911 (2)
27 pages
And Estimation Sampling Distributions: Learning Outcomes
No ratings yet
And Estimation Sampling Distributions: Learning Outcomes
12 pages
Statistics
No ratings yet
Statistics
47 pages
5268-1-19590-2-10-20130508
No ratings yet
5268-1-19590-2-10-20130508
3 pages
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
From Everand
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
Seaport AI Madhavan
No ratings yet
Revision of Emission Factors For AP-42 Section 11.9
No ratings yet
Revision of Emission Factors For AP-42 Section 11.9
350 pages
Telephone Directory
No ratings yet
Telephone Directory
2 pages
Bullying Research Paper Thesis Statement
100% (2)
Bullying Research Paper Thesis Statement
7 pages
Dissertation Topics in Hospitals
100% (2)
Dissertation Topics in Hospitals
8 pages
Etop and Swot Analysis
75% (4)
Etop and Swot Analysis
6 pages
Buying Behaviour
100% (1)
Buying Behaviour
19 pages
Module6 - MBIE - Earthquake Guidance
No ratings yet
Module6 - MBIE - Earthquake Guidance
80 pages
Competitive Importance-Performance Analysis of An Australian Wildlife Park
No ratings yet
Competitive Importance-Performance Analysis of An Australian Wildlife Park
9 pages
Automatic Data Collection in Logistics Costing: Analysing The Causes and Effects of Variation
No ratings yet
Automatic Data Collection in Logistics Costing: Analysing The Causes and Effects of Variation
15 pages
Working Memory Model
No ratings yet
Working Memory Model
7 pages
BCS506 - Software Project Management Assignment 1 of 1: Faculty of Science, Technology, Engineering & Mathematics
100% (1)
BCS506 - Software Project Management Assignment 1 of 1: Faculty of Science, Technology, Engineering & Mathematics
3 pages
Daniel's Resume 7.2018
No ratings yet
Daniel's Resume 7.2018
2 pages
Abstract Muthu
No ratings yet
Abstract Muthu
4 pages
The Economic Sociology of Development 2nd Edition Andrew Schrank 2024 scribd download
100% (1)
The Economic Sociology of Development 2nd Edition Andrew Schrank 2024 scribd download
40 pages
Taco Bell
No ratings yet
Taco Bell
4 pages
Sampling-An Audit Tool: Presented by Mr. Preman Dinaraj, Prin. Director, RTI, Mumbai
No ratings yet
Sampling-An Audit Tool: Presented by Mr. Preman Dinaraj, Prin. Director, RTI, Mumbai
36 pages
Barry O Donovan Novartis (Compatibility Mode) PDF
No ratings yet
Barry O Donovan Novartis (Compatibility Mode) PDF
13 pages
Draft Feedback 6
No ratings yet
Draft Feedback 6
6 pages
Turkish Ship Chandler Companies: A Marketing Success or A Disappointment?
No ratings yet
Turkish Ship Chandler Companies: A Marketing Success or A Disappointment?
6 pages
Hathway Project Report
75% (4)
Hathway Project Report
147 pages
Hrma 211 Admin and Office MGT
No ratings yet
Hrma 211 Admin and Office MGT
6 pages
CH 5 Market Risk Measurement and Management Answers
No ratings yet
CH 5 Market Risk Measurement and Management Answers
264 pages
Lahore School of Economics
No ratings yet
Lahore School of Economics
10 pages
Capstone Forms 2
No ratings yet
Capstone Forms 2
15 pages
Quality Management Practices and Organizational Knowledge Management A Quantitative and Qualitative Investigation
100% (1)
Quality Management Practices and Organizational Knowledge Management A Quantitative and Qualitative Investigation
124 pages
PH.D Thesis Report Model
No ratings yet
PH.D Thesis Report Model
4 pages
W3-3 Hypothesis Testing Spanish 25oct07
No ratings yet
W3-3 Hypothesis Testing Spanish 25oct07
71 pages

ESci 117-Module 2-Lesson 2.3

Uploaded by

ESci 117-Module 2-Lesson 2.3

Uploaded by

ESci 117

Engineering Data Analysis

Dr. Jacqueline M. Guarte

Student Learning Guide in

ESci 117: Engineering

For nominal-level data, we can use the relative frequencies or

heterogeneous‖ responses as the pair 0.5-0.5 indicates the most

Example: (Ordinal data)

Suppose 15 randomly selected household beneficiaries of a solar home

First, we find the median of the (ordered) values 1, 1, 2, 3, 3, 3, 3, 3, 3, 3, 3,

the divisor to improve the estimation of (Almeda et al, 2010). Otherwise,

Table 1. Summary of the different measures of variability for at least interval-

Consider again the population data on the average monthly high

1. Measures based on the mean:

b. Population standard deviation:

2. Measures based on the median:

a. Median absolute deviation:

Interpretation: On the average, the 12 average monthly high

characteristics using the , this assumption is ignored. For the population

Describing the variability of a characteristic of interest measured in at least

2. Put a line across the interior of the rectangle at the median.

3. Let be the interquartile range. This is also a measure of

5. Mark each mild outlier with a closed circle, .

6. Mark each extreme outlier with an open circle, O.

Source: Taken from J. evore’s Probability and Statistics for

Additional information about skewness can be obtained from the lengths of

Consider again the population data on the average monthly high

Figure 2. A boxplot of the average monthly high temperature in

Average monthly high temperature ( )

Devore (2012) shares that ―a comparative or side-by-side boxplot is a very

Figure 3. A boxplot of impurities removed using three filter types.

Source: Taken from R.L. Ott, and . Longnecker’s An Introduction to

Source: Taken from R.L. Ott, and . Longnecker’s An Introduction to

1. Compute the appropriate measure of variability, justify choice, and

1. Appropriate measure of variability:

2. Relative measure of variability:

4. Summary measures needed for the boxplot:

The boxplot for this data set is:

DEVORE, J. 2012. Probability and Statistics for Engineering and the

OTT, R.L. and M. LONGNECKER. 2016. An Introduction to Statistical

1. Why does a stem-and-leaf display resemble a frequency distribution?

2. How does a relative frequency histogram summarize data?

3. What are quantiles?

4. How does a boxplot show outliers?

For inquiries, contact:

ENGR. ELDON P. DE PADUA

Use this code when referring to this material:

Visca, Baybay City, Leyte

You might also like