0% found this document useful (0 votes)

34 views

Nanodegree

Uploaded by

Eric Djagam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views

Nanodegree

Uploaded by

Eric Djagam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 12

Lesson Overview

In this lesson, we will continue to cover more topics related to analyzing

quantitative variables and you will learn to use **measures of spread.
** Measures of spread are used to provide us an idea of how spread-
out our data are from one another.

In this lesson you will:

 Evaluate measures of spread

 Range
 Interquartile Range (IQR)
 Standard Deviation
 Variance
 Analyze outliers
 Evaluate descriptive and inferential statistics

Throughout this lesson, you will learn how to calculate these, as well as
why we would use one measure of spread over another.

Histograms

Histograms
Histograms
Histograms are super useful for understanding the different aspects of data and they are the
most common visual used for quantitative data. In the upcoming concepts, you will see
histograms used all the time to help you understand the four aspects we outlined earlier
regarding a quantitative variable:

 center
 spread
 shape
 outliers

How are Histograms are constructed?

First, we need to bin our data. Each bin represents a range of values in a dataset. The number
of values that fall in the range of each bin determines the height of each histogram bar. As
shown in the video above, changing the range of our bins can result in slightly different
visuals. However, there is no right or wrong answer in choosing how to bin, and in most
cases, the software you use will choose the appropriate bins for you.
PrécédentSuivant

Envoyer des commentaires sur la page

Weekdays vs. Weekends: What is the Difference

Weekdays vs. Weekends

The two histograms below illustrate the number of dogs Josh saw on weekdays versus
weekends. The measures of center for both histograms (mean, median, mode) are basically
the same and centered about the highest bin for both histograms, 13.

Visually, the difference between the histograms is the range or spread of dogs Josh sees
during each time period. In the upcoming lessons, we will discuss the most common ways
to measure the spread of our data.

PrécédentSuivant

Envoyer des commentaires sur la page

Introduction to Five Number Summary

Five Number Summary

Calculating the 5 Number Summary
The five-number summary consist of 5 values:

1. Minimum: The smallest number in the dataset.

2. �1Q1: The value such that 25% of the data fall below.
3. �2Q2: The value such that 50% of the data fall below.
4. �3Q3: The value such that 75% of the data fall below.
5. Maximum: The largest value in the dataset.

In the above video, we saw that calculating each of these values was essentially just finding
the median of a bunch of different datasets. Because we are essentially calculating a bunch of
medians, the calculation depends on whether we have an odd or even number of values.

Range
The range is then calculated as the difference between the maximum and the minimum.

IQR
The interquartile range is calculated as the difference between �3Q3 and �1Q1.

In the upcoming sections, you will practice this with Katie and on your own.

PrécédentSuivant

Envoyer des commentaires sur la page

Quiz: 5 Number Summary Practice

Do you know your 5 Number Summary?

Question du questionnaire
Identify the following for this dataset:

1, 5, 10, 3, 8, 12, 4, 1, 2, 8
10
9
11
8
2
5
4.5
Item
Number
Range
First Quartile
Third Quartile
Median
Envoyer
Question du questionnaire
Identify the following for this dataset:

5, 10, 3, 8, 12, 4, 1, 2, 8
5
4.5
9
9.5
2.5
11
10
Item
Number
Range
First Quartile
Third Quartile
Median
Envoyer

PrécédentSuivant

Envoyer des commentaires sur la page

What if We Only Want One Number?

Looking back at the histograms Josh created for the number of dogs he recorded seeing on
weekdays and weekends, we can use the histograms to mark the values of the 5 number
summary and create a box plot.

 Box plots are useful for quickly comparing the spread of two data sets across some
key metrics, like quartiles, maximum, and minimum.

How do we create the box plot?

1. The beginning of the line to the left of the box and the end of the line to the right of
the box represent the minimum and maximum values in a dataset.
2. The visual distance between these markings is an indication of the range of the values.
3. The box itself represents the IQR. The box begins at the Q1 value, ends at the Q3
value, and Q2, or the median, is represented by a line within the box.
From both the histograms and box plots, we can see that the number of dogs seen on
weekends varies much more than on weekdays.

However, instead of depending on a visual of the 5 number summary to compare our data, in
the next lesson, we will learn about using a single value to compare the two distribution
spreads - standard deviation.

PrécédentSuivant

Envoyer des commentaires sur la page

Introduction to Standard Deviation and Variance

Standard Deviation and Variance

Standard Deviation and Variance
The standard deviation is one of the most common measures for talking about the spread of
data. It is defined as the average distance of each observation from the mean.

In the above video, we saw this as how far individuals were from the average distance from
work (the example distances shown are examples from the full data set, the mean of just those
4 numbers is 38.5. The mean of 18 shown later in the video is the mean of the full data set
which is not shown in the video). In the next video, you will see exactly how this is
calculated.

PrécédentSuivant

Envoyer des commentaires sur la page

Standard Deviation Calculation

Note: at 2:00 the 4 in (14-10)2 = 4 = 16 should be squared. So it should be (14-10) 2 = 42 = 16

How to Calculate Standard Deviation

Dataset = 10, 14, 10, 6

1. Calculate the mean (∑�=14��)/�(i=1∑4xi)/n = 40/4 = 10

2. Calculate the distance of each observation from the mean and square the value

$$ (x_i - \overline{x})^2 $$ =

10-10 0

14-10 16

10-10 0

6-10 16

1. Calculate the **variance**, the average squared difference of each observation from the mean

$$ \frac{1}{n} \sum\limits_{i = 1}^n (x_i - \overline{x})^2 $$ =

(0+16+0+16)/4 8

1. Calculate the standard deviation, the square root of the variance

$$\sqrt{\frac{1}{n} \sum\limits_{i = 1}^n (x_i - \overline{x})^2} $$ =

2.83
88
is on average, how far each point in our dataset is from the mean.

PrécédentSuivant

Envoyer des commentaires sur la page

Introduction to the Standard Deviation and Variance

Other Measures of Spread

5 Number Summary
In the previous sections, we have seen how to calculate the values
associated with the five-number summary (min, �1Q1, �2Q2
, �3Q3, max), as well as the measures of spread associated with these
values (range and IQR).

For datasets that are not symmetric, the five-number summary and a
corresponding box plot are a great way to get started with
understanding the spread of your data. Although I still prefer a
histogram in most cases, box plots can be easier to compare two or
more groups. You will see this in the quizzes towards the end of this
lesson.

Variance and Standard Deviation

Two additional measures of spread that are used all the time are
the variance and standard deviation. At first glance, the variance and
standard deviation can seem overwhelming. If you do not understand
the expressions below, don't panic! In this section, I just want to give
you an overview of what the next sections will cover. We will walk
through each of these parts thoroughly in the next few sections, but the
big picture goal is to generally understand the following:

1. How the mean, variance, and standard deviation are calculated.

2. Why the measures of variance and standard deviation make sense

to capture the spread of our data.

3. Fields, where you might see these values used.

4. Why we might use the standard deviation or variance as opposed

to the values associated with the 5 number summary for a
particular dataset.

Calculation
We calculate the variance in the following way:

1�∑�=1�(��−�ˉ)2n1i=1∑n(xi−xˉ)2
The variance is the average squared difference of each observation
from the mean.

To calculate the variance of a set of 10 values in a spreadsheet

application, with our 10 data points in column A, we would create a new
column B by typing in something like =A1-AVERAGE(A$1:A$10) and
copying this down for all 10 rows. This would find us the difference
between each data point and the mean average of all the data. Then we
create a new column C having the square of these differences, using the
formula =B1^2 in cell C1, and copying that down for all rows. Then in
the cell below this new column, cell C11, type in =SUM(C1:C10). This
adds up all these values in column C. Finally in cell C12, we divide this
sum by the number of data points we have, in this case, ten: =C11/10.
This cell C12 now contains the variance for our 10 data points.

More detailed guidance on using spreadsheets like this may be included

in a future lesson in your program.

The standard deviation is the square root of the variance. Therefore, the
formula for the standard deviation is the following:

1�∑�=1�(��−�ˉ)2n1i=1∑n(xi−xˉ)2
In the same spreadsheet as above, to find the standard deviation of our
same set of 10 data values, we would use another cell like C13 to take
the square root of our variance measure, by typing in =sqrt(C12).

The standard deviation is a measurement that has the same units as

our original data, while the units of the variance are the square of the
units in our original data. For example, if the units in our original data
were dollars, then units of the standard deviation would also be dollars,
while the units of the variance would be dollars squared.

Again, this section is designed as background knowledge for the

following sections. If it doesn't make sense on this first pass, do not
worry. You will be guided in future sections in performing these
calculations, and building your intuition, as you work through an
example using the salary data. Then we will provide context about why
these calculations are important, and where you might see them!

Why the Standard Deviation?

Standard deviation is a common metric used to compare the spread
of two datasets. The benefits of using a single metric instead of the 5
number summary are:

 It simplifies the amount of information needed to give a measure

of spread

 It is useful for inferential statistics

Important Final Points

Important Final Points
1. The variance is used to compare the spread of two different
groups. A set of data with higher variance is more spread out than
a dataset with lower variance. Be careful though, there might just
be an outlier (or outliers) that is increasing the variance when
most of the data are actually very close.
2. When comparing the spread between two datasets, the units of
each must be the same.
3. When data are related to money or the economy, higher variance
(or standard deviation) is associated with higher risk.
4. The standard deviation is used more often in practice than the
variance because it shares the units of the original dataset.
Use in the World
The standard deviation is associated with risk in finance, assists in
determining the significance of drugs in medical studies, and measures
the error of our results for predicting anything from the amount of
rainfall we can expect tomorrow to your predicted commute time
tomorrow.

These applications are beyond the scope of this lesson as they pertain
to specific fields, but know that understanding the spread of a particular
set of data is extremely important to many areas. In this lesson, you
mastered the calculation of the most common measures of spread.
Measures of Center and Spread Summary

Recap

Variable Types
We have covered a lot up to this point! We started with identifying data
types as either categorical or quantitative. We then learned we could identify
quantitative variables as either continuous or discrete. We also found we could
identify categorical variables as either ordinal or nominal.

Categorical Variables
When analyzing categorical variables, we commonly just look at the
count or percent of a group that falls into each level of a category. For
example, if we had two levels of a dog category: lab and not lab. We might
say, 32% of the dogs were lab (percent), or we might say 32 of the 100
dogs I saw were labs (count).

However, the 4 aspects associated with describing quantitative variables

are not used to describe categorical variables.

Quantitative Variables
Then we learned there are four main aspects used to
describe quantitative variables:

1. Measures of Center

2. Measures of Spread

3. Shape of the Distribution

4. Outliers

We looked at calculating measures of Center

1. Means

2. Medians
3. Modes

We also looked at calculating measures of Spread

1. Range

2. Interquartile Range

3. Standard Deviation

4. Variance

Calculating Variance
We saw that we could calculate the variance as:

1�∑�=1�(��−�ˉ)2n1i=1∑n(xi−xˉ)2
You will also see:

1�−1∑�=1�(��−�ˉ)2n−11i=1∑n(xi−xˉ)2
The reason for this is beyond the scope of what we have covered thus
far, but you can find an explanation here.

You can commonly find answers to your questions with a quick Google
search. Now is a great time to get started with this practice! This
answer should make more sense at the completion of this lesson.

Standard Deviation vs. Variance

The standard deviation is the square root of the variance. In practice,
you usually use the standard deviation rather than the variance. The
reason for this is because the standard deviation shares the same units
with our original data, while the variance has squared units.

What Next?
In the next sections, we will be looking at the last two aspects of
quantitative variables: shape and outliers. What we know about
measures of center and measures of spread will assist in your
understanding of these final two aspects.
Supporting Materials

 Calculating Variance

Varianc and Standard Deviation
No ratings yet
Varianc and Standard Deviation
10 pages
Variance and Standard Deviation (2)
No ratings yet
Variance and Standard Deviation (2)
35 pages
Discriptive Statistics
No ratings yet
Discriptive Statistics
23 pages
Lecture Slides - Capítulo 02
No ratings yet
Lecture Slides - Capítulo 02
21 pages
Measures of Dispersion
No ratings yet
Measures of Dispersion
40 pages
1.3 Variation
No ratings yet
1.3 Variation
16 pages
3-Measures of Dispersion
No ratings yet
3-Measures of Dispersion
33 pages
Measures of Variability
No ratings yet
Measures of Variability
20 pages
Unit-3-Measure-of-Central-Location
No ratings yet
Unit-3-Measure-of-Central-Location
29 pages
Measures of Variability and Normal Distribution
No ratings yet
Measures of Variability and Normal Distribution
61 pages
Lecture 5
No ratings yet
Lecture 5
25 pages
1.3 Describing Distributions With Numbers
No ratings yet
1.3 Describing Distributions With Numbers
45 pages
Measures of Spread and Dispersion
No ratings yet
Measures of Spread and Dispersion
20 pages
Standard Deviation and Variance
No ratings yet
Standard Deviation and Variance
10 pages
Statistics
No ratings yet
Statistics
4 pages
Measures of Variability
100% (2)
Measures of Variability
71 pages
2 - Descriptive Statistics
No ratings yet
2 - Descriptive Statistics
29 pages
Measures of Dispersion
No ratings yet
Measures of Dispersion
23 pages
Lec006 - Measures of Dispersion
No ratings yet
Lec006 - Measures of Dispersion
42 pages
Lec.3 Measures of Spread (1) .
No ratings yet
Lec.3 Measures of Spread (1) .
15 pages
Ch 2 Lecture Notes
No ratings yet
Ch 2 Lecture Notes
12 pages
Lecture III-Measures of Dispersion
No ratings yet
Lecture III-Measures of Dispersion
33 pages
Descriptive Statsistics
No ratings yet
Descriptive Statsistics
34 pages
Measures of Dispersion and Relative Standing
No ratings yet
Measures of Dispersion and Relative Standing
11 pages
Unit 1 - Business Statistics & Analytics
No ratings yet
Unit 1 - Business Statistics & Analytics
25 pages
Meas T
No ratings yet
Meas T
8 pages
04 Variance and Standard Deviation
No ratings yet
04 Variance and Standard Deviation
3 pages
AP ECON 2500 Session 2
No ratings yet
AP ECON 2500 Session 2
22 pages
Measures of Dispersion
No ratings yet
Measures of Dispersion
17 pages
Unit - 2 Biostatistics
No ratings yet
Unit - 2 Biostatistics
9 pages
BComp3 Module 5 Measures of Variability
No ratings yet
BComp3 Module 5 Measures of Variability
17 pages
Measures of Dispersion
No ratings yet
Measures of Dispersion
26 pages
Measures of Variability: Range, Interquartile Range, Variance, and Standard Deviation
No ratings yet
Measures of Variability: Range, Interquartile Range, Variance, and Standard Deviation
13 pages
Module 5 - Range, Variance, Standard Deviation
No ratings yet
Module 5 - Range, Variance, Standard Deviation
31 pages
measures of dispersion updated
No ratings yet
measures of dispersion updated
38 pages
Module 4. Part2 Analyzing and Interpreting Data 1
No ratings yet
Module 4. Part2 Analyzing and Interpreting Data 1
42 pages
Measures of Dispersion
No ratings yet
Measures of Dispersion
3 pages
MetNum1 2023 1 Week 10
No ratings yet
MetNum1 2023 1 Week 10
79 pages
Brick Exchange • Descriptive Statistics and Data Representation
No ratings yet
Brick Exchange • Descriptive Statistics and Data Representation
24 pages
Measures of Dispersion
No ratings yet
Measures of Dispersion
79 pages
These are the measures of variability
No ratings yet
These are the measures of variability
4 pages
3-Measures of Central Tendency
No ratings yet
3-Measures of Central Tendency
59 pages
Lecture 9 Measure of Dispersion
No ratings yet
Lecture 9 Measure of Dispersion
43 pages
Biostatistics (Descriptive Statistics)
No ratings yet
Biostatistics (Descriptive Statistics)
30 pages
Measures of Variation, Quartiles and Percentiles, Skewness and Kurtosis
No ratings yet
Measures of Variation, Quartiles and Percentiles, Skewness and Kurtosis
16 pages
Measures of Variability
No ratings yet
Measures of Variability
8 pages
Final Measures of Dispersion DR Lotfi
No ratings yet
Final Measures of Dispersion DR Lotfi
54 pages
Central Tendency - Lecture Notes
No ratings yet
Central Tendency - Lecture Notes
34 pages
Biostat Ch-5
No ratings yet
Biostat Ch-5
58 pages
Unit 6 Interpreting Evaluation Results
No ratings yet
Unit 6 Interpreting Evaluation Results
54 pages
07 Box Plots, Variance and Standard Deviation
No ratings yet
07 Box Plots, Variance and Standard Deviation
5 pages
2 Stats Intro 14022024 105150am
No ratings yet
2 Stats Intro 14022024 105150am
19 pages
Group-1 Module-1 PPT
No ratings yet
Group-1 Module-1 PPT
100 pages
Statistical Data
No ratings yet
Statistical Data
41 pages
Measures of Variability
No ratings yet
Measures of Variability
6 pages
Author(s) Prerequisites Learning Objectives: Measures of Variability
No ratings yet
Author(s) Prerequisites Learning Objectives: Measures of Variability
17 pages
04 - Measures of Variation
No ratings yet
04 - Measures of Variation
24 pages
Statistics
No ratings yet
Statistics
30 pages
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
From Everand
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
Seaport AI Madhavan
No ratings yet
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
From Everand
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
Peter Bradley
No ratings yet
Margate Activity 6.1 Eda
No ratings yet
Margate Activity 6.1 Eda
5 pages
22-Article Text-144-3-10-20200220
No ratings yet
22-Article Text-144-3-10-20200220
10 pages
Name: - : Instructions
No ratings yet
Name: - : Instructions
8 pages
The Normal Distribution and Other Continuous Distributions: P Z 1.12) 1 0.8686 0.1314
No ratings yet
The Normal Distribution and Other Continuous Distributions: P Z 1.12) 1 0.8686 0.1314
4 pages
Slides Prepared by John S. Loucks St. Edward's University: 1 Slide
No ratings yet
Slides Prepared by John S. Loucks St. Edward's University: 1 Slide
76 pages
B.stats CH 1&2 by S.P Gupta & Archana Aggrawal
No ratings yet
B.stats CH 1&2 by S.P Gupta & Archana Aggrawal
56 pages
STATISTICS 2015 To 2024
No ratings yet
STATISTICS 2015 To 2024
17 pages
Moments 7
No ratings yet
Moments 7
4 pages
BioPhysics Lab Manual Experiment 1
No ratings yet
BioPhysics Lab Manual Experiment 1
2 pages
Lec 1
No ratings yet
Lec 1
54 pages
Intrebari Grila Econometrie Exemple PT EXAMEN Pus
No ratings yet
Intrebari Grila Econometrie Exemple PT EXAMEN Pus
7 pages
Midterm Exam Data Analytics
No ratings yet
Midterm Exam Data Analytics
858 pages
3RD Quarter Test Questions 2023-2024
No ratings yet
3RD Quarter Test Questions 2023-2024
8 pages
Statistics
70% (20)
Statistics
32 pages
MAE202 FINALterm 2nd Sem AY 22-23-Zafra-Jonald-Grace
No ratings yet
MAE202 FINALterm 2nd Sem AY 22-23-Zafra-Jonald-Grace
13 pages
Download Statistics for Nursing Research A Workbook for Evidence-Based Practice 3rd Edition Susan K. Grove ebook All Chapters PDF
100% (5)
Download Statistics for Nursing Research A Workbook for Evidence-Based Practice 3rd Edition Susan K. Grove ebook All Chapters PDF
37 pages
Chapter 6 - Utilization of Assessment Data Module 11
No ratings yet
Chapter 6 - Utilization of Assessment Data Module 11
6 pages
Social Science Research Principles Methods and Practices
No ratings yet
Social Science Research Principles Methods and Practices
5 pages
Descriptive statistics and probability theory
No ratings yet
Descriptive statistics and probability theory
4 pages
Soal Korelasi Dan Regresi Linier Sederhana
No ratings yet
Soal Korelasi Dan Regresi Linier Sederhana
11 pages
Statistics For Managers Using Microsoft Excel: 5 Edition
No ratings yet
Statistics For Managers Using Microsoft Excel: 5 Edition
54 pages
Unit 4 Measures of Central Tendency and Dispersion: Structure
No ratings yet
Unit 4 Measures of Central Tendency and Dispersion: Structure
90 pages
Statistics SE AIDS (2019 Pattern) (Semester IV) June 2023 (1)
No ratings yet
Statistics SE AIDS (2019 Pattern) (Semester IV) June 2023 (1)
4 pages
Kurtosis By Alejandro, Adrian
No ratings yet
Kurtosis By Alejandro, Adrian
3 pages
Tugas Biostatiska Annisa Rahmawati
No ratings yet
Tugas Biostatiska Annisa Rahmawati
8 pages
MMW 101 - Lesson 8 - Measures of Central Tendency
No ratings yet
MMW 101 - Lesson 8 - Measures of Central Tendency
24 pages
Becs-184 Question Paper
No ratings yet
Becs-184 Question Paper
12 pages
Jurnal Zafran New
No ratings yet
Jurnal Zafran New
15 pages
Sample Mean R.V. ̅ : Bern/bin
No ratings yet
Sample Mean R.V. ̅ : Bern/bin
1 page
Measures of Relative Standing
No ratings yet
Measures of Relative Standing
59 pages