0% found this document useful (0 votes)

14 views

Lecture 1ASADA Descriptive Stats

Uploaded by

shengyanmin49

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views

Lecture 1ASADA Descriptive Stats

Uploaded by

shengyanmin49

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

Applied Statistics and Data Analysis

Descriptive Statistics Review

Oksana Chernova, Ph.D.

Technical University of Munich

16/10/2023

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 1 / 33

Applied Statistics and Data Analysis [CIT5130001] is oered in both
semesters
in Freising during winter semesters
at Garching Forschungszentrum during summer semesters.

The class sessions will not be recorded. All the materials will be posted
on Moodle.

Your nal course grade will be determined solely by your performance

in the nal exam, and no grade bonuses available.

Exam 12.02.2024 (registration till 15.01.2024). The retake is the next

semester.

No late registrations are allowed, either for the course or the exam.

Only emails from a TUM address will be read.

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 2 / 33
Motivation

Figure: https://www.google.de/books/edition/Even_You_Can_Learn_
Statistics_and_Analyt/5y2tBQAAQBAJ?hl=en&gbpv=1
Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 3 / 33
Overview

1 Introduction

2 Measures of Central Tendency

3 Measures of Dispersion

4 Graphical Data Analysis

5 Descriptive Statistics in R

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 4 / 33

Introduction

Applied statistics can be divided into two areas:

descriptive statistics (methods for organizing, displaying, and
describing data by using tables, graphs, and summary measures)
inferential statistics (consists of methods that use sample results to
make decisions or predictions about a population)

Today we make a gentle introduction to univariate descriptive statistics.

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 5 / 33

Population vs sample

A population is the entire group that you want to draw conclusions

about.

A sample is the specic group that you will collect data from.

Figure: https://www.omniconvert.com/what-is/sample-size/

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 6 / 33

界限，范围；参数

Figure: https://www.questionpro.com/blog/population-vs-sample/

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 7 / 33

Example 1.

A market researcher surveys 85 people on their coee-drinking habits.

The aim is to know whether people in the local region are willing to
switch their regular drink to something new. What is the sample?
population:the local people
sample:85 people

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 8 / 33

Example 1.

A market researcher surveys 85 people on their coee-drinking habits.

The aim is to know whether people in the local region are willing to
switch their regular drink to something new. What is the sample?
The sample is the 85 people surveyed, while the population is all the
people in the local region.

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 8 / 33

Example 2.

The market researcher analyzes the data and nds that 61% of survey
respondents are willing to switch their regular drink to something new.
What is the 61% referred to as?
a) Parameter
b) Statistics
c) Sampling error
d) Standard error

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 9 / 33

Example 2.

The market researcher analyzes the data and nds that 61% of survey
respondents are willing to switch their regular drink to something new.
What is the 61% referred to as?

b) Statistics

The 61% is referred to as a statistic because it is a measure taken from

the sample.

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 9 / 33

Measures of Central Tendency

To gain intuition for any data set, one can use numerical summary
measures.

There are three main measures of central tendency: the mean, the
median, the mode.

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 10 / 33

Mean

Sum of all values

Mean =
Number of all values

The arithmetic mean calculated for sample data is denoted by x̄,

and the mean for population data is denoted by µ .

sample population

size n N
x x
mean x̄ =
P P
n µ= N

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 11 / 33

Median
The median is the value of the middle term in a data set that has been
ranked in increasing order.

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 12 / 33

Median
The median is the value of the middle term in a data set that has been
ranked in increasing order.
Odd sample size

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 12 / 33

Median
The median is the value of the middle term in a data set that has been
ranked in increasing order.
Odd sample size

Even sample size

average of two

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 12 / 33

Mean vs Median
how outer measure means differ
The median is not inuenced by outliers. Consequently, the median is
preferred over the mean as a measure of central tendency for data sets
that contain outliers.

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 13 / 33

Mode

The mode represents the most common value in a data set.

Therefore, the mode is the value that occurs with the highest frequency
in a data set.

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 14 / 33

Figure: The Flaw of Averages; Sam L. Savage, 2009

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 15 / 33

Figure: The Flaw of Averages; Sam L. Savage, 2009
The measures of central tendency do not reveal the whole picture of the
distribution of a data set.
Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 15 / 33
Measures of Dispersion

We also need a measure that can provide some information about the
variation among data values.

Therefore, to get the full picture we need to consider both measures

central tendency and dispersion.
传播，散布

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 16 / 33

Range

Range = Largest value − Smallest value

The range generally gives you a good indicator of variability when you
have a distribution without extreme values.

But the range can be misleading when you have outliers in your data
set.

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 17 / 33

IQR
The interquartile range gives the range of the middle half of a data set
IQR = Q3 − Q1 ,
Q1 = 1st quantile or 25th percentile,
Q3 = 3st quantile or 75th percentile

Figure: https://www.scribbr.com/statistics/interquartile-range/
Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 18 / 33
Variance and Standard Deviation
偏离

The variance is the the sum of squared deviations from the mean. The
variance for population data is
(xi − µ)2
P
2
σ =
N
and the variance calculated for sample data is
(xi − x̄)2
P
2
s =
n−1

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 19 / 33

Variance and Standard Deviation

The standard deviation for population data is

(xi − µ)2
rP
σ=
N
and sample data standard deviation is
(xi − x̄)2
rP
s=
n−1
The quantity xi − µ or xi − x̄ in the above formulas is called the
deviation of the xi value from the mean.

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 20 / 33

sample population

size n N
x x
mean x̄ =
P P
n µ= N

(xi −x̄)2 (xi −µ)2

variance s2 = σ2 =
P P
n−1 N
√ √
standard deviation s = s2 σ= σ2

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 21 / 33

The value of the standard deviation tells how closely the values of a
data set are clustered around the mean.

Figure: https://www.geo.fu-berlin.de/en/v/soga/Basics-of-statistics/

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 22 / 33

Graphical Data Analysis

We've covered statistics that provide a summary of data using a single

value to describe either its central tendency or its variability. Exploring
the distribution of data is also valuable. To do this, you can use:
Boxplot
Histogram
Frequency table
Density plot

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 23 / 33

Boxplot

A boxplot, or a box-and-whisker plot, summarizes a data set visually

using a ve-number summary: Lowest value, Q1, Median, Q3, Highest
value.

Figure:
https://towardsdatascience.com/understanding-boxplots-5e2df7bcbd51

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 24 / 33

Boxplot (Explanation)

A rectangle is drawn from the lower quartile Q1 (i.e., the 1st quartile)
to the upper quartile Q3 (the 3rd quartile), calculated from the data.
The line inside the rectangle represents the median.

The whiskers extending from the box mark the range of data that is
not considered outliers. The upper whisker corresponds to the largest
non-outlier, and the lower one to the smallest. Each individual data
point outside this range is depicted as a separate point and is
considered an outlier.

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 25 / 33

Outliers

Observations that do not lie in

[Q1 − 1.5 × IQR ; Q3 + 1.5 × IQR]

are potential outliers.

Why 1.5?
John W. Tukey: Because 1 is too small and 2 is too large.

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 26 / 33

Histogram
Observe (X1 , . . . , Xn ). We dene an interval [a, b] that encompasses all
observed values. This interval is divided into K subintervals A1 , . . . , Ak ,
each with the same width h = (b − a)/K.
Ai = (ti−1 , ti ], where ti = a + ih, i = 2, . . . , K, and A1 = [t1 , t2 ].
n
ni = I{Xj ∈ Ai }
X

j=1

This represents the number of observations falling within interval Ai .

The quantity ni is known as the absolute frequency of interval Ai

within the sample. The value νi = ni /n is the relative frequency.
n ti
1X
Z
νi = I{Xj ∈ Ai } ≈ P{X1 ∈ Ai } = f (t) dt .
n j= 1 ti−1

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 27 / 33

R and RStudio

http://www.r-project.org/

Once you installed R you are ready to go, however we highly

recommend to install RStudio as well. RStudio is an open-source
integrated development environment for R, which includes a console,
syntax-highlighting editor that supports direct code execution, as well
as tools for plotting, history, debugging and workspace management.

https://posit.co/downloads/

Getting started - Installing R and RStudio

https://www.geo.fu-berlin.de/en/v/soga-r/Introduction-to-R/
Getting-Started/index.html

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 28 / 33

Descriptive Statistics in R

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 29 / 33

Height survey example

A survey of adult heights was conducted, and the results have been
compiled in the le Height_Survey.csv, where each row includes
observational data for height in centimeters (Height_cm) and gender
(Sex).

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 30 / 33

Height survey example

hist(height.f, freq = FALSE)

lines(density(height.f), lwd = 3, col = 'red')

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 31 / 33

Boxplot

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 32 / 33

References

∗ Hartmann, K., Krois, J., Waske, B. (2018): E-Learning Project

SOGA: Statistics and Geospatial Data Analysis. Department of
Earth Sciences, Freie Universitaet Berlin. https://www.geo.
fu-berlin.de/en/v/soga-r/Basics-of-statistics/index.html
∗ http://awofaverages.com/
∗ https://www.scribbr.com/statistics/descriptive-statistics/
∗ https://towardsdatascience.com/
understanding-descriptive-statistics-c9c2b0641291

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 33 / 33

The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
From Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
4/5 (6131)
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (627)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brene Brown
4/5 (1148)
Never Split the Difference: Negotiating As If Your Life Depended On It
From Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
4.5/5 (935)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4/5 (8215)
Grit: The Power of Passion and Perseverance
From Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
4/5 (631)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1253)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4/5 (8365)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (860)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (877)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
From Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
4.5/5 (361)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
From Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
4/5 (954)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4/5 (2923)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
From Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
4.5/5 (484)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (277)
A Man Called Ove: A Novel
From Everand
A Man Called Ove: A Novel
Fredrik Backman
4.5/5 (4972)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (444)
Brooklyn: A Novel
From Everand
Brooklyn: A Novel
Colm Toibin
3.5/5 (2061)
The Art of Racing in the Rain: A Novel
From Everand
The Art of Racing in the Rain: A Novel
Garth Stein
4/5 (4281)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
The Little Book of Hygge: Danish Secrets to Happy Living
From Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
3.5/5 (447)
Yes Please
From Everand
Yes Please
Amy Poehler
4/5 (1987)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2283)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
From Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
4.5/5 (278)
Bad Feminist: Essays
From Everand
Bad Feminist: Essays
Roxane Gay
4/5 (1068)
The Woman in Cabin 10
From Everand
The Woman in Cabin 10
Ruth Ware
3.5/5 (2641)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (1993)
A Tree Grows in Brooklyn
From Everand
A Tree Grows in Brooklyn
Betty Smith
4.5/5 (1936)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
From Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
4.5/5 (125)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (692)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (1912)
Wolf Hall: A Novel
From Everand
Wolf Hall: A Novel
Hilary Mantel
4/5 (4074)
On Fire: The (Burning) Case for a Green New Deal
From Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
4/5 (75)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (830)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (901)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (143)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2544)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M L Stedman
4.5/5 (790)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
Little Women
From Everand
Little Women
Louisa May Alcott
4/5 (105)
The Constant Gardener: A Novel
From Everand
The Constant Gardener: A Novel
John le Carré
3.5/5 (109)

Lecture 1ASADA Descriptive Stats

Uploaded by

Lecture 1ASADA Descriptive Stats

Uploaded by

Applied Statistics and Data Analysis

Descriptive Statistics Review

Oksana Chernova, Ph.D.

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 1 / 33

Your nal course grade will be determined solely by your performance

Exam 12.02.2024 (registration till 15.01.2024). The retake is the next

Only emails from a TUM address will be read.

2 Measures of Central Tendency

4 Graphical Data Analysis

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 4 / 33

Applied statistics can be divided into two areas:

Today we make a gentle introduction to univariate descriptive statistics.

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 5 / 33

A population is the entire group that you want to draw conclusions

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 6 / 33

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 7 / 33

A market researcher surveys 85 people on their coee-drinking habits.

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 8 / 33

A market researcher surveys 85 people on their coee-drinking habits.

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 8 / 33

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 9 / 33

The 61% is referred to as a statistic because it is a measure taken from

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 9 / 33

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 10 / 33

Sum of all values

The arithmetic mean calculated for sample data is denoted by x̄,

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 11 / 33

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 12 / 33

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 12 / 33

Even sample size

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 12 / 33

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 13 / 33

The mode represents the most common value in a data set.

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 14 / 33

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 15 / 33

Therefore, to get the full picture we need to consider both measures

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 16 / 33

Range = Largest value − Smallest value

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 17 / 33

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 19 / 33

The standard deviation for population data is

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 20 / 33

(xi −x̄)2 (xi −µ)2

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 21 / 33

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 22 / 33

We've covered statistics that provide a summary of data using a single

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 23 / 33

A boxplot, or a box-and-whisker plot, summarizes a data set visually

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 24 / 33

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 25 / 33

Observations that do not lie in

are potential outliers.

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 26 / 33

This represents the number of observations falling within interval Ai .

The quantity ni is known as the absolute frequency of interval Ai

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 27 / 33

Once you installed R you are ready to go, however we highly

Getting started - Installing R and RStudio

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 28 / 33

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 29 / 33

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 30 / 33

hist(height.f, freq = FALSE)

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 31 / 33

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 32 / 33

∗ Hartmann, K., Krois, J., Waske, B. (2018): E-Learning Project

Oksana Chernova, Ph.D. Descriptive Statistics 16/10/2023 33 / 33

You might also like

Your nal course grade will be determined solely by your performance

A market researcher surveys 85 people on their coee-drinking habits.

A market researcher surveys 85 people on their coee-drinking habits.