0% found this document useful (0 votes)

23 views49 pages

Topic 4 Descriptive Statistics

Uploaded by

racieanhdao5203

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views49 pages

Topic 4 Descriptive Statistics

Uploaded by

racieanhdao5203

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 49

Topic 4 Descriptive statistics

Vincent Hoang (2022), Lecture 4

Camn et al (2016), Chapter 2
Three main goals

MEASURES OF CENTRAL MEASURES OF MEASURES OF

TENDENCY DISPERSION AND SHAPE ASSOCIATION
Review
Dichotomous:
two categories/levels, e.g.
“yes’ and ‘no’.

Source: slightly edited from https://studyonline.unsw.edu.au/sites/default/files/UNSW2.png

Descriptive statistics
• Used to describe and summarise a variable or
variables for a sample of data.
◦ For categorical or grouped data: the proportion.
◦ Measures of central tendency: mean, median, and mode
◦ Measures of dispersion: range, interquartile range, standard
deviation, coefficient of variation, percentiles, z-scores
◦ Measures of shape: skewness, kurtosis
◦ Measures of association: covariance, correlation
You can transform
data into new variables
Example 4.31 (textbook, p 178)
What is the proportion of orders that are placed with
The Proportion Spacetime Technologies?

Alum Sheeting 8
Durrable Products 13
Fast-Tie Aerospace 15
• The proportion, p, is the percentage of
Hulkey Fasteners 15
observations that have a certain
Manley Valve 11
characteristic
Pylon Accessories 5

• Very useful for categorical or grouped Spacetime Technologies 12

data Steelpin Inc. 15

Total 94
• Take the number of observations with a
characteristic (X) and divide it by the total 𝑋 12
number of observations (N) 𝑝= = = 0.128 = 12.8%
𝑁 94

12.8% of orders are placed with Spacetime Technologies.

Individual assessment
Measures of central tendency
• Three different measures of the “typical” or “representative” value in a dataset

Arithmetic Mean Median Mode

Total value of all The middle observation, after The most frequently
observations / number of ordering the data from occurring observation
observations smallest to largest
=AVERAGE(datarange)
=MEDIAN(datarange) =MODE.MULT(datarange)
Mean vs Median vs Mode
• Mean is often used for quantitative data unless outliers
exist or data is skewed.
• Median is often used in conjunction with the mean
since it is not affected by outliers. Comparing mean
with median gives us an idea of skewness.
• Mode is mainly used for qualitative data, rarely used for
numerical data. There may be no mode, multiple
modes, or the mode may not be close to the centre of
the data.
Individual assessment
• Consider investment in stock markets. For each stock,
◦ Daily return = closing price – opening price
◦ Over a year (a period) = price on the ending date – price on the beginning date
◦ Returns = price on purchase – price when re-selling

• What measures of central tendency would you use?

Excel’s Aggregate function
• Syntax AGGREGATE(function_num, options, ref1,
[ref2], …)

• Array form: AGGREGATE(function_num, options,

array, [k])
Individual assessment
Individual assessment
Three main goals

MEASURES OF CENTRAL MEASURES OF MEASURES OF

TENDENCY DISPERSION AND SHAPE ASSOCIATION
Skewness
• Measures symmetry relative to a bell- Mean = Median = Mode (no skewness)
shaped (normal) distribution.
• Normal distribution: bell shape; median
= mode = mean; no skewness
• If the mean is different to the median,
this implies skewness. As a general rule,
a value for skewness:
◦ < -1 or > 1 is highly skewed
◦ Between -1 and -0.5 or between 0.5 and 1 is Note: This rule may not apply to discrete or bimodal data.
moderately skewed
◦ Between -0.5 and 0 or between 0 and 0.5 is
approximately symmetric = SKEW(datarange)
Income distribution

https://ourworldindata.org/global-economic-inequality
“income inequality in Australia/Vietnam
has been increasing recently”
• What would you show in your analysis?
◦ Think of a specific context:
◦ (global vs) national vs state level
◦ by socio-economic demographic factors: gender, ethnicity, skills, education & qualification,
efforts etc.
◦ Think of a specific data set
◦ entire population vs income groups
◦ Think of specific measures (metrics/ indicators/ variables)
◦ mean – median – mode – skewness etc.
Measures of variation
Dispersion= Variation= Spread: refers to the
degree of variation in the data
Five key measures:
1. Range
2. Interquartile Range
What can we say about
3. Percentiles the variation in income?
4. Standard deviation
5. Coefficient of variation
Range and Interquartile Range
• Range: the difference between the minimum and maximum value in the data – sensitive to
outliers
• Interquartile Range: the range of the middle 50% of the data – the difference between the
third quartile and first quartile in the data (Q3 minus Q1) – not sensitive to outliers
1. Interpretation of percentile: percentile thứ mấy nghĩa là X% thấp hơn sample và (100-X)% cao hơn

Percentiles
sample
2. VD trong trường hợp này là 10th percentile là 12990 thì 10% Australia tax payer có mức thu nhập là thấp
hơn hoặc bằng 12990, 90% Australian tax payer có mức thu nhập cao hơn 12990

• The position in the dataset where p% of

observations are below it and (100-p)% are
above it, when ordered from smallest to
largest
◦ Useful for analysing specific points along the
distribution
◦ Most common percentiles are quartiles (i.e. 25th,
50th, 75th percentiles) or deciles (i.e. 10th,
20th,…, 90th percentiles)
◦ More extreme percentiles are affected by outliers

• =PERCENTILE.EXC(datarange, percentile)
◦ Make sure you put the percentile in as a fraction
(e.g. 20th percentile is 0.2)
Example: Gender pay gap
• If asked to use data to show current trends of gender pay
gap, what would you show?
• Consider GPG =

• https://data.wgea.gov.au/home
Standard deviation
• Difficult to interpret on its own, but assuming
the data is approximately bell-shaped (normally
distributed):
◦ 68% of observations are situated within ± 1 standard
deviation from the mean
◦ 95% of observations are situated within ± 2 standard
deviation from the mean
◦ 99.7% of observations are situated within ± 3 standard
deviation from the mean
= STDEV.S(datarange)
use coefficient variance to measure the votality of a stock

Real world business uses of SD

• Banking and finance:
◦ Standard deviation is often used as a measure of a relative riskiness of an asset.
◦ A volatile stock has a high standard deviation, while the deviation of a stable stock is usually rather
low.

• Actuaries calculate standard deviation of healthcare usage to know how much

variation in usage to expect in a given period (month, quarter, or year)
• Real estate agents calculate the standard deviation of house prices in a particular
area to inform their clients of the type of variation in house prices they can expect.
• Human Resource managers often calculate the standard deviation of salaries in a
certain field to know what type of variation in salaries to offer to new employees.
Coefficient of Variation
• The coefficient of variation (CV)
expresses the standard deviation NVL
of data relative to (divided by) its VCB
mean
• Useful for comparisons of NVL: VCB:
• Average = 5.3% • Average = 5.3%
variation across different sets of • SD = 2.67% • SD = 0.95%
data (e.g. between returns on • CV = 2.67/5.3 = 0.50 • CV = 0.95/5.3 = 0.18
different investments) Therefore, NVL has more variation in its returns
(higher risk) given the same “average” return.
Combining Mean and Standard Deviation
Individual assessment

• This shows top five stocks.

• You can calculate Coefficient of Variation to compare volatility across stocks.
Standardized Values (Z-scores)
• Sometimes we are interested in seeing where individual observations sit
relative to the mean.
• The Z-score tells us how many standard deviations away from the mean
an observation sits
• Use the =STANDARDIZE(x,mean,stdev) function in Excel
◦ a z-score of 1.0 (a positive value) means that the observation is one standard
deviation above the mean;
◦ a z-score of -1.5 means that the observation is 1.5 standard deviations below the
mean.
• Useful for checking if individual observations are outliers.
Outliers
• Skewness indicate the presence of outliers.
• No standard definition of what constitutes an
outlier.
• Several good rules of thumb are:
◦ Z-scores greater than +3 or less than −3
◦ Extreme outliers: more than 3*IQR to the left of Q1 or right of Q3
◦ Mild outliers: between 1.5*IQR and 3*IQR to the left Q1 or right of Q3
◦ Visual –an individual data point sit relative to the rest of the data
Outliers: Remove or not?
• Whether we remove outliers is a contentious debate and
this depends on the context
◦ Consider income or wealth inequality issues: definitely, we do not remove
(mild) outliers.
◦ But if we assess if education affects income, then it is reasonable to
remove outliers, definitely remove extreme outliers
Excel’s add-in: Toolpak vs RealStatistics
Outlier analysis

Visual approach Z-score approach

=STANDARDIZE(x, mean, standard deviation)
NVL Annual Return (%) Z-scores
BHP
NVL Coles
VCB BHP
NVL Coles
VCB =STANDARDIZE(0, 5.3, 2.67)
0 4 -1.99 -1.37

2 4 -1.24 -1.37

5 5 -0.11 -0.32

5 5 -0.11 -0.32
VCB
5 5 -0.11 -0.32

6 5 0.26 -0.32

6 6 0.26 0.74

This value stands out a little. 7

8
6

6
0.64

1.01
0.74

0.74 =STANDARDIZE(7, 5.3, 0.95)

9 7 1.39 1.79

None of the observations are more than

3 standard deviations from the mean
Measures of dispersion
• Dispersion= Variation= Spread: refers to the degree of variation in the data; that is,
the numerical spread (or compactness) of the data.
Tool
Measure Description Excel Formula
Pack?
The average of all the squared deviations from the mean =VAR.S(datarange)
Variance o Very difficult and often meaningless to interpret on its own Yes
o Affected by outliers
The square root of the variance =STDEV.S(datarange)
Standard o Difficult to interpret on its own, expressed in the same unit of
Yes
Deviation measurement as the variable of interest (e.g. dollars, metres)
o Affected by outliers
The standard deviation relative to (divided by) the mean
Coefficient of o Useful for comparing variation across variables when means are
No
Variation different (e.g. between returns on different stocks)
Measures of dispersion
Tool
Measure Description Excel Formula
Pack?
The difference between the maximum and minimum values =MIN(datarange)
in the data =MAX(datarange)
Range Yes
o Affected by outliers

The range of the middle 50% of the data =QUARTILE.EXC(datarange,3)

Interquartile o Calculated as Quartile 3 minus Quartile 1 =QUARTILE.EXC(datarange,1)
No
Range (IQR) o Not affected by outliers

The position in the dataset where p% of observations are =PERCENTILE.EXC(datarange,

below and (100-p)% are above percentile)
o More extreme percentiles are affected by outliers Make sure you put the percentile
Percentile o Most common percentiles are quartiles (i.e. 25th, 50th, 75th in as a fraction (e.g. 20th No
percentile is 0.2)
percentiles) or deciles (i.e. 10th, 20th,…, 90th percentiles)
Three main goals

MEASURES OF CENTRAL MEASURES OF MEASURES OF

TENDENCY DISPERSION AND SHAPE ASSOCIATION
Real-world questions
• Is that true that…
◦ bottled water sales increase as temperature increases?
◦ older houses are worth less?
◦ those that earn more consume more?
• We can gain insights by looking measures of association:
covariance and correlation
Using Bottledwater Data
Measures of association
• Covariance measures the direction of a relationship between two quantitative variables.
• Correlation measures both the direction and strength of the relationship between two quantitative
variables.
• A plot to gauge correlation by looking at how closed all the data points sit to the line of best fit.
Linear or Non-Linear Relationship
Measures of Association
• Two variables have a strong statistical relationship with one another
if they appear to move together.
• When two variables appear to be related, you might suspect a
cause-and-effect relationship.
• Sometimes, however, statistical relationships exist even though a
change in one variable is not caused by a change in the other.
Measures of Association: Covariance
• Covariance is a measure of the linear association between two variables, X and Y. Like
the variance, different formulas are used for populations and samples.

• Population covariance:

◦ Excel function: =COVARIANCE.P(array1,array2)

• Sample covariance:

◦ Excel function: =COVARIANCE.S(array1,array2)

• The covariance between X and Y is the average of the product of the deviations of each
pair of observations from their respective means.
Measures of Association: Correlation
• Correlation is a measure of the linear relationship between two variables, X and Y, which does not depend
on the units of measurement.
• Correlation is measured by the correlation coefficient, also known as the Pearson product moment
correlation coefficient.
• Correlation coefficient for a population:

• Correlation coefficient for a sample:

• The correlation coefficient is scaled between -1 and 1.

• Excel function: =CORREL(array1,array2)
Examples of Correlation
Notes on the CORREL Function
• When using the CORREL function, it does not matter if the data represent
samples or populations. In other words,

CORREL(array1,array2) =
COVARIANCE.P(array1,array2) / STDEV.P(array1)*STDEV.P(array2)

and

CORREL(array1,array2) =
COVARIANCE.S(array1,array2) / STDEV.S(array1)*STDEV.S(array2)
Excel Correlation Tool

Data >
Data Analysis >
Correlation

• Excel computes the correlation coefficient

between all pairs of variables in the Input Range. Input Range data must
be in contiguous columns.
Excel’s ToolPak add-in for multiple
variables
• Data > Data Analysis >
Correlation
• Can also use =CORREL(datarange1, datarange2)

• The function for covariance is

=COVARIANCE.S (datarange1, datarange2)

• Real-Statistics add-in allows

only two variables analysis.
Interpreting Correlation Coefficient
• Direction of the relationship: positive r Interpretation
or negative
0 No relationship
• Strength of the relationship: no,
weak, moderate, strong, very strong, < 0.3 Weak
perfect.
0.3 - 0.7 Moderate
• For example:
◦ Correlation of 0.4 indicates a moderate and Strong
positive linear relationship
> 0.7
◦ Correlation of -0.72 indicates a strong and Perfect relationship
negative linear relationship 1
A word of caution…
• When two variables appear to be related,
you might suspect a cause-and-effect
relationship.
• Sometimes, however, statistical
relationships exist even though a change
in one variable is not caused by a change
in the other.
• Correlation does imply CAUSATION
◦ More on this in week 6
Summaries
• Key descriptive statistics, dispersion, and association
◦ What are they?
◦ Their meanings, pros and cons.
◦ How to calculate these in Excel.
◦ How to apply these metrics in analysis.

Mindmap Lv1 Quant 2022
No ratings yet
Mindmap Lv1 Quant 2022
9 pages
News Trading Guide: Trading Is Our Passion
100% (6)
News Trading Guide: Trading Is Our Passion
26 pages
Chapter 4 Fin534
No ratings yet
Chapter 4 Fin534
38 pages
Notes Stats Quiz 2
No ratings yet
Notes Stats Quiz 2
10 pages
Class Test 1 Revision Notes
No ratings yet
Class Test 1 Revision Notes
10 pages
Basic Statistics
No ratings yet
Basic Statistics
24 pages
EECM3724_Unit_1_Ch3_slides_2022
No ratings yet
EECM3724_Unit_1_Ch3_slides_2022
48 pages
Bus. Statt. Chapter-Lecture 2+3
No ratings yet
Bus. Statt. Chapter-Lecture 2+3
43 pages
Lecture III-Measures of Dispersion
No ratings yet
Lecture III-Measures of Dispersion
33 pages
Why Study Dispersion?: Spread of The Data
No ratings yet
Why Study Dispersion?: Spread of The Data
31 pages
Biostat Ch-5
No ratings yet
Biostat Ch-5
58 pages
Stat
No ratings yet
Stat
16 pages
Seminar Slides Week 3 - Fullpage
No ratings yet
Seminar Slides Week 3 - Fullpage
36 pages
Understanding Data Variability Position and The Normal Curve
No ratings yet
Understanding Data Variability Position and The Normal Curve
9 pages
Unit 3. Measures of Dispersion Revised
No ratings yet
Unit 3. Measures of Dispersion Revised
41 pages
Chapter 4
No ratings yet
Chapter 4
21 pages
Understanding Data Variability Position and the Normal Curve
No ratings yet
Understanding Data Variability Position and the Normal Curve
9 pages
Session3
No ratings yet
Session3
11 pages
DSILYTC Session 5 - Descriptive Statistics
No ratings yet
DSILYTC Session 5 - Descriptive Statistics
99 pages
Chapter 5
No ratings yet
Chapter 5
6 pages
Measures of Dispersion
No ratings yet
Measures of Dispersion
8 pages
Measures of Dispersion
No ratings yet
Measures of Dispersion
59 pages
Lecture 3
No ratings yet
Lecture 3
14 pages
Measures of Dispersion
No ratings yet
Measures of Dispersion
79 pages
Descriptive-Statistics
No ratings yet
Descriptive-Statistics
25 pages
Unit 1 - Business Statistics & Analytics
No ratings yet
Unit 1 - Business Statistics & Analytics
25 pages
4 - Dispersion & Skewness - Part 1
No ratings yet
4 - Dispersion & Skewness - Part 1
35 pages
Lecture 3 - Numerical Statistics
No ratings yet
Lecture 3 - Numerical Statistics
7 pages
dddddd2
No ratings yet
dddddd2
5 pages
2.descriptive Statistics
No ratings yet
2.descriptive Statistics
49 pages
Presentation 4
No ratings yet
Presentation 4
29 pages
8614.educational Statitics Unit 4
No ratings yet
8614.educational Statitics Unit 4
34 pages
Part 2-Chapter 3 - Describing Data - Edit
No ratings yet
Part 2-Chapter 3 - Describing Data - Edit
46 pages
Introduction To Descriptive Statistics 2014
67% (3)
Introduction To Descriptive Statistics 2014
72 pages
Math2101Stat 2 2
No ratings yet
Math2101Stat 2 2
23 pages
Relative Measures of Dispersion
100% (1)
Relative Measures of Dispersion
8 pages
Measures of Dispersion
No ratings yet
Measures of Dispersion
24 pages
Chapter 3, Part A Descriptive Statistics: Numerical Measures
No ratings yet
Chapter 3, Part A Descriptive Statistics: Numerical Measures
7 pages
Chapter 4 Measures of Dispersion (Variation)
No ratings yet
Chapter 4 Measures of Dispersion (Variation)
34 pages
Lesson 4-Analysis-Interpretation-Descriptive Statistics
No ratings yet
Lesson 4-Analysis-Interpretation-Descriptive Statistics
25 pages
4 - Dispersion & Skewness - Part 1
No ratings yet
4 - Dispersion & Skewness - Part 1
35 pages
Discriptive Statistics
No ratings yet
Discriptive Statistics
23 pages
Statistical Measures 2024 (Part 2) - Word
No ratings yet
Statistical Measures 2024 (Part 2) - Word
8 pages
unit 5 brm
No ratings yet
unit 5 brm
17 pages
Chapter Four
No ratings yet
Chapter Four
27 pages
f592b059 1643454320549
No ratings yet
f592b059 1643454320549
39 pages
Unit 3 Descriptive Statistics
No ratings yet
Unit 3 Descriptive Statistics
25 pages
Measures of Dispersion
No ratings yet
Measures of Dispersion
14 pages
Measures of Disperson
No ratings yet
Measures of Disperson
17 pages
Business Statistics and Analysis Course 2&3
No ratings yet
Business Statistics and Analysis Course 2&3
42 pages
Bioepi Lesson 6. Descriptive Statistics
No ratings yet
Bioepi Lesson 6. Descriptive Statistics
38 pages
Topic II Part II
No ratings yet
Topic II Part II
22 pages
Descriptive Statistics.pptx
No ratings yet
Descriptive Statistics.pptx
14 pages
Descriptive Stat
No ratings yet
Descriptive Stat
13 pages
Module-5
No ratings yet
Module-5
51 pages
Numerical Descriptive Measures
No ratings yet
Numerical Descriptive Measures
52 pages
Summarizing Data-Measures of Dispersion
No ratings yet
Summarizing Data-Measures of Dispersion
47 pages
3) Statistical Measures of Asset Returns Copy
No ratings yet
3) Statistical Measures of Asset Returns Copy
6 pages
Unit 4 Descriptive Statistics
No ratings yet
Unit 4 Descriptive Statistics
8 pages
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
From Everand
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
Seaport AI Madhavan
No ratings yet
Descriptive Statistics: Six Sigma Thinking, #3
From Everand
Descriptive Statistics: Six Sigma Thinking, #3
Sumeet Savant
No ratings yet
Gale Researcher Guide for: Econometric Models
From Everand
Gale Researcher Guide for: Econometric Models
Chupp
No ratings yet
Determinants of Bop
No ratings yet
Determinants of Bop
6 pages
Chapter 23 - Measuring A Nation's Income (Content)
No ratings yet
Chapter 23 - Measuring A Nation's Income (Content)
19 pages
IMF - Whose Inflation
No ratings yet
IMF - Whose Inflation
17 pages
Economics Module2 Unemployment
No ratings yet
Economics Module2 Unemployment
2 pages
At Current Prices For Periods Indicated
No ratings yet
At Current Prices For Periods Indicated
2 pages
Engineering Economics Class12
No ratings yet
Engineering Economics Class12
19 pages
Karnataka Economy
No ratings yet
Karnataka Economy
14 pages
Project Start Date Project End Date Aggregate Planning Unit Cost Entry Currency
No ratings yet
Project Start Date Project End Date Aggregate Planning Unit Cost Entry Currency
15 pages
Chapter 15
No ratings yet
Chapter 15
16 pages
A. B. C. D.: Home General Study and GK
No ratings yet
A. B. C. D.: Home General Study and GK
5 pages
E Survey
No ratings yet
E Survey
164 pages
Bhutans Gross National Happiness
No ratings yet
Bhutans Gross National Happiness
4 pages
Working Paper June 2015: Manufacturing or Services? An Indian Illustration of A Development Dilemma
No ratings yet
Working Paper June 2015: Manufacturing or Services? An Indian Illustration of A Development Dilemma
53 pages
Akuntansi Dasar 1 - Inventory
No ratings yet
Akuntansi Dasar 1 - Inventory
5 pages
China Dynamo Electromotor Machines Mfg. Industry Profile Cic391
No ratings yet
China Dynamo Electromotor Machines Mfg. Industry Profile Cic391
8 pages
Gross Domestic Product-An Analysis: Contents
No ratings yet
Gross Domestic Product-An Analysis: Contents
4 pages
Calculating GDP Deflator
100% (1)
Calculating GDP Deflator
22 pages
Assignment #5 GDP Deflator
No ratings yet
Assignment #5 GDP Deflator
3 pages
Measuring A Nation's Income
No ratings yet
Measuring A Nation's Income
3 pages
Belarus GDP New
No ratings yet
Belarus GDP New
16 pages
Advanced Economics Questions
No ratings yet
Advanced Economics Questions
5 pages
Nepal Economic Survey 2009-10 - Tables
No ratings yet
Nepal Economic Survey 2009-10 - Tables
209 pages
Business Environment - Economic Indicators-1
50% (2)
Business Environment - Economic Indicators-1
35 pages
International Macroeconomics: Slides For Chapter 9: Determinants of The Real Exchange Rate
No ratings yet
International Macroeconomics: Slides For Chapter 9: Determinants of The Real Exchange Rate
51 pages
Introduction To Macro and Micro Economics
No ratings yet
Introduction To Macro and Micro Economics
3 pages
Applying Earned Value Management To Your Project
100% (1)
Applying Earned Value Management To Your Project
3 pages
INO-ADB Key Indicators
No ratings yet
INO-ADB Key Indicators
20 pages
The Construction Sector and Economic Development
No ratings yet
The Construction Sector and Economic Development
8 pages