Statistics For Economists: Lecturer: DR Omid Mazdak Email: Omid - Mazdak@kcl - Ac.uk
Statistics For Economists: Lecturer: DR Omid Mazdak Email: Omid - Mazdak@kcl - Ac.uk
Lecture 1
Lecturer: Dr Omid Mazdak
Email: [email protected]
1
Lecture 1 – Descriptive Statistics
2
Purpose of Statistics
3
Concept: Random Variable
4
Concept: Sample Vs. Population
• A Sample: a subset of observations from variable(s) from the
population.
- E.g. An election exit poll is drawn from a sample of the voter
population.
5
Concept: Sample Vs. Population (2)
• Sample size is usually indicated by n
and the population size by N with n <
N.
• A parameter: numerical measure that
describes a specific characteristic of a
population.
• A sample statistic/estimator: is
numerical measure that describes a
specific characteristic of a sample
which is used to estimate the
population parameter. The sample
mean is an example of a sample
statistic and is used to estimate the
population mean.
Random Samples
8
Types of Economic Data
• Variables
• Categorical variables (defined categories or groups, e.g. male/female)
• Numerical variables
• Discrete variables (counted items)
• Continuous variables (measured characteristics)
• Data
• Cross-sectional data
• Time series data
• Panel data
9
Types of Economic Data – Cross Sectional Data
Cross-section data:
Observations from multiple
variables, at a given moment
time.
10
Types of Economic Data – Cross Sectional Data (2)
Example 2 of Cross sectional data: Countries with Largest Trade Surpluses (2019)
Trade Balance (2019)
China
Germany
Russian Federation
Saudi Arabia
Ireland
Netherlands
Italy
Australia
United Arab Emirates
Brazil
Qatar
Taipei, Chinese
Iraq
US$ Trillions
2
a single 1.5
variable over a 1
period of time… 0.5
E.g. UK GDP 0
1998-2019
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
Source: World Bank
12
Types of Economic Data – Panel Data
UK, China, US, GDP (1998 - 2019) Current US$
Panel data: Trillions
Is a set of 25
observations of 20
multiple
US$ Trillions
15
variables over a UK
period of time… 10 China
US
E.g. UK, US 5
2007
1998
1999
2000
2001
2002
2003
2004
2005
2006
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
Year
13
Measures of Central Tendency
Add up all observations of
1 n variable x, from 1 to n,
The Mean: Arithmetic mean = x = xi and divide by the number
n i =1 of observations, n.
14
Measures of Central Tendency: Arithmetic Mean
15
Measures of Central Tendency: Median
The median is also known as the 50th Percentile. In other words, 50% of
the observations are below or equal to this value.
Xi = 1, 2, 5, 5, 6, 9, 11, 15
UK income survey:
f i
x f fx
Class in £ Mid income point Number in thousand
0-10k 5 2448 12240
10-25k 17.5 1823 31902.5
25-40k 32.5 1375 44687.5
40-50k 45 480 21600
50-60k 55 665 36575
60-80k 70 1315 92050
80-100k 90 1640 147600
100-150k 125 2151 268875
150-200k 175 2215 387625
200-300k 250 1856 464000
300-500k 400 1057 422800
500-1000k 750 439 329250
1000-2000k 1500 122 183000
2000k+ 3000 50 150000
total 17636 2592205
Mean 146983.726 Mean ≈ 147k 18
Measures of Central Tendency – Grouped data (2)
UK income survey:
Class in £ Number in thousand frequency cumulative freq.
Mode ≈ 5k 0-10k 2448 13.88% 13.88%
10-25k 1823 10.34% 24.22%
25-40k 1375 7.80% 32.01%
40-50k 480 2.72% 34.74%
50-60k 665 3.77% 38.51%
Median ≈ 80k 60-80k 1315 7.46% 45.96%
80-100k 1640 9.30% 55.26%
100-150k 2151 12.20% 67.46%
Mean ≈ 147k 150-200k 2215 12.56% 80.02%
200-300k 1856 10.52% 90.54%
300-500k 1057 5.99% 96.54%
500-1000k 439 2.49% 99.02%
1000-2000k 122 0.69% 99.72%
2000k+ 50 0.28% 100.00%
total 17636 100%
19
Measures of Central Tendency – Grouped data (3)
UK Income Survey:
The Mode The Median The Mean
0.25 Histogram
0.2
0.15
0.1
0.05
0
10 60 110 160 210 260
• The mode < the median < the mean. So the distribution is skewed to
the right. (If the reverse was true, it would be skewed to left)
• If mode = the median = the mean, then it would be a symmetrical
distribution 20
Percentiles
A percentile is the percent of observations that are less than or equal to a given
value.
To calculate pth percentile, (for any percentile, p) the observations need to be first
ordered from lowest to highest.
Pth percentile = value located in the (P/100)(n + 1)th ordered position
So, for e.g., the 25th percentile (also known as the first, or lower quartile, Q1):
Q1 = the value in the 0.25(n + 1)th ordered position.
The 75th percentile (also known as the third or upper quartile, Q3):
Q3 = the value in the 0.75(n + 1)th ordered position.
21
BOX PLOT
22
Measures of dispersion: Variance, Standard Deviation, Coefficient of Variation
Variance is a measure of the dispersion of the data from the mean. The larger the
variance, the larger the standard deviation and the larger the coefficient of variation.
s= 𝑠 2 σ = σ2
24
Summary
• Statistics can be used both to test theoretical hypothesis, and also to
create new theory from empirical observation.
• Data, can be in the form of cross sectional, time series and panel
data.
• There are three main measures of central tendency, the mean,
median and mode.
• Observations from a variable can be divided into percentiles, to give
an idea of the dispersion, and data distribution can be summarized
using a box plot.
• Variance, and the standard deviation of a variable can be used to
give a formal measure of the dispersion of the observations from the
mean.
• Next lecture, some additional descriptive statistics (covariance,
correlation) is covered, and probability theory is introduced. 25