Probabilities and Statistics
Probabilities and Statistics
KT Wong
Faculty of Social Sciences, HKU
2024-08-27
Motivation
need a framework for communicating our uncertainty about the inferences we draw
in our empirical work
probability theory
▶ help us do inference on modeling
▶ the root of social statistics
it might be even helpful for case selection and small-n inference in qualitative
research
Basic Concepts
some set theory
union: the union of two sets contains all the elements that belong to either sets
intersection: the intersection of two sets contains only those elements found in both
sets
complement: the complement of a given set is the set that contains all elements not
in the original set
intersection
union
complement
Basic Concepts
probability space
Exercises
Consider the universal set Ω = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
Given the sets A = {2, 4, 6, 8, 10} and B = {3, 6, 9}
Basic Concepts
probability measure
1. P(∅) = 0
2. P(Ω) = 1
3. If A , A , … are disjoint, then P(∪
1 2
∞
i=1 Ai ) = ∑∞
i=1 P(Ai )
Basic Concepts
Law of Total Probability
Basic Concepts
Conditional Probability
we can use conditional probability statements to derive the law of total probability
Basic Concepts
Bayes’s rule
Exercises
Suppose you work in a building that has a fire alarm system. The fire alarm is designed
to go off when there is a fire, and it’s also known that sometimes the alarm can go off
due to smoke from a malfunctioning HVAC system.
the alarm system works pretty well and there is a 95% chance it goes off when there
is an actual fire
▶ P(Alarm goes off | Fire) = 0.95
there is a 10% chance that the alarm goes off due to smoke without a fire
▶ P(Alarm goes off | No Fire) = 0.1
what’s the probability of actually there being a dangerous fire given that alarm goes
off?
Basic Concepts
Independence
Intuition: Information about the outcome of event A doesn’t change the probability of
event B happening
P(A ∩ B) = P(A)P(B)
When there are more than two events, we say that they are mutually independent if
every subset of the events is independent
Reminder:
Basic Concepts
random variable
∫
Prob(X ∈ A) = P(ω)dω
where is the subset of Ω for which X(ω) ∈ A
Probability Distributions
A probability distribution Prob(X ∈ A) can be described by its cumulative distribution
function (CDF)
F X (x) = Prob{X ≤ x}.
A continuous-valued random variable can be described by density function f (x) that is
related to its CDF by
∫t∈B
Prob{X ∈ B} = f (t)dt
x
∫−∞
F(x) = f (t)dt
Probability Distributions
Common distributions
Discrete distributions
A discrete distribution is defined by a set of numbers S = {x1 , … , xn } and a probability
mass function (pmf) on S, which is a function p from S to [0, 1] with the property
n
∑
p(xi ) = 1
i=1
Common distributions
Discrete distributions
∑
𝔼[X] = xi p(xi )
i=1
∑
𝕍[X] = (xi − 𝔼[X]) 2 p(xi )
i=1
Variance is often called the second central moment of the distribution in statistics
Common distributions
Discrete distributions
∑
F(x) = ℙ{X ≤ x} = 𝟙{xi ≤ x}p(xi )
i=1
Hence the second term takes all xi ≤ x and sums their probabilities
Common distributions
the uniform distribution
Bernoulli distribution
p(i) = θi (1 − θ) 1−i (i = 0, 1)
θ ∈ [0, 1] is a parameter
▶ p(1) = θ means that the trial succeeds (takes value 1) with probability θ
▶ the mean is θ; the variance is θ(1 − θ)
Common distributions
Binomial distribution
the binomial distribution on X = {0, … , n}, which has pmf
(x)
n x
p(x) = θ (1 − θ) n−x
θ ∈ [0, 1] is a parameter
▶ the mean is nθ
▶ the variance is nθ(1 − θ)
Common distributions
Poisson distribution
The Poisson distribution on X = {0, 1, …} with parameter λ > 0 has pmf
λx −λ
p(x) = e
x!
Common distributions
Normal distribution
the most famous distribution is the normal distribution, which has density
‾ σ ( 2σ 2 )
1 (x − μ) 2
p(x) = exp −
‾‾
√2π
Common distributions
Continuous distributions
A continuous distribution is represented by a probability density function (pdf), which
is a function p over ℝ such that p(x) ≥ 0 for all x and
∞
∫−∞
p(x)dx = 1
∫a
ℙ{a < X < b} = p(x)dx
for all a ≤ b
Common distributions
Lognormal distribution
The lognormal distribution is a distribution on (0, ∞) with density
‾ ( )
1 (log x − μ) 2
p(x) = exp −
‾‾
σx√2π 2σ 2
Common distributions
Gamma distribution
The gamma distribution is a distribution on (0, ∞) with density
βα α−1
p(x) = x exp(−βx)
Γ(α)
Common distributions
Beta distribution
The beta distribution is a distribution on (0, 1) with density
Γ(α + β) α−1
p(x) = x (1 − x) β−1
Γ(α)Γ(β)
where Γ is the gamma function
X ∈ {0, … , I − 1} Y ∈ {0, … , J − 1}
fij = P{X = i, Y = j} ≥ 0
where ∑ i ∑ j fij = 1
∑
P{X = i} = fij = μi , i = 0, … , I − 1
j=0
I−1
∑
P{Y = j} = fij = νj , j = 0, … , J − 1
i=0
P{X = 0} =
P{X = 1} =
P{Y = 0} =
P{Y = 1} =
Summary Statistics
Suppose we have an observed distribution with values {x1 , … , xn }
P(|X̄n − μ| ≥ ε) → 0 as n → ∞, ∀ε > 0
References
Stachurski, John. 2016. A Primer in Econometric Theory. Cambridge,
Massachusetts: The MIT Press.