Mod 1 Stats
Mod 1 Stats
➗
● Relative frequency: the fraction of total number of
items belonging to category (eg. 102 808 = 0.1262)
● Percent frequency: relative frequency x 100%
Histograms
● Categories on x-axis
● Frequency, relative frequency, percent frequency on y-axis
Probability theory
● Random variable (r.v.) - a variable’s value appears randomly
● population - the complete pool of a certain random variable
● Sample - a random collection of certain size from the population
Probability distribution
● Probability distribution - the general shape of probability for
values that a random variable may take
Notation
● Random variable denoted by X, Y (capital letters)
- Eg. X: number of children in household
- Eg. Y: amount of time spent by husband on
housework per day
● realisations/observations of a random variable denoted by xᵢ, yᵢ (lowercase letters
with subscript)
- Eg. x₁: number of children in household is 1
- Eg. y₁₃₇:amount of time spent by husband is 137 on housework per day
● N and n denote the size or number of observations.
- N is referred to population size
- n denotes the sample size
Descriptive Statistics
Central tendency
● Measure of central tendency yields info about the centre of a set of numbers
(distribution of a r.v.’s) – does not focus on the span of the dataset or how far values
are from middle numbers
● gives an idea of what a typical, middle, or average that a r.v. can take
● sometimes called measures of location
Variability formulas
Variance
● It computes the average squared distance between data points and their mean,
depending on sample or population
● Population variance
- Finite population
- Denoted by σ² (stigma square) or
Var(X)/Variance of X
● Sample variance
- Denoted by s²
Standard deviation
● Standard deviation solves the problem of squared units. It has the same unit of the
original data
● Population standard deviation
- Denoted by σ (stigma) or std(X)
● Sample standard deviation
- Denoted by s
Coefficient of variation
● Measures standard deviation per unit of
mean
● In finance when the r.v. X denotes assets returns, CV measures risk per unit of
expected return
● It is unit free, because both the numerator and denominator have the same unit as
the original data and they cancel each other
● Population CV
- when σ increase, CV increase
- when µ increase, CV decreases
- Ratio between risk and expected return
Skewness
Shape
● Central tendency and variability are useful to describe and summarise data or the
distribution of r.v.’s
● Skewness - a measure of asymmetry
● Mode: value on the horizontal axis where the high point of the curve occurs
● Mean: towards the tail of the distribution (drawn towards the extreme values)
● Median: generally located somewhere between the mode and the mean
Law of addition
Joint vs marginal probabilities
● Distinguish joint and marginal probability through multidimensional outcomes
● Joint probability: denotes relative frequency when asking about all dimensions
- Eg. what is relative frequency that customer bought a $49 plan on a weekday
● Marginal probability: displays relative frequency when only asking about a single
dimension
● Bayes rule:
● Bayes rule:
Implications of formulas
Binomial experiments
● Eg. toss a coin 3 times in a row and you are interested in how likely it is that you get
exactly two heads
● A binomial experiment assesses the number of a certain outcome from repeated
independent trials
● Each trial has two possible outcomes (eg. heads or tails, success or failure)
Binomial tree
● When two outcomes are independent, P(A|B) = P(A)
● Suppose we have three products, each can be defect
(D) with probability p or functional (F) with probability q=
=1-p
Binomial distribution
● A r.v. X taking value in (0,1,...,n) is said to follow the binomial distribution denoted by
𝑋 ~ 𝐵𝑖𝑛(𝑛, 𝑝)
𝑥
● 𝑝 : the probability of x successes
𝑛−𝑥
● (1 − 𝑝) : the probability of n-x failures. So in total we have n trials
● The factor (combinatorial operator)