0% found this document useful (0 votes)
41 views

Cheat Sheet 2 in 1-1

Uploaded by

y1yat1717
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views

Cheat Sheet 2 in 1-1

Uploaded by

y1yat1717
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

21 -

1 5
.
(I2R) 23 -

1 5
. (I2R)
( )
Chapter 1 ↓ ↓ Rate of return (for CV): 𝑅 = where X is the cash flow [uniform distribution, more on chapter 4] and c is the cost of project [constant]

Variable: 1) Categorical (ordinal = ordered; nominal = not ordered) → bar chart (count) // pie chart (percentage) // two-way (contingency) table a s
Bernoulli distribution: 1) 2 possible outcomes; 2) fixed probability; 3) independence 4) n=1
[Area principle for bar chart & pie chart: the area of a plot that shows data should be PROPORTIONAL TO the amount of data] except décor / baseline not 0
Min Max Min Max Probability 𝑃(𝑋 = 1) = 𝑝 ; 𝑃(𝑋 = 0) = (1 − 𝑝) = 𝑞
Variable: 2) Numerical → histogram // boxplot // scatterplot | Boxplot: left-skewed = right-skewed = || Detailed: Md; IQR; ±1.5*IQR Mean [pop. mean] [ref. 1f] 𝐸(𝑋) = 𝑝
Variance [pop. var] [ref. 1f] 𝑉𝑎𝑟(𝑋) = 𝑝 ∙ (1 − 𝑝) = 𝑝 ∙ 𝑞
Normal curve: symmetric; bell shape // Unimodal // Bimodal // Left-skewed // Right skewed || Mean > Median → Right-skewed // Mean < Median → Left-skewed Standard deviation 𝑝 ∙ (1 − 𝑝) = 𝑝 ∙ 𝑞
↓ Binomial distribution: 1) 2 possible outcomes; 2) fixed probability; 3) independence 4) n identical trials
Time series v. Cross-sectional data || Independent (Explanatory) variable v. Dependent (Response) Variable
Chloe Tin
theory

? Y
Q0 = Min // Q1 = (n+1/4)th // Q2 = (n+1/2)th // Q3 = 3(n+1/4)th // Q4 = Max }→ 5 number summary | IQR = Q3-Q1 Probability 𝑝(𝑟) = 𝐶 ∙ 𝑝 ∙ (1 − 𝑝) =𝐶 ∙𝑝 ∙𝑞
Mean 𝐸(𝑋) = 𝑛 ∙ 𝑝
Robust = Insensitive to a few extreme observations | Robust: median (centre); IQR (spread) | Not robust: mean (centre); SD (spread) Variance 𝜎 = 𝑛 ∙ 𝑝 ∙ (1 − 𝑝) = 𝑛 ∙ 𝑝 ∙ 𝑞
Standard deviation 𝑛 ∙ 𝑝 ∙ (1 − 𝑝) = 𝑛 ∙ 𝑝 ∙ 𝑞
no possible value //PT successful rate &

!
( ̅) ( ̅ )( )

Y
∑ more
Sample: mean = ; variance = ; sd = √𝑣𝑎𝑟 ; cov = || cor = (−1 ≤ 𝑟 ≤ 1) *cov and cor are unitless*

Population: mean =∑ 𝑥𝑝(𝑥) ; variance = ∑(𝑥 − 𝜇 ) 𝑝(𝑥) ; sd = √𝑣𝑎𝑟 ; cov =∑ , [(𝑥 − 𝜇 ) ∙ 𝑦 − 𝜇 ∙ 𝑃 𝑋 = 𝑥 ∩ 𝑌 = 𝑦 ] || cor = (−1 ≤ 𝑟 ≤ 1) Chapter 4

Scatterplot: show relationship between 2 QUANTITATIVE variables measured on the SAME individuals | 1) trend up/down 2) linear/curved 3) cluster/scatter 4) outliers? PDF (con’t): f(x) is a continuous function such that f(x)≥0 for all x; uniform or normal or other shapes; a continuous function such that the total area under f(x)=1

f
? I
Two-way table: describe the relationship between two categorical variables, tables contain counts or proportions
(contingency table (
1) 𝑋 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚[𝑐, 𝑑] → 𝑓(𝑥) =
cells = combination of values of the two variables; joint distribution (only %); marginal distribution (only sides); conditional distribution (condition as denominator) 0
> X
ca

?
Mean 𝑐+𝑑

I
Simpson’s Paradox = a change in the direction of association between two variables when data are separated into groups defined by a third variable (lurking variable) 𝜇 =
2
Standard deviation 𝑑−𝑐
Lurking variable = a variable that has an important effect but was overlooked >
-
this proportion (not var !!! ) 𝜎 =
√12

right
I'd =
-1 x >d => sd Probability = 𝑃(𝑐 ≤ 𝑋 ≤ 𝑑) = 𝑤𝑖𝑑𝑡ℎ ∙ ℎ𝑒𝑖𝑔ℎ𝑡 = (𝑑 − 𝑐) ∙ =1
Chapter 2 assuming probability
all outcomes 2) 𝑋 ~ 𝑁(𝜇, 𝜎 ) → Standard normal: 𝑋 ~ 𝑁(0,1) || Mean = Median = Mode || The area under the normal curve (−∞, +∞) is 1
t
pof are the same
->
with LLN ↓

highers-lowers
-
Experiment; Outcome; Sample Space; Event; Probability 1) classical method 2) long-run relative frequency 3) subjective – assessment based on experience / expertise on
Empirical rule: 68-95-99.7 V
- 1000
M

i ...
C
Law of Large Number: the relative frequency of an outcome converges to a number i.e. the probability of the outcome as the number of observed outcomes increases
Quantile-quantile Plot / QQ Plot → a graphical method to compare two probability distributions by plotting their quantiles against each other I

Law of Total Probability; Mutually exclusive; Independent; Dependent | Joint prob = 𝑃(𝐴 ∩ 𝐵) | Marginal (unconditional) prob = 𝑃(𝐴) | Conditional prob = 𝑃(𝐴|𝐵) ̅
I
O
>
Standard score: 𝑧 = || Specific value reverse: 𝑥 = 𝑥̅ + (𝑧𝜎) ~
When P(A) & P(B) > 0 → cannot mutually exclusive and independent at the same time
Normal approximation to the Binomial: 𝑌 ~ 𝐵𝑖𝑛𝑜𝑚𝑖𝑎𝑙 (𝑛, 𝑝) → 𝑋~ 𝑁(𝑛𝑝, 𝑛𝑝𝑞) → 𝑋~ 𝑁(𝜇, 𝜎 ) || The approximation is good only when 𝑛𝑝 ≥ 5, 𝑛𝑞
up25 ≥5
ng25 ,

Rules: ↳ P(Y < 500 Z


eg
I
goodnormaation
.

- t-distribution:
t-dist
& P(X 500)
- t
<

1) Addition rule → 𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵) where 𝑃(𝐴 ∩ 𝐵) = 0 for mutually exclusive events
-
~ P(z ) ...

>
1) Mean = 0 for df > 1; 2) Median = Mode = 0; 3) Symmetric and bell-shaped, fatter tails than the normal [higher prob in tails]; 4) t tends to z when df tends to ∞
2) Complement rule → 𝑃(𝐴) = 1 − 𝑃(𝐴 ) // 𝑃(𝐴|𝐵) = 1 − 𝑃(𝐴 |𝐵)

3) Multiplication rule (also test for independence) → 𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴) × 𝑃(𝐵) [independent] // 𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴|𝐵) × 𝑃(𝐵) = 𝑃(𝐵|𝐴) × 𝑃(𝐴) [dependent] Chapter 5
( ∩ )
* 𝑃(𝐴|𝐵) = ( )
where P(B)≠0 Central Limit Theorem (CLT):
( ∩ ) ( | )× ( )
Bayes’ rule: 𝑃(𝐴|𝐵) = = where needed 1) 𝑃(𝐴) | 2) 𝑃(𝐴 ) | 3) 𝑃(𝐵|𝐴) | 4) 𝑃(𝐵|𝐴 ) || Contingency table → practice As the sample size increases, sampling distribution of the sample mean 𝜇 , 𝜎 approaches the normal distribution 𝑋 ~ 𝑁 𝜇, → always stand when 𝑛 ≥ 30
( ) ( | )× ( ) ( | )× ( )

Chapter 3 P(x) Population s.d. 𝜎 Population distribution Sample size n Result

Known Normal ≥ 30 𝑍𝑋 ~ 𝑁(0, 1)


Discrete RV = Probability mass function → | Random RV = Probability density function → con’t
> X Known Normal < 30 𝑍𝑋 ~ 𝑁(0, 1)
1) 𝐸(𝑎𝑋) = 𝑎𝐸(𝑋) 1) 𝑉𝑎𝑟(𝑎𝑋) = 𝑎 ∙ 𝑉𝑎𝑟(𝑋)  𝐶𝑜𝑣(𝑎𝑋, 𝑏𝑌) = 𝑎𝑏 ∙ 𝐶𝑜𝑣(𝑋, 𝑌)
2) 𝐸(𝑋 + 𝑏) = 𝐸(𝑋) + 𝑏 2) 𝑉𝑎𝑟(𝑋 + 𝑏) = 𝑉𝑎𝑟(𝑋)  𝐶𝑜𝑣(𝑋 + 𝑎, 𝑌 + 𝑏) = 𝐶𝑜𝑣(𝑋, 𝑌) Known Not normal ≥ 30 𝑍𝑋 ~ 𝑁(0, 1)
Independent 3) 𝐸(𝑋 + 𝑌) = 𝐸(𝑋) + 𝐸(𝑌) 3) 𝑉𝑎𝑟(𝑋) = 𝐸(𝑋 ) − 𝐸(𝑋)  𝐶𝑜𝑣(𝑋, 𝑌) = 𝐸(𝑋 ∙ 𝑌) − 𝐸(𝑋) ∙ 𝐸(𝑌) Known Not normal < 30 - Nil.
Uncorrelated
∗∗∗ 𝑉𝑎𝑟(𝑎𝑋 + 𝑏𝑌) = 𝑎 ∙ 𝑉𝑎𝑟(𝑋) + 𝑏 ∙ 𝑉𝑎𝑟(𝑌) + 2𝑎𝑏 ∙ 𝐶𝑜𝑣(𝑋, 𝑌) ∗∗∗
Unknown Normal ≥ 30 𝑍𝑋 ~ 𝑁(0, 1)

-
T
↳ Independence implies uncorrelatedness → uncorrelated = cov and cor = 0
[ I Unknown Normal < 30 𝑡𝑋 ~ 𝑡𝑛−1
. .
Coefficient of variation: 𝐶𝑉 = = → measure risk without changing unit (unlike Sharpe ratio) | Higher ratio = higher risk Unknown Not normal ≥ 30 𝑍𝑋 ~ 𝑁(0, 1)

Unknown Not normal < 30 Nil.


Y Example (Chapter
-
Sharpe ratio: 𝑆(𝑋) = where rf = risk free rate | Higher ratio = higher average rate of return relative to s.d. >

4 and 5)
Chapter 6 Chi-square test for independence:
*
Confidence level 95% → Motive: 95% confident that 𝜇 is within the interval // Mechanism: 95 in 100 intervals covers 𝜇 [H0: The Y variable is independent of the X variable] vs [Ha: The Y variable is independent of the X variable]
-
Steps: Requirement for the test:
A
1) Identify whether it is sample proportion (binomial) or sample mean 1) The observed frequencies are obtained from a Simple Random Sample (SRS)
A
2) If proportion → good approximation requirement → z // If mean → z or t? [ref TOPIC 5 table] 2) The expected frequencies are all ≥ 5

3) 𝛼 = 1 − 𝑥% → get 𝛼 Expected table: C1*R1/Total  do this for all cells || Degree of freedom = (C-1)(R-1)

4) Get df = n-1 if needed for t-test


Example (Chapter 6) I Reject when X >
X dy where X" =
[0 : -
Eil
,
Ei

5) Find critical value 𝑧 or 𝑡 ,


Chapter 8
∙( )
6) Find margin of error = 𝐿 = 𝑧 ∙ or 𝑧 ∙ or 𝑡 , ∙ Simple linear regression model (only 1 explanatory variable): 𝜇 𝑦 𝑥 = 𝛽 + 𝛽 𝑥 // 𝑦 = 𝑏 + 𝑏 𝑥
√ √

Build up a regression model with b0 and b1


7) Find interval = [mean – L, mean + L]
- ∑ ∑ (∑ )
∑ ∑ ∑( )
Sample size requirement calculation: 𝑏 = = ;𝑏 =𝑌−𝑏 𝑋 || where 1) 𝑋 = ; 2) 𝑠 =
(∑ )
∑( )

𝑛≥
∙( )∙( )
or 𝑛 ≥
∙( )
or others [NEVER USE t IN SAMPLE SIZE REQUIREMENT CALCULATION]
↳ same
for Y
𝑌 = 𝑏 + 𝑏 𝑋 by LSE (Least Squares Estimation)

......
2
Model assumption:
M
Chapter 7 =
Chare = (no 1) Linearity → 𝐸(𝜖) = 0 → Mean zero assumption  Plot scatterplot (Y/X) and scatterplot (e/X)

!
: : : : : :) X

Hypothesis situations
Null hypothesis Alternative & X ~
2) Independence (error)  Not time series
↓ ↓ hypothesis

,
Less than “𝐻 ” 𝐻 :𝜇 ≥ 𝑘 vs 𝐻 :𝜇 <𝑘 3) Normality → Errors are normal RV  QQ Plot check
One-sided

(
Greater than “𝐻 ” 𝐻 :𝜇 ≤ 𝑘 vs 𝐻 :𝜇 >𝑘
Two-sided Not equal to “𝐻 ” 𝐻 :𝜇 = 𝑘 vs 𝐻 :𝜇 ≠𝑘 4) Equal / Constant variance of error (MSE)  Plot scatterplot (Y/X) and scatterplot (e/X) → no fan out / funnel in (heteroscedastic error) but evenly spread
- 1
(Chapter
Error:
Example (homoscedastic error) ~ 31 4
X

Type I error → Rejecting true 𝐻 || 𝛼 = 𝑃(𝑅𝑒𝑗𝑒𝑐𝑡 𝐻 | 𝐻 𝑖𝑠 𝑡𝑟𝑢𝑒) || Significance level: 𝛼 TT .


d
𝑌 = 𝛽 + 𝛽 𝑋 + 𝜖 𝑤ℎ𝑒𝑟𝑒 𝜖 ~ 𝑁(0, 𝜎 ),where i.i.d. stands for identically, independently distributed
.
.

Type II error → NOT rejecting false 𝐻 || 𝛽 = 𝑃(𝑁𝑜𝑡 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻 | 𝐻 𝑖𝑠 𝑓𝑎𝑙𝑠𝑒) || Power: 1 − 𝛽


- *Danger of extrapolation when X is out of the experimental region (dataset range)*
Hypothesis testing:
SSE (Unexplained sum of SSR (Model / Explained sum SST (Total sum of squares) MSE (Mean square error) SE (Standard error)
1) Identify whether it is sample proportion (binomial) or sample mean squared error / squares) of squared error / squares) (Overall variability in Y)
* = 𝑒 = 𝑌 −𝑌 = (𝑌 − 𝑌 ) =𝑠 =𝑠
2) If proportion → good approximation requirement → z // If mean → z or t? [ref TOPIC 5 table] 𝑆𝑆𝐸 = √𝑀𝑆𝐸
=
= 𝑌 −𝑌 =
[((2) -

n(Y)2 𝑛−2
3) Identify the three hypothesis situations → follow the following steps
T
Note that 𝑆𝑆𝑇 = 𝑆𝑆𝑅 + 𝑆𝑆𝐸
↑ point estimate
Critical value approach
Coefficient of determination (𝑅 ) = = 1−
Greater than; upper tail Less than; Lower tail Not equal to; two-sided
4) Get df = n-1 if needed for t-test 4) Get df = n-1 if needed for t-test 4) Get df = n-1 if needed for t-test
5) Find critical point using significance level 𝛼: 5) Find critical point using significance level 𝛼: 5) Find critical point using significance level 𝛼: Interpretation: about (𝑅 )% of the sample variation in Y can be explained by the simple linear regression model where we use X to predict Y
Critical point = 𝑧 or 𝑡 , Critical point = −𝑧 or −𝑡 , [negative] Critical point = 𝑧 or 𝑡 ,
̅ ̅ Coefficient of correlation (or simply [correlation]) (r) =
6) Find z- or t-statistic: z / t = ; z for p = ⋅
6) Find z- or t-statistic: z / t = ; z for p = ⋅ 6) Find z- or t-statistic: z / t =
̅
; z for p =
√ √ ⋅

7) Apply rejection rule to reject H (in favor of H ) 7) Apply rejection rule to reject H (in favor of H ) Interpretation: see TOPIC 1 || 𝑟 =𝑅
7) Apply rejection rule to reject H (in favor of H )
G bi bo
if z > 𝑧 𝑜𝑟 𝑡 > 𝑡 , if z < −𝑧 𝑜𝑟 𝑡 < −𝑡 , if [𝑧 > 𝑧 or 𝑧 < −𝑧 ] strong/weak +velve linear relationship ?
Standard deviation 𝑠
or if [t > t , or 𝑡 < −𝑡 , ] no need refer to Topic 5 𝑠 =
𝑠 ∙ √𝑛 − 1 𝑠 =𝑠∙
1
+
𝑋
1 𝑛 (𝑛 − 1) ∙ 𝑠
p-value approach (usually z-test because t-test is hard to find) I
Greater than; upper tail Less than; Lower tail Not equal to; two-sided Hypothesis testing (determine z or t by simply 𝑛 ≥ 30 or not) 𝑏 −𝛽 𝑏 −𝛽
of for + 𝑧 𝑜𝑟 𝑡 = 𝑧 𝑜𝑟 𝑡 =

Inormally
̅ ̅ ̅
= n-2 :: 2 variables 𝑠 𝑠
4) Find z-statistic: z = ;z= 4) Find z-statistic: z = ;z= 4) Find z-statistic: z = ;z=

⋅ ⋅
√ √
⋅ Interval (only z) 𝑏 ± 𝑧 ∙ (𝑠 ) 𝑏 ± 𝑧 ∙ (𝑠 )
5) Find p-value = 𝑃(𝑍 ≥ 𝑧 | 𝐻 𝑖𝑠 𝑡𝑟𝑢𝑒) = the 5) Find p-value = 𝑃(𝑍 ≤ 𝑧 | 𝐻 𝑖𝑠 𝑡𝑟𝑢𝑒) = the left 5) Find p-value = 2 × 𝑃(𝑍 ≥ |𝑧| | 𝐻 𝑖𝑠 𝑡𝑟𝑢𝑒) =
right side of z-statistic side of z-statistic double the right side of absolute z-statistic -
-
assume Bo and B ,
= 0 and use 2-sided test
6) Apply rejection rule to reject H (in favor of H ) 6) Apply rejection rule to reject H (in favor of H ) 6) Apply rejection rule to reject H (in favor of H )
L
if p < 𝛼 if p < 𝛼 if p < 𝛼
CI (estimate average of Y) (mean value of Y) PI (predict Y) (individual value
of Y)
Interval estimate (only z)
-1 1 (𝑋 − 𝑋 ) 1 (𝑋 − 𝑋 )
D -1 D 𝑌± 𝑧 ∙ (𝑠) ∙ + 𝑌± 𝑧 ∙ (𝑠) ∙ 1 + +
I I 1 1
-
-

𝑛 (𝑛 − 1) ∙ 𝑠 𝑛 (𝑛 − 1) ∙ 𝑠
Remember sample proportion : -
p

Pq Pq
Distinguish: 𝑠, 𝑠 , 𝑠 , 𝑠 , 𝑠 , 𝑠 , 𝑠 , 𝑠 , 𝑠 , 𝑋, 𝑌 , 𝑆𝑆𝐸, 𝑆𝑆𝑅, 𝑆𝑆𝑇, 𝑟, 𝑟 , 𝑅
j

N in

You might also like