1. basicQuants 2024
1. basicQuants 2024
• Interest rates are set by the forces of supply and demand, where investors supply funds and borrowers
Demand their use
• Nominal Interest Rate = Real risk-free interest rate + Inflation premium + Default risk premium
+ Liquidity premium + Maturity premium.
Example 1: Inflation is 5%, Real RFR is 3% , Nominal Rate will be (1+ 5%)*(1+3%) -1 = 8.15%
(Close to 3% + 5% = 8%, however geometric chaining makes it 8.15%)
Example : Nominal RFR is 8.15%, Real RFR is 3% , Inflation will be (1+ 8.15 %) / (1+3%) -1 = 5%
(Again close to 8.15% - 5% = 3.15%)
FinTree © 2024 FinTree Education Private Ltd
Example 1 : $ 100 Investment becomes $ 105 in some time (time doesn’t matter for HPR) and also gives you a
Dividend of $ 3.
Example 2: Let’s say returns earned in first 3 months is 4%, next 7 months is 3% and last 2 months of the year
Is 6%, what will be one year holding period return ?
• AM * HM = GM2
• Unless all observations are same values, AM > GM > HM
• Harmonic Mean is calculated as:
n
1 + 1 +......+ 1
X1 X2 Xn
Example1: Calculate AM, GM and HM of PE Ratios
Stock PE Ratio
A 3
• AM = (3 + 4 + 5 + 20) / 4 = 8
B 4
C 5 • GM = (3 * 4 * 5 * 20)(1/4) = 5.88
D 20 4
• HM = = 4.8
1 1 1 1
3
+
4 + 5 + 20
• Trimmed mean removes a small defined percentage of the largest and smallest values from a dataset
before calculating the mean by averaging the remaining observations.
• Winsorized mean is calculated after replacing extreme values at both ends with the values of their
Nearest observations, and then calculating the mean by averaging the remaining observations.
• In summary: HM, Winsorized and Trimmed mean are solutions to extreme outliers values
FinTree Fruit 9 : Time Weighted Rate of Return (TWRR) & Money Weighted Rate of Return (MWRR)
MWRR can be computed by calculating IRR of Net Cash Flow (Last Column)
FinTree © 2024 FinTree Education Private Ltd
Example 1: Convert quoted rates to Annualized Rates using I-Conversion Function of the Table
To Access Interest Conversation Function, press 2nd and 2 on TI BA II Plus Calculator
FinTree © 2024 FinTree Education Private Ltd
Example 1 : If you buy a stock worth $ 100, by borrowing $80 and investing $20 of equity (own).
At the end of the year , the stock value is $ 120. Interest rate on borrowing is 12.5%.
Calculate Asset Return (unlevered) and Leveraged Return.
Unlevered Return
= Profit / Total Value of Asset at the time of investment
20
= 20%
100
Leveraged Return
= Profit made after interest payment / Equity Invested
20 - 80*12.5%
= 50%
20
FinTree
• A bond is a fixed-income instrument that represents a loan made by an investor to a borrower
• The face value of a bond is the price that the issuer pays at the time of maturity, also referred
to as “par value”
• The coupon rate is the interest rate paid on a bond by its issuer for the term of the security.
• It’s calculated as Face Value * Par Rate
• The term yield to maturity (YTM) refers to the total return anticipated on a bond if the bond is held
until it matures.
• A bond’s market value is how much someone will pay for the bond on the free market. It is calculated as
Present Value of future cash flows of the bond discounted at YTM.
Solution
FinTree © 2024 FinTree Education Private Ltd
FinTree
• A perpetual bond is a less common type of coupon bond with no stated maturity date.
• PV(Perpetual Bond) = PMT / Discount Rate
• Amortization Schedule = Schedule that displays how loan is repaid over a period of time.
Example 1: Build Amortization Scehdule for a Loan of $ 100, to be repaid over 5 years with
level payments, at interest rate of 10%.
Example 2 : A stock is expected to pay a dividend of $ 10 next year. Expected return on the
FinTree
stock is 10% (Discount rate). Dividends are expected to grow at 4%.
Example 4 : A stock paid a dividend of $ 10 last year. Dividends are expected to grow at 15% for
first 3 years, then at 10% for next two years and 4% therefter in perpetuity. Discount rate is 10%.
Example 1 : A stock is trading at $ 100. It is expected to give a dividend of $10 next year.
FinTree
Cost of equity is 15%. Calculate implied growth rate.
Example 2 : A stock is trading at $ 100. It is expected to give a dividend of $10 next year.
g = 5%. Calculate implied cost of equity.
Example 1 : A company has EPS of $10, it pays Dividends Per Share (DPS) of $4. It has Ke= 10%
and g = 4%. Calculated justified Leading and Trailing PE Ratio.
Justified Trailing PE Ratio = Payout Ratio / (Ke-g) = 40% / (10% - 4%) = 6.67
Justified Leading PE Ratio = Payout Ratio*(1+g) / (Ke-g) = 6.93
FinTree © 2024 FinTree Education Private Ltd
FinTree
• Under cash flow additivity, the present value of any future cash flow stream indexed at the same point
equals the sum of the present values of the cash flows
• This principle ensures that market prices reflect the condition of no arbitrage action costs.
Example 1: Based on the stream of Cash Flows presented in Investment 1 & Investment 2,
recommend which investment strategy to choose by Comparing Net Present Value (NPV).
Required rate of return = 10%
Investment 1
Investment 2
Solution: To recommend an investment strategy, compare NPV of two options and select
strategy with higher positive NPV.
Since both the investments are producing same NPV, we will be indifferent between the two.
• A forward rate is an interest rate applicable to a financial transaction that will take place in the future
• Let’s say one year rate is 5%, whereas two year rate is 7%. We can calculate one year forward rate
one year from now (F1,1) will be calculated as {(1+7%)2 / (1+5%)1} - 1 = 9.038%
• Interpretation of this rate:
PV = 100 FV =114.49
N = 2 , I/Y = 7
Example 1 : 1 year zero coupon bond (STRIP) is trading at 96, 2 year zero coupon bond (STRIP)
FinTree
is trading at 88. Calculate on year rate one year from now (F1,1)
STEP 1 : Calculate one year (spot) rate . PV = 96 FV = -100, N=1 , CPT I/Y = 4.17%
STEP 2: Calculate two year (spot) rate . PV = 88 FV = -100, N=2 , CPT I/Y = 6.6%
STEP 3: Calculate F1,1= (1+6.6%)2 / (1+4.17%) - 1 = 9.09%
Forward Exchange Rate (DC/FC) = Spot Rate * (1+ Interest Rate of DC)n
(1+ Interest Rate of FC)n
(1+3%)1
= 50 * 1.08 / 1.03 = 52.42
Same example using continuous compounding, with maturity of 3 months will be as follows:
e8%*0.25
1 Year Forward Rate = 50 * = INR 50.63/$
e3%*0.25
FinTree © 2024 FinTree Education Private Ltd
FinTree
specific price (strike price)
• A call buyer profits when the underlying asset increases in price
Let’s say, a stock price is 100. In one year the price can either go up to 150 or 90.
Let’s assume the option has a call option has a strike price of 120 (i.e.Right to buy the asset at 120).
Stock Price at
Option Price
time1 = 150 time1 = 150 - 120
= 30
Stock Price at
time0 = 100
Option Price
Stock Price at
time1 = 90
time1 = 0
Investor will
not exercise.
• Hedge Ratio can be calculated as = Option Price @ uptick - Option Price @ down-tick
FinTree
Measures of Location
FinTree
Highest Value quartile and the upper bound of the third quartile, with
the median or arithmetic average noted as a measure of
central tendency of the entire distribution
Upper Boundary for Q3 • The whiskers are the lines that run from the box and
Median are bounded by the “Fences” which represent the lowest
X Mean
and highest values of the distribution.
Lowest Boundary for Q2
• Another form of box and whisker plot typically uses 1.5
times the interquartile range for the fences.
Lowest Value
• Thus, the upper fence is 1.5 times the interquartile
range added to the upper bound of Q3, and the lower
fence is 1.5 times the interquartile range subtracted
from the lower bound of Q2.
Measures of Dispersion
Dispersion is the variability around the
central tendency
FinTree
1. Sample Mean
2. Sample SD and Sample Variance
3. Population SD and Population Variance
4. Range
5. Mean Absolute Deviation
6. Semi-deviation (mean = target)
7. Target Semi-deviation (Target = 13)
8. Coefficient of Variation
Solution:
6. Semi-deviation
FinTree © 2024 FinTree Education Private Ltd
FinTree
7. Target Semi-deviation (Target = 13) 8. Coefficient of Variation
= Sample SD/ Mean
= 8.72 / 10
= 0.872
• Kurtosis is a measure of the combined weight of the tails of a distribution relative to the rest of the
FinTree
distribution that is, the proportion of the total probability that is outside of, say, 2.5 standard
deviations of the mean
• A distribution that has fatter tails than the normal distribution is referred to as leptokurtic or
fat-tailed
• A distribution that has thinner tails than the normal distribution is referred to as being platykurtic
or thin-tailed
• Distribution similar to the normal distribution as it concerns relative weight in the tails is called
mesokurtic
• Excess Kurtosis : Kurtosis of the distribution - 3
• Why? Because normal distribution has a kurtosis of 3
• So, if K> 3 then distribution is Leptokurtic
• If K = 3 , then distribution is platykurtic (Normal Distribution)
• If K < 3, then distribution is Mesokurtic platykurtic
FinTree
• A scatter plot is displays potential relationships between two variables
• Pattern of the scatter plot may indicate no apparent relationship, a linear association, or a
non-linear relationship
• It provides a quick sense of the data range.
• Inspecting the scatter plot can help to spot extreme values (i.e., outliers).
• Correlation is a measure of the linear relationship between two random variables
• The first step in considering how two variables vary together, however, is constructing their
covariance. It measures how two variables in a sample move together.
• covariance is a measure of the joint variability of two random variables
• If the random variables vary in the same direction—for example, X tends to be above its mean
when Y is above its mean, and X tends to be below its mean when Y is below its mean—then their
covariance is positive
• The size of the covariance measure is difficult to interpret as it involves squared units of measure
and so depends on the magnitude of the variables
• The sample correlation coefficient is a standardized measure of how two variables in a sample
move together.
• Correlation coefficient expresses the strength of the linear relationship between the two random
variables
• Correlation (X,Y) = Covariance(X,Y)
SDx * SDy
• Correlation ranges from −1 and +1.
• A correlation of 0, termed uncorrelated, indicates an absence of any linear relationship between
the variables
• A positive correlation close to +1 indicates a strong positive linear r ship. A correlation of 1
indicates a perfect linear relationship.
• A negative correlation close to −1 indicates a strong negative (i.e., inverse) linear relationship. A
correlation of −1 indicates a perfect inverse linear relationship.
FinTree © 2024 FinTree Education Private Ltd
FinTree
y y
Correlation = + 1 Correlation = + 1
y y
x x
Perfect Positive Relationship Perfect Negative Relationship
Correlation between 0 to + 1 Correlation between 0 to - 1
y y
x x
No Linear Relationship Non Linear Relationship
Correlation = 0 Correlation = 0
Correlation captures only linear relationships
FinTree © 2024 FinTree Education Private Ltd
•
•
•
•
FinTree
Correlation does not capture non-linear relationship
Correlation may be quite sensitive to outliers.
correlation does not imply causation
Spurious correlation can be on the three types:
1. correlation between two variables that reflects chance relationships in a particular dataset;
2. correlation induced by a calculation that mixes each of two variables with a third variable; and
- For example, consider a cross-sectional sample of companies’ dividends and total assets. While
there may be a low correlation between these two variables, dividing each by market
capitalization may increase the correlation.
3. correlation between two variables arising not from a direct relation between them but from their
relation to a third variable.
- height may be positively correlated with the extent of a person’s vocabulary, but the underlying
relationships are between age and height and between age and vocabulary.
FinTree © 2024 FinTree Education Private Ltd
FinTree
• The expected value of a random variable is the probability-weighted average of the possible
outcomes of the random variable. For a random variable X, the expected value of X is denoted
E(X).
• The variance of a random variable is the expected value (the probability-weighted average) of
squared deviations from the random variable’s expected value:
Solution:
FinTree © 2024 FinTree Education Private Ltd
• The same question can also be solved using STAT function of the calculator with following
FinTree
steps :
STEP 1 : 2nd 7 , 2nd CLR WRK
STEP 2 : X01 = 5, Y01= 20 (not 20%)
X01 = 10, Y01= 30
X01 = 20, Y01= 40
X01 = 30, Y01= 10
STEP 3: Press 2nd and 8 , Press 2nd SET multiple times till you reach 1-V Display
STEP 4: Navigate downwards, Mean will be 15 and population sigma will be 7.74596 ~ 7.75
Example 1: If probability studying for the exam P(S) is 70%, the probability of not study-
ing for the exams P(Sc) = 30% (called complement).
If Probability of passing if a student studies P(P/S)= 80% (called conditional Probability)
and Probability of passing if a student does not study P(P/Sc) is 40%. What is the total
probability of passing ?
Example 2 : The earnings of HDFC Bank are interest rate sensitive, benefiting from a
declining interest rate environment. Suppose there is a 60% probability that HDFC Bank
will operate in a declining interest rate environment and a 40% probability that it will
operate in a stable interest rate environment. If a declining interest rate environment occurs,
the probability that EPS will be $ 3 is estimated at 25%, and the probability that EPS will be
USD 4 is estimated at 75%. In a stable interest rate environment P( EPS = 2) is 70% and
P(EPS= 1) is 30%. Calculate expected value of EPS and Standard Deviation.
Solution: The question can be simplified by building a probability tree diagram as below:
FinTree © 2024 FinTree Education Private Ltd
FinTree
Joint Probability
EPS = 4 60%*75% =45%
75%
60%
te=
t ra
i ng in
n
ecli
ty of d 25%
il i
b ab 60%*25% =15%
Pro EPS = 3
40%*70% =28%
Pro
ba b
ilit
EPS = 2
yo
f st
abl
e in t
70%
ra t
e=
40%
30%
40%*30% =12%
EPS = 1
FinTree
• Bayes' Theorem is a mathematical formula that helps us update the probability of a hypothesis
based on new evidence.
• It's named after Reverend Thomas Bayes, who introduced the concept.
Example 1: If probability studying for the exam P(S) is 70%, the probability of not studying for
the exams P(Sc) = 30% (called complement). The probability of passing if a student studies is
P(P/S)= 80% (called conditional Probability) and Probability of passing if a student does not
study P(P/Sc) is 40%. If you get to know that the student eventually passed the exam, what is the
updated probability of studying? (Note: Updated Probability is also called Posterior Probability).
Joint Probability
PASS 70%*80% =56%
80%
0%
ng =7
dyi
of stu 20%
t y
ba bili 70%*20% =14%
Pro FAIL
Pro
bab PASS 30%*40% =12%
i lity
of
not
st u
40%
dyi
ng=
30%
60%
30%*60% =18%
FAIL
Portfolio Mathematics
FinTree Fruit 1 : Portfolio Return and Standard Deviation
FinTree
• Expected return on the portfolio (E(Rp)) is a weighted average of the expected returns
• portfolio variance (two assets) is calculated as follows:
Example 1 : Calculate portfolio expected return and standard deviation based on following
FinTree
Example 1: Calculate covariance between A and B based on following joint probability function
STEP1: Convert the Joint Probability Distribution into a simple tabular structure:
STEP 2: Calculate Expected Value of Returns A E(A), Expected Value of Returns B E(B),
Expected Value of Returns A Returns B& E(AB) as follows:
FinTree
• Mean-variance analysis is the process of weighing risk, expressed as variance, against expected
return.
• Investors use mean-variance analysis to make investment decisions.
• Investors weigh how much risk they are willing to take on in exchange for different levels of
reward
• Mean–variance analysis (MVA) holds exactly when investors are risk averse
• MVA holds when : 1. Returns are normally distributed or 2. Investors have quadratic utility
functions
• Mean–variance analysis, however, can still be useful—that is, it can hold approximately—when
either assumption 1 or 2 is violated
Example 1: An investor wants to choose between following two portfolios, recommend a portfolio
based on Roy’s Safety First Ratio and Shortfall Risk
Solution:
STEP 1: Calculate Safety First Ratio and select the portfolio based on highest SFR.
FinTree
STEP 2 : Calculate Shortfall Risk by using normal distribution tables.
Shortfall risk is the probability of not earning a minimum of 5%.
Therefore, we will look for probability on the left of -1 and -2 for portfolio A and B respectively.
FinTree
• Two main ways of managing financial risk are value at risk (VaR) and stress testing/scenario
analysis.
• Stress testing and scenario analysis refer to a set of techniques for estimating losses in extremely
unfavorable combinations of events or scenarios.
• Value at risk (VaR) is a money measure of the minimum value of losses expected over a specified
time period (e.g., a day, a quarter, or a year) at a given level of probability (often 0.05 or 0.01)
FinTree © 2024 FinTree Education Private Ltd
Simulation Methods
FinTree Fruit 1 : Log-normal Distribution
FinTree
• Use Normal Distribution - Asset returns
• Log-normal Distribution - Asset Prices
• The Black–Scholes–Merton model assumes that the price of the asset underlying the option is
log-normally distributed.
• Log-normal Distribution - “it’s log is normal”
• The two most noteworthy observations about the log-normal distribution are that it is bounded
below by 0 and it is skewed to the right (it has a long right tail).
• If a stock’s continuously compounded return is normally distributed, then future stock price is
log-normally distributed.
• Stock price may be described by the log-normal distribution even when continuously
compounded returns do not follow a normal distribution (Central Limit Theorem)
• A key assumption in many investment applications is that returns are independently and
identically distributed (i.i.d.)
• Independence captures the proposition that investors cannot predict future returns using past
returns.
• Identical distribution captures the assumption of stationarity, a property implying that the
mean and variance of return do not change from period to period.
• If the one-period continuously compounded returns are normally distributed, then the T
holding period continuously compounded return, r0,T, is also normally distributed with mean μT
and variance σ2T.
• This is because a linear combination of normal random variables is also a normal random
variable.
• Sigma Scaling rule: if one day σ = 2%, one year σ will be 2%* √250 (Assuming 250 trading days
in a year). Sigma can be scaled by multiplying with square root of time
• Expected Return (mean) Scaling rule : if one day return = 2%, one year return will be
2%* 250 (Assuming 250 trading days in a year). returns can be scaled by multiplying with time
FinTree © 2024 FinTree Education Private Ltd
FinTree
Example 1: Based on Following data, calculate:
1. Daily Historical return (Continuously Compounded)
2. Daily σ
3. Annual Expected Return (assume 250 trading days/year)
4. Annual σ (assume 250 trading days/year)
Solution:
STEP 1: Calculate Continuously Compounded daily returns by first calculating
Closing Price / Opening price and them pressing LN button on the calculator.
Day 1: LN (120/100) = 18.23%
Day 2: LN (140/120) = 15.41%
Day 3: LN (170/140) = 19.41%
Day 4: LN (200/170) = 16.25%
STEP 2: Insert daily returns in STAT function and calculate mean and Sample Standard
Deviation (Keep calculator on LIN mode)
Mean Return (daily) = 17.33%
Sample SD (daily) (Sx) =1.825%
STEP 3: Annualize returns and SD by multiplying with 250 and √250 respectively.
Expected Annual Return = 17.33% * 250 = 4332.5%
Annual SD = 1.825% * √250 = 28.86%
• Monte Carlo simulation is like rolling dice or flipping a coin many times to understand the
range of possible results.
• In finance, it helps assess the potential outcomes of an investment or financial strategy by
considering various uncertain factors.
FinTree © 2024 FinTree Education Private Ltd
FinTree
Process:
1. Identify Variables: Determine the key factors that affect your financial scenario, such as
investment returns, inflation rates, or interest rates.
2. Define Ranges: Specify the possible values or ranges each variable can take. For instance,
investment returns might range from -10% to +15%.
3. Run Simulations: Randomly generate values for each variable within their defined ranges and
calculate the financial outcome. Repeat this process multiple times (e.g., 1,000 simulations) to
see a range of possible results.
4. Analyze Results: Examine the outcomes of all simulations to understand the likelihood of
different scenarios. This provides a probability distribution of potential financial outcomes.
Strengths: Monte Carlo simulation can be used to price complex securities for which no analytic
expression (formula) is available, particularly American-style options or Complex Securities.
Weaknesses: Monte Carlo simulation provides only statistical estimates, not exact results.
Analytic methods, when available, provide more insight into cause-and-effect relationships than
does Monte Carlo simulation.
• The idea behind bootstrap is to mimic the process of performing random sampling from a
population to construct the sampling distribution
• The difference lies in the fact that we have no knowledge of what the population looks like,
except for a sample with size n drawn from the population.
• In bootstrap, we repeatedly draw samples from the original sample, and each re-sample is of
the same size as the original sample. Note that each item drawn is replaced for the next draw
(i.e., the identical element is put back into the group so that it can be drawn more than once).
• Both the bootstrap and the Monte Carlo simulation build on repetitive sampling.
• Bootstrapping re-samples a dataset as the true population, and infers from the sampling
statistical distribution parameter values (i.e., mean, variance, skewness, and kurtosis) for the
population. Monte Carlo simulation builds on generating random data with certain known
statistical distribution of parameter values.
• Bootstrap simulation is a complement to analytical methods.
• Analytical methods, where available, provide more insight into cause-and-effect relationships.
FinTree © 2024 FinTree Education Private Ltd
FinTree
FinTree
Sampling Methods
Systematic Sampling
FinTree © 2024 FinTree Education Private Ltd
FinTree
• A simple random sample is a subset of a larger population created in such a way that each
element of the population has an equal probability of being selected to the subset.
• Simple random sampling is particularly useful when data in the population is homogeneous
• Systematic sampling can be used when we cannot code (or even identify) all the members of a
population. With systematic sampling, we select every kth member until we have a sample of
the desired size. The sample that results from this procedure should be approximately random.
• Sampling error is the difference between the observed value of a statistic and the quantity it is
intended to estimate as a result of using subsets of the population.
• Sampling distribution of a statistic is the distribution of all the distinct possible values that the
statistic can assume when computed from samples of the same size randomly drawn from the
same population.
• Cluster sampling also requires the division or classification of the population into
subpopulation groups, called clusters
• Then certain clusters are chosen as a whole using simple random sampling
• If all the members in each sampled cluster are sampled, this sample plan is referred to as
one-stage cluster sampling
• If a subsample is randomly selected from each selected cluster, then the plan is referred as
two-stage cluster sampling.
FinTree © 2024 FinTree Education Private Ltd
FinTree
• A major difference between cluster and stratified random samples is that in cluster sampling, a
whole cluster is regarded as a sampling unit and only sampled clusters are included.
• In stratified random sampling, however, all the strata are included and only specific elements
within each stratum are then selected as sampling units.
• Cluster sampling is commonly used for broad market surveys, and the most popular version
identifies clusters based on geographic parameters
• Cluster sampling usually yields lower accuracy because a sample from a cluster might be less
representative of the entire population.
• Its major advantage, however, is offering the most time-efficient and cost-efficient probability
sampling plan for analyzing a vast population.
• Convenience Sampling: an element is selected from the population based on whether or not it is
accessible to a researcher or on how easy it is for a researcher to access the element.
• Samples are not necessarily representative of the entire population
• Level of sampling accuracy could be limited
• Advantage: data can be collected quickly at a low cost ( time-efficient and cost-effective)
• Judgmental Sampling: involves selectively handpicking elements from the population based on
a researcher’s knowledge and professional judgment
• Affected by the bias of the researcher
• Might lead to skewed results
• For example, when auditing financial statements, seasoned auditors can apply their sound
judgment to select accounts or transactions that can provide sufficient audit coverage
FinTree © 2024 FinTree Education Private Ltd
FinTree
• A sampling distribution (drawn from any population with mean = µ, variance = σ2) will have
following 3 properties (Large Sample Size)
1. Sample Mean (X) will approach Population Mean (µ)
2. Sample Standard Deviation (Sx) will approach σ/√n
3. Sampling Distribution will approach Normal Distribution.
• When sample size n is greater than or equal to 30, we can assume that the sample mean is
approximately normally distributed
• Re-sampling (Bootstrapping): we repeatedly draw samples from the original sample, and each
re-sample is of the same size as the original sample
• Each item drawn is replaced for the next draw
• It is also called it is often called model-free re-sampling or non-parametric re-sampling
• Standard Error of re-sampled distribution is calculated as follows:
FinTree
• Unlike bootstrap, which repeatedly draws samples with replacement, jackknife samples are
selected by taking the original observed data sample and leaving out one observation at a time
from the set (and not replacing it).
• According to its computation procedure, we can conclude that jackknife produces similar
results for every run, whereas bootstrap usually gives different results because bootstrap
re-samples are randomly drawn.
• For a sample of size n, jackknife usually requires n repetitions, whereas with bootstrap, we are
left to determine how many repetitions are appropriate.
FinTree © 2024 FinTree Education Private Ltd
Hypothesis Testing
FinTree Fruit 1 : Hypothesis Basics
FinTree
• Hypothesis testing is part of the branch of statistics known as statistical inference
• A hypothesis is a statement about one or more populations that we test using sample statistics
• Six Step standard approach to hypothesis testing is as below:
• For each hypothesis test, we always state two hypotheses: the null hypothesis (or null),
designated H , and the alternative hypothesis, designated Ha
• The null hypothesis is what we want to reject.
• The null and alternative hypotheses are stated in terms of population parameters, and we use
sample statistics to test these hypotheses
• The null and alternative hypotheses must be mutually exclusive and collectively exhaustive; in
other words, all possible values are contained in either the null or the alternative hypothesis
FinTree © 2024 FinTree Education Private Ltd
FinTree
• If we are performing a hypothesis test at 95% level of confidence, the remaining 5% is called
significance level.
• Type I Error : False Positive, i.e. Rejecting a True Null
• Type II Error: False Negative, i.e. Failing to reject a False Null
• Type I Error Probability = Significance Level
• (1- Significance Level) = Confidence Level, therefore
• (1- Prob. of Type I Error) = Confidence Level
• (1- Prob. of Type II Error) = Power of test
Not Rejecting
Not Rejecting - Correct
TYPE II Error
Decision
Rejecting
Rejecting -Correct
TYPE I Error Decision
• The critical value or values we choose are based on the level of significance and the probability
distribution associated with the test statistic (selected from probability tables).
• If calculates value (test statistic) > critical value, we reject the null hypothesis
• When we reject null hypothesis, we say the result is statistically significant.
• p-value: smallest level of significance at which the null hypothesis can be rejected
FinTree © 2024 FinTree Education Private Ltd
FinTree
There are total 5 types of Hypothesis tests in this Learning Module
Test of a Test of a
Single Variance Differences
Chi-Square Test in variances
(DOV)
F- Test
FinTree © 2024 FinTree Education Private Ltd
FinTree
Example 1: Suppose you are analyzing FinTree Equity Fund, During the past 24 months, it has
achieved a mean monthly return of 1.40%, with a sample standard deviation of monthly returns
of 3.80 percent. Given its level of market risk and according to a pricing model, this mutual fund
was expected to have earned a 1.20 percent mean monthly return during that time period.
Assuming returns are normally distributed, are the actual results consistent with a population
mean monthly return of 1.20 percent?
Formulate and test a hypothesis that the fund's performance was different than the mean return
of 1.1 percent inferred from the pricing model. Use a 5 percent level of significance.
FinTree Fruit 5: Test concerning differences between means with independent samples (DOM)
FinTree
• We often want to know whether a mean value—for example, a mean return differs for two
groups. Is an observed difference due to chance or to different underlying values for the
mean?
• We test this by drawing a sample from each group.
• Samples have to be from populations that are approximately normally distributed and that
the samples are also independent of each other.
• Our focus in discussing the test of the difference of means is using the assumption that the
population variances are equal (Pooled Variance Method)
Example 1: Suppose we want to test whether the returns of the FinTree High Yield Index, shown
below, are different for two different time periods, Period 1 and Period 2 (independent Sample)
Period 1 Period 2
Standard Deviation 5% 6%
Note that these periods are of different lengths and the samples are independent; that is,
there is no pairing of the days for the two periods.
Is there a difference between the mean daily returns in Period 1 and in Period 2,
using a 5% level of significance?
FinTree © 2024 FinTree Education Private Ltd
FinTree
Define Hypothesis
399 + 499
= 0.003111
= (1.2% -1.5%) - 0
= - 0.8017
Compare with Critical Values Since Test Statistic (-0.8017) falls between
Critical value range of -1.96 and +1.96,
Degrees of Freedom = 400+500-2 = 898 We FAIL TO REJECT null hypothesis.
Look up 5% two tailed We conclude that there is insufficient evidence
t- distribution value
to indicate that the returns are different for the
two time periods
-0.8017
-1.96 +1.96
FinTree © 2024 FinTree Education Private Ltd
FinTree Fruit 6: Test Concerning Differences between Means with Dependent Samples (paired comparison test)
FinTree
• The test of paired comparisons is more powerful than the test of the difference in the means
(pooled variance) because by using the common element (such as the same periods or
companies), we eliminate the variation between the samples that could be caused by
something other than what we are testing.
Suppose we want to compare the returns of the FinTree High Yield Index with those of the
FinTree BBB Index. We collect data over the same 1200 days for both indexes and calculate their
means and standard deviations as shown below
FinTree
Define Hypothesis Calculate Test Statistics Compare with Critical Values
FinTree
• Example 1: FinTree Equity Fund, a small cap growth fund that has been in existence for only
24 months. During this period, FinTree Equity achieved a mean monthly return of 2% and a
standard deviation of monthly returns of 3.50%
• Using a 5 percent level of significance, test whether the standard deviation of returns is less
than 4 percent. Recall that the standard deviation is the square root of the variance, hence a
standard deviation of 4 percent or 0.04, is a variance of 0.0016.
Solution:
FinTree
• You are investigating whether the population variance of returns on a FinTree Broad Market t
index changed after a change in market regulation. The first 121 weeks occurred before the
regulation change, and the second 61 weeks occurred after the regulation change. The
variance before the change was 8 and variance after was 3.9.
• Are the variance of returns different before the regulation change versus after the regulation
change?
Solution:
= 2.05 FT
1.47 Rejection Area
FinTree
• Parametric Test (all tests done so far) have two important characteristics:
• The nonparametric test will frequently involve the conversion of observations (or a function of
observations) into ranks according to magnitude, and sometimes it will involve working with
only “greater than” or “less than” relationships (using the + and − signs to denote those
relationships).
• One must refer to specialized statistical tables to determine the rejection points of the test
statistic.
• If the assumptions of the parametric test are met, the parametric test (where available) is
generally preferred over the nonparametric test because the parametric test may have more
power, that is, a greater ability to reject a false null hypothesis.
FinTree © 2024 FinTree Education Private Ltd
FinTree
Tests Parametric Tests Non-Parametric Tests
FinTree
• The parametric pairwise correlation coefficient is often referred to as Pearson correlation, the
bivariate correlation, or simply the correlation.
• Correlation is calculated as CovarianceXY/VarianceX
• Positive Covariance → Positive Correlation → Positive Slope Coefficient (beta)
• If the two variables are normally distributed, we can test to determine whether the null
hypothesis (H0: ρ = 0) should be rejected using the sample correlation, r. The formula for the
t-test is as follows:
• As the sample sizes increase as ever-larger datasets are examined, the null hypothesis is
almost always rejected (power of test increases) and other tools of data analysis must be
applied.
FinTree © 2024 FinTree Education Private Ltd
FinTree
Example 1: Correlation between two variables is 0.5, perform a two tailed hypothesis test
assuming sample size of a) 12
Solution: a) n = 12
FinTree
Example 2: Correlation between two variables is 0.5, perform a two tailed hypothesis test
assuming sample size of b) 32
Solution: a) n = 12
FinTree
• When we believe that the population departs from normality, we can use a test based on the
Spearman rank correlation coefficient.
• It is calculated based on ranks of the variables
• STEPS to calculate Spearman rank correlation:
1. Rank the observations on X from largest to smallest. Assign the number 1 to the observation
with the largest value, the number 2 to the observation with second largest value, and so on.
In case of ties, assign to each tied observation the average of the ranks that they jointly
occupy. For example, if the third and fourth largest values are tied, we assign both observa-
tions the rank of 3.5 (the average of 3 and 4).
2. Perform the same procedure for the observations on Y.
3. Calculate the difference, di , between the ranks for each pair of observations, then calculate
di2 (Square the differences in ranks).
4. Calculate Spearman rank correlation as follows
FinTree
Solution:Spearman Rank Correlation can be calculated in following steps:
6 * 30
=1-
7*(72 - 1)
= 0.4642
FinTree © 2024 FinTree Education Private Ltd
FinTree
The Hypothesis Test of Spearman
Rank Correlation Depends on
Sample Size.
FinTree
• When faced with categorical or discrete data, we cannot use the methods that we have
discussed up to this point to test whether the classifications of such data are independent.
• When classification of the data type is discrete, so we cannot use correlation to assess the
relationship between two variables.
• For example, if we have 50 fund categorized based on size (Small, Mid and Large Caps) and
style (Value, Growth), the contingency table will look as follows:
Value 5 5 20 30
Growth 10 6 4 20
TOTAL 15 11 24 50
• If we want to test whether a relationship exists between the size and investment type, we can
perform a test of independence using a nonparametric test statistic that is chi-square
distributed:
• m = the number of cells in the table, which is the number of groups in the first class
multiplied by the number of groups in the second class;
• Oij = the number of observations in each cell of row i and column j (i.e., observed
frequency); and
• Eij = the expected number of observations in each cell of row i and column j, assuming
independence (i.e., expected frequency).
• Degrees of freedom = (r − 1)(c − 1), where r is the number of rows and c is the number of
columns.
FinTree © 2024 FinTree Education Private Ltd
FinTree
Example 1: Perform the test of independence using contingency table created in FF 4 for 50
funds.
STEP 1: Calculate m as multiplication of number of rows and columns. As we have 2 rows and
3 columns , m = 2*3 = 6
STEP 2 : Calculate expected value for each cell using Total of Row* Total Column
Total of Table
TOTAL 15 11 24 50
FinTree © 2024 FinTree Education Private Ltd
FinTree
STEP 4: For each cell, calculate (Actual frequency - Expected frequency)2
Expected Frequency
STEP 6: Test Statistic (chi-Squared) is calculated as total of these scaled squared deviations
which will be 1.78 + 0.39 + 2.18 + 2.67 + 0.036 + 3.27 = 10.326
STEP 7: Degrees of Freedom are calculated as (no. of columns -1) * (no. of rows -1)
= (3-1)*(2-1) = 2
FinTree
Define Hypothesis Calculate Test Statistics Compare with Critical Values
are independent
10.3
Ha: Fund size and investment type are Fail to Reject Rejection Area
= 10.326 (STEP 6) 5.99
related, so these classifications
are not independent. Since Test Statistic (10.32) falls on the right of
Critical value of 5.99,
We REJECT null hypothesis.
Which Means, the fund size and investment
type are not independent.
Notice the difference in how we read the table for two Chi Squared tests we have learnt so far.
When we tested for Variance in one of the earlier learning module, we did a left tailed test.
Therefore for 5% level of significance, we used 95% column.
However, this test is designed to be a right tailed test. Therefore for 5% level of significance,
we will use 5% Column.
FinTree © 2024 FinTree Education Private Ltd
FinTree
• We can visualize the contingency table in a graphic referred to as a mosaic.
• In a mosaic, a grid reflects the comparison between the observed and expected frequencies.
FinTree
Basics of Regression:
• Why: Regression helps predict and understand relationships between variables.
• How: It makes a math model showing the connection between them.
• Use: Used in many fields for predictions and decision-making
• Dependent Variable: Variable you are seeking to explain (y Variable)
• Independent Variable: Variable you are using to explain changes in the dependent variable
(x variable) Error
Term
• Simple Linear Regression looks like : y = a + b*x1 + ε
Independent
Variable
Dependent Slope
Variable Intercept Coefficient
• The variation of Y is often referred to as the sum of squares total (SST), or the total sum of
squares.
• The goal is to fit a line to the observations on Y and X to minimize the squared deviations
from the line; this is the least squares criterion, hence, the name least squares regression.
• Because of its common use, linear regression is often referred to as ordinary least squares
(OLS) regression.
• Slope Coefficient (b) = Covariancex,y / Variancex
• Alternate Formula Slope Coefficient (b) = Correlationx,y * (SDx / SDy)
• The intercept is the value of the dependent variable if the value of the independent variable
is zero
• The slope is the change in the dependent variable for a one-unit change in the independent
variable
FinTree © 2024 FinTree Education Private Ltd
FinTree
• A cross-sectional regression involves many observations of X and Y for the same time period
• Time-series data use many observations from different time periods for the same company,
asset class etc
1. Linearity: The relationship between the dependent variable, Y, and the independent variable,
X, is linear.
If the relationship between the independent and dependent variables is nonlinear in the
parameters, estimating that relation with a simple linear regression model will produce invalid
results: The model will be biased, because it will under and overestimate the dependent
variable
• Another implication of this assumption
is that the independent variable, X, must
not be random; that is, it is
non-stochastic. (However residuals
should be random)
• If the independent variable is random,
there would be no linear relation
between the dependent and independent
variables
CFA Curriculum, Volume 1, pg no. 274
2. Homoskedasticity: The variance of the regression residuals is the same for all Observations.
If the residuals are not homoskedastic, that is, if the variance of residuals differs across obser-
vations, then we refer to this as heteroskedasticity.
3. Independence: The observations, pairs of Ys and Xs, are independent of one another (read
observation as “error terms”).
This implies the regression residuals are uncorrelated across observations.
4. Normality: The regression residuals are normally distributed. For large sample sizes, we may
be able to drop the assumption of normality by appealing to the central limit theorem.
FinTree © 2024 FinTree Education Private Ltd
FinTree
Sources of Sum of Degrees Mean
F Value
Variation Squares Of Freedom Square
Number of Independent
SSR/ k MSR/MSE
Explained Variables (k)
∑(Predicted Y- Mean Y) 2 Called as Mean Sum of F value is used for
Sum of squares regression (SSR) (1 for simple linear
Regression (MSR) Hypothesis Testing
Regression)
SSE/ n-k1
Error
∑(Actual Y- Predicted Y)2 n-k-1 Called as Mean Sum of
Sum of squares error (SSE)
Error (MSE)
• Standard Error of Estimate (se) is also known as also known as the standard error of the regression
or the root mean square error.
• The se is a measure of the distance between the observed values of the d dent variable and those
predicted from the estimated regression, the smaller the se, the better the fit of the model.
• The se, along with the coefficient of determination and the F-statistic, is a measure of the goodness
of the fit of the estimated regression line.
• Unlike the coefficient of determination and the F-statistic, which are relative measures of fit,
the standard error of the estimate is an absolute measure of the distance of the observed dependent
variable from the regression line
• Se = √ MSE
• The coefficient of determination (R2), also referred to as the R-squared, is the percentage of
the variation of the dependent variable that is explained by the independent variable:
• R2 = Explained Variation/ Total Variation
• For Simple Linear Regression, it can also be calculated a square of correlation Coefficient.
• To see if our regression model is be statistically meaningful, we will need to construct an
F-distributed test statistic.
FinTree © 2024 FinTree Education Private Ltd
FinTree
Example 1: An analyst regressed returns on a stock with returns on the index with resulting
regression equation Returns on stock = 5% + 1.2*Returns on Index + ε.
The standard error of the slope coefficient is 0.2520. Sample Size is 25. Perform a test of
significance at 95% Level of Confidence.
Solution:
t stat =
Degrees of Freedom = 25-2 = 23
H0: Slope (b1) = 0 (NULL hypothesis) (Slope Coeff. - Hypothesized Value)/ S.E.
FinTree
• Hypothesis Test for the intercept is done in the same way. We just replace standard error of
slope coefficient with standard error of the intercept.
• An interesting feature of simple linear regression is, that is slope coefficient between two
variable x & y, is statistically significant, we can also conclude that their correlation coefficient
is also statistically significant. Test Statistic of Slope Coefficient = Test Stat. of Correlation
Coefficient.
• Another interesting feature of simple linear regression is that the test-statistic used to test the
fit of the model (i.e., the F-distributed test statistic) is related to the calculated t-statistic used
to test whether the slope coefficient is equal to zero: t2 = F.
• Indicator variable, or dummy variable, that takes on only the values 0 or 1 as the indepen-
dent variable.
• We perform hypothesis testing in the same manner as if the independent variable were a
continuous variable
FinTree Fruit 9 : Prediction Using Simple Linear Regression and Prediction Intervals
FinTree
Example 1: Let’s assume we have a regression equation as y = 10 + 2*x + ε, assume that value of
x= 3, calculate value of y.
Y = 10 + 2*3 = 16
Example 2: Let’s assume, that sample size is 22, and standard error of forecast is 3. Build a 95%
prediction interval.
Degrees of Freedom = 22-2 = 20
Critical Value from the t-distribution will be 2.086
Margin of Error will be calculated as (tc * Standard Error of forecast) = 2.086*3 = 6.258
Subtracting and adding margin of error from forecasted value: 16 ± 6.258 = 9.742 to 22.258
Interpretation: There is 95% probability that true value of Y variable (given x= 3) will be
between 9.742 to 22.258
FinTree
Example 1: Based on the data below
a) Build equation for simple Linear Regression
b) Generate ANOVA Table
c) Calculate R-Squared and Standard Error of Regression (Standard Error of Estimate SEE)
d) Perform two tailed hypothesis test on slope coefficient @ 95% CL
e) Perform one tailed hypothesis test on slope coefficient with hypothesized value of 1 @95%
CL
f) Perform F test @ 95% CL
g) Predict value of y , assuming forecasted value of x is 9.
h) Calculated standard error of the forecast
i) Build 95% Prediction interval around predicted interval value of y.
Solution a) Insert data in data function of the calculator (2nd 6) and go to stat function (2nd 7),
make sure the calculator is on the LIN mode (if not, press 2nd set) and look for a and b value.
a is intercept and b is slope coefficient. The regression equation should be:
Y = 2.18 + 3.05 * x + ε
b) ANOVA Table
FinTree © 2024 FinTree Education Private Ltd
FinTree
c) R Squared = Regression Sum of Squares/ Total Sum of Squares
= 159.63 / 162.8 = 98.06%
Also, Note that correlation coefficient between X and Y variable is 0.9902, the square of
correlation 0.99022 also produced R2 value of 98.06%.
Standard Error of Regression (Standard Error of Estimate SEE) is calculated as square root of
Mean Sum of Square which is 1.054 as per ANOVA Table.
Therefore, SEE = √1.054 = 1.026
d)
H0: Slope (b1) = 0 (NULL hypothesis) (Slope Coeff. - Hypothesized Value)/ S.E.
FinTree
e)
t stat =
Degrees of Freedom = 25-2 = 23
H0: Slope (b1) ≤ 1(NULL hypothesis) (Slope Coeff. - Hypothesized Value)/ S.E.
= 8.26 2.03
towards right(→), it indicates right tailed test.
Rejection will be on the right hand side of the Since Test Statistic (8.26) falls on the right of 2.03.
We REJECT null hypothesis.
Critical Value) Which means, slope coefficient is greater
than 1 at 5% significance.
FinTree
f)
F stat =
Degrees of Freedom Numerator = 1
H0: Slope (b1) = 0 (NULL hypothesis) MSR/ MSE Degrees of Freedom Denominator = 3
= 151.42 10.1
Since we need one tailed 5%, we will look at 5% F Table with Degrees of
Freedom of Numerator (column headings df1) of 1 and Degrees of Freedom
of Denominator of 3 (Row Headings df2).
FinTree © 2024 FinTree Education Private Ltd
FinTree
g) Predicted Value of Y, assuming forecasted value of x is 9.
17.20
FinTree © 2024 FinTree Education Private Ltd
FinTree
i) 95% Prediction interval
Forecasted value of y =29.60 (subpart g)
Standard Error of forecast is 1.5658
t critical value for 5% Significance, with 5-2 = 3 Degrees of Freedom is 3.182 (subpart d)
Subtracting and adding margin of error from forecasted value: 29.60 ± 4.98 = 24.61 to 34.58
Interpretation: There is 95% probability that true value of Y variable (given x= 9) will be
between 24.61 to 34.58.
• Financial and economic data can exhibit complex relationships that may not align perfectly
with a linear model.
• To address this, we often explore modifications to the variables.
• These adjustments aim to enhance the model's ability to capture and represent the
underlying patterns in the data accurately.
• three often-used functional forms, each of which involves log transformation as follows:
• model is useful in
calculating elasticities
FinTree © 2024 FinTree Education Private Ltd
FinTree
FinTree Fruit 2 : Big Data
FinTree © 2024 FinTree Education Private Ltd
FinTree
FinTree Fruit 6 : Text Analytics, NLP & Algorithmic Trading