0% found this document useful (0 votes)
11 views22 pages

BA_CH03

The document discusses investment strategies, focusing on growth versus value investing, and outlines the importance of calculating and interpreting returns and risks associated with mutual funds. It also covers statistical concepts such as measures of central location (mean, median, mode), measures of dispersion, and the Sharpe ratio to evaluate investment performance. Additionally, it introduces concepts of skewness and kurtosis to assess the distribution of investment returns.

Uploaded by

andrewhermanto99
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views22 pages

BA_CH03

The document discusses investment strategies, focusing on growth versus value investing, and outlines the importance of calculating and interpreting returns and risks associated with mutual funds. It also covers statistical concepts such as measures of central location (mean, median, mode), measures of dispersion, and the Sharpe ratio to evaluate investment performance. Additionally, it introduces concepts of skewness and kurtosis to assess the distribution of investment returns.

Uploaded by

andrewhermanto99
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

3 Summary Measure

Introductory Case: Investment Decision

¨ Dorothy Brennan works as a financial advisor at a large investment


firm.
¨ She meets with an inexperienced investor who has some questions
regarding two approaches to mutual fund investing: growth investing
versus value investing.
¨ The investor has heard that growth funds invest in companies whose
stock prices are expected to grow at a faster rate, relative to the overall
stock market
¨ On the other hand, value funds invest in companies whose stock prices
are below their true worth.
¨ The investor has also heard that the main component of investment
return is through capital appreciation in growth funds and through
dividend income in value funds.
3-2
7-2

3-2
7-2
Introductory Case: Investment Decision

¨ The investor shows Dorothy the annual return data for Fidelity’s
Growth Index Fund (Growth) and Fidelity’s Value Index fund.

¨ It is difficult for the investor to draw any conclusions from the data in
their present form.
¨ Dorothy will use the sample information for the following tasks.
1. Calculate and interpret the typical return for these two mutual funds.
2. Calculate and interpret the investment risk for these two mutual funds.
3-3
7-3
3. Determine which mutual fund provides the greater return relative to risk.
3-3
7-3

Farmer’s two sons

¨ An old farmer gives 50 just-born baby pigs to his


two sons, for a test.
¨ Who is better on raising pigs?
¨ The Heavier the better & The less inequal the better

¨ First son takes 30 pigs, and second takes the rest.


¨ 1 year later, they measure their pigs’ weights.

¨ How should we compare the performance ?


Brainstorming
(Assume all the baby pigs were equal in weight)
3-4
7-4

3-4
7-4
4
Comparisons on aggregate value and equality

¨ The Heavier the better

¨ The less inequal the better

3-5
7-5

3-5
7-5
5

Measures of Location

¨ The term central location refers to how numerical data


tend to cluster around some middle or central value.
¨ Measures of central location attempt to find a typical or
central value that describes a variable.
¨ We will examine the three mostly widely used measures of
central location: mean, median and mode.
¨ Then we discuss a percentile: a measure of relative
position.

3-6
7-6

3-6
7-6
Arithmetic Mean

¨ The arithmetic mean is the primary measure of central location.


¤ Referred to as the mean or the average
¤ Simply add up all the observations and divide by the number of
observations.
¨ The only thing that differs between a population mean and a
sample mean is the notation.
¨ The population mean is denoted as !.
¤ ! observations in the population: "! , "" , … , "#
∑ %!
¤ %= #
¤ % is a parameter
¨ The sample mean is denoted as #.̅
¤ n observations in the sample: "! , "" , … , "&
∑ %!
¤ "̅ = & 3-7
7-7
¤ "̅ is a statistic 3-7
7-7

Median

¨ The mean can give a misleading description of the center in


the presence of extremely small or large observations, or
outliers.
¨ The median is another measure of central location not
affected by outliers.
¨ It is the middle value of a data set: there is an equal number
of observations lie above and below the median.
¤ Arrange the data in ascending order
¤ The middle value if the number of observations is odd
¤ The average of the two middle values if the number of observations
is even
¨ If the mean and median are different, it is likely the variable
contains outliers. 3-8
7-8

3-8
7-8
Median – US household income

¨ In 2022,
¤ AverageUS income : $102,310
¤ Median US income : $70,181

¨ Percentile
¤ $70,181: 50%
¤ $102,310 : 66%

¨ Example: 5 emploees earn, $4K,$4K,$5K,$5K,$50K


¤ Average : $13.6K
¤ Median: $5K

3-9
7-9

3-9
7-9
9

Mode

¨ The mode of a variable is the observation that occurs most


frequently.
¨ There can be more than one or no modes.
¤ One mode: unimodal
¤ Two modes: bimodal
¤ Two more more mode: multimodal
¨ The model is less useful when there are more than three
modes.
¨ The mode is a useful summary for a categorical variable.

3-10
7-10

3-10
7-10
Measures of Location
¨ Example: The mean and median for the Growth and Value
variables from the introductory case.

3-11
7-11

3-11
7-11

Descriptive Statistics with Excel


¨ Example continued with Excel

3-12
7-12

3-12
7-12
Descriptive Statistics with Excel
¨ Example continued with Excel

3-13
7-13

3-13
7-13

Descriptive Statistics with R

¨ Example continued With R

3-14
7-14

3-14
7-14
Median

¨ The median is the middle observation. Half of the


observations fall below and above the median.
¨ The median is also called the 50th percentile.
¨ A percentile is technically a measure of location, however it
is also used as a measure of relative position.
¨ The pth percentile divides a variable into two parts.
¤ Approximately p percent of the observations are less than the pth
percentile.
¤ Approximately (100−p) percent of the observations are greater than
the pth percentile.

3-15
7-15

3-15
7-15

Quartiles

¨ Example: The quartiles of the Growth and Value variables.

3-16
7-16

3-16
7-16
Subset

¨ As discussed in Chapter 2, sometimes it is useful to


subset the observations in a sample or a
population.
¨ This process often reveals important information
that would not be uncovered if the variable is
analyzed for the entire data set.
¨ With Excel: use the AVERAGEIF function
¨ With R: use the function tapply

3-17
7-17

3-17
7-17

Subset -Example

¨ Example: Compute the corresponding average spending


for each of the product categories by female and male
customers .

3-18
7-18

3-18
7-18
Subset - Example

¨ Example continued

¨ With Excel:

¨ With R:

3-19
7-19

3-19
7-19

Measures of Dispersion, Shape, and Association

¨ Measures of central location reflect the typical or central


value.
¨ Measures of dispersion gauge the underlying variability of
the variable.
¨ Measures of shape reveal whether the distribution of the
variable is symmetric or if the tails are more or less extreme
than the normal distribution.
¨ Measures of association show whether two numeric
variables have a linear relationship.

3-20
7-20

3-20
7-20
Dispersion

¨ Measures of dispersion are numerical values.


¤ 0 indicates all the observations are identical
¤ Increases as the observations become more diverse
¨ The range is the simplest measure.
¤ Difference between the maximum and minimum
¤ Not good because it focuses solely on extreme observations
¨ The interquartile range (IQR) is the difference between the
third quartile and the first quartile.
¤ )*+ = *' − *!
¤ The range of the middle 50% of the variable
¤ Does not depend on the extreme observations
3-21
7-21

3-21
7-21

Mean Absolute Difference (MAD)

¨ A good measure of dispersion should consider


differences of all observations from the mean (or the
median).
¨ Averaging all differences to the mean yields a value of
zero (positives and negatives cancel out).
¨ The mean absolute difference (MAD) is the average of
the absolute differences between the observations and
the mean.
∑ |#! $%|
¤ Population: &
∑ |#! $#|̅
¤ Sample:
(
3-22
7-22

3-22
7-22
Variance and Standard Deviation

¨ The variance and the standard deviation are the two


most widely used measures of dispersion.
¤ Compute the average of the squared differences
¤ The squaring of the differences emphasizes larger differences
∑ #! $% "
¨ The population variance is denoted ! !, ! !
= &
.
∑ #! $#̅ "
¨ The sample variance is denoted $ ! , $ ! = ($)
.
¨ The units of each are the units of the underlying
variable squared.
¨ The standard deviation of each is the positive square
root. 3-23
7-23

3-23
7-23

Measures of Dispersion

¨ With Excel: MIN, MAX, PERCNTILE.INC, AVEDEV,


VAR.S, STDEV.S, VAR.P, STDEV.P
¨ With R: min, max, quartile, mad, var, sd
¨ Example: Dispersion statistics for the Growth variable.
¤ Range: 120.38
¤ IQR: 34.1125
¤ MAD: 17.491
¤ Variance: 566.406
¤ Standard deviation: 23.799

3-24
7-24

3-24
7-24
Coefficient of Variation (CV)

¨ In some instances, analysis entails comparing the


variability of two or more variables that have different
means or units of measurement.
¨ The coefficient of variation (CV) is a relative measure of
dispersion and adjusts for differences in the magnitudes of
the means.
¨ The coefficient of variation (CV) for a variable is calculated
by dividing its standard deviation by its mean.
(
¤ Sample: ,- =

*
¤ Population: ,- = +

3-25
7-25

3-25
7-25

Coefficient of Variation

¨ Example: Calculate and interpret the coefficient of variation


(CV) for the Growth and Value variables.
! $%.'((
¨ Growth: !" = #̅ = )*.'** = 1.511
! )'.('(
¨ Value:!" = #̅ = )$.++* = 1.498
¨ As measured by CV, the Growth variable has slightly more
relative dispersion as compared to Value

3-26
7-26

3-26
7-26
Sharpe Ratio

¨ In general, investments with higher returns also carry


higher risk.
¨ The average return represents an investor’s reward,
whereas variance, or equivalently standard deviation,
corresponds to risk.
¨ The Sharpe ratio is the “reward-to-variability” ratio.
#̅ , ,--
¤ Calculated as !,
¤ *. is the mean return for a risk-free asset such as a
Treasury bill (T-bill)
¤ The numerator measures the extra reward for the added
risk, and the difference is excess return
¨ The higher the Sharpe ratio, the better the investment
compensates its investors for risk. 3-27
7-27

3-27
7-27

Sharpe Ratio

¨ Example: Compute the Sharpe ratios for the Growth and


Value fund assuming *. = 2%.
¨ Because the standard deviation of Growth is greater than
the standard deviation of Value, 23.799 > 17.979, Growth is
considered riskier than Value.
)*.'**,$
¨ Growth CV: $%.'((
= 0.58
)$.++*,$
¨ Value CV: )'.('(
=0.56
¨ Growth provides a higher Sharpe ratio than Value (0.58 >
0.56); therefore, Growth offered more reward per unit of
risk.
3-28
7-28

3-28
7-28
Skewness

¨ A symmetric distribution is one that is a mirror image of


itself on both sides of its center.
¨ The skewness coefficient measures the degree to which a
distribution is not symmetric about its mean.
/ #, ,#̅ %
¤ Calculated as ∑
/,) /,$ !
¤ Symmetric: coefficient of 0 (normal)
¤ Positively skewed: positive coefficient

¤ Negatively skewed: negative coefficient

3-29
7-29

3-29
7-29

Kurtosis Coefficient and tails

¨ The kurtosis coefficient is a summary measure that tells us whether


the tails of the distribution are more or less extreme than the normal
distribution.
¨ A distribution that has tails that are more extreme than the normal
distribution is leptokurtic (lepto from the Greek word for slender).
¨ A return distribution is often leptokurtic, which means that its tails are
longer than the normal distribution—implying the existence of outliers.
¨ If a return distribution is in fact leptokurtic, but we assume that it is
normally distributed in statistical models, then we will underestimate
the likelihood of very bad or very good returns.
¨ A platykurtic (platy from the Greek word for broad) distribution is one
that has shorter tails, or tails that are less extreme, than the normal
distribution.
3-30
7-30

3-30
7-30
Kurtosis Coefficient and tails

(((*+) #! $#̅ 0
¨ The kurtosis coefficient is calculated as ∑ .
($+ ($- ($. /
¨ The kurtosis coefficient of a normal distribution is 3.
¤ Kurtosis more than three: more extreme tails than a normal
distribution
¤ Kurtosis less than three: less extreme tail than a normal
distribution
(($+)"
¨ The excess kurtosis is the kurtosis coefficient minus 3× .
($- ($.
¤ Positive: more extreme tails than a normal distribution
¤ Negative: less extreme tail than a normal distribution

3-31
7-31

3-31
7-31

Shape - Example

¨ Example: Interpret the skewness and the kurtosis


coefficients for the Growth and Value variables.
¨ The skewness coefficient and the (excess) kurtosis
coefficient for Growth are −0.029 and 0.974, respectively.
¨ These values imply that the return distribution for Growth
is slightly negatively skewed, and the distribution has
longer tails than the normal distribution.
¨ With a skewness coefficient of −1.024 and a (excess)
kurtosis coefficient of 1.853, the return distribution for
Value is also negatively skewed, and it too has longer tails
than the normal distribution.
3-32
7-32

3-32
7-32
Covariance

¨ Measures of association quantify the direction and strength of


the linear relationship between two numeric variables.
¨ It is important to point out that these measures are not
appropriate when the underlying relationship between the
variables is nonlinear.
¨ Covariance measures the direction of the linear relationship.
∑ %! /+" .! /+#
¤ Population: *%. = #
∑ %! /%̅ .! /.0
¤ Sample: ,%. = &/!
¤ Negative: negative linear relationship
¤ Positive: positive linear relationship
¤ Zero: no linear relationship
¨ Covariance is hard to interpret because it is sensitive to the units
of measurement. We cannot comment on the strength of the
linear relationship.
3-33
7-33

3-33
7-33

Correlation Coefficient

¨ The correlation coefficient describes both the direction and


strength of the linear relationship between x and y.
*"#
¤ Population: -%. =
*" *#
("#
¤ Sample: .%. =
(" (#
¤ Negative: negative linear relationship
¤ Positive: positive linear relationship
¤ Zero: no linear relationship
¨ The correlation is unit-free.
¨ The correlation is between −1 and 1.
¤ Correlation is −1: perfect negative linear relationship
¤ Correlation is 0: not linearly related
¤ Correlation is 1: perfect positive linear relationship
3-34
7-34

3-34
7-34
Correlation Coefficient

¨ Example: The correlation between the Growth and Value


variables.
¨ With Excel: CORREL
¨ With R:

¨ Indicates that the variables have a moderate, positive linear


relationship
3-35
7-35

3-35
7-35

Outliers

¨ Extremely large or small observations for a variable are


referred to as outliers
¨ Outliers can unduly influence summary statistics, such as
the mean or the standard deviation.
¨ In a small sample, the impact of outliers is particularly
pronounced.
¨ Sometimes, outliers may just be due to random variations,
in which case the relevant observations should remain in
the data set.
¨ Alternatively, outliers may indicate bad data due to
incorrectly recorded observations or incorrectly included
observations in the data set.
¨ In such cases, the relevant observations should be corrected
or simply deleted from the data set.
3-36
7-36

3-36
7-36
Outliers

¨ There are no universally agreed upon methods for treating


outliers.
¨ It is important to be able to identify potential outliers so
that one can take corrective actions, if needed.
¨ We first construct a boxplot which is an effective tool for
identifying outliers.
¨ A series of boxplots are also useful when comparing similar
information for a variable gathered at another place or
time.
¨ Another method for detecting outliers is to calculate z-
scores.
3-37
7-37

3-37
7-37

Boxplot and Outlier

¨ A common way to quickly summarize a variable is to use a five-number


summary.
¨ A five-number summary shows the minimum, the quartiles (Q1, Q2, and Q3),
and the maximum.
¨ A boxplot, also referred to as a box-and-whisker plot, is a way to graphically
display a five-number summary.
¤ Draw a box encompassing the first and third quartiles.
¤ Draw a dashed vertical line in the box at the median.
¤ Calculate the IQR. Draw a whisker that extends from Q1 to the minimum value that is
not further from 1.5*IQR from Q1.
¤ Similarly, draw a line that extends from Q3 to the maximum value that is not farther
than 1.5*IQR from Q3.
¤ Use an asterisk (or another symbol) to indicate observations that are farther than
1.5*QQR from the box. These observations are considered outliers.

3-38
7-38

3-38
7-38
Boxplot and Outlier

¨ A boxplot is also used to informally gauge the shape of the distribution.


¨ Symmetry is implied if the median is in the center of the box and the
left/right whiskers are equidistant from their respective quartiles.
¨ If the median is left of center and the right whisker is longer than the
left whisker, then the distribution is positively skewed.
¨ Similarly, if the median is right of center and the left whisker is longer
than the right whisker, then the distribution is negatively skewed.
¨ If outliers exist, we need to include them when comparing the lengths
of the left and right whiskers.

3-39
7-39

3-39
7-39

Boxplot

¨ Example: Construct a boxplot for the Growth and Value variables from
the introductory case.
¨ Excel: use the Box and Whisker function
¨ With R:

3-40
7-40

3-40
7-40
Empirical Rule

¨ The empirical rule makes precise statements regarding the percentage


of observations that fall within a specified number of standard
deviations from the mean.
¨ Assume the observations are drawn from a relatively symmetric and
bell-shaped distribution, perhaps by an inspection of its histogram
¤ Approximately 68% of all observations fall in the interval "̅ ± $.
¤ Approximately 95% of all observations fall in the interval "̅ ± 2$.
¤ Approximately 100% of all observations fall in the interval "̅ ± 3$.

3-41
7-41

3-41
7-41

Detecting Outliers

¨ It is often instructive to use the mean and the standard


deviation to find the relative location of an observation.
¨ We use the z-score to find the relative position of an
observation by dividing the difference of the observation
#,#̅
from the mean by the standard deviation: / = !
.
¨ A z-score is a unitless measure.
¨ It measures the distance of an observation from the mean in
terms of standard deviations.
¨ Converting observations into z-scores is also called
standardizing the observations.
3-42
7-42

3-42
7-42
Standardization

¨ Standardization is a common technique used in data


analytics when dealing with variables measured using
different scales.
¨ If the distribution of a variable is relatively symmetric and
bell-shaped, we can also use z-scores to detect outliers.
¤ Since almost all observations fall within three standard deviations of
the mean, it is common to treat an observation as an outlier if its z-
score is more than 3 or less than −3.
¤ Such observations must be reviewed to determine if they should
remain in the data set.

3-43
7-43

3-43
7-43

Z score

¨ Example: What are the z-scores for the minimum and


maximum values of the Growth and Value variables?

,0+.(+,)*.'**
¨ Growth minimum: / = $%.'((%
= −2.38
¨ Growth maximum: / = 2.68
¨ Value minimum: / = −3.28
¨ Value maximum: / = 1.78
3-44
7-44

3-44
7-44

You might also like