Curriculum Module 2 Questions
Curriculum Module 2 Questions
PRACTICE PROBLEMS
1. Published ratings on stocks ranging from 1 (strong sell) to 5 (strong buy) are
examples of which measurement scale?
A. Ordinal
B. Continuous
C. Nominal
2. Data values that are categorical and not amenable to being organized in a logical
order are most likely to be characterized as:
A. ordinal data.
B. discrete data.
C. nominal data.
B. Nominal
C. Continuous
B. discrete data.
C. continuous data.
B. discrete data.
C. nominal data.
6. Each individual column of data in the table can be best characterized as:
A. panel data.
B. time-series data.
C. cross-sectional data.
7. Each individual row of data in the table can be best characterized as:
A. panel data.
B. time-series data.
C. cross-sectional data.
B. time-series data.
C. cross-sectional data.
B. represents the actual number of observations counted for each unique value
of the variable.
10. An investment fund has the return frequency distribution shown in the following
exhibit.
−10.0 to −7.0 3
−7.0 to −4.0 7
−4.0 to −1.0 10
−1.0 to +2.0 12
+2.0 to +5.0 23
+5.0 to +8.0 5
11. An analyst is using the data in the following exhibit to prepare a statistical report.
The cumulative relative frequency for the bin −1.71% ≤ x < 2.03% is closest to:
A. 0.250.
B. 0.333.
C. 0.583.
Bond Rating
Sector A AA AAA
Communication Services 25 32 27
Consumer Staples 30 25 25
Energy 100 85 30
Health Care 200 100 63
Utilities 22 28 14
B. 85.
C. 215.
13. The relative frequency of AA rated energy bonds, based on the total count, is
© CFA Institute. For candidate use only. Not for distribution.
154 Learning Module 2 Organizing, Visualizing, and Describing Data
closest to:
A. 10.5%.
B. 31.5%.
C. 39.5%.
14. The following is a frequency polygon of monthly exchange rate changes in the US
dollar/Japanese yen spot exchange rate for a four-year period. A positive change
represents yen appreciation (the yen buys more dollars), and a negative change
represents yen depreciation (the yen buys fewer dollars).
15
10
0
–5 –3 –1 1 3
Return Interval Midpoint (%)
15. A bar chart that orders categories by frequency in descending order and includes
a line displaying cumulative relative frequency is referred to as a:
A. Pareto Chart.
C. frequency polygon.
16. Which visualization tool works best to represent unstructured, textual data?
A. Tree-Map
B. Scatter plot
© CFA Institute. For candidate use only. Not for distribution.
Practice Problems 155
C. Word cloud
18. A line chart with two variables—for example, revenues and earnings per share—
is best suited for visualizing:
A. the joint variation in the variables.
20. Which valuation tool is recommended to be used if the goal is to make compari-
sons of three or more variables over time?
A. Heat map
Frequency
8
0
–37 –32 –27 –22 –17 –12 –7 –2 3 8 13 18 23 28 33
to to to to to to to to to to to to to to to
–32 –27 –22 –17 –12 –7 –2 3 8 13 18 23 28 33 38
Return Intervals (%)
B. 8% to 13%.
C. 13% to 18%.
22. Based on the previous histogram, the distribution is best described as being:
A. unimodal.
B. bimodal.
C. trimodal.
23. The annual returns for three portfolios are shown in the following exhibit. Portfo-
lios P and R were created in Year 1, Portfolio Q in Year 2.
B. Portfolio Q is 4.0%.
© CFA Institute. For candidate use only. Not for distribution.
Practice Problems 157
24. At the beginning of Year X, an investor allocated his retirement savings in the
asset classes shown in the following exhibit and earned a return for Year X as also
shown.
Asset Allocation
Asset Class (%) Asset Class Return for Year X (%)
B. 5.3%.
C. 6.3%.
25. The following exhibit shows the annual returns for Fund Y.
Fund Y (%)
Year 1 19.5
Year 2 −1.9
Year 3 19.7
Year 4 35.0
Year 5 5.7
B. 15.6%.
C. 19.5%.
26. A portfolio manager invests €5,000 annually in a security for four years at the
prices shown in the following exhibit.
Year 1 62.00
Year 2 76.00
Year 3 84.00
Year 4 90.00
27. When analyzing investment returns, which of the following statements is cor-
rect?
A. The geometric mean will exceed the arithmetic mean for a series with
non-zero variance.
Year Return
1 4.5%
2 6.0%
3 1.5%
4 −2.0%
5 0.0%
6 4.5%
7 3.5%
8 2.5%
9 5.5%
10 4.0%
28. The arithmetic mean return over the 10 years is closest to:
A. 2.97%.
B. 3.00%.
C. 3.33%.
29. The geometric mean return over the 10 years is closest to:
A. 2.94%.
B. 2.97%.
C. 3.00%.
30. The harmonic mean return over the 10 years is closest to:
A. 2.94%.
B. 2.97%.
C. 3.00%.
© CFA Institute. For candidate use only. Not for distribution.
Practice Problems 159
B. 2.53%.
C. 7.58%.
32. The target semideviation of the returns over the 10 years if the target is 2% is
closest to:
A. 1.42%.
B. 1.50%.
C. 2.01%.
160
154.45
140
120
114.25
100 100.49
80 79.74
60
51.51
40
B. 100.49.
C. 102.98.
B. 25.74.
C. 34.51.
© CFA Institute. For candidate use only. Not for distribution.
160 Learning Module 2 Organizing, Visualizing, and Describing Data
35. The fourth quintile return for the MSCI World Index is closest to:
A. 20.65%.
B. 26.03%.
C. 27.37%.
36. For Year 6–Year 10, the mean absolute deviation of the MSCI World Index total
returns is closest to:
A. 10.20%.
B. 12.74%.
C. 16.40%.
37. Annual returns and summary statistics for three funds are listed in the following
exhibit:
38. The average return for Portfolio A over the past twelve months is 3%, with a stan-
© CFA Institute. For candidate use only. Not for distribution.
Practice Problems 161
dard deviation of 4%. The average return for Portfolio B over this same period
is also 3%, but with a standard deviation of 6%. The geometric mean return of
Portfolio A is 2.85%. The geometric mean return of Portfolio B is:
A. less than 2.85%.
B. equal to 2.85%.
39. The mean monthly return and the standard deviation for three industry sectors
are shown in the following exhibit.
B. materials.
C. industrials.
B. 0.42.
C. 2.41.
B. having no skewness.
C. positively skewed.
© CFA Institute. For candidate use only. Not for distribution.
162 Learning Module 2 Organizing, Visualizing, and Describing Data
42. Compared to the normal distribution, this sample’s distribution is best described
as having tails of the distribution with:
A. less probability than the normal distribution.
43. An analyst calculated the excess kurtosis of a stock’s returns as −0.75. From this
information, we conclude that the distribution of returns is:
A. normally distributed.
44. A correlation of 0.34 between two variables, X and Y, is best described as:
A. changes in X causing changes in Y.
B. Spurious correlation
Returns (%)
© CFA Institute. For candidate use only. Not for distribution.
Practice Problems 163
46. Without calculating the correlation coefficient, the correlation of the portfolio
returns and the bond index returns is:
A. negative.
B. zero.
C. positive.
47. Without calculating the correlation coefficient, the correlation of the portfolio
returns and the real estate index returns is:
A. negative.
B. zero.
C. positive.
48. Consider two variables, A and B. If variable A has a mean of −0.56, variable B
has a mean of 0.23, and the covariance between the two variables is positive, the
correlation between these two variables is:
A. negative.
B. zero.
C. positive.
© CFA Institute. For candidate use only. Not for distribution.
164 Learning Module 2 Organizing, Visualizing, and Describing Data
SOLUTIONS
1. A is correct. Ordinal scales sort data into categories that are ordered with respect
to some characteristic and may involve numbers to identify categories but do not
assure that the differences between scale values are equal. The buy rating scale
indicates that a stock ranked 5 is expected to perform better than a stock ranked
4, but it tells us nothing about the performance difference between stocks ranked
4 and 5 compared with the performance difference between stocks ranked 1 and
2, and so on.
2. C is correct. Nominal data are categorical values that are not amenable to being
organized in a logical order. A is incorrect because ordinal data are categorical
data that can be logically ordered or ranked. B is incorrect because discrete data
are numerical values that result from a counting process; thus, they can be or-
dered in various ways, such as from highest to lowest value.
3. B is correct. Categorical data (or qualitative data) are values that describe a quali-
ty or characteristic of a group of observations and therefore can be used as labels
to divide a dataset into groups to summarize and visualize. The two types of
categorical data are nominal data and ordinal data. Nominal data are categorical
values that are not amenable to being organized in a logical order, while ordinal
data are categorical values that can be logically ordered or ranked. A is incorrect
because discrete data would be classified as numerical data (not categorical data).
C is incorrect because continuous data would be classified as numerical data (not
categorical data).
4. C is correct. Continuous data are data that can be measured and can take on
any numerical value in a specified range of values. In this case, the analyst is
estimating bankruptcy probabilities, which can take on any value between 0 and
1. Therefore, the set of bankruptcy probabilities estimated by the analyst would
likely be characterized as continuous data. A is incorrect because ordinal data
are categorical values that can be logically ordered or ranked. Therefore, the
set of bankruptcy probabilities would not be characterized as ordinal data. B is
incorrect because discrete data are numerical values that result from a counting
process, and therefore the data are limited to a finite number of values. The pro-
prietary model used can generate probabilities that can take any value between 0
and 1; therefore, the set of bankruptcy probabilities would not be characterized
as discrete data.
5. A is correct. Ordinal data are categorical values that can be logically ordered or
ranked. In this case, the classification of sentences in the earnings call transcript
into three categories (negative, neutral, or positive) describes ordinal data, as the
data can be logically ordered from positive to negative. B is incorrect because
discrete data are numerical values that result from a counting process. In this
case, the analyst is categorizing sentences (i.e., unstructured data) from the earn-
ings call transcript as having negative, neutral, or positive sentiment. Thus, these
categorical data do not represent discrete data. C is incorrect because nominal
data are categorical values that are not amenable to being organized in a logical
order. In this case, the classification of unstructured data (i.e., sentences from
the earnings call transcript) into three categories (negative, neutral, or positive)
describes ordinal (not nominal) data, as the data can be logically ordered from
positive to negative.
such as daily, weekly, monthly, annually, and quarterly. In this case, each column
is a time series of data that represents annual total return (the specific variable)
for a given country index, and it is measured annually (the discrete interval of
time). A is incorrect because panel data consist of observations through time
on one or more variables for multiple observational units. The entire table of
data is an example of panel data showing annual total returns (the variable) for
three country indexes (the observational units) by year. C is incorrect because
cross-sectional data are a list of the observations of a specific variable from multi-
ple observational units at a given point in time. Each row (not column) of data in
the table represents cross-sectional data.
10. A is correct. The relative frequency is the absolute frequency of each bin divided
by the total number of observations. Here, the relative frequency is calculated as:
(12/60) × 100 = 20%. B is incorrect because the relative frequency of this bin is
(23/60) × 100 = 38.33%. C is incorrect because the cumulative relative frequency
of the last bin must equal 100%.
11. C is correct. The cumulative relative frequency of a bin identifies the fraction of
observations that are less than the upper limit of the given bin. It is determined
by summing the relative frequencies from the lowest bin up to and including the
given bin. The following exhibit shows the relative frequencies for all the bins of
© CFA Institute. For candidate use only. Not for distribution.
166 Learning Module 2 Organizing, Visualizing, and Describing Data
The bin −1.71% ≤ x < 2.03% has a cumulative relative frequency of 0.583.
12. C is correct. The marginal frequency of energy sector bonds in the portfolio is
the sum of the joint frequencies across all three levels of bond rating, so 100
+ 85 + 30 = 215. A is incorrect because 27 is the relative frequency for energy
sector bonds based on the total count of 806 bonds, so 215/806 = 26.7%, not the
marginal frequency. B is incorrect because 85 is the joint frequency for AA rated
energy sector bonds, not the marginal frequency.
13. A is correct. The relative frequency for any value in the table based on the total
count is calculated by dividing that value by the total count. Therefore, the rela-
tive frequency for AA rated energy bonds is calculated as 85/806 = 10.5%.
B is incorrect because 31.5% is the relative frequency for AA rated energy bonds,
calculated based on the marginal frequency for all AA rated bonds, so 85/(32 +
25 + 85 + 100 + 28), not based on total bond counts. C is incorrect because 39.5%
is the relative frequency for AA rated energy bonds, calculated based on the
marginal frequency for all energy bonds, so 85/(100 + 85 + 30), not based on total
bond counts.
14. A is correct. Twenty observations lie in the interval “0.0 to 2.0,” and six observa-
tions lie in the “2.0 to 4.0” interval. Together, they represent 26/48, or 54.17%, of
all observations, which is more than 50%.
15. A is correct. A bar chart that orders categories by frequency in descending order
and includes a line displaying cumulative relative frequency is called a Pareto
Chart. A Pareto Chart is used to highlight dominant categories or the most im-
portant groups. B is incorrect because a grouped bar chart or clustered bar chart
is used to present the frequency distribution of two categorical variables. C is
incorrect because a frequency polygon is used to display frequency distributions.
16. C is correct. A word cloud, or tag cloud, is a visual device for representing
unstructured, textual data. It consists of words extracted from text with the size
of each word being proportional to the frequency with which it appears in the
given text. A is incorrect because a tree-map is a graphical tool for displaying
and comparing categorical data, not for visualizing unstructured, textual data. B
is incorrect because a scatter plot is used to visualize the joint variation in two
numerical variables, not for visualizing unstructured, textual data.
17. C is correct. A tree-map is a graphical tool used to display and compare categor-
ical data. It consists of a set of colored rectangles to represent distinct groups,
and the area of each rectangle is proportional to the value of the corresponding
group. A is incorrect because a line chart, not a tree-map, is used to display the
change in a data series over time. B is incorrect because a scatter plot, not a
tree-map, is used to visualize the joint variation in two numerical variables.
changes in the data and underlying trends in a clear and concise way. Often a line
chart is used to display the changes in data series over time. A is incorrect be-
cause a scatter plot, not a line chart, is used to visualize the joint variation in two
numerical variables. C is incorrect because a heat map, not a line chart, is used to
visualize the values of joint frequencies among categorical variables.
19. B is correct. A heat map is commonly used for visualizing the degree of cor-
relation between different variables. A is incorrect because a word cloud, or tag
cloud, not a heat map, is a visual device for representing textual data with the size
of each distinct word being proportional to the frequency with which it appears
in the given text. C is incorrect because a histogram, not a heat map, depicts the
shape, center, and spread of the distribution of numerical data.
20. B is correct. A bubble line chart is a version of a line chart where data points
are replaced with varying-sized bubbles to represent a third dimension of the
data. A line chart is very effective at visualizing trends in three or more variables
over time. A is incorrect because a heat map differentiates high values from low
values and reflects the correlation between variables but does not help in making
comparisons of variables over time. C is incorrect because a scatterplot matrix is
a useful tool for organizing scatterplots between pairs of variables, making it easy
to inspect all pairwise relationships in one combined visual. However, it does not
help in making comparisons of these variables over time.
21. C is correct. Because 50 data points are in the histogram, the median return
would be the mean of the 50/2 = 25th and (50 + 2)/2 = 26th positions. The sum of
the return bin frequencies to the left of the 13% to 18% interval is 24. As a result,
the 25th and 26th returns will fall in the 13% to 18% interval.
22. C is correct. The mode of a distribution with data grouped in intervals is the
interval with the highest frequency. The three intervals of 3% to 8%, 18% to 23%,
and 28% to 33% all have a high frequency of 7.
23. C is correct. The median of Portfolio R is 0.8% higher than the mean for Portfolio
R.
24. C is correct. The portfolio return must be calculated as the weighted mean re-
turn, where the weights are the allocations in each asset class:
(0.20 × 8%) + (0.40 × 12%) + (0.25 × −3%) + (0.15 × 4%) = 6.25%, or ≈ 6.3%.
25. A is correct. The geometric mean return for Fund Y is found as follows:
= 14.9%.
26. A is correct. The harmonic mean is appropriate for determining the average price
per unit. It is calculated by summing the reciprocals of the prices, then averaging
that sum by dividing by the number of prices, then taking the reciprocal of the
average:
4/[(1/62.00) + (1/76.00) + (1/84.00) + (1/90.00)] = €76.48.
27. B is correct. The geometric mean compounds the periodic returns of every
period, giving the investor a more accurate measure of the terminal value of an
investment.
© CFA Institute. For candidate use only. Not for distribution.
168 Learning Module 2 Organizing, Visualizing, and Describing Data
28. B is correct. The sum of the returns is 30.0%, so the arithmetic mean is 30.0%/10
= 3.0%.
29. B is correct.
1 4.5% 1.045
2 6.0% 1.060
3 1.5% 1.015
4 −2.0% 0.980
5 0.0% 1.000
6 4.5% 1.045
7 3.5% 1.035
8 2.5% 1.025
9 5.5% 1.055
10 4.0% 1.040
30. A is correct.
31. B is correct.
The standard deviation is the square root of the sum of the squared deviations
divided by n − 1:
_
√ 0.005750
s = _
9
= 2.5276%.
32. B is correct.
Deviation Squared
Year Return below Target of 2%
1 4.5%
2 6.0%
3 1.5% 0.000025
4 −2.0% 0.001600
5 0.0% 0.000400
6 4.5%
7 3.5%
8 2.5%
9 5.5%
10 4.0%
Sum 0.002025
The target semi-deviation is the square root of the sum of the squared deviations
from the target,
_ divided by n − 1:
√0.002025
_
sTarget = 9
= 1.5%.
33. B is correct. The median is indicated within the box, which is the 100.49 in this
diagram.
34. C is correct. The interquartile range is the difference between 114.25 and 79.74,
which is 34.51.
35. B is correct. Quintiles divide a distribution into fifths, with the fourth quintile
occurring at the point at which 80% of the observations lie below it. The fourth
quintile is equivalent to the 80th percentile. To find the yth percentile (P y),
we first must determine its location. The formula for the location (Ly) of a yth
percentile in an array with n entries sorted in ascending order is Ly = (n + 1) ×
(y/100). In this case, n = 10 and y = 80%, so
L80 = (10 + 1) × (80/100) = 11 × 0.8 = 8.8.
With the data arranged in ascending order (−40.33%, −5.02%, 9.57%, 10.02%,
12.34%, 15.25%, 16.54%, 20.65%, 27.37%, and 30.79%), the 8.8th position would
be between the 8th and 9th entries, 20.65% and 27.37%, respectively. Using linear
© CFA Institute. For candidate use only. Not for distribution.
170 Learning Module 2 Organizing, Visualizing, and Describing Data
= 26.03%.
Column 1 Column 2
_
Year Return | Xi − X |
37. C is correct. The mean absolute deviation (MAD) of Fund ABC’s returns is great-
er than the MAD of both of the other funds.
n _
∑ |Xi − X | _
MAD = _
i=1
n , where X is the arithmetic mean of the series.
MAD for Fund ABC =
| − 20 − ( − 4) | + | 23 − ( − 4) | + | − 14 − ( − 4) | + | 5 − ( − 4) | + | − 14 − ( − 4) |
_________________________________________________________
5
= 14.4%.
MAD for Fund XYZ =
| − 33 − ( − 10.8) | + | − 12 − ( − 10.8) | + | − 12 − ( − 10.8) | + | − 8 − ( − 10.8) | + | 11 − ( − 10.8) |
______________________________________________________________________
5
= 9.8%.
MAD for Fund PQR =
| − 14 − ( − 5) | + | − 18 − ( − 5) | + | 6 − ( − 5) | + | − 2 − ( − 5) | + | 3 − ( − 5) |
________________________________________________________
5 = 8.8%.
A and B are incorrect because the range and variance of the three funds are as
follows:
© CFA Institute. For candidate use only. Not for distribution.
Solutions 171
The numbers shown for variance are understood to be in “percent squared” terms
so that when taking the square root, the result is standard deviation in percentage
terms. Alternatively, by expressing standard deviation and variance in decimal
form, one can avoid the issue of units. In decimal form, the variances for Fund
ABC, Fund XYZ, and Fund PQR are 0.0317, 0.0243, and 0.0110, respectively.
38. A is correct. The more disperse a distribution, the greater the difference between
the arithmetic mean and the geometric mean.
39. B is correct. The coefficient of variation (CV) is the ratio of the standard devia-
tion to the mean, where a higher CV implies greater risk per unit of return.
s 1.23%
CV UTIL = _ _ = _
X
2.10%
= 0.59.
s 1.35%
CV MATR = _ _ = _
X
1.25%
= 1.08.
s 1.52%
CV INDU = _ _ = _
X
3.01%
= 0.51.
40. B is correct. The coefficient _of variation is the ratio of the standard deviation to
the arithmetic average, or √ 0.001723 / 0.09986= 0.416.
42. C is correct. The excess kurtosis is positive, indicating that the distribution is
“fat-tailed”; therefore, there is more probability in the tails of the distribution
relative to the normal distribution.
44. B is correct. The correlation coefficient is positive, indicating that the two series
move together.
45. C is correct. Both outliers and spurious correlation are potential problems with
interpreting correlation coefficients.
46. C is correct. The correlation coefficient is positive because the covariation is posi-
tive.
47. A is correct. The correlation coefficient is negative because the covariation is neg-
ative.
48. C is correct. The correlation coefficient is positive because the covariance is pos-
itive. The fact that one or both variables have a negative mean does not affect the
sign of the correlation coefficient.