0% found this document useful (0 votes)
829 views21 pages

Curriculum Module 2 Questions

1. This document provides 19 multiple choice practice problems relating to organizing, visualizing, and describing data. The problems cover topics such as data types (nominal, ordinal, discrete, continuous), frequency distributions, contingency tables, and different visualization tools (frequency polygons, bar charts, tree-maps, heat maps). 2. The problems ask students to identify examples of different data types and scales, calculate frequencies and relative frequencies from tables and charts, determine what type of data various visualizations are best suited for, and choose the best visualization for different types of data. 3. The document provides context, tables, and charts to help students practice applying concepts relating to data analytics.

Uploaded by

Emin Salmanov
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
829 views21 pages

Curriculum Module 2 Questions

1. This document provides 19 multiple choice practice problems relating to organizing, visualizing, and describing data. The problems cover topics such as data types (nominal, ordinal, discrete, continuous), frequency distributions, contingency tables, and different visualization tools (frequency polygons, bar charts, tree-maps, heat maps). 2. The problems ask students to identify examples of different data types and scales, calculate frequencies and relative frequencies from tables and charts, determine what type of data various visualizations are best suited for, and choose the best visualization for different types of data. 3. The document provides context, tables, and charts to help students practice applying concepts relating to data analytics.

Uploaded by

Emin Salmanov
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

© CFA Institute. For candidate use only. Not for distribution.

Practice Problems 151

PRACTICE PROBLEMS
1. Published ratings on stocks ranging from 1 (strong sell) to 5 (strong buy) are
examples of which measurement scale?
A. Ordinal

B. Continuous

C. Nominal

2. Data values that are categorical and not amenable to being organized in a logical
order are most likely to be characterized as:
A. ordinal data.

B. discrete data.

C. nominal data.

3. Which of the following data types would be classified as being categorical?


A. Discrete

B. Nominal

C. Continuous

4. A fixed-income analyst uses a proprietary model to estimate bankruptcy proba-


bilities for a group of firms. The model generates probabilities that can take any
value between 0 and 1. The resulting set of estimated probabilities would most
likely be characterized as:
A. ordinal data.

B. discrete data.

C. continuous data.

5. An analyst uses a software program to analyze unstructured data—specifically,


management’s earnings call transcript for one of the companies in her research
coverage. The program scans the words in each sentence of the transcript and
then classifies the sentences as having negative, neutral, or positive sentiment.
The resulting set of sentiment data would most likely be characterized as:
A. ordinal data.

B. discrete data.

C. nominal data.

The following information relates to questions


6-7
An equity analyst gathers total returns for three country equity indexes over the
past four years. The data are presented below.
© CFA Institute. For candidate use only. Not for distribution.
152 Learning Module 2 Organizing, Visualizing, and Describing Data

Time Period Index A Index B Index C

Year t–3 15.56% 11.84% −4.34%


Year t–2 −4.12% −6.96% 9.32%
Year t–1 11.19% 10.29% −12.72%
Year t 8.98% 6.32% 21.44%

6. Each individual column of data in the table can be best characterized as:
A. panel data.

B. time-series data.

C. cross-sectional data.

7. Each individual row of data in the table can be best characterized as:
A. panel data.

B. time-series data.

C. cross-sectional data.

8. A two-dimensional rectangular array would be most suitable for organizing a


collection of raw:
A. panel data.

B. time-series data.

C. cross-sectional data.

9. In a frequency distribution, the absolute frequency measure:


A. represents the percentages of each unique value of the variable.

B. represents the actual number of observations counted for each unique value
of the variable.

C. allows for comparisons between datasets with different numbers of total


observations.

10. An investment fund has the return frequency distribution shown in the following
exhibit.

Return Interval (%) Absolute Frequency

−10.0 to −7.0 3
−7.0 to −4.0 7
−4.0 to −1.0 10
−1.0 to +2.0 12
+2.0 to +5.0 23
+5.0 to +8.0 5

Which of the following statements is correct?


© CFA Institute. For candidate use only. Not for distribution.
Practice Problems 153

A. The relative frequency of the bin “−1.0 to +2.0” is 20%.

B. The relative frequency of the bin “+2.0 to +5.0” is 23%.

C. The cumulative relative frequency of the bin “+5.0 to +8.0” is 91.7%.

11. An analyst is using the data in the following exhibit to prepare a statistical report.

Portfolio’s Deviations from Benchmark Return for a 12-Year Period (%)

Year 1 2.48   Year 7 −9.19


Year 2 −2.59   Year 8 −5.11
Year 3 9.47   Year 9 1.33
Year 4 −0.55   Year 10 6.84
Year 5 −1.69   Year 11 3.04
Year 6 −0.89   Year 12 4.72

The cumulative relative frequency for the bin −1.71% ≤ x < 2.03% is closest to:
A. 0.250.

B. 0.333.

C. 0.583.

The following information relates to questions


12-13
A fixed-income portfolio manager creates a contingency table of the number of
bonds held in her portfolio by sector and bond rating. The contingency table is
presented here:

  Bond Rating

Sector A AA AAA

Communication Services 25 32 27
Consumer Staples 30 25 25
Energy 100 85 30
Health Care 200 100 63
Utilities 22 28 14

12. The marginal frequency of energy sector bonds is closest to:


A. 27.

B. 85.

C. 215.

13. The relative frequency of AA rated energy bonds, based on the total count, is
© CFA Institute. For candidate use only. Not for distribution.
154 Learning Module 2 Organizing, Visualizing, and Describing Data

closest to:
A. 10.5%.

B. 31.5%.

C. 39.5%.

14. The following is a frequency polygon of monthly exchange rate changes in the US
dollar/Japanese yen spot exchange rate for a four-year period. A positive change
represents yen appreciation (the yen buys more dollars), and a negative change
represents yen depreciation (the yen buys fewer dollars).

Exhibit 1: Monthly Changes in the US Dollar/Japanese Yen Spot Exchange


Rate
Frequency
20

15

10

0
–5 –3 –1 1 3
Return Interval Midpoint (%)

Based on the chart, yen appreciation:


A. occurred more than 50% of the time.

B. was less frequent than yen depreciation.

C. in the 0.0 to 2.0 interval occurred 20% of the time.

15. A bar chart that orders categories by frequency in descending order and includes
a line displaying cumulative relative frequency is referred to as a:
A. Pareto Chart.

B. grouped bar chart.

C. frequency polygon.

16. Which visualization tool works best to represent unstructured, textual data?
A. Tree-Map

B. Scatter plot
© CFA Institute. For candidate use only. Not for distribution.
Practice Problems 155

C. Word cloud

17. A tree-map is best suited to illustrate:


A. underlying trends over time.

B. joint variations in two variables.

C. value differences of categorical groups.

18. A line chart with two variables—for example, revenues and earnings per share—
is best suited for visualizing:
A. the joint variation in the variables.

B. underlying trends in the variables over time.

C. the degree of correlation between the variables.

19. A heat map is best suited for visualizing the:


A. frequency of textual data.

B. degree of correlation between different variables.

C. shape, center, and spread of the distribution of numerical data.

20. Which valuation tool is recommended to be used if the goal is to make compari-
sons of three or more variables over time?
A. Heat map

B. Bubble line chart

C. Scatter plot matrix

The following information relates to questions


21-22
The following histogram shows a distribution of the S&P 500 Index annual re-
turns for a 50-year period:
© CFA Institute. For candidate use only. Not for distribution.
156 Learning Module 2 Organizing, Visualizing, and Describing Data

Frequency
8

0
–37 –32 –27 –22 –17 –12 –7 –2 3 8 13 18 23 28 33
to to to to to to to to to to to to to to to
–32 –27 –22 –17 –12 –7 –2 3 8 13 18 23 28 33 38
Return Intervals (%)
 

21. The bin containing the median return is:


A. 3% to 8%.

B. 8% to 13%.

C. 13% to 18%.

22. Based on the previous histogram, the distribution is best described as being:
A. unimodal.

B. bimodal.

C. trimodal.

23. The annual returns for three portfolios are shown in the following exhibit. Portfo-
lios P and R were created in Year 1, Portfolio Q in Year 2.

  Annual Portfolio Returns (%)

  Year 1 Year 2 Year 3 Year 4 Year 5

Portfolio P −3.0 4.0 5.0 3.0 7.0


Portfolio Q −3.0 6.0 4.0 8.0
Portfolio R 1.0 −1.0 4.0 4.0 3.0

The median annual return from portfolio creation to Year 5 for:


A. Portfolio P is 4.5%.

B. Portfolio Q is 4.0%.
© CFA Institute. For candidate use only. Not for distribution.
Practice Problems 157

C. Portfolio R is higher than its arithmetic mean annual return.

24. At the beginning of Year X, an investor allocated his retirement savings in the
asset classes shown in the following exhibit and earned a return for Year X as also
shown.

Asset Allocation
Asset Class (%) Asset Class Return for Year X (%)

Large-cap US equities 20.0 8.0


Small-cap US equities 40.0 12.0
Emerging market equities 25.0 −3.0
High-yield bonds 15.0 4.0

The portfolio return for Year X is closest to:


A. 5.1%.

B. 5.3%.

C. 6.3%.

25. The following exhibit shows the annual returns for Fund Y.

  Fund Y (%)

Year 1 19.5
Year 2 −1.9
Year 3 19.7
Year 4 35.0
Year 5 5.7

The geometric mean return for Fund Y is closest to:


A. 14.9%.

B. 15.6%.

C. 19.5%.

26. A portfolio manager invests €5,000 annually in a security for four years at the
prices shown in the following exhibit.

  Purchase Price of Security (€ per unit)

Year 1 62.00
Year 2 76.00
Year 3 84.00
Year 4 90.00

The average price is best represented as the:


A. harmonic mean of €76.48.

B. geometric mean of €77.26.

C. arithmetic average of €78.00.


© CFA Institute. For candidate use only. Not for distribution.
158 Learning Module 2 Organizing, Visualizing, and Describing Data

27. When analyzing investment returns, which of the following statements is cor-
rect?
A. The geometric mean will exceed the arithmetic mean for a series with
non-zero variance.

B. The geometric mean measures an investment’s compound rate of growth


over multiple periods.

C. The arithmetic mean measures an investment’s terminal value over multiple


periods.

The following information relates to questions


28-32
A fund had the following experience over the past 10 years:

Year Return

1 4.5%
2 6.0%
3 1.5%
4 −2.0%
5 0.0%
6 4.5%
7 3.5%
8 2.5%
9 5.5%
10 4.0%

28. The arithmetic mean return over the 10 years is closest to:
A. 2.97%.

B. 3.00%.

C. 3.33%.

29. The geometric mean return over the 10 years is closest to:
A. 2.94%.

B. 2.97%.

C. 3.00%.

30. The harmonic mean return over the 10 years is closest to:
A. 2.94%.

B. 2.97%.

C. 3.00%.
© CFA Institute. For candidate use only. Not for distribution.
Practice Problems 159

31. The standard deviation of the 10 years of returns is closest to:


A. 2.40%.

B. 2.53%.

C. 7.58%.

32. The target semideviation of the returns over the 10 years if the target is 2% is
closest to:
A. 1.42%.

B. 1.50%.

C. 2.01%.

The following information relates to questions


33-34
180

160
154.45

140

120
114.25
100 100.49

80 79.74

60
51.51
40

33. The median is closest to:


A. 34.51.

B. 100.49.

C. 102.98.

34. The interquartile range is closest to:


A. 13.76.

B. 25.74.

C. 34.51.
© CFA Institute. For candidate use only. Not for distribution.
160 Learning Module 2 Organizing, Visualizing, and Describing Data

The following information relates to questions


35-36
The following exhibit shows the annual MSCI World Index total returns for a
10-year period.

Year 1 15.25%   Year 6 30.79%


Year 2 10.02%   Year 7 12.34%
Year 3 20.65%   Year 8 −5.02%
Year 4 9.57%   Year 9 16.54%
Year 5 −40.33%   Year 10 27.37%

35. The fourth quintile return for the MSCI World Index is closest to:
A. 20.65%.

B. 26.03%.

C. 27.37%.

36. For Year 6–Year 10, the mean absolute deviation of the MSCI World Index total
returns is closest to:
A. 10.20%.

B. 12.74%.

C. 16.40%.

37. Annual returns and summary statistics for three funds are listed in the following
exhibit:

  Annual Returns (%)

Year Fund ABC Fund XYZ Fund PQR

Year 1 −20.0 −33.0 −14.0


Year 2 23.0 −12.0 −18.0
Year 3 −14.0 −12.0 6.0
Year 4 5.0 −8.0 −2.0
Year 5 −14.0 11.0 3.0
       
Mean −4.0 −10.8 −5.0
Standard deviation 17.8 15.6 10.5

The fund with the highest absolute dispersion is:


A. Fund PQR if the measure of dispersion is the range.

B. Fund XYZ if the measure of dispersion is the variance.

C. Fund ABC if the measure of dispersion is the mean absolute deviation.

38. The average return for Portfolio A over the past twelve months is 3%, with a stan-
© CFA Institute. For candidate use only. Not for distribution.
Practice Problems 161

dard deviation of 4%. The average return for Portfolio B over this same period
is also 3%, but with a standard deviation of 6%. The geometric mean return of
Portfolio A is 2.85%. The geometric mean return of Portfolio B is:
A. less than 2.85%.

B. equal to 2.85%.

C. greater than 2.85%.

39. The mean monthly return and the standard deviation for three industry sectors
are shown in the following exhibit.

Standard Deviation of Return


Sector Mean Monthly Return (%) (%)

Utilities (UTIL) 2.10 1.23


Materials (MATR) 1.25 1.35
Industrials (INDU) 3.01 1.52

Based on the coefficient of variation, the riskiest sector is:


A. utilities.

B. materials.

C. industrials.

The following information relates to questions


40-42
An analyst examined a cross-section of annual returns for 252 stocks and calcu-
lated the following statistics:

Arithmetic Average 9.986%


Geometric Mean 9.909%
Variance 0.001723
Skewness 0.704
Excess Kurtosis 0.503

40. The coefficient of variation is closest to:


A. 0.02.

B. 0.42.

C. 2.41.

41. This distribution is best described as:


A. negatively skewed.

B. having no skewness.

C. positively skewed.
© CFA Institute. For candidate use only. Not for distribution.
162 Learning Module 2 Organizing, Visualizing, and Describing Data

42. Compared to the normal distribution, this sample’s distribution is best described
as having tails of the distribution with:
A. less probability than the normal distribution.

B. the same probability as the normal distribution.

C. more probability than the normal distribution.

43. An analyst calculated the excess kurtosis of a stock’s returns as −0.75. From this
information, we conclude that the distribution of returns is:
A. normally distributed.

B. thin-tailed compared to the normal distribution.

C. fat-tailed compared to the normal distribution.

44. A correlation of 0.34 between two variables, X and Y, is best described as:
A. changes in X causing changes in Y.

B. a positive association between X and Y.

C. a curvilinear relationship between X and Y.

45. Which of the following is a potential problem with interpreting a correlation


coefficient?
A. Outliers

B. Spurious correlation

C. Both outliers and spurious correlation

The following information relates to questions


46-47
An analyst is evaluating the tendency of returns on the portfolio of stocks she
manages to move along with bond and real estate indexes. She gathered monthly
data on returns and the indexes:

  Returns (%)

  Bond Index Real Estate Index


Portfolio Returns Returns Returns

Arithmetic average 5.5 3.2 7.8


Standard deviation 8.2 3.4 10.3

  Portfolio Returns and Portfolio Returns and Real


Bond Index Returns Estate Index Returns

Covariance 18.9 −55.9

     

 
© CFA Institute. For candidate use only. Not for distribution.
Practice Problems 163

46. Without calculating the correlation coefficient, the correlation of the portfolio
returns and the bond index returns is:
A. negative.

B. zero.

C. positive.

47. Without calculating the correlation coefficient, the correlation of the portfolio
returns and the real estate index returns is:
A. negative.

B. zero.

C. positive.

48. Consider two variables, A and B. If variable A has a mean of −0.56, variable B
has a mean of 0.23, and the covariance between the two variables is positive, the
correlation between these two variables is:
A. negative.

B. zero.

C. positive.
© CFA Institute. For candidate use only. Not for distribution.
164 Learning Module 2 Organizing, Visualizing, and Describing Data

SOLUTIONS
1. A is correct. Ordinal scales sort data into categories that are ordered with respect
to some characteristic and may involve numbers to identify categories but do not
assure that the differences between scale values are equal. The buy rating scale
indicates that a stock ranked 5 is expected to perform better than a stock ranked
4, but it tells us nothing about the performance difference between stocks ranked
4 and 5 compared with the performance difference between stocks ranked 1 and
2, and so on.

2. C is correct. Nominal data are categorical values that are not amenable to being
organized in a logical order. A is incorrect because ordinal data are categorical
data that can be logically ordered or ranked. B is incorrect because discrete data
are numerical values that result from a counting process; thus, they can be or-
dered in various ways, such as from highest to lowest value.

3. B is correct. Categorical data (or qualitative data) are values that describe a quali-
ty or characteristic of a group of observations and therefore can be used as labels
to divide a dataset into groups to summarize and visualize. The two types of
categorical data are nominal data and ordinal data. Nominal data are categorical
values that are not amenable to being organized in a logical order, while ordinal
data are categorical values that can be logically ordered or ranked. A is incorrect
because discrete data would be classified as numerical data (not categorical data).
C is incorrect because continuous data would be classified as numerical data (not
categorical data).

4. C is correct. Continuous data are data that can be measured and can take on
any numerical value in a specified range of values. In this case, the analyst is
estimating bankruptcy probabilities, which can take on any value between 0 and
1. Therefore, the set of bankruptcy probabilities estimated by the analyst would
likely be characterized as continuous data. A is incorrect because ordinal data
are categorical values that can be logically ordered or ranked. Therefore, the
set of bankruptcy probabilities would not be characterized as ordinal data. B is
incorrect because discrete data are numerical values that result from a counting
process, and therefore the data are limited to a finite number of values. The pro-
prietary model used can generate probabilities that can take any value between 0
and 1; therefore, the set of bankruptcy probabilities would not be characterized
as discrete data.

5. A is correct. Ordinal data are categorical values that can be logically ordered or
ranked. In this case, the classification of sentences in the earnings call transcript
into three categories (negative, neutral, or positive) describes ordinal data, as the
data can be logically ordered from positive to negative. B is incorrect because
discrete data are numerical values that result from a counting process. In this
case, the analyst is categorizing sentences (i.e., unstructured data) from the earn-
ings call transcript as having negative, neutral, or positive sentiment. Thus, these
categorical data do not represent discrete data. C is incorrect because nominal
data are categorical values that are not amenable to being organized in a logical
order. In this case, the classification of unstructured data (i.e., sentences from
the earnings call transcript) into three categories (negative, neutral, or positive)
describes ordinal (not nominal) data, as the data can be logically ordered from
positive to negative.

6. B is correct. Time-series data are a sequence of observations of a specific variable


collected over time and at discrete and typically equally spaced intervals of time,
© CFA Institute. For candidate use only. Not for distribution.
Solutions 165

such as daily, weekly, monthly, annually, and quarterly. In this case, each column
is a time series of data that represents annual total return (the specific variable)
for a given country index, and it is measured annually (the discrete interval of
time). A is incorrect because panel data consist of observations through time
on one or more variables for multiple observational units. The entire table of
data is an example of panel data showing annual total returns (the variable) for
three country indexes (the observational units) by year. C is incorrect because
cross-sectional data are a list of the observations of a specific variable from multi-
ple observational units at a given point in time. Each row (not column) of data in
the table represents cross-sectional data.

7. C is correct. Cross-sectional data are observations of a specific variable from


multiple observational units at a given point in time. Each row of data in the table
represents cross-sectional data. The specific variable is annual total return, the
multiple observational units are the three countries’ indexes, and the given point
in time is the time period indicated by the particular row. A is incorrect because
panel data consist of observations through time on one or more variables for
multiple observational units. The entire table of data is an example of panel data
showing annual total returns (the variable) for three country indexes (the obser-
vational units) by year. B is incorrect because time-series data are a sequence of
observations of a specific variable collected over time and at discrete and typical-
ly equally spaced intervals of time, such as daily, weekly, monthly, annually, and
quarterly. In this case, each column (not row) is a time series of data that rep-
resents annual total return (the specific variable) for a given country index, and it
is measured annually (the discrete interval of time).

8. A is correct. Panel data consist of observations through time on one or more


variables for multiple observational units. A two-dimensional rectangular array,
or data table, would be suitable here as it is comprised of columns to hold the
variable(s) for the observational units and rows to hold the observations through
time. B is incorrect because a one-dimensional (not a two-dimensional rect-
angular) array would be most suitable for organizing a collection of data of the
same data type, such as the time-series data from a single variable. C is incorrect
because a one-dimensional (not a two-dimensional rectangular) array would
be most suitable for organizing a collection of data of the same data type, such
as the same variable for multiple observational units at a given point in time
(cross-sectional data).

9. B is correct. In a frequency distribution, the absolute frequency, or simply the


raw frequency, is the actual number of observations counted for each unique
value of the variable. A is incorrect because the relative frequency, which is cal-
culated as the absolute frequency of each unique value of the variable divided by
the total number of observations, presents the absolute frequencies in terms of
percentages. C is incorrect because the relative (not absolute) frequency provides
a normalized measure of the distribution of the data, allowing comparisons be-
tween datasets with different numbers of total observations.

10. A is correct. The relative frequency is the absolute frequency of each bin divided
by the total number of observations. Here, the relative frequency is calculated as:
(12/60) × 100 = 20%. B is incorrect because the relative frequency of this bin is
(23/60) × 100 = 38.33%. C is incorrect because the cumulative relative frequency
of the last bin must equal 100%.

11. C is correct. The cumulative relative frequency of a bin identifies the fraction of
observations that are less than the upper limit of the given bin. It is determined
by summing the relative frequencies from the lowest bin up to and including the
given bin. The following exhibit shows the relative frequencies for all the bins of
© CFA Institute. For candidate use only. Not for distribution.
166 Learning Module 2 Organizing, Visualizing, and Describing Data

the data from the previous exhibit:

Lower Limit Upper Limit Absolute Relative Cumulative Relative


(%) (%) Frequency Frequency Frequency

−9.19 ≤ < −5.45 1 0.083 0.083


−5.45 ≤ < −1.71 2 0.167 0.250
−1.71 ≤ < 2.03 4 0.333 0.583
2.03 ≤ < 5.77 3 0.250 0.833
5.77 ≤ ≤ 9.47 2 0.167 1.000

The bin −1.71% ≤ x < 2.03% has a cumulative relative frequency of 0.583.

12. C is correct. The marginal frequency of energy sector bonds in the portfolio is
the sum of the joint frequencies across all three levels of bond rating, so 100
+ 85 + 30 = 215. A is incorrect because 27 is the relative frequency for energy
sector bonds based on the total count of 806 bonds, so 215/806 = 26.7%, not the
marginal frequency. B is incorrect because 85 is the joint frequency for AA rated
energy sector bonds, not the marginal frequency.

13. A is correct. The relative frequency for any value in the table based on the total
count is calculated by dividing that value by the total count. Therefore, the rela-
tive frequency for AA rated energy bonds is calculated as 85/806 = 10.5%.
B is incorrect because 31.5% is the relative frequency for AA rated energy bonds,
calculated based on the marginal frequency for all AA rated bonds, so 85/(32 +
25 + 85 + 100 + 28), not based on total bond counts. C is incorrect because 39.5%
is the relative frequency for AA rated energy bonds, calculated based on the
marginal frequency for all energy bonds, so 85/(100 + 85 + 30), not based on total
bond counts.

14. A is correct. Twenty observations lie in the interval “0.0 to 2.0,” and six observa-
tions lie in the “2.0 to 4.0” interval. Together, they represent 26/48, or 54.17%, of
all observations, which is more than 50%.

15. A is correct. A bar chart that orders categories by frequency in descending order
and includes a line displaying cumulative relative frequency is called a Pareto
Chart. A Pareto Chart is used to highlight dominant categories or the most im-
portant groups. B is incorrect because a grouped bar chart or clustered bar chart
is used to present the frequency distribution of two categorical variables. C is
incorrect because a frequency polygon is used to display frequency distributions.

16. C is correct. A word cloud, or tag cloud, is a visual device for representing
unstructured, textual data. It consists of words extracted from text with the size
of each word being proportional to the frequency with which it appears in the
given text. A is incorrect because a tree-map is a graphical tool for displaying
and comparing categorical data, not for visualizing unstructured, textual data. B
is incorrect because a scatter plot is used to visualize the joint variation in two
numerical variables, not for visualizing unstructured, textual data.

17. C is correct. A tree-map is a graphical tool used to display and compare categor-
ical data. It consists of a set of colored rectangles to represent distinct groups,
and the area of each rectangle is proportional to the value of the corresponding
group. A is incorrect because a line chart, not a tree-map, is used to display the
change in a data series over time. B is incorrect because a scatter plot, not a
tree-map, is used to visualize the joint variation in two numerical variables.

18. B is correct. An important benefit of a line chart is that it facilitates showing


© CFA Institute. For candidate use only. Not for distribution.
Solutions 167

changes in the data and underlying trends in a clear and concise way. Often a line
chart is used to display the changes in data series over time. A is incorrect be-
cause a scatter plot, not a line chart, is used to visualize the joint variation in two
numerical variables. C is incorrect because a heat map, not a line chart, is used to
visualize the values of joint frequencies among categorical variables.

19. B is correct. A heat map is commonly used for visualizing the degree of cor-
relation between different variables. A is incorrect because a word cloud, or tag
cloud, not a heat map, is a visual device for representing textual data with the size
of each distinct word being proportional to the frequency with which it appears
in the given text. C is incorrect because a histogram, not a heat map, depicts the
shape, center, and spread of the distribution of numerical data.

20. B is correct. A bubble line chart is a version of a line chart where data points
are replaced with varying-sized bubbles to represent a third dimension of the
data. A line chart is very effective at visualizing trends in three or more variables
over time. A is incorrect because a heat map differentiates high values from low
values and reflects the correlation between variables but does not help in making
comparisons of variables over time. C is incorrect because a scatterplot matrix is
a useful tool for organizing scatterplots between pairs of variables, making it easy
to inspect all pairwise relationships in one combined visual. However, it does not
help in making comparisons of these variables over time.

21. C is correct. Because 50 data points are in the histogram, the median return
would be the mean of the 50/2 = 25th and (50 + 2)/2 = 26th positions. The sum of
the return bin frequencies to the left of the 13% to 18% interval is 24. As a result,
the 25th and 26th returns will fall in the 13% to 18% interval.

22. C is correct. The mode of a distribution with data grouped in intervals is the
interval with the highest frequency. The three intervals of 3% to 8%, 18% to 23%,
and 28% to 33% all have a high frequency of 7.

23. C is correct. The median of Portfolio R is 0.8% higher than the mean for Portfolio
R.

24. C is correct. The portfolio return must be calculated as the weighted mean re-
turn, where the weights are the allocations in each asset class:
(0.20 × 8%) + (0.40 × 12%) + (0.25 × −3%) + (0.15 × 4%) = 6.25%, or ≈ 6.3%.

25. A is correct. The geometric mean return for Fund Y is found as follows:

Fund Y = [(1 + 0.195) × (1 − 0.019) × (1 + 0.197) × (1 + 0.350) × (1 + 0.057)]


(1/5) − 1

= 14.9%.

26. A is correct. The harmonic mean is appropriate for determining the average price
per unit. It is calculated by summing the reciprocals of the prices, then averaging
that sum by dividing by the number of prices, then taking the reciprocal of the
average:
4/[(1/62.00) + (1/76.00) + (1/84.00) + (1/90.00)] = €76.48.

27. B is correct. The geometric mean compounds the periodic returns of every
period, giving the investor a more accurate measure of the terminal value of an
investment.
© CFA Institute. For candidate use only. Not for distribution.
168 Learning Module 2 Organizing, Visualizing, and Describing Data

28. B is correct. The sum of the returns is 30.0%, so the arithmetic mean is 30.0%/10
= 3.0%.

29. B is correct.

Year Return 1+ Return

1 4.5% 1.045
2 6.0% 1.060
3 1.5% 1.015
4 −2.0% 0.980
5 0.0% 1.000
6 4.5% 1.045
7 3.5% 1.035
8 2.5% 1.025
9 5.5% 1.055
10 4.0% 1.040

The product of the 1_ + Return is 1.3402338.


_ 10
Therefore, X ​
​  ​ G​  =  ​ √   1.3402338  
​ − 1​= 2.9717%.

30. A is correct.

Year Return 1+ Return 1/(1+Return)

1 4.5% 1.045 0.957


2 6.0% 1.060 0.943
3 1.5% 1.015 0.985
4 −2.0% 0.980 1.020
5 0.0% 1.000 1.000
6 4.5% 1.045 0.957
7 3.5% 1.035 0.966
8 2.5% 1.025 0.976
9 5.5% 1.055 0.948
10 4.0% 1.040 0.962
Sum     9.714

The harmonic mean return = (n/Sum of reciprocals) − 1 = (10 / 9.714) − 1.

The harmonic mean return = 2.9442%.

31. B is correct.

Year Return Deviation Deviation Squared

1 4.5% 0.0150 0.000225


2 6.0% 0.0300 0.000900
3 1.5% −0.0150 0.000225
4 −2.0% −0.0500 0.002500
5 0.0% −0.0300 0.000900
6 4.5% 0.0150 0.000225
© CFA Institute. For candidate use only. Not for distribution.
Solutions 169

Year Return Deviation Deviation Squared


7 3.5% 0.0050 0.000025
8 2.5% −0.0050 0.000025
9 5.5% 0.0250 0.000625
10 4.0% 0.0100 0.000100
Sum   0.0000 0.005750

The standard deviation is the square root of the sum of the squared deviations
divided by n − 1:
_

√ 0.005750
​s  =  ​  _
​ 9     ​ 
 = 2.5276%.

32. B is correct.

Deviation Squared
Year Return below Target of 2%

1 4.5%  
2 6.0%  
3 1.5% 0.000025
4 −2.0% 0.001600
5 0.0% 0.000400
6 4.5%  
7 3.5%  
8 2.5%  
9 5.5%  
10 4.0%  
Sum   0.002025

The target semi-deviation is the square root of the sum of the squared deviations
from the target,
_ divided by n − 1:
√0.002025
_
sTarget = ​  ​ 9    ​   
= 1.5%.

33. B is correct. The median is indicated within the box, which is the 100.49 in this
diagram.

34. C is correct. The interquartile range is the difference between 114.25 and 79.74,
which is 34.51.

35. B is correct. Quintiles divide a distribution into fifths, with the fourth quintile
occurring at the point at which 80% of the observations lie below it. The fourth
quintile is equivalent to the 80th percentile. To find the yth percentile (P y),
we first must determine its location. The formula for the location (Ly) of a yth
percentile in an array with n entries sorted in ascending order is Ly = (n + 1) ×
(y/100). In this case, n = 10 and y = 80%, so
L80 = (10 + 1) × (80/100) = 11 × 0.8 = 8.8.
With the data arranged in ascending order (−40.33%, −5.02%, 9.57%, 10.02%,
12.34%, 15.25%, 16.54%, 20.65%, 27.37%, and 30.79%), the 8.8th position would
be between the 8th and 9th entries, 20.65% and 27.37%, respectively. Using linear
© CFA Institute. For candidate use only. Not for distribution.
170 Learning Module 2 Organizing, Visualizing, and Describing Data

interpolation, P80 = X8 + (Ly − 8) × (X9 − X8),

P80 = 20.65 + (8.8 − 8) × (27.37 − 20.65)

= 20.65 + (0.8 × 6.72) = 20.65 + 5.38

= 26.03%.

36. A is correct. The formula for mean absolute deviation (MAD) is


n _
 ​∑ ​ |​​Xi​ ​ − ​ X ​​ |​
​ AD  =  _
i=1
M ​  n     
.​​
_
Column 1: Sum annual returns and divide by n to find the arithmetic mean ​ ​(X ​
​  ​ )​ ​​
of 16.40%.
Column 2: Calculate the absolute value of the difference between each year’s
return and the mean from Column 1. Sum the results and divide by n to find the
MAD.
These calculations are shown in the following exhibit:

  Column 1     Column 2
_
Year Return     ​ |​ ​Xi​ ​  − ​ X ​| ​ ​ ​

Year 6 30.79%     14.39%


Year 7 12.34%     4.06%
Year 8 −5.02%     21.42%
Year 9 16.54%     0.14%
Year 10 27.37%     10.97%
         
Sum: 82.02%   Sum: 50.98%
n: 5   n: 5
_
​ X ​​ : 16.40%   MAD: 10.20%

37. C is correct. The mean absolute deviation (MAD) of Fund ABC’s returns is great-
er than the MAD of both of the other funds.
n _
 ​∑ ​ |​​Xi​ ​ − ​ X ​​| ​ _
​MAD  =  _
i=1
​  n    ,​ where ​ X ​ ​ is the arithmetic mean of the series.

MAD for Fund ABC =
​ |​​  − 20 − ​ (​ ​− 4​)​ ​ |​​ ​ + ​ |​​ 23 − ​ (​ ​− 4​)​ ​ |​​ ​ + ​ |​​  − 14 − ​ (​ ​− 4​)​ ​ ​|​ ​ + ​ |​​ 5 − ​ (​ ​− 4​)​ ​ |​​ ​ + ​ |​​  − 14 − ​ (​ ​− 4​)​ ​ |​​ ​
_________________________________________________________
      
​​  5   
  ​  =  14.4%​.
MAD for Fund XYZ =
​ |​​  − 33 − ​ (​ ​− 10.8​)​ ​ |​​ ​ + ​ |​​  − 12 − ​ (​ ​− 10.8​)​ ​ |​​ ​ + ​ |​​  − 12 − ​ (​ ​− 10.8​)​ ​ |​​ ​ + ​ |​​  − 8 − ​ (​ ​− 10.8​)​ ​ ​|​ ​ + ​ |​​ 11 − ​ (​ ​− 10.8​)​ | ​
______________________________________________________________________
       
​​      
5  ​ 
=  9.8%​.
MAD for Fund PQR =
 ​|​​  − 14 − ​ (​ ​− 5​)​ |​​ ​  + ​ |​​  − 18 − ​ (​ ​− 5​)​ ​ ​|​ ​ + ​ |​​ 6 − ​ (​ ​− 5​)​ ​ ​|​ ​ + ​ |​​  − 2 − ​ (​ ​− 5​)​ ​ |​​ ​ + ​ |​​ 3 − ​ (​ ​− 5​)​ ​ ​|​ ​
________________________________________________________
      
​​     
5  ​  =  8.8%​.
A and B are incorrect because the range and variance of the three funds are as
follows:
© CFA Institute. For candidate use only. Not for distribution.
Solutions 171

  Fund ABC Fund XYZ Fund PQR

Range 43% 44% 24%


Variance 317 243 110

The numbers shown for variance are understood to be in “percent squared” terms
so that when taking the square root, the result is standard deviation in percentage
terms. Alternatively, by expressing standard deviation and variance in decimal
form, one can avoid the issue of units. In decimal form, the variances for Fund
ABC, Fund XYZ, and Fund PQR are 0.0317, 0.0243, and 0.0110, respectively.

38. A is correct. The more disperse a distribution, the greater the difference between
the arithmetic mean and the geometric mean.

39. B is correct. The coefficient of variation (CV) is the ratio of the standard devia-
tion to the mean, where a higher CV implies greater risk per unit of return.
s 1.23%
​​CV​ UTIL​  =  _ _  ​  =  _
​ ​ X ​ 
​ 2.10%  
​  =  0.59​.

s 1.35%
​​CV​ MATR​  =  _ _  ​  =  _
​ ​ X ​ 
​ 1.25%  
​  =  1.08​.

s 1.52%
​​CV​ INDU​  =  _ _  ​  =  _
​ ​ X ​ 
​ 3.01%  
​  =  0.51​.

40. B is correct. The coefficient _of variation is the ratio of the standard deviation to
the arithmetic average, or √​  0.001723  ​ / 0.09986​= 0.416.

41. C is correct. The skewness is positive, so it is right-skewed (positively skewed).

42. C is correct. The excess kurtosis is positive, indicating that the distribution is
“fat-tailed”; therefore, there is more probability in the tails of the distribution
relative to the normal distribution.

43. B is correct. The distribution is thin-tailed relative to the normal distribution


because the excess kurtosis is less than zero.

44. B is correct. The correlation coefficient is positive, indicating that the two series
move together.

45. C is correct. Both outliers and spurious correlation are potential problems with
interpreting correlation coefficients.

46. C is correct. The correlation coefficient is positive because the covariation is posi-
tive.

47. A is correct. The correlation coefficient is negative because the covariation is neg-
ative.

48. C is correct. The correlation coefficient is positive because the covariance is pos-
itive. The fact that one or both variables have a negative mean does not affect the
sign of the correlation coefficient.

You might also like