Machine Learning MCQ
Machine Learning MCQ
Respondent
Although Alef’s investment performance has been good, its Chief Investment Officer, Paul Moresanu, is contemplating a
change in the investment process aimed at achieving even better returns. After attending multiple workshops and being
approached by data vendors, Moresanu feels that data science should play a role in the way Alef selects its investments.
He has also noticed that much of Alef’s past outperformance is due to stocks that became takeover targets. After some
research and reflection, Moresanu writes the following email to the Alef’s CEO.
I have been thinking about modernizing the way we select stock investments. Given that our past success has put Alef
Associates in an excellent financial position, now seems to be a good time to invest in our future. What I propose is that
we continue managing a portfolio of 100 global small-cap stocks but restructure our process to benefit from machine
learning (ML). Importantly, the new process will still allow a role for human insight, for example, in providing domain
knowledge. In addition, I think we should make a special effort to identify companies that are likely to be acquired.
Specifically, I suggest following the four steps which would be repeated every quarter.
Step 1: We apply ML techniques to a model including fundamental and technical variables (features) to predict
next quarter’s return for each of the 100 stocks currently in our portfolio. Then, the 20 stocks with the lowest esti‐
mated return are identified for replacement.
Step 2: We utilize ML techniques to divide our investable universe of about 10,000 stocks into 20 different groups,
based on a wide variety of the most relevant financial and non-financial characteristics. The idea is to prevent un‐
intended portfolio concentration by selecting stocks from each of these distinct groups.
Step 3: For each of the 20 different groups, we use labeled data to train a model that will predict the five stocks (in
any given group) that are most likely to become acquisition targets in the next one year.
Step 4: Our five experienced securities analysts are each assigned four of the groups, and then each analyst se‐
lects their one best stock pick from each of their assigned groups. These 20 “high-conviction” stocks will be
added to our portfolio (in replacement of the 20 relatively underperforming stocks to be sold in Step 1).
Comment 1 The ML algorithms will require large amounts of data. We would first need to explore using free or
inexpensive historical datasets and then evaluate their usefulness for the ML-based stock selection processes be‐
fore deciding on using data that requires subscription.
Comment 2 As time passes, we expect to find additional ways to apply ML techniques to refine Alef’s investment
processes.
Correct 0 / 0 pts
Auto-graded
1. The machine learning techniques appropriate for executing Step 1 are most
likely to be based on: *
regression
classification
clustering
Correct 0 / 0 pts
Auto-graded
Correct 0 / 0 pts
Auto-graded
K-Means Clustering
Correct 0 / 0 pts
Auto-graded
10,000, the eligible universe of small-cap stocks in which Alef can potially invest.
20, the number of different groups (i.e. clusters) into which the eligible universe of small-
cap stocks will be divided.
Correct 0 / 0 pts
Auto-graded
5. The target variable for the labelled training data to be used in Step 3 is most
likely which one of the following? *
Statement I only.
Correct 0 / 0 pts
Auto-graded
Using regularization
Using a fitting curve to select a model with low bias error and high variance error.
Correct 0 / 0 pts
Auto-graded
Discarding CART and using the predictions of a Support Vector Machine (SVM) model instead.
Discarding CART and using the predictions of a K-Nearest Neighbor (KNN) model instead.
Combining the predictions of the CART model with the predictions of other models – such
as logistic regression, SVM, and KNN – via ensemble learning.
Correct 0 / 0 pts
Auto-graded
9. Regarding Comment #2, Moresanu has been thinking about the applications
of neural networks (NNs) and deep learning (DL) to investment management.
Which statement(s) best describe(s) the tasks for which NNs and DL are well-
suited?
Statement I: NNs and DL are well-suited for image and speech recognition, and
natural language processing.
Statement II: NNs and DL are well-suited for developing single variable
ordinary least squares regression models.
Statement III: NNs and DL are well-suited for modelling non-linearities and
complex interactions among many features. *
Statement II only.
He tasks his team with building a supervised machine learning (ML) model for making predictions based on the selected
input features which include company-specific and macroeconomic factors. One of the team members asks Myers to
describe the difference between supervised and unsupervised ML.
Myers moves on to state that ML algorithms have several advantages over structured statistical approaches when ex‐
ploring and analyzing the structure of large data sets. When asked to elaborate on these advantages, Myers states:
Myers decides to employ the classification and regression tree (CART) model for making the required prediction. One of
his reasons for employing the model is the ability to provide a visual guide for predictions making it highly favorable for
communicating results and providing investment recommendations to clients. He also favors the model because of its
ability to handle complex, non-linear relationships.
His team proceeds to build the diversification prediction model by training a labeled dataset of 20 manufacturing com‐
panies in the automobile sector.
After building the model, Myer’s subordinate voices his concern that CARTs can perfectly learn the training data and that
the possibility of this occurrence will need to be addressed by regularization.
A few weeks following its construction, Myers works on improving model accuracy by employing a combination of mod‐
els to make the predictions. His selected model is the random forest classifier. In explaining the model to his subordi‐
nates, Myers makes the following statements:
Statement 1: The model represents a collection of a large number of decision trees trained via voting classifier
techniques.
Statement 2: A greater number of individual predictions can be generated with greater diversity by increasing the
number of input features used during training.
Statement 3: By incorporating the output of a collection of models, the random forest technique produces classi‐
fications that have better noise to signal ratios than the individual classifiers.
Correct 0 / 0 pts
Auto-graded
algorithms seek to discover structure within the data themselves without using a target
variable.
Correct 0 / 0 pts
Auto-graded
Advantage 1
Advantage 2
Advantage 3
Correct 0 / 0 pts
Auto-graded
12. Is Myers accurate with respect to the assumed advantages of the CART model?
*
Yes.
Correct 0 / 0 pts
Auto-graded
13. Which of the following regularization techniques will not be available to Myers
for the CART? *
Pruning
Dimension reduction
Correct 0 / 0 pts
Auto-graded
14. With respect to his description of random forests, Myers is most accurate
regarding: *
Statement 1.
Statement 2.
Statement 3.
Stone enhances the basic model further to rank the weakest performing stocks on a scale of 1 to 10 based on the likeli‐
hood of corporate failure in light of the global recession. Stocks ranked as 10 are issued by corporations which are
highly likely to fail. The input variables constitute a variety of financial and non-financial factors.
Stone discusses the basic model with a colleague who points that some of the chosen variables are highly correlated
which will increase the probability of producing misleading conclusions concerning stock performance. Stone sits down
to determine how she can further modify the model.
Stone’s discussion with her colleague also features reinforcement learning (RL). Based on her preliminary understanding
of the technique, Stone makes the following statements:
Statement 1: The RL algorithm relies on direct labeled data for each observation in a similar vein to supervised
machine (ML).
Statement 2: Any subsequent learning in the algorithm occurs through trial and error.
Statement 3: RL algorithms focus on maximizing rewards over time taking into consideration the constraints of
the environment.
Correct 0 / 0 pts
Auto-graded
15. Which type of technique is most suitable for Scott’s analysis in the basic
model? *
Deep learning
Supervised ML
Unsupervised ML
Correct 0 / 0 pts
Auto-graded
16. Which ML technique is suitable for grouping stock returns in the basic model?
*
17. An advantage of using SVM over the K-nearest neighbor (KNN) for predicting
corporate issuer failure is that: *
KNN is sensitive to the inclusion of correlated features while the SVM is not.
non-linear SVMs rely on a small number of features reducing model complexity compared to
the KNN.
Correct 0 / 0 pts
Auto-graded
18. In light of Stone’s discussion with her colleague, which of the following
techniques can be employed to reduce the presence of highly correlated
variables? *
Deep learning
Penalized regression
Dimension reduction
Correct 0 / 0 pts
Auto-graded
19. With respect to her discussion on RL, Stone is least accurate regarding: *
Statement 1.
Statement 2.
Statement 3.
To build the training set with 50 defined features, 300 randomly selected working-age clients will be asked a set of open-
ended questions by Lucia Fernandez, a market researcher. The resulting answers will include demographic data, infor‐
mation about risk preferences, and other retirement details. A Jubilación analyst will assign each individual in the sample
to one of the five portfolios. Martin initially plans to perform machine learning analysis and use the model to assign new
clients to the appropriate portfolio based on their responses to the questions.
Fernandez brings a sample set of responses back to Martin for further discussion. She tells him that in the interview ses‐
sions, many of the responses she has obtained are complex and subjective. For example, most individuals she interviews
are not clear about the concept of risk tolerance and provide comparisons or abstract concepts rather than specific
numbers or levels. In some cases, their fear of loss seems to increase at an increasing rate when some scenarios are pre‐
sented. Martin decides he will have to review these risk tolerance responses and use a model that groups them into risk
categories.
Fernandez delivers the completed set of interview data to Martin. After some preliminary analysis, Martin decides that he
is ready to develop the algorithm the chatbot will use to advise clients as to which of its five strategic investment portfo‐
lios is best for meeting their retirement goals. Martin notes that the final dataset has 50 features, and he is concerned
that some of them are likely to be correlated, which may lead to model misstatement. He considers three methods to
address this issue:
Correct 0 / 0 pts
Auto-graded
20. Martin’s initial planned machine learning analysis is best described as a form
of: *
categorical learning.
supervised learning.
unsupervised learning.
Correct 0 / 0 pts
Auto-graded
21. If Martin were to use a k-nearest neighbor model, the value for k would
be closest to: *
50
300
Incorrect 0 / 0 pts
Auto-graded
22. The most appropriate model for Martin to use in analyzing the responses to
the risk tolerance questions is a: *
Correct 0 / 0 pts
Auto-graded
Method 1
Method 2
Method 3