0% found this document useful (0 votes)
6 views

Machine Learning MCQ

Uploaded by

Thùy Chi Phạm
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Machine Learning MCQ

Uploaded by

Thùy Chi Phạm
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Review: Machine Learning

Respondent

1 Anonymous 10:16 96%


Time to complete Score

Alef Associates Case Scenario


Alef Associates manages a long-only fund specializing in global smallcap equities. Since its founding a decade ago, Alef
maintains a portfolio of 100 stocks (out of an eligible universe of about 10,000 stocks). Some of these holdings are the
result of screening the universe for attractive stocks based on several ratios that use readily available market and ac‐
counting data; others are the result of investment ideas generated by Alef’s professional staff of five securities analysts
and two portfolio managers.

Although Alef’s investment performance has been good, its Chief Investment Officer, Paul Moresanu, is contemplating a
change in the investment process aimed at achieving even better returns. After attending multiple workshops and being
approached by data vendors, Moresanu feels that data science should play a role in the way Alef selects its investments.
He has also noticed that much of Alef’s past outperformance is due to stocks that became takeover targets. After some
research and reflection, Moresanu writes the following email to the Alef’s CEO.

Subject: Investment Process Reorganization

I have been thinking about modernizing the way we select stock investments. Given that our past success has put Alef
Associates in an excellent financial position, now seems to be a good time to invest in our future. What I propose is that
we continue managing a portfolio of 100 global small-cap stocks but restructure our process to benefit from machine
learning (ML). Importantly, the new process will still allow a role for human insight, for example, in providing domain
knowledge. In addition, I think we should make a special effort to identify companies that are likely to be acquired.
Specifically, I suggest following the four steps which would be repeated every quarter.

Step 1: We apply ML techniques to a model including fundamental and technical variables (features) to predict
next quarter’s return for each of the 100 stocks currently in our portfolio. Then, the 20 stocks with the lowest esti‐
mated return are identified for replacement.

Step 2: We utilize ML techniques to divide our investable universe of about 10,000 stocks into 20 different groups,
based on a wide variety of the most relevant financial and non-financial characteristics. The idea is to prevent un‐
intended portfolio concentration by selecting stocks from each of these distinct groups.

Step 3: For each of the 20 different groups, we use labeled data to train a model that will predict the five stocks (in
any given group) that are most likely to become acquisition targets in the next one year.

Step 4: Our five experienced securities analysts are each assigned four of the groups, and then each analyst se‐
lects their one best stock pick from each of their assigned groups. These 20 “high-conviction” stocks will be
added to our portfolio (in replacement of the 20 relatively underperforming stocks to be sold in Step 1).

A couple of additional comments related to the above:

Comment 1 The ML algorithms will require large amounts of data. We would first need to explore using free or
inexpensive historical datasets and then evaluate their usefulness for the ML-based stock selection processes be‐
fore deciding on using data that requires subscription.

Comment 2 As time passes, we expect to find additional ways to apply ML techniques to refine Alef’s investment
processes.

What do you think?


Paul Moresanu

Correct 0 / 0 pts
Auto-graded

1. The machine learning techniques appropriate for executing Step 1 are most
likely to be based on: *

regression

classification

clustering
Correct 0 / 0 pts
Auto-graded

2. Which of the following ML models would be least appropriate to avoid


overfitting? *

Regression tree with pruning.

LASSO with lambda (λ) equal to 0.

LASSO with lambda (λ) between 0.5 and 1.

Correct 0 / 0 pts
Auto-graded

3. Which of the following machine learning techniques is most appropriate for


executing Step 2: *

K-Means Clustering

Principal Components Analysis (PCA)

Classification and Regression Trees (CART)

Correct 0 / 0 pts
Auto-graded

4. The hyperparameter in the ML model to be used for accomplishing Step 2 is? *

100, the number of small-cap stocks in Alef’s portfolio.

10,000, the eligible universe of small-cap stocks in which Alef can potially invest.

20, the number of different groups (i.e. clusters) into which the eligible universe of small-
cap stocks will be divided.

Correct 0 / 0 pts
Auto-graded

5. The target variable for the labelled training data to be used in Step 3 is most
likely which one of the following? *

A continuous target variable.

A categorical target variable.

An ordinal target variable.


Correct 0 / 0 pts
Auto-graded

6. Comparing two ML models that could be used to accomplish Step 3, which


statement(s) best describe(s) the advantages of using Classification and
Regression Trees (CART) instead of K-Nearest Neighbor (KNN)?
Statement I: For CART there is no requirement to specify an initial
hyperparameter (like K).
Statement II: For CART there is no requirement to specify a similarity (or
distance) measure.
Statement III: For CART the output provides a visual explanation for the
prediction. *

Statement I only.

Statement III only.

Statements I, II and III.

Correct 0 / 0 pts
Auto-graded

7. Assuming a Classification and Regression Tree (CART) model is used to


accomplish Step 3, which of the following is most likely to result in model
overfitting? *

Using regularization

Including an overfitting penalty (i.e., regularization term).

Using a fitting curve to select a model with low bias error and high variance error.

Correct 0 / 0 pts
Auto-graded

8. Assuming a Classification and Regression Tree (CART) model is initially used to


accomplish Step 3, as a further step which of the following techniques is most
likely to result in more accurate predictions? *

Discarding CART and using the predictions of a Support Vector Machine (SVM) model instead.

Discarding CART and using the predictions of a K-Nearest Neighbor (KNN) model instead.

Combining the predictions of the CART model with the predictions of other models – such
as logistic regression, SVM, and KNN – via ensemble learning.
Correct 0 / 0 pts
Auto-graded

9. Regarding Comment #2, Moresanu has been thinking about the applications
of neural networks (NNs) and deep learning (DL) to investment management.
Which statement(s) best describe(s) the tasks for which NNs and DL are well-
suited?

Statement I: NNs and DL are well-suited for image and speech recognition, and
natural language processing.
Statement II: NNs and DL are well-suited for developing single variable
ordinary least squares regression models.
Statement III: NNs and DL are well-suited for modelling non-linearities and
complex interactions among many features. *

Statement II only.

Statements I and III.

Statements I, II and III.

Oxi-Naught Case Scenario


Sigmund Myers is an analyst at Oxi-Naught, a quantitative sell-side research firm specializing in equities research. Myers
is attempting to forecast whether selected companies under evaluation will enter into manufacturing diversification
based on incentives provided by the government. His research extends to all manufacturing company stocks in the auto
industry.

He tasks his team with building a supervised machine learning (ML) model for making predictions based on the selected
input features which include company-specific and macroeconomic factors. One of the team members asks Myers to
describe the difference between supervised and unsupervised ML.

Myers moves on to state that ML algorithms have several advantages over structured statistical approaches when ex‐
ploring and analyzing the structure of large data sets. When asked to elaborate on these advantages, Myers states:

Advantage 1: Less susceptible to overfitting problems


Advantage 2: Can capture non-linear relationships and recognize and predict structural changes between the fea‐
tures and target.
Advantage 3: Capable of processing massive amounts of data rapidly.

Myers decides to employ the classification and regression tree (CART) model for making the required prediction. One of
his reasons for employing the model is the ability to provide a visual guide for predictions making it highly favorable for
communicating results and providing investment recommendations to clients. He also favors the model because of its
ability to handle complex, non-linear relationships.

His team proceeds to build the diversification prediction model by training a labeled dataset of 20 manufacturing com‐
panies in the automobile sector.

After building the model, Myer’s subordinate voices his concern that CARTs can perfectly learn the training data and that
the possibility of this occurrence will need to be addressed by regularization.

A few weeks following its construction, Myers works on improving model accuracy by employing a combination of mod‐
els to make the predictions. His selected model is the random forest classifier. In explaining the model to his subordi‐
nates, Myers makes the following statements:

Statement 1: The model represents a collection of a large number of decision trees trained via voting classifier
techniques.
Statement 2: A greater number of individual predictions can be generated with greater diversity by increasing the
number of input features used during training.
Statement 3: By incorporating the output of a collection of models, the random forest technique produces classi‐
fications that have better noise to signal ratios than the individual classifiers.

Correct 0 / 0 pts
Auto-graded

10. Myers’ best response to the subordinate’s question is that, relative to


unsupervised ML, supervised ML: *

makes use of labeled data.

focuses on organizing observations into groups known as clusters.

algorithms seek to discover structure within the data themselves without using a target
variable.
Correct 0 / 0 pts
Auto-graded

11. With respect to the relative advantages of ML models over structured


statistical approaches, Myers is least accurate regarding: *

Advantage 1

Advantage 2

Advantage 3

Correct 0 / 0 pts
Auto-graded

12. Is Myers accurate with respect to the assumed advantages of the CART model?
*

Yes.

No, with respect to the model generating a visual guide.

No, with respect to the model’s ability to handle non-linear data.

Correct 0 / 0 pts
Auto-graded

13. Which of the following regularization techniques will not be available to Myers
for the CART? *

Pruning

Dimension reduction

Adding the maximum depth of the tree as a parameter

Correct 0 / 0 pts
Auto-graded

14. With respect to his description of random forests, Myers is most accurate
regarding: *

Statement 1.

Statement 2.

Statement 3.

Stone Asset Management (SAM) Case Scenario


Lisa Scott is a quantitative analyst at Coopers Financials. Scott has been hired by Stone Asset Management (SAM) to
provide investment advice with respect to its global equity fund. Scott evaluates the fund’s holdings in an attempt to de‐
termine the impact of a global recession on stock returns. She builds a basic model which will collect returns projected
by top analysts that incorporate key input factors related to the global economy and local (country) environment. Based
on these returns, the model will group the stocks which are most similar to each other based on the issuer’s operating
and financial characteristics.

Stone enhances the basic model further to rank the weakest performing stocks on a scale of 1 to 10 based on the likeli‐
hood of corporate failure in light of the global recession. Stocks ranked as 10 are issued by corporations which are
highly likely to fail. The input variables constitute a variety of financial and non-financial factors.

Stone discusses the basic model with a colleague who points that some of the chosen variables are highly correlated
which will increase the probability of producing misleading conclusions concerning stock performance. Stone sits down
to determine how she can further modify the model.

Stone’s discussion with her colleague also features reinforcement learning (RL). Based on her preliminary understanding
of the technique, Stone makes the following statements:

Statement 1: The RL algorithm relies on direct labeled data for each observation in a similar vein to supervised
machine (ML).

Statement 2: Any subsequent learning in the algorithm occurs through trial and error.

Statement 3: RL algorithms focus on maximizing rewards over time taking into consideration the constraints of
the environment.

Correct 0 / 0 pts
Auto-graded

15. Which type of technique is most suitable for Scott’s analysis in the basic
model? *

Deep learning

Supervised ML

Unsupervised ML

Correct 0 / 0 pts
Auto-graded

16. Which ML technique is suitable for grouping stock returns in the basic model?
*

Hierarchical clustering unknown number of categories

Support vector machine (SVM) supervised machine learning

Principal component analysis (PCA) dimension reduction


Correct 0 / 0 pts
Auto-graded

17. An advantage of using SVM over the K-nearest neighbor (KNN) for predicting
corporate issuer failure is that: *

KNN is used for regression while SVM is used for classification

KNN is sensitive to the inclusion of correlated features while the SVM is not.

non-linear SVMs rely on a small number of features reducing model complexity compared to
the KNN.

Correct 0 / 0 pts
Auto-graded

18. In light of Stone’s discussion with her colleague, which of the following
techniques can be employed to reduce the presence of highly correlated
variables? *

Deep learning

Penalized regression

Dimension reduction

Correct 0 / 0 pts
Auto-graded

19. With respect to her discussion on RL, Stone is least accurate regarding: *

Statement 1.

Statement 2.

Statement 3.

Jubilación S.L., Case Scenario


Carlos Martin, a recent graduate of the financial engineering program at a well-known university, has just been hired by
Jubilación S.L., a Madrid-based firm that specializes in retirement planning. He has been asked to develop a machine
learning (ML) tool to help assign each client to one of the firm’s five strategic investment portfolios. labeled data

To build the training set with 50 defined features, 300 randomly selected working-age clients will be asked a set of open-
ended questions by Lucia Fernandez, a market researcher. The resulting answers will include demographic data, infor‐
mation about risk preferences, and other retirement details. A Jubilación analyst will assign each individual in the sample
to one of the five portfolios. Martin initially plans to perform machine learning analysis and use the model to assign new
clients to the appropriate portfolio based on their responses to the questions.

Fernandez brings a sample set of responses back to Martin for further discussion. She tells him that in the interview ses‐
sions, many of the responses she has obtained are complex and subjective. For example, most individuals she interviews
are not clear about the concept of risk tolerance and provide comparisons or abstract concepts rather than specific
numbers or levels. In some cases, their fear of loss seems to increase at an increasing rate when some scenarios are pre‐
sented. Martin decides he will have to review these risk tolerance responses and use a model that groups them into risk
categories.

Fernandez delivers the completed set of interview data to Martin. After some preliminary analysis, Martin decides that he
is ready to develop the algorithm the chatbot will use to advise clients as to which of its five strategic investment portfo‐
lios is best for meeting their retirement goals. Martin notes that the final dataset has 50 features, and he is concerned
that some of them are likely to be correlated, which may lead to model misstatement. He considers three methods to
address this issue:

1. Combine variables using the ensemble model.


2. Use the bootstrap aggregating (bagging) method.
3. Employ principal components analysis.

Correct 0 / 0 pts
Auto-graded

20. Martin’s initial planned machine learning analysis is best described as a form
of: *

categorical learning.

supervised learning.

unsupervised learning.

Correct 0 / 0 pts
Auto-graded

21. If Martin were to use a k-nearest neighbor model, the value for k would
be closest to: *

50

300
Incorrect 0 / 0 pts
Auto-graded

22. The most appropriate model for Martin to use in analyzing the responses to
the risk tolerance questions is a: *

neural network (NN) model.

support vector machine (SVM) model.

least absolute shrinkage and selection operator (LASSO) model.

Correct 0 / 0 pts
Auto-graded

23. Which of the methods Martin considers to address potential feature


correlation is the most suitable? *

Method 1

Method 2

Method 3

You might also like