0% found this document useful (0 votes)

118 views

Reading 4 Big Data Projects - Answers

Freja Karlsson is a bond analyst who developed a machine learning model to predict bond defaults. She evaluated the model's performance using metrics including precision (91%), recall (93%), F1 score (92%), and accuracy (89%). These metrics are calculated based on the model's confusion matrix, which shows it correctly predicted 307 bond defaults and 113 non-defaults out of 474 total bonds. Karlsson is using these metrics to understand how well the model identifies true positives and avoids false negatives, to improve its ability to predict bond defaults.

Uploaded by

tristan.riols

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

118 views

Reading 4 Big Data Projects - Answers

Uploaded by

tristan.riols

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Question #1 of 11 Question ID: 1472233

Which of the following uses of data is most accurately described as curation?

An investor creates a word cloud from financial analysts’ recent research

A)
reports about a company.
An analyst adjusts daily stock index data from two countries for their different
B)
market holidays.
A data technician accesses an offsite archive to retrieve data that has been
C)
stored there.

Explanation

Curation is ensuring the quality of data, for example by adjusting for bad or missing data.
Word clouds are a visualization technique. Moving data from a storage medium to where
they are needed is referred to as transfer.

(Module 4.1, LOS 4.a)

Freja Karlsson is a bond analyst with Storbank AB. Over the past several months, Karlsson
has been working to develop her own machine learning (ML) model that she plans to use to
predict default of the various bonds that she covers. The inputs to the model are various
pieces of financial data that Karlsson has compiled from multiple sources.

After Karlsson has constructed the model using her knowledge of appropriate variables,
Karlsson runs the model on the training set. Each firm's bonds are classified as predicted- to-
default or predicted-not-to-default. When Karlsson's model predicts that a bond will default
and the bond actually defaults, Karlsson considers this to be a true positive. Karlsson then
evaluates the performance of her model using error analysis. The confusion matrix that
results is shown in Exhibit 1.

N = 474 Actual Bond Status

Bond Default No Default

Model Prediction Bond Default 307 31

No Default 23 113
Question #2 - 5 of 11 Question ID: 1472240

Based on Exhibit 1, Karlsson's model's precision is closest to:

A) 81%.
B) 91%.
C) 71%.

Explanation

Precision, the ratio of correctly predicted positive classes (true positives) to all predicted
positive classes, is calculated as:

Precision (P) = TP /(TP + FP) = 307 / (307 + 31) = 0.9083 (91%)

In the context of this default classification, high precision would help us avoid the situation
where a bond is incorrectly predicted to default when it actually is not going to default.

(Module 4.3, LOS 4.c)

Question #3 - 5 of 11 Question ID: 1472241

Karlsson is especially concerned about the possibility that her model may indicate that a
bond will not default, but then the bond actually defaults. Karlsson decides to use the
model's recall to evaluate this possibility. Based on the data in Exhibit 1, the model's recall is
closest to:

A) 93%.
B) 83%.
C) 73%.

Explanation

Recall that TP / (TP + FN) = 307 / (307 + 23) = 0.9303 = 93%.

Recall is useful when the cost of a false negative is high, such as when we predict that a
bond will not default but it actually will. In cases like this, high recall indicates that false
negatives will be minimized.

(Module 4.3, LOS 4.c)

Question #4 - 5 of 11 Question ID: 1472242

Karlsson would like to gain a sense of her model's overall performance. In her research,
Karlsson learns about the F1 score, which she hopes will provide a useful measure. Based on
Exhibit 1, Karlsson's model's F1 score is closest to:

A) 82%.
B) 92%.
C) 72%.

Explanation

The model's F1 score, which is the harmonic mean of precision and recall, is calculated as:

F1 score = (2 × P × R) / (P + R) = (2 × 0.9083 × 0.9303) / (0.9083 + 0.9303) = 0.9192

(92%)

Like accuracy, F1 is a measure of overall performance measures that gives equal weight to
FP and FN.

(Module 4.3, LOS 4.c)

Question #5 - 5 of 11 Question ID: 1472243

Karlsson also learns of the model measure of accuracy. Based on Exhibit 1, Karlsson's
model's accuracy metric is closest to:

A) 79%.
B) 89%.
C) 69%.

Explanation

The model's accuracy is the percentage of correctly predicted classes out of total
predictions. Model accuracy is calculated as:

Accuracy = (TP + TN) / (TP + FP + TN + FN) = (TP + TN) / N

= (307 + 113) / (307 + 31 + 113 + 23) = (307 + 113) / (474)
= 0.8861 = 89%

(Module 4.3, LOS 4.c)

Question #6 of 11 Question ID: 1472234

Big data is most likely to suffer from low:

A) veracity.
B) velocity.
C) variety.

Explanation

Big data is defined as data with high volume, velocity, and variety. Big data often suffers
from low veracity, because it can contain a high percentage of meaningless data.

(Module 4.1, LOS 4.a)

Question #7 of 11 Question ID: 1472236

In big data projects, data exploration is least likely to encompass:

A) feature design.
B) feature engineering.
C) feature selection.

Explanation

Data exploration encompasses exploratory data analysis, feature selection, and feature
engineering.

(Module 4.2, LOS 4.d)

Question #8 of 11 Question ID: 1472237

Under which of these conditions is a machine learning model said to be underfit?

A) The input data are not labelled.

B) The model treats true parameters as noise.
C) The model identifies spurious relationships.

Explanation
Underfitting describes a machine learning model that is not complex enough to describe
the data it is meant to analyze. An underfit model treats true parameters as noise and fails
to identify the actual patterns and relationships. A model that is overfit (too complex) will
tend to identify spurious relationships in the data. Labelling of input data is related to the
use of supervised or unsupervised machine learning techniques.

(Module 4.3, LOS 4.f)

Question #9 of 11 Question ID: 1472235

The process of splitting a given text into separate words is best characterized as:

A) stemming.
B) tokenization.
C) bag-of-words.

Explanation

Text is considered to be a collection of tokens, where a token is equivalent to a word.

Tokenization is the process of splitting a given text into separate tokens. Bag-of-words
(BOW) is a collection of a distinct set of tokens from all the texts in a sample dataset.
Stemming is the process of converting inflected word forms into a base word.

(Module 4.1, LOS 4.g)

Question #10 of 11 Question ID: 1472232

An executive describes her company's "low latency, multiple terabyte" requirements for
managing Big Data. To which characteristics of Big Data is the executive referring?

A) Volume and variety.

B) Volume and velocity.
C) Velocity and variety.

Explanation

Big Data may be characterized by its volume (the amount of data available), velocity (the
speed at which data are communicated), and variety (degrees of structure in which data
exist). "Terabyte" is a measure of volume. "Latency" refers to velocity.

(Module 4.1, LOS 4.a)

Question #11 of 11 Question ID: 1472238

When evaluating the fit of a machine learning algorithm, it is most accurate to state that:

accuracy is the ratio of correctly predicted positive classes to all predicted

A)
positive classes.
recall is the ratio of correctly predicted positive classes to all actual positive
B)
classes.
precision is the percentage of correctly predicted classes out of total
C)
predictions.

Explanation

Recall (also called sensitivity) is the ratio of correctly predicted positive classes to all actual
positive classes. Precision is the ratio of correctly predicted positive classes to all predicted
positive classes. Accuracy is the percentage of correctly predicted classes out of total
predictions.

(Module 4.3, LOS 4.c)

ML Interview Questions PDF
100% (5)
ML Interview Questions PDF
20 pages
ECS7020P Sample Paper Solutions
No ratings yet
ECS7020P Sample Paper Solutions
6 pages
Huawei Final Written Exam
50% (2)
Huawei Final Written Exam
18 pages
Data Mining Sample Midterm Questions (Last Modified 2/17/19)
No ratings yet
Data Mining Sample Midterm Questions (Last Modified 2/17/19)
4 pages
Reading 4 Big Data Projects
No ratings yet
Reading 4 Big Data Projects
4 pages
Reading 4 Big Data Projects
No ratings yet
Reading 4 Big Data Projects
4 pages
Reading 4 Big Data Projects 1
No ratings yet
Reading 4 Big Data Projects 1
4 pages
Khoi KHDL - de On
No ratings yet
Khoi KHDL - de On
6 pages
AI_MQP 15 2024-25
No ratings yet
AI_MQP 15 2024-25
6 pages
EE2211_Past_Paper
No ratings yet
EE2211_Past_Paper
14 pages
ML Midterm Question Pool
No ratings yet
ML Midterm Question Pool
7 pages
Computational Machine Learning Mock Test
No ratings yet
Computational Machine Learning Mock Test
6 pages
Reading 4 Big Data Projects
No ratings yet
Reading 4 Big Data Projects
2 pages
ML_exam
No ratings yet
ML_exam
11 pages
Machine Learning Bits
100% (2)
Machine Learning Bits
28 pages
Huawei Final Written Exam 2.2 Attempts
No ratings yet
Huawei Final Written Exam 2.2 Attempts
19 pages
212 Final-Solution
No ratings yet
212 Final-Solution
23 pages
IML-IITKGP - Assignment 1 Solution
No ratings yet
IML-IITKGP - Assignment 1 Solution
7 pages
2023 ML Assignment
No ratings yet
2023 ML Assignment
57 pages
Big Data Analytics (BDAG 19-5) : Quiz: GMP - 2019 Term V
No ratings yet
Big Data Analytics (BDAG 19-5) : Quiz: GMP - 2019 Term V
2 pages
Ml Ese 031223 Openbook
No ratings yet
Ml Ese 031223 Openbook
4 pages
R08 Big Data Projects - Answers
No ratings yet
R08 Big Data Projects - Answers
3 pages
Data Mining f20 Practice Final Solutions
No ratings yet
Data Mining f20 Practice Final Solutions
8 pages
Data Final
No ratings yet
Data Final
17 pages
data analytic mcq
No ratings yet
data analytic mcq
5 pages
Marks Hi Marks: Be Comp MCQ PDF
100% (1)
Marks Hi Marks: Be Comp MCQ PDF
878 pages
ML Suggestion
No ratings yet
ML Suggestion
5 pages
ML mcq2
No ratings yet
ML mcq2
10 pages
CLASS-12 AI QUESTION PAPER Final
100% (1)
CLASS-12 AI QUESTION PAPER Final
5 pages
ISE 529 mock test answers
No ratings yet
ISE 529 mock test answers
6 pages
3rd_data(1) (1)
No ratings yet
3rd_data(1) (1)
18 pages
Quiz 1
No ratings yet
Quiz 1
5 pages
MFDS - Test 1 Problems
No ratings yet
MFDS - Test 1 Problems
9 pages
QCM
No ratings yet
QCM
24 pages
MIDA1 AUT - Solutions
No ratings yet
MIDA1 AUT - Solutions
4 pages
d3 PDF
No ratings yet
d3 PDF
7 pages
ML (1)
No ratings yet
ML (1)
6 pages
BML End Sem
No ratings yet
BML End Sem
2 pages
ML Question Bank U - 4
No ratings yet
ML Question Bank U - 4
14 pages
DS&BDA Techneo Unit 1&2 MCQs
No ratings yet
DS&BDA Techneo Unit 1&2 MCQs
16 pages
MachineLearning MidTerm UMT Spring 2021
No ratings yet
MachineLearning MidTerm UMT Spring 2021
12 pages
Hatdog 1.2
No ratings yet
Hatdog 1.2
18 pages
Exam 1
No ratings yet
Exam 1
3 pages
This Sheet Is For 1 Mark Questions S.R No
No ratings yet
This Sheet Is For 1 Mark Questions S.R No
63 pages
ds
No ratings yet
ds
22 pages
IML-IITKGP - Assignment 7 Solution
No ratings yet
IML-IITKGP - Assignment 7 Solution
8 pages
Mcqs 1
No ratings yet
Mcqs 1
34 pages
Data Science
No ratings yet
Data Science
35 pages
This Sheet Is For 1 Mark Questions S.R No
100% (1)
This Sheet Is For 1 Mark Questions S.R No
69 pages
Final Exam Update Huawei
0% (1)
Final Exam Update Huawei
13 pages
Sample_Exam_ML4DT-revised
No ratings yet
Sample_Exam_ML4DT-revised
10 pages
ML MODULE_1
No ratings yet
ML MODULE_1
3 pages
AI XII (843) Units 1,2,3
No ratings yet
AI XII (843) Units 1,2,3
17 pages
MACHINE LEARNING_INFO 4122_2023
No ratings yet
MACHINE LEARNING_INFO 4122_2023
4 pages
MCQs Dumps 2
No ratings yet
MCQs Dumps 2
15 pages
WEEK 1 2021[1]
No ratings yet
WEEK 1 2021[1]
2 pages
IT Specialist: Data Analytics Certification Prep - 500 Exam Questions and Explanations
From Everand
IT Specialist: Data Analytics Certification Prep - 500 Exam Questions and Explanations
Steve Brown
No ratings yet
Business Forecasting: The Emerging Role of Artificial Intelligence and Machine Learning
From Everand
Business Forecasting: The Emerging Role of Artificial Intelligence and Machine Learning
Michael Gilliland
No ratings yet
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
From Everand
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
Seaport AI Madhavan
No ratings yet
Reading 43 Application of The Code and Standards Level II - Answers
No ratings yet
Reading 43 Application of The Code and Standards Level II - Answers
14 pages
Reading 42.7 Standards of Professional Conduct Guidance For Standards VII
No ratings yet
Reading 42.7 Standards of Professional Conduct Guidance For Standards VII
8 pages
Reading 42.1 Standards of Professional Conduct Guidance For Standards I - Answers
No ratings yet
Reading 42.1 Standards of Professional Conduct Guidance For Standards I - Answers
25 pages
Reading 42.6 Standards of Professional Conduct Guidance For Standards VI
No ratings yet
Reading 42.6 Standards of Professional Conduct Guidance For Standards VI
20 pages
Reading 42.4 Standards of Professional Conduct Guidance For Standards IV
No ratings yet
Reading 42.4 Standards of Professional Conduct Guidance For Standards IV
20 pages
Reading 42.3 Standards of Professional Conduct Guidance For Standards III
No ratings yet
Reading 42.3 Standards of Professional Conduct Guidance For Standards III
29 pages
Reading 40 Analysis of Active Portfolio Management
No ratings yet
Reading 40 Analysis of Active Portfolio Management
17 pages
Reading 42.2 Standards of Professional Conduct Guidance For Standards II - Answers
No ratings yet
Reading 42.2 Standards of Professional Conduct Guidance For Standards II - Answers
27 pages
Reading 34 Hedge Fund Strategies
100% (1)
Reading 34 Hedge Fund Strategies
20 pages
Reading 37 Measuring and Managing Market Risk
No ratings yet
Reading 37 Measuring and Managing Market Risk
7 pages
Reading 32 Introduction To Commodities and Commodity Derivatives - Answers
100% (1)
Reading 32 Introduction To Commodities and Commodity Derivatives - Answers
7 pages
Reading 37 Measuring and Managing Market Risk - Answers
No ratings yet
Reading 37 Measuring and Managing Market Risk - Answers
11 pages
Reading 22 Market-Based Valuation - Price and Enterprise Value Multiples - Answers
No ratings yet
Reading 22 Market-Based Valuation - Price and Enterprise Value Multiples - Answers
73 pages
Reading 18 Corporate Restructuring - Answers
No ratings yet
Reading 18 Corporate Restructuring - Answers
8 pages
Reading 21 Free Cash Flow Valuation - Answers
100% (1)
Reading 21 Free Cash Flow Valuation - Answers
74 pages
Reading 6 Economic Growth - Answers
No ratings yet
Reading 6 Economic Growth - Answers
23 pages
Reading 7 Economics of Regulation - Answers
100% (1)
Reading 7 Economics of Regulation - Answers
15 pages
Reading 5 Currency Exchange Rates - Understanding Equilibrium Value - Answers
No ratings yet
Reading 5 Currency Exchange Rates - Understanding Equilibrium Value - Answers
27 pages
Reading 2 Time-Series Analysis - Answers
No ratings yet
Reading 2 Time-Series Analysis - Answers
63 pages
Reading 16 Environmental, Social, and Governance (ESG) Considerations in Investment Analysis - Answers
No ratings yet
Reading 16 Environmental, Social, and Governance (ESG) Considerations in Investment Analysis - Answers
8 pages
Reading 2 Time-Series Analysis
No ratings yet
Reading 2 Time-Series Analysis
47 pages
Reading 1 Multiple Regression
No ratings yet
Reading 1 Multiple Regression
68 pages
Reading 3 Machine Learning - Answers
No ratings yet
Reading 3 Machine Learning - Answers
12 pages
Fake Voices Teacher Guide
No ratings yet
Fake Voices Teacher Guide
10 pages
C1915-Gaurav Kumar-Ot
No ratings yet
C1915-Gaurav Kumar-Ot
3 pages
779 Naziya Shaikh
No ratings yet
779 Naziya Shaikh
30 pages
Reference Architecture ABB Ability Platform V1.3 - Complete
No ratings yet
Reference Architecture ABB Ability Platform V1.3 - Complete
5 pages
Telecom Billing PDF
No ratings yet
Telecom Billing PDF
33 pages
Transaction Blockchain Explorer
No ratings yet
Transaction Blockchain Explorer
1 page
2015 Amc8
No ratings yet
2015 Amc8
10 pages
Solved Computing 2024 Bece Structure
100% (2)
Solved Computing 2024 Bece Structure
6 pages
Bus Devices - EN1220b
No ratings yet
Bus Devices - EN1220b
16 pages
6.5.1.3 Packet Tracer - Layer 2 VLAN Security
No ratings yet
6.5.1.3 Packet Tracer - Layer 2 VLAN Security
14 pages
Sokkia CX Series
No ratings yet
Sokkia CX Series
3 pages
Flatpack2 48V HE Rectifiers: The Original HE Rectifier
No ratings yet
Flatpack2 48V HE Rectifiers: The Original HE Rectifier
2 pages
Verification and Validation in CFD and Heat Transfer: ANSYS Practice and The New ASME Standard
100% (1)
Verification and Validation in CFD and Heat Transfer: ANSYS Practice and The New ASME Standard
22 pages
Practical Lab Manual-CSE-492
No ratings yet
Practical Lab Manual-CSE-492
4 pages
HCS-5390 Series Digital Infrared Wireless Conference Unit - 20230418
No ratings yet
HCS-5390 Series Digital Infrared Wireless Conference Unit - 20230418
11 pages
SMBus 3 2 20220112
No ratings yet
SMBus 3 2 20220112
86 pages
Cdu Preflight Procedure Note - FTD
No ratings yet
Cdu Preflight Procedure Note - FTD
6 pages
VoluMill For GibbsCAM
No ratings yet
VoluMill For GibbsCAM
15 pages
Symphony Mobile Bangladesh
No ratings yet
Symphony Mobile Bangladesh
9 pages
Communicating With Raspberry Pi Via Mav Link
No ratings yet
Communicating With Raspberry Pi Via Mav Link
10 pages
Yealink MeetingBar A30 Teams&Zoom Collaboration Bar Datasheet
No ratings yet
Yealink MeetingBar A30 Teams&Zoom Collaboration Bar Datasheet
4 pages
A Novel Ship Trajectory Clustering Analysis and Anomaly Detection Method Based on AIS Data
No ratings yet
A Novel Ship Trajectory Clustering Analysis and Anomaly Detection Method Based on AIS Data
27 pages
2 JSS Two Lesson Note
No ratings yet
2 JSS Two Lesson Note
20 pages
Advanced Networking Technology: Ambo University Institute of Technology Department of Information Technology
No ratings yet
Advanced Networking Technology: Ambo University Institute of Technology Department of Information Technology
31 pages
Puter Networks An Open Source Approach
No ratings yet
Puter Networks An Open Source Approach
10 pages
Social Network
No ratings yet
Social Network
34 pages
System Design Basics: IB Computer Science
No ratings yet
System Design Basics: IB Computer Science
7 pages
Java Programming Using Linux, March
No ratings yet
Java Programming Using Linux, March
2 pages
CNM Accuplacer Remote Testing
No ratings yet
CNM Accuplacer Remote Testing
2 pages
Guide To Net Commands
No ratings yet
Guide To Net Commands
7 pages

Reading 4 Big Data Projects - Answers

Uploaded by

Reading 4 Big Data Projects - Answers

Uploaded by

Question #1 of 11 Question ID: 1472233

Which of the following uses of data is most accurately described as curation?

An investor creates a word cloud from financial analysts’ recent research

(Module 4.1, LOS 4.a)

N = 474 Actual Bond Status

Bond Default No Default

Model Prediction Bond Default 307 31

Based on Exhibit 1, Karlsson's model's precision is closest to:

Precision (P) = TP /(TP + FP) = 307 / (307 + 31) = 0.9083 (91%)

(Module 4.3, LOS 4.c)

Question #3 - 5 of 11 Question ID: 1472241

Recall that TP / (TP + FN) = 307 / (307 + 23) = 0.9303 = 93%.

(Module 4.3, LOS 4.c)

F1 score = (2 × P × R) / (P + R) = (2 × 0.9083 × 0.9303) / (0.9083 + 0.9303) = 0.9192

(Module 4.3, LOS 4.c)

Question #5 - 5 of 11 Question ID: 1472243

Accuracy = (TP + TN) / (TP + FP + TN + FN) = (TP + TN) / N

(Module 4.3, LOS 4.c)

Big data is most likely to suffer from low:

(Module 4.1, LOS 4.a)

Question #7 of 11 Question ID: 1472236

In big data projects, data exploration is least likely to encompass:

(Module 4.2, LOS 4.d)

Question #8 of 11 Question ID: 1472237

Under which of these conditions is a machine learning model said to be underfit?

A) The input data are not labelled.

(Module 4.3, LOS 4.f)

Question #9 of 11 Question ID: 1472235

Text is considered to be a collection of tokens, where a token is equivalent to a word.

(Module 4.1, LOS 4.g)

Question #10 of 11 Question ID: 1472232

A) Volume and variety.

(Module 4.1, LOS 4.a)

accuracy is the ratio of correctly predicted positive classes to all predicted

(Module 4.3, LOS 4.c)

You might also like