Machine learning Assignment

The document outlines an exam for valuing machine learning's potential in real-time fraud prediction for a bank's credit card transactions. It details specific tasks, including identifying data leakage, ethical considerations, feature engineering, and modeling procedures, while also emphasizing the importance of a structured report. Deliverables are divided into two sections, with the first focusing on specific questions and the second on a comprehensive report of findings and methodologies.

Uploaded by

Sonal Katiyar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views3 pages

Machine learning Assignment

Uploaded by

Sonal Katiyar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

EXAM DESCRIPTION

You work for a bank that issues credit cards. The bank has asked you to value machine learning’s potential use
for real time fraud prediction. You have received some sample data and a data dictionary, all available in this
folder:

https://drive.google.com/drive/folders/18wSh3FxhwVrDA9EIzNtmBDAmVPN9UA8O?usp=sharing

Your job is as follows. Use the provided data and data dictionary to build a machine learning model predicting
credit card fraud. Estimate the annual profitability of deploying your ML model.

For purposes of the exam, assume:

• Transactions are in dollars

• These are transactions for two years (and those years are representative of all other years)
• The bank receives 2% of every completed transaction as a processing fee
• The bank is 100% liable for any fraudulent transaction (i.e., the bank has to cover fraud losses)
• 10% of false positives will call, speak with customer service, and complete their transaction
o These calls take an average of 15 minutes
• Customer service personnel make 30 dollars per hour
• There is no customer goodwill cost or benefit associated with predicting fraud
• There are no costs, beyond those mentioned above, for predicted frauds (i.e., no further auditing,
reporting, or other potential costs for transactions predicted/found to be fraudulent). There are no
benefits, beyond those mentioned above, for the bank in its credit card business.

If you need to make other assumptions, feel free to do so. Just note the assumption and any references you
use to ground the assumption (if necessary).

DELIVERABLES

Answer the following questions 1 to 5 (max 275 words altogether: number your responses; comprising 25% of
your exam grade):

1. Is there leakage in the dataset? If yes, is it a case of data leakage, target leakage or time leakage, and
which variable(s) is (are) responsible?
2. What are your ethical considerations when building this machine learning model? How would you
safeguard against these potential issues?
3. How could you combine the customer and merchant’s location to derive a new feature? Why may that
feature hold predictive power?
4. What would be the rationale behind inclusion of a new feature, measuring frequency of purchases by
a cardholder, in your model to predict fraud?
5. Time_since_last_trans has many missing values. Use the count of the missing values and summary
statistics of other variables (and notion that these are time differences), to make an educated guess
what the missing values correspond to. What is the fraud rate for these values? What would be the
rationale for an abnormal fraud rate for these missing values? How can you increase model
predictability by incorporating this insight in your model?

75% of your exam grade will be determined by the following deliverables:

Write a report based on your findings (5 pages max and 1650 words max; “and” means both limits apply;
This includes front page, table of contents, reference, and figure list as well as any appendices, while no
appendix is necessary). This should include:

1. A concrete and valid recommendation (specifying the baseline, an appropriate action or actions and
well-motivated prediction thresholds) and the associated total profit
2. The action/intervention on which you base your profit calculation
3. Your data management procedures, consisting of:
a. Your review of the data;
b. Any preprocessing and feature engineering (Do these in Excel, Python or other appropriate
programs) steps; Note that a basic level of preprocessing and feature engineering would
have to be done. Motivating the rational and implications of consequential decisions you
make here are required and valuable; e.g. these could be motivated by summary statistics
and data explorations.
i. To be specific a basic level of preprocessing and feature engineering, would
constitute some of the following (among others referred to in class), where
appropriate:
1. Feature creation
2. Feature exclusion (redundant features, etc.)
3. Encoding (if appropriate)
4. Row exclusions (if appropriate)
5. Outlier removal (if appropriate)
6. Numerical transformations (if appropriate)
7. Dealing with excess zeros, missing values, etc.
c. Your employed data partitioning and justification of it;
d. The composition of the profit components (e.g., what are your high-level profit
components, and how does each contribute to the total?). Ensure to specify qualitative and
numerical representation of each component of a profit matrix (the numerical
representation could be a formula or a number; you are to decide how to represent it.)
e. Technical/data issues you think might affect results
4. Your modeling procedures:
a. Your model selection process, that:
i. Searches across possible models and hyperparameters (this can be done using
automated tools, such as DataRobot or tpot)
ii. Searches across meaningful sampling routines (e.g. downsampling or SMOTE)
iii. Searches across potential prediction decision thresholds
iv. Searches across potential actions/targets
5. Your evaluation process, that:
a. Executes the above without compromising the holdout, overfitting or failing to address any
leakage
b. Describes which features hold the most signal and motivate reasoning
c. Makes correct use of, and interpretation of, partitioning decisions
d. Addresses other key modelling issues
6. CRUCIAL NOTE: You must begin all paragraphs with a 1- to 7-word title that describes the
paragraph (the purpose of this is to provide more structure, cohesion & flow to your writing), an
example would be: "On data splitting: We split the dataset into three subsets: training, valuation
and holdout. We used 60% of the data for training, 20% for valuation and 20% for testing."
7. Clear and organized reporting of machine learning model prediction and evaluation steps
TIPS, TRICKS, and NOTES

1. I strongly suggest using the same structure, as the enumerated list above. That means use a larger font
heading for sections corresponding to words in bold font in the “deliverable” instructions above belonging
to numbers 1 to 6, and smaller headings for subsections corresponding to words in bold belonging to a)
to e), and yet smaller headings for the subsubsections, i) to iv).
2. The words in bold font in the “deliverable” instructions above, are great (not exhaustive) suggestions for
the titles (or part of titles), which each paragraph in the deliverables is to start with.
3. You may use Python (in conjunction with Excel), DataRobot (in conjunction with Excel), or some
combination for the exam. If using DataRobot, choosing “Quick” as the modeling mode (under Start), will
be adequate.
4. You will not be graded based on your solution’s profit. If one student’s model yields expected profit X, and
another student’s model yields expected profit 10*X, the 10*X exam will not necessarily receive a higher
score. Exams will be graded on their model building, model evaluation, and model valuation processes.
Solution profitability may spuriously correlate with exam grades to the extent that a more thorough
modeling procedure may yield a more profitable solution.
5. The data contain ~50K rows. Be aware of your time limit, taking into consideration the modeling platform
(Python, DataRobot) and models that you build, and your resource limitations (hardware, memory, etc).
6. You are not to submit Python codes (references to code are fine). However, where Python is used for
modelling, explaining the modelling procedure(s) is a requirement.
7. I strongly recommend that you do not rely on online solutions for similar code. Using existing code runs a
risk of yielding a good solution without demonstrating that you know how to properly execute the
modeling process. Because you will not be graded on your model’s profitability (the goal of many online
codes), but will be graded based on the thoroughness of your modeling process and explanation of model
value, adapting existing solutions online can often become a minefield of shortcuts and misunderstandings
that hurt exams.
8. Many python machine learning models and methods may require your data to be encoded (e.g. one-hot
encoded instead of categorical values). In that case, either implement the correct encoding or ensure to
find and apply the equivalent method for your data type. Ensure to do so in DataRobot too. You can find
the appropriate methods online (by googling or searching in documentations for packages we have
introduced you to).
9. If you modify the dataset in Excel, it is possible that automated formatting changes could make the dataset
un-loadable into DataRobot. This is usually a function of semi-colons, commas, and decimals being used
differently around the world. If you find that a dataset you’ve modified in Excel cannot be loaded to
DataRobot, you may need to come up with a workaround. If this is a problem for you, I often find it easiest
to load Excel data into Python, save data from Python into .txt or .csv, and then load that into DataRobot.
10. A recommendation given this exam has a time limit: In exams, work projects, or your thesis project, always
get 1 “OK” solution first. Even if you immediately envision a grand solution and the “OK” solution seems
like a wasted interim step, start with a simple but informative approach! You’ll learn something from this
process, and you’ll ensure that you’ve got something to hand in by the deadline.
11. You will be graded on the first 1650 words (and 5 pages) of your report. Title words do count towards the
total word count. The limits are meant to help replicate reports to technically-oriented business managers
and data scientists in firms.

Ace the Trading Systems Developer Interview (C++ Edition) : Insider's Guide to Top Tech Jobs in Finance
From Everand
Ace the Trading Systems Developer Interview (C++ Edition) : Insider's Guide to Top Tech Jobs in Finance
Dennis Thompson Sr
5/5 (1)
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Metashape Python Api 2 1 1
No ratings yet
Metashape Python Api 2 1 1
323 pages
Task - Case Study - DLMDSME01
No ratings yet
Task - Case Study - DLMDSME01
7 pages
Machine Learning Project Checklist
100% (1)
Machine Learning Project Checklist
10 pages
Hands-On Machine Learning with Microsoft Excel 2019: Build complete data analysis flows, from data collection to visualization
From Everand
Hands-On Machine Learning with Microsoft Excel 2019: Build complete data analysis flows, from data collection to visualization
Julio Cesar Rodriguez Martino
No ratings yet
IT Interview Guide for Freshers: Crack your IT interview with confidence
From Everand
IT Interview Guide for Freshers: Crack your IT interview with confidence
Sameer S Paradkar
No ratings yet
Beyond The Algorithm: Practical Machine Learning Strategies
From Everand
Beyond The Algorithm: Practical Machine Learning Strategies
Jane Onwuchekwa
No ratings yet
AI-900: Microsoft Azure AI Fundamentals Preparation
From Everand
AI-900: Microsoft Azure AI Fundamentals Preparation
Georgio Daccache
No ratings yet
CS502M_project_spec
No ratings yet
CS502M_project_spec
8 pages
ITNPBD6 Assignment 2018-2 PDF
No ratings yet
ITNPBD6 Assignment 2018-2 PDF
2 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
Data Mining: Fundamentals and Applications
From Everand
Data Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
From Everand
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
WINTON CLEM
No ratings yet
Data Analytics with Generative AI
From Everand
Data Analytics with Generative AI
Younish P
No ratings yet
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet
C++ Data Structures Explained: A Practical Guide with Examples
From Everand
C++ Data Structures Explained: A Practical Guide with Examples
William E. Clark
No ratings yet
ML Checklist PDF
No ratings yet
ML Checklist PDF
4 pages
Statistics with Rust: 50+ Statistical Techniques Put into Action
From Everand
Statistics with Rust: 50+ Statistical Techniques Put into Action
Keiko Nakamura
No ratings yet
Manufacturing: Engineering, Management and Marketing
From Everand
Manufacturing: Engineering, Management and Marketing
S.O.T Ogaji
No ratings yet
Data Science Checklist
No ratings yet
Data Science Checklist
22 pages
Milestone
No ratings yet
Milestone
7 pages
Microsoft Excel Statistical and Advanced Functions for Decision Making
From Everand
Microsoft Excel Statistical and Advanced Functions for Decision Making
Palani Murugappan
No ratings yet
C++ for Finance: Writing Fast and Reliable Trading Algorithms
From Everand
C++ for Finance: Writing Fast and Reliable Trading Algorithms
Robert Johnson
No ratings yet
XGBoost in Practice: Definitive Reference for Developers and Engineers
From Everand
XGBoost in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
DS Assignment (1)
No ratings yet
DS Assignment (1)
2 pages
Mastering Classification Algorithms for Machine Learning: Learn how to apply Classification algorithms for effective Machine Learning solutions (English Edition)
From Everand
Mastering Classification Algorithms for Machine Learning: Learn how to apply Classification algorithms for effective Machine Learning solutions (English Edition)
PARTHA MAJUMDAR
No ratings yet
IT Specialist: Artificial Intelligence Exam Prep - 500 Questions for Certification Success (0225)
From Everand
IT Specialist: Artificial Intelligence Exam Prep - 500 Questions for Certification Success (0225)
Satou Takahiro
No ratings yet
Crack the Data Analyst Interview: Real-Time Questions & Expert Answers
From Everand
Crack the Data Analyst Interview: Real-Time Questions & Expert Answers
Yash d.
No ratings yet
Mastering Machine Learning: A Comprehensive Guide to Success
From Everand
Mastering Machine Learning: A Comprehensive Guide to Success
Rick Spair
No ratings yet
Project Progress Report Handout and Rubric
No ratings yet
Project Progress Report Handout and Rubric
2 pages
Project2 - 158755. 4.21
No ratings yet
Project2 - 158755. 4.21
3 pages
Applied Predictive Modeling: An Overview of Applied Predictive Modeling
From Everand
Applied Predictive Modeling: An Overview of Applied Predictive Modeling
Steven Taylor
No ratings yet
Backtrader Essentials: Building Successful Strategies with Python
From Everand
Backtrader Essentials: Building Successful Strategies with Python
Ali AZARY
No ratings yet
PRACTICAL GUIDE TO LEARN ALGORITHMS: Master Algorithmic Problem-Solving Techniques (2024 Guide for Beginners)
From Everand
PRACTICAL GUIDE TO LEARN ALGORITHMS: Master Algorithmic Problem-Solving Techniques (2024 Guide for Beginners)
MARTY TWITTY
No ratings yet
Extension courseware based on the ArchiMate Standard, Version 3.1 Standard by Van Haren Publishing
From Everand
Extension courseware based on the ArchiMate Standard, Version 3.1 Standard by Van Haren Publishing
Van Haren Learning Solutions a.o.
No ratings yet
AIML Feb, March Scheme 2023
No ratings yet
AIML Feb, March Scheme 2023
25 pages
MACHINE LEARNING FOR BEGINNERS: A Practical Guide to Understanding and Applying Machine Learning Concepts (2023 Beginner Crash Course)
From Everand
MACHINE LEARNING FOR BEGINNERS: A Practical Guide to Understanding and Applying Machine Learning Concepts (2023 Beginner Crash Course)
Elaine Tate
No ratings yet
PYTHON DATA SCIENCE: A Practical Guide to Mastering Python for Data Science and Artificial Intelligence (2023 Beginner Crash Course)
From Everand
PYTHON DATA SCIENCE: A Practical Guide to Mastering Python for Data Science and Artificial Intelligence (2023 Beginner Crash Course)
Calvert Long
No ratings yet
ECON 460202E006 MLforBI2 S23o
No ratings yet
ECON 460202E006 MLforBI2 S23o
5 pages
Algorithms and Data Structures: An Easy Guide to Programming Skills
From Everand
Algorithms and Data Structures: An Easy Guide to Programming Skills
Rigdon Jonathan
No ratings yet
Knight's Microsoft Business Intelligence 24-Hour Trainer: Leveraging Microsoft SQL Server Integration, Analysis, and Reporting Services with Excel and SharePoint
From Everand
Knight's Microsoft Business Intelligence 24-Hour Trainer: Leveraging Microsoft SQL Server Integration, Analysis, and Reporting Services with Excel and SharePoint
Brian Knight
3/5 (1)
Assignment - Machine Learning
No ratings yet
Assignment - Machine Learning
3 pages
Lones_2024
No ratings yet
Lones_2024
28 pages
Top 20 MS Excel VBA Simulations, VBA to Model Risk, Investments, Growth, Gambling, and Monte Carlo Analysis
From Everand
Top 20 MS Excel VBA Simulations, VBA to Model Risk, Investments, Growth, Gambling, and Monte Carlo Analysis
Andrei Besedin
2.5/5 (2)
12 - Ai - Mock Test - 2023-24
No ratings yet
12 - Ai - Mock Test - 2023-24
4 pages
DPT Week 1
No ratings yet
DPT Week 1
3 pages
Assignment 1 DA_E Oct 2023 V1-1 (3)
No ratings yet
Assignment 1 DA_E Oct 2023 V1-1 (3)
3 pages
ML_1
No ratings yet
ML_1
13 pages
Enabling World-Class Decisions for Asia Pacific (APAC): The Executive’s Guide to Understanding & Deploying Modern Corporate Performance Management Solutions for Asia Pacific
From Everand
Enabling World-Class Decisions for Asia Pacific (APAC): The Executive’s Guide to Understanding & Deploying Modern Corporate Performance Management Solutions for Asia Pacific
Corey Barak
No ratings yet
Project Description
No ratings yet
Project Description
4 pages
Practical Full Stack Machine Learning: A Guide to Build Reliable, Reusable, and Production-Ready Full Stack ML Solutions
From Everand
Practical Full Stack Machine Learning: A Guide to Build Reliable, Reusable, and Production-Ready Full Stack ML Solutions
Alok Kumar
No ratings yet
Software Testing Interview Questions You'll Most Likely Be Asked
From Everand
Software Testing Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
15 Math Concepts Every Data Scientist Should Know: Understand and learn how to apply the math behind data science algorithms
From Everand
15 Math Concepts Every Data Scientist Should Know: Understand and learn how to apply the math behind data science algorithms
David Hoyle
No ratings yet
Computer Programming and Problem Solving Explorations
From Everand
Computer Programming and Problem Solving Explorations
Pasquale De Marco
No ratings yet
Business Dashboards: A Visual Catalog for Design and Deployment
From Everand
Business Dashboards: A Visual Catalog for Design and Deployment
Nils H. Rasmussen
4/5 (1)
Data Science and AI Simplified
From Everand
Data Science and AI Simplified
Ekaaksh Deshpande
No ratings yet
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet
DataRobot: Practical Automation for Enterprise AI
From Everand
DataRobot: Practical Automation for Enterprise AI
Richard Johnson
No ratings yet
Mastering Data Structures and Algorithms in Python & Java
From Everand
Mastering Data Structures and Algorithms in Python & Java
Sachin Naha
No ratings yet
Artificial Intelligence 2024 Book 2 of 2: AI, #2
From Everand
Artificial Intelligence 2024 Book 2 of 2: AI, #2
Yang Yen Thaw
No ratings yet
Guidelines
No ratings yet
Guidelines
7 pages
Machine Learning Report
No ratings yet
Machine Learning Report
5 pages
Brexit
No ratings yet
Brexit
93 pages
Emp
No ratings yet
Emp
1 page
Python Guide PDF
No ratings yet
Python Guide PDF
125 pages
Python Basics
No ratings yet
Python Basics
33 pages
Real Time Drowsiness Detection System
No ratings yet
Real Time Drowsiness Detection System
24 pages
Faster Python Programs Through Optimization PDF
No ratings yet
Faster Python Programs Through Optimization PDF
2 pages
Python Book
100% (3)
Python Book
445 pages
Machine Learning Cheat Sheet: 1. Hardware
No ratings yet
Machine Learning Cheat Sheet: 1. Hardware
14 pages
CS Project
No ratings yet
CS Project
9 pages
Give Me A Roadmap To Learn Python For Programming
No ratings yet
Give Me A Roadmap To Learn Python For Programming
11 pages
Introduction To Computer Science
No ratings yet
Introduction To Computer Science
44 pages
Aman's Resume (5)
No ratings yet
Aman's Resume (5)
1 page
Beginning Python Programming: Kantesh Raj (@kanteshraj)
No ratings yet
Beginning Python Programming: Kantesh Raj (@kanteshraj)
28 pages
Element 010 Project
No ratings yet
Element 010 Project
16 pages
PP Unit I Notes Dbatu-1
100% (1)
PP Unit I Notes Dbatu-1
20 pages
Python For Data Science
No ratings yet
Python For Data Science
8 pages
Python Day 1
No ratings yet
Python Day 1
18 pages
Class 7 CH 7
No ratings yet
Class 7 CH 7
5 pages
Python Currency Converter
No ratings yet
Python Currency Converter
5 pages
CWK 4
No ratings yet
CWK 4
5 pages
Pythong - Multimedia PDF
No ratings yet
Pythong - Multimedia PDF
234 pages
Final Documentation Egg Capture
No ratings yet
Final Documentation Egg Capture
15 pages
GUI Programming With Python QT EDITION
80% (5)
GUI Programming With Python QT EDITION
641 pages
Reto de Lectura 3: Durante El Reto, Puedes Recurrir A Los Diccionarios y Glosarios Igual Como A Tu Material de Clase
No ratings yet
Reto de Lectura 3: Durante El Reto, Puedes Recurrir A Los Diccionarios y Glosarios Igual Como A Tu Material de Clase
4 pages
Python Fundamentals Sheet
No ratings yet
Python Fundamentals Sheet
29 pages
Report
No ratings yet
Report
30 pages
Important Notes On Data Science
No ratings yet
Important Notes On Data Science
39 pages
Week - 1: Brief History of Python
No ratings yet
Week - 1: Brief History of Python
5 pages
Customer Churn Prediction On E-Commerce Using Machine Learning
No ratings yet
Customer Churn Prediction On E-Commerce Using Machine Learning
8 pages
Face Detection Report
No ratings yet
Face Detection Report
8 pages
ANPR Final Project Report
100% (3)
ANPR Final Project Report
53 pages

Machine learning Assignment

Uploaded by

Machine learning Assignment

Uploaded by

EXAM DESCRIPTION

For purposes of the exam, assume:

• Transactions are in dollars

75% of your exam grade will be determined by the following deliverables:

You might also like