0% found this document useful (0 votes)
202 views

CC Fraud Analytics Capstone

This document outlines a new credit card fraud detection system using machine learning. It provides background on building a model to detect fraud early and reduce losses. Key insights show transaction amount, category and gender are most important variables. The current monthly fraud cost is $213,392, but the new model could detect 1,720 fraudulent transactions at a cost of $2,580 while missing 68 frauds for a cost of $35,908. This would save $174,904 per month, an 82% reduction in losses.

Uploaded by

Rohit Vora
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
202 views

CC Fraud Analytics Capstone

This document outlines a new credit card fraud detection system using machine learning. It provides background on building a model to detect fraud early and reduce losses. Key insights show transaction amount, category and gender are most important variables. The current monthly fraud cost is $213,392, but the new model could detect 1,720 fraudulent transactions at a cost of $2,580 while missing 68 frauds for a cost of $35,908. This would save $174,904 per month, an 82% reduction in losses.

Uploaded by

Rohit Vora
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

New Credit

Card Fraud
Detection
System
- Krunal Nagda
Agenda
 Objective
 Background
 Key Insights
 Cost Benefit Analysis
 Appendix:
o Data Attributes
o Data Methodology
o Attached Files
Objective
 Getting in place a credit card fraud
detection system to save on incurred
costs incurred
 Huge costs are being incurred due to
frauds and a manual detection system
Background
A machine learning model has been built
to detect frauds early and mitigate losses
 A cost benefit analysis has been done for
the deployment of the same
Key Insights
 Transaction amount,
category and gender
are the most important
variables
 Gas and transport,
grocery and shopping
are the top three
categories
Current Incurred Losses
 77,183 credit card transactions per month
 402 fraudulent transactions per month
 $ 530.66 amount per fraud transaction
 Total costs incurred from fraud
transactions is $ 213,392.22
After New Model Deployment
 1720 fraudulent transactions detected by the
model
 $ 1.5 cost to provide customer support to
these transactions that is $ 2,580.38 in total
 68 fraudulent transactions not detected by
model which amounts to $ 35,908.09 loss
 Total cost incurred after new model
deployment is $ 38,488.46
 Final savings after new model deployment is
$174,903.76 that is reduction in losses by ~82%
Appendix: Data Attributes
 Snapshot of the data:
o index - Unique Identifier for each row
o transdatetrans_time - Transaction DateTime
o cc_num - Credit Card Number of Customer
o merchant - Merchant Name
o category - Category of Merchant
o amt - Amount of Transaction
o first - First Name of Credit Card Holder
o last - Last Name of Credit Card Holder
o gender - Gender of Credit Card Holder
o street - Street Address of Credit Card Holder
o city - City of Credit Card Holder
o state - State of Credit Card Holder
o zip - Zip of Credit Card Holder
o lat - Latitude Location of Credit Card Holder
o long - Longitude Location of Credit Card Holder
o city_pop - Credit Card Holder's City Population
o job - Job of Credit Card Holder
o dob - Date of Birth of Credit Card Holder
o trans_num - Transaction Number
o unix_time - UNIX Time of transaction
o merch_lat - Latitude Location of Merchant
o merch_long - Longitude Location of Merchant
o is_fraud - Fraud Flag <--- Target Class
Appendix: Data Methodology
A random forest classifier built on top a
Kaggle simulated dataset
 Class imbalance adjusted using Adaptive
Synthetic (ADASYN) sampling method
 Manual hyperparameter tuning done due
to extensive computational times when
using Grid Search Cross Validation
Attached Files
 Cost Benefit Analysis:
o Cost Benefit Analysis.xlsx
 Random Forest Classifier Model:
o CC Fraud Analytics Capstone.ipynb

You might also like