SlideShare a Scribd company logo
© 2014 MapR Technologies 1© 2014 MapR Technologies
© 2014 MapR Technologies 2
Contact Information
Ted Dunning
Chief Applications Architect at MapR Technologies
Committer & PMC for Apache’s Drill, Zookeeper & others
VP of Incubator at Apache Foundation
Email tdunning@apache.org tdunning@maprtech.com
Twitter @ted_dunning
© 2014 MapR Technologies 3
Goals for Today
• Explore the state of the art for deep-learning and fraud detection
• Separate at least some of the wheat from the chaff
• Provide some realistic guidance for getting results
© 2014 MapR Technologies 4
Goals for Today
• Explore the state of the art for deep-learning and fraud detection
• Separate at least some of the wheat from the chaff
• Provide some realistic guidance for getting results
• Play with cool stuff !
© 2014 MapR Technologies 5
Agenda
• Motivation
• What are neural networks and deep learning?
• It can be simpler than you think
• But, no free lunch / you get what you pay / other clever aphorism
• Some experiments
• Where to go from here
© 2014 MapR Technologies 6
Motivation For Advanced Modeling in Fraud
• Neural networks have completely dominated credit card fraud
detection since late 80’s
– Random forest, tree ensembles often used in other kinds of fraud and
churn models
• The reason is rule-based systems simply don’t work
– Well, they do work at first
– Fraudsters change tactics, you add rules, interaction mayhem ensues
• And learning algorithms really do work
– Fraudsters change tactics, you add features and retrain
© 2014 MapR Technologies 7
So learning is good
© 2014 MapR Technologies 8
So learning is good
But good learning is hard
© 2014 MapR Technologies 9
So learning is good
But good learning is hard
And finding good features is
really hard
© 2014 MapR Technologies 10
Some Sample Features
• Charge size relative to previous averages for card
• Charge size relative to previous average for merchant
• Known merchant or not
• Doubled transaction
• AVS or CVV2 mismatch
© 2014 MapR Technologies 11
Some Sample Features
• Charge size relative to previous averages for card
• Charge size relative to previous average for merchant
• Known merchant or not
• Doubled transaction
• Address Verification System or CVV2 mismatch
© 2014 MapR Technologies 12
Some Sample Features
• Charge size relative to previous averages for card
• Charge size relative to previous average for merchant
• Known merchant or not
• Doubled transaction
• Address Verification System or Card Verification Value mismatch
© 2014 MapR Technologies 13
Some Sample Features
• Charge size relative to previous averages for card
• Charge size relative to previous average for merchant
• Known merchant or not
• Doubled transaction
• Address Verification System or Card Verification Value mismatch
• Unusual region for card
• Unusual time-of-day relative to history
• Magstripe use if chip available
• (hundreds more)
© 2014 MapR Technologies 14
Sequence Based Features
• Plausible pattern matching (rent a car, pay for gas at airport)
• Probe transactions (gas in wrong place, pizza, big charge)
• Previous transaction at compromised merchant
• Card velocity
© 2014 MapR Technologies 15
Key Problems
• Good guys need data … that means that fraudsters get first
chance at bat
• Good guys are careful and test systems before releasing
• Bad guys have many low-risk transactions and can change
methods quickly
• In some areas, fraudster adapt techniques in hours
© 2014 MapR Technologies 16
Making up features is easy
Finding features that add
real lift is very hard
© 2014 MapR Technologies 17
What are neural networks and deep learning?
• Start simple … imagine we have 20 features, 0 or 1
– Let’s yell “Fraud” if any of the features is a 1
– Houston, we have a model
• But this model isn’t any better than a rule
• Also doesn’t have any interesting Greek letters
© 2014 MapR Technologies 18
Real-world Intrudes
• We assumed all features are equally good
– What if some are kind of poor or weak?
• Can we weight different features more or less?
– Can we learn these weights from data?
© 2014 MapR Technologies 19
Real-world Intrudes
• We assumed all features are equally good
– What if some are kind of poor or weak?
• Can we weight different features more or less?
– Can we learn these weights from data?
© 2014 MapR Technologies 20
Learning Works
• Yes. We can learn these models
• How we measure error is important
• We must have good features
• Even good features may need transformation
– Take logs of times and monetary values
– Subtract means, scale, bin values
© 2014 MapR Technologies 21
Not Good Enough
• We need combinations of models
• Simple linear combinations aren’t subtle enough
• Enter multi-level models
– Can we learn a model that uses combinations of inputs
– Where each of those combinations is a model that we learn?
© 2014 MapR Technologies 22
Yes, Virginia, There IS a Santa Claus
Each circle is a sum
and a (soft) threshold
Arrows are multiplication
by a learned weight
© 2014 MapR Technologies 23
Errors on Output Can Propagate
Each circle is sends
error to each arrow
Arrows weight back-
propagating errors
Inputs
Hidden layer
© 2014 MapR Technologies 24
Success!
Triumph!
World domination!
© 2014 MapR Technologies 25
World domination!
With some reservations
because features are hard
© 2014 MapR Technologies 26
Turtles All the Way Down – We Wish
• This learning works well for just a few layers
• This is still a big deal …
– with cool features, we can build real systems
• With many layers, the learning no longer converges
• Well … until recently
© 2014 MapR Technologies 27
Model Learning in an Ideal World
• If we could just learn the features
– Maybe unsupervised, maybe supervised
– And at the same time learn the model
• Presumably we could build models quicker
• And more easily
• And we wouldn’t have to dirty our minds with
pedestrian domain knowledge
© 2014 MapR Technologies 28
Example 1 – (not very) Deep Auto-encoder
• Let’s take an example where we can learn features
• Data is EKG traces
• We want to find anomalies
– No supervised training
© 2014 MapR Technologies 29
Spot the Anomaly
Anomaly?
© 2014 MapR Technologies 30
Maybe not!
© 2014 MapR Technologies 31
Where’s Waldo?
This is the real
anomaly
© 2014 MapR Technologies 32
Normal Isn’t Just Normal
• What we want is a model of what is normal
• What doesn’t fit the model is the anomaly
• For simple signals, the model can be simple …
• The real world is rarely so accommodating
x ~ m(t)+ N(0,e)
© 2014 MapR Technologies 33
We Do Windows
© 2014 MapR Technologies 34
We Do Windows
© 2014 MapR Technologies 35
We Do Windows
© 2014 MapR Technologies 36
We Do Windows
© 2014 MapR Technologies 37
We Do Windows
© 2014 MapR Technologies 38
We Do Windows
© 2014 MapR Technologies 39
We Do Windows
© 2014 MapR Technologies 40
We Do Windows
© 2014 MapR Technologies 41
We Do Windows
© 2014 MapR Technologies 42
Windows on the World
• The set of windowed signals is a nice model of our original signal
• Clustering can find the prototypes
– Fancier techniques available using sparse coding
• The result is a dictionary of shapes
• New signals can be encoded by shifting, scaling and adding
shapes from the dictionary
© 2014 MapR Technologies 43
Most Common Shapes (for EKG)
© 2014 MapR Technologies 44
Reconstructed signal
Original
signal
Reconstructed
signal
Reconstruction
error
< 1 bit / sample
© 2014 MapR Technologies 45
An Anomaly
Original technique for finding
1-d anomaly works against
reconstruction error
© 2014 MapR Technologies 46
Close-up of anomaly
Not what you want your
heart to do.
And not what the model
expects it to do.
© 2014 MapR Technologies 47
A Different Kind of Anomaly
© 2014 MapR Technologies 48
Some k-means Caveats
• But Eamonn Keogh says that k-means can’t work on time-series
• That is silly … and kind of correct, k-means does have limits
– Other kinds of auto-encoders are much more powerful
• More fun and code demos at
– https://github.com/tdunning/k-means-auto-encoder
http://www.cs.ucr.edu/~eamonn/meaningless.pdf
© 2014 MapR Technologies 49
The Limits of Clustering as Auto-encoder
• Clustering is like trying to tile your sample distribution
• Can be used to approximate a signal
• Filling d dimensional region with k clusters should give
• If d is large, this is no good
e » 1/ kd
© 2014 MapR Technologies 50
0 500 1000 1500 2000
−2−1012
Time series training data (first 2000 samples)
Time
Test data
Reconstruction
Error
© 2014 MapR Technologies 51
0 500 1000 1500 2000
0.000.050.100.15
Reconstruction error for time−series data
Centroids
MAVError
Training data
Held−out data
© 2014 MapR Technologies 52
Moral For Auto-encoders
• The simplest auto-encoders can be good models
• For more complex spaces/signals, more elaborate models may
be required
– Winner take (absolutely) all may be problematic
– In particular, models that allow sparse linear combination may be better
• Consider deep learning, recurrent networks, denoising
© 2014 MapR Technologies 53
How Does Clustering Do Reconstruction?
x1 x2
...
xn-1 xn
Input
For normalized cluster centroids,
dot-product and distance are equivalent
© 2014 MapR Technologies 54
How Does Clustering Do Reconstruction?
x1 x2
...
xn-1 xn
Input
Winner takes all with k-means
© 2014 MapR Technologies 55
How Does Clustering Do Reconstruction?
x1 x2
...
xn-1 xn
x'1 x'2
...
x'n-1 x'n
Input
Hidden layer
(clusters)
Reconstruction
Dot-product scales
centroid to reconstruct
© 2014 MapR Technologies 56
AKA - Neural Network
x1 x2
...
xn-1 xn
x'1 x'2
...
x'n-1 x'n
Input
Hidden layer
(clusters)
Reconstruction
© 2014 MapR Technologies 57
What If … We Had More Layers?
...
...
...
...
... ... ... ... ...
... ... ... ... ...
A
B
A'
© 2014 MapR Technologies 58
Other Thoughts
• What if we allow more than one cluster to be active?
– k-sparse learning!
© 2014 MapR Technologies 59
Other Thoughts
• What if we allow more than one cluster to be active?
– k-sparse learning!
© 2014 MapR Technologies 60
Other Thoughts
• What if we allow more than one cluster to be active?
– k-sparse learning!
© 2014 MapR Technologies 61
Other Thoughts
• What if we allow more than one cluster to be active?
– k-sparse learning!
• Well, almost
© 2014 MapR Technologies 62
The Point of Deep Learning
• It isn’t just many hidden layers in a neural network
• The goal is to eliminate feature engineering by learning features
as well as the classifier
© 2014 MapR Technologies 63
Experiment 3 – Card Velocity
• Most features so far are inherent in the data
• Few are true sequence features
• Card velocity is a pure combination
– Starting point can be anywhere
– The issue is where the next point is relative to starting point
© 2014 MapR Technologies 64
Card Velocity
Non-fraud steps are
reasonable in terms
of distance and time
Fraudulent use of card
by multiple attackers
results in big, fast jumps
© 2014 MapR Technologies 65
Synthetic Data Example
• Generate random point
• Take four small steps
• If fraud, second step can be large
• Result is five positions, each in 3-d on surface of a sphere
– Data shape is N x (5 x 3)
• Add secondary features containing step size … N x 4
© 2014 MapR Technologies 66
The Truth is Out There
• With the right feature (step-size),
it is trivial to spot the fraud
• Here we show the step size
between positions
• Fraud cases take a big jump that
others don’t
• But they can be anywhere
© 2014 MapR Technologies 67
But Dimensionality Bites Hard
• With the step-size feature, learning succeeds instantly with the
simplest models and gets nearly perfect accuracy
• Without the step-size feature, learning with TensorFlow gets
modest accuracy after substantial learning cost (work in
progress, could do better with lots more tuning)
• The problem is that there are two many combinations of 15
variables, we need a very specific combination of three pair-wise
diffs combined non-linearly into a distance
© 2014 MapR Technologies 68
104
105
106
1
0
0.2
0.4
0.6
0.8
Data Size
AUCorPrecision
AUC
Precision
© 2014 MapR Technologies 69
We have a
bona fide revolution
But old tricks still pay
© 2014 MapR Technologies 70
Greenfield Problem Landscape
© 2014 MapR Technologies 71
Mature Problem Landscape
© 2014 MapR Technologies 72
Summary
• There is too much to say in 40 minutes, let’s talk some more at
the MapR booth
• Deep learning, especially with systems like TensorFlow have
huge promise
• Deep learning trades learning architecture engineering for
feature engineering
• There are powerful middle grounds
© 2014 MapR Technologies 73
© 2014 MapR Technologies 74
Short Books by Ted Dunning & Ellen Friedman
• Published by O’Reilly in 2014 - 2016
• For sale from Amazon or O’Reilly
• Free e-books currently available courtesy of MapR
http://bit.ly/ebook-real-
world-hadoop
http://bit.ly/mapr-tsdb-
ebook
http://bit.ly/ebook-
anomaly
http://bit.ly/recommend
ation-ebook
© 2014 MapR Technologies 75
Streaming Architecture
by Ted Dunning and Ellen Friedman © 2016 (published by O’Reilly)
Free copies at book
signing today
http://bit.ly/mapr-ebook-streams
© 2014 MapR Technologies 76
Thank You!
© 2014 MapR Technologies 77
Q&A
@mapr maprtech
tdunning@maprtech.com
Engage with us!
MapR
maprtech
mapr-technologies
Ad

More Related Content

What's hot (20)

ACFE Presentation on Analytics for Fraud Detection and Mitigation
ACFE Presentation on Analytics for Fraud Detection and MitigationACFE Presentation on Analytics for Fraud Detection and Mitigation
ACFE Presentation on Analytics for Fraud Detection and Mitigation
Scott Mongeau
 
Healthcare fraud detection
Healthcare fraud detectionHealthcare fraud detection
Healthcare fraud detection
Mahdi Esmailoghli
 
Fraud Analytics with Machine Learning and Big Data Engineering for Telecom
Fraud Analytics with Machine Learning and Big Data Engineering for TelecomFraud Analytics with Machine Learning and Big Data Engineering for Telecom
Fraud Analytics with Machine Learning and Big Data Engineering for Telecom
Sudarson Roy Pratihar
 
Fraud Detection Techniques
Fraud Detection TechniquesFraud Detection Techniques
Fraud Detection Techniques
Vhena Pilongo
 
Artificial Intelligence and Digital Banking - What about fraud prevention ?
Artificial Intelligence and Digital Banking - What about fraud prevention ?Artificial Intelligence and Digital Banking - What about fraud prevention ?
Artificial Intelligence and Digital Banking - What about fraud prevention ?
Jérôme Kehrli
 
Big Data Analytics in light of Financial Industry
Big Data Analytics in light of Financial Industry Big Data Analytics in light of Financial Industry
Big Data Analytics in light of Financial Industry
Capgemini
 
Practical Crypto Asset Predictions rev
Practical Crypto Asset Predictions revPractical Crypto Asset Predictions rev
Practical Crypto Asset Predictions rev
Jesus Rodriguez
 
Evolution of Digital Bank 4.0
Evolution of Digital Bank 4.0Evolution of Digital Bank 4.0
Evolution of Digital Bank 4.0
Connected Futures
 
Digital strategy for Financial Institutions
Digital strategy for Financial InstitutionsDigital strategy for Financial Institutions
Digital strategy for Financial Institutions
Sameer Singh Jaini
 
Data driven approach to KYC
Data driven approach to KYCData driven approach to KYC
Data driven approach to KYC
Pankaj Baid
 
Tictaclabs Managed Cyber Security Services
Tictaclabs Managed Cyber Security ServicesTictaclabs Managed Cyber Security Services
Tictaclabs Managed Cyber Security Services
TicTac Data Recovery
 
Explain the Value of your Splunk Deployment Breakout Session
Explain the Value of your Splunk Deployment Breakout SessionExplain the Value of your Splunk Deployment Breakout Session
Explain the Value of your Splunk Deployment Breakout Session
Splunk
 
The Journey to Digital Transformation with Touch Bank
The Journey to Digital Transformation with Touch BankThe Journey to Digital Transformation with Touch Bank
The Journey to Digital Transformation with Touch Bank
Backbase
 
Cybersecurity: Cyber Risk Management for Banks & Financial Institutions
Cybersecurity: Cyber Risk Management for Banks & Financial InstitutionsCybersecurity: Cyber Risk Management for Banks & Financial Institutions
Cybersecurity: Cyber Risk Management for Banks & Financial Institutions
Shawn Tuma
 
Inypay Pitch Deck - March 2023-Latest copy 2.pdf
Inypay Pitch Deck - March 2023-Latest copy 2.pdfInypay Pitch Deck - March 2023-Latest copy 2.pdf
Inypay Pitch Deck - March 2023-Latest copy 2.pdf
Mustafa Kuğu
 
DeFi 101
DeFi 101DeFi 101
DeFi 101
Manish Jain
 
Overview of Data Analytics in Lending Business
Overview of Data Analytics in Lending BusinessOverview of Data Analytics in Lending Business
Overview of Data Analytics in Lending Business
Sanjay Kar
 
BaaS-platforms and open APIs in fintech l bank-as-a-service.com
BaaS-platforms and open APIs in fintech l bank-as-a-service.comBaaS-platforms and open APIs in fintech l bank-as-a-service.com
BaaS-platforms and open APIs in fintech l bank-as-a-service.com
Vladislav Solodkiy
 
Trends in AML Compliance and Technology
Trends in AML Compliance and TechnologyTrends in AML Compliance and Technology
Trends in AML Compliance and Technology
SAS Institute India Pvt. Ltd
 
Digital Insurance - Opportunities in India
Digital Insurance - Opportunities in IndiaDigital Insurance - Opportunities in India
Digital Insurance - Opportunities in India
The Digital Insurer
 
ACFE Presentation on Analytics for Fraud Detection and Mitigation
ACFE Presentation on Analytics for Fraud Detection and MitigationACFE Presentation on Analytics for Fraud Detection and Mitigation
ACFE Presentation on Analytics for Fraud Detection and Mitigation
Scott Mongeau
 
Fraud Analytics with Machine Learning and Big Data Engineering for Telecom
Fraud Analytics with Machine Learning and Big Data Engineering for TelecomFraud Analytics with Machine Learning and Big Data Engineering for Telecom
Fraud Analytics with Machine Learning and Big Data Engineering for Telecom
Sudarson Roy Pratihar
 
Fraud Detection Techniques
Fraud Detection TechniquesFraud Detection Techniques
Fraud Detection Techniques
Vhena Pilongo
 
Artificial Intelligence and Digital Banking - What about fraud prevention ?
Artificial Intelligence and Digital Banking - What about fraud prevention ?Artificial Intelligence and Digital Banking - What about fraud prevention ?
Artificial Intelligence and Digital Banking - What about fraud prevention ?
Jérôme Kehrli
 
Big Data Analytics in light of Financial Industry
Big Data Analytics in light of Financial Industry Big Data Analytics in light of Financial Industry
Big Data Analytics in light of Financial Industry
Capgemini
 
Practical Crypto Asset Predictions rev
Practical Crypto Asset Predictions revPractical Crypto Asset Predictions rev
Practical Crypto Asset Predictions rev
Jesus Rodriguez
 
Evolution of Digital Bank 4.0
Evolution of Digital Bank 4.0Evolution of Digital Bank 4.0
Evolution of Digital Bank 4.0
Connected Futures
 
Digital strategy for Financial Institutions
Digital strategy for Financial InstitutionsDigital strategy for Financial Institutions
Digital strategy for Financial Institutions
Sameer Singh Jaini
 
Data driven approach to KYC
Data driven approach to KYCData driven approach to KYC
Data driven approach to KYC
Pankaj Baid
 
Tictaclabs Managed Cyber Security Services
Tictaclabs Managed Cyber Security ServicesTictaclabs Managed Cyber Security Services
Tictaclabs Managed Cyber Security Services
TicTac Data Recovery
 
Explain the Value of your Splunk Deployment Breakout Session
Explain the Value of your Splunk Deployment Breakout SessionExplain the Value of your Splunk Deployment Breakout Session
Explain the Value of your Splunk Deployment Breakout Session
Splunk
 
The Journey to Digital Transformation with Touch Bank
The Journey to Digital Transformation with Touch BankThe Journey to Digital Transformation with Touch Bank
The Journey to Digital Transformation with Touch Bank
Backbase
 
Cybersecurity: Cyber Risk Management for Banks & Financial Institutions
Cybersecurity: Cyber Risk Management for Banks & Financial InstitutionsCybersecurity: Cyber Risk Management for Banks & Financial Institutions
Cybersecurity: Cyber Risk Management for Banks & Financial Institutions
Shawn Tuma
 
Inypay Pitch Deck - March 2023-Latest copy 2.pdf
Inypay Pitch Deck - March 2023-Latest copy 2.pdfInypay Pitch Deck - March 2023-Latest copy 2.pdf
Inypay Pitch Deck - March 2023-Latest copy 2.pdf
Mustafa Kuğu
 
Overview of Data Analytics in Lending Business
Overview of Data Analytics in Lending BusinessOverview of Data Analytics in Lending Business
Overview of Data Analytics in Lending Business
Sanjay Kar
 
BaaS-platforms and open APIs in fintech l bank-as-a-service.com
BaaS-platforms and open APIs in fintech l bank-as-a-service.comBaaS-platforms and open APIs in fintech l bank-as-a-service.com
BaaS-platforms and open APIs in fintech l bank-as-a-service.com
Vladislav Solodkiy
 
Digital Insurance - Opportunities in India
Digital Insurance - Opportunities in IndiaDigital Insurance - Opportunities in India
Digital Insurance - Opportunities in India
The Digital Insurer
 

Viewers also liked (13)

Modern Data Architecture
Modern Data ArchitectureModern Data Architecture
Modern Data Architecture
Alexey Grishchenko
 
Architectural Overview of MapR's Apache Hadoop Distribution
Architectural Overview of MapR's Apache Hadoop DistributionArchitectural Overview of MapR's Apache Hadoop Distribution
Architectural Overview of MapR's Apache Hadoop Distribution
mcsrivas
 
Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkSimplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache Spark
Databricks
 
Hands on MapR -- Viadea
Hands on MapR -- ViadeaHands on MapR -- Viadea
Hands on MapR -- Viadea
viadea
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT Better
MapR Technologies
 
MapR Tutorial Series
MapR Tutorial SeriesMapR Tutorial Series
MapR Tutorial Series
selvaraaju
 
MapR M7: Providing an enterprise quality Apache HBase API
MapR M7: Providing an enterprise quality Apache HBase APIMapR M7: Providing an enterprise quality Apache HBase API
MapR M7: Providing an enterprise quality Apache HBase API
mcsrivas
 
Apache Spark & Hadoop
Apache Spark & HadoopApache Spark & Hadoop
Apache Spark & Hadoop
MapR Technologies
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & Internals
Anton Kirillov
 
MapR Data Analyst
MapR Data AnalystMapR Data Analyst
MapR Data Analyst
selvaraaju
 
Apache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and SmarterApache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and Smarter
Databricks
 
Introduction to Spark Internals
Introduction to Spark InternalsIntroduction to Spark Internals
Introduction to Spark Internals
Pietro Michiardi
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
Alexey Grishchenko
 
Architectural Overview of MapR's Apache Hadoop Distribution
Architectural Overview of MapR's Apache Hadoop DistributionArchitectural Overview of MapR's Apache Hadoop Distribution
Architectural Overview of MapR's Apache Hadoop Distribution
mcsrivas
 
Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkSimplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache Spark
Databricks
 
Hands on MapR -- Viadea
Hands on MapR -- ViadeaHands on MapR -- Viadea
Hands on MapR -- Viadea
viadea
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT Better
MapR Technologies
 
MapR Tutorial Series
MapR Tutorial SeriesMapR Tutorial Series
MapR Tutorial Series
selvaraaju
 
MapR M7: Providing an enterprise quality Apache HBase API
MapR M7: Providing an enterprise quality Apache HBase APIMapR M7: Providing an enterprise quality Apache HBase API
MapR M7: Providing an enterprise quality Apache HBase API
mcsrivas
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & Internals
Anton Kirillov
 
MapR Data Analyst
MapR Data AnalystMapR Data Analyst
MapR Data Analyst
selvaraaju
 
Apache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and SmarterApache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and Smarter
Databricks
 
Introduction to Spark Internals
Introduction to Spark InternalsIntroduction to Spark Internals
Introduction to Spark Internals
Pietro Michiardi
 
Ad

Similar to Deep Learning for Fraud Detection (20)

How to find what you didn't know to look for, oractical anomaly detection
How to find what you didn't know to look for, oractical anomaly detectionHow to find what you didn't know to look for, oractical anomaly detection
How to find what you didn't know to look for, oractical anomaly detection
DataWorks Summit
 
Mathematical bridges From Old to New
Mathematical bridges From Old to NewMathematical bridges From Old to New
Mathematical bridges From Old to New
MapR Technologies
 
Anomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningAnomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine Learning
Ted Dunning
 
How to tell which algorithms really matter
How to tell which algorithms really matterHow to tell which algorithms really matter
How to tell which algorithms really matter
DataWorks Summit
 
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look for
Ted Dunning
 
Strata 2014 Anomaly Detection
Strata 2014 Anomaly DetectionStrata 2014 Anomaly Detection
Strata 2014 Anomaly Detection
Ted Dunning
 
How to Determine which Algorithms Really Matter
How to Determine which Algorithms Really MatterHow to Determine which Algorithms Really Matter
How to Determine which Algorithms Really Matter
DataWorks Summit
 
Hadoop and R Go to the Movies
Hadoop and R Go to the MoviesHadoop and R Go to the Movies
Hadoop and R Go to the Movies
DataWorks Summit
 
Predictive Analytics with Hadoop
Predictive Analytics with HadoopPredictive Analytics with Hadoop
Predictive Analytics with Hadoop
DataWorks Summit
 
HUG_Ireland_Streaming_Ted_Dunning
HUG_Ireland_Streaming_Ted_DunningHUG_Ireland_Streaming_Ted_Dunning
HUG_Ireland_Streaming_Ted_Dunning
John Mulhall
 
Practical Computing With Chaos
Practical Computing With ChaosPractical Computing With Chaos
Practical Computing With Chaos
DataWorks Summit
 
Practical Computing with Chaos
Practical Computing with ChaosPractical Computing with Chaos
Practical Computing with Chaos
MapR Technologies
 
How the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside DownHow the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside Down
DataWorks Summit
 
Dealing with an Upside Down Internet
Dealing with an Upside Down InternetDealing with an Upside Down Internet
Dealing with an Upside Down Internet
MapR Technologies
 
Strata 2014-tdunning-anomaly-detection-140211162923-phpapp01
Strata 2014-tdunning-anomaly-detection-140211162923-phpapp01Strata 2014-tdunning-anomaly-detection-140211162923-phpapp01
Strata 2014-tdunning-anomaly-detection-140211162923-phpapp01
MapR Technologies
 
Realistic Synthetic Generation Allows Secure Development
Realistic Synthetic Generation Allows Secure DevelopmentRealistic Synthetic Generation Allows Secure Development
Realistic Synthetic Generation Allows Secure Development
DataWorks Summit
 
Realistic Synthetic Generation Allows Secure Development
Realistic Synthetic Generation Allows Secure DevelopmentRealistic Synthetic Generation Allows Secure Development
Realistic Synthetic Generation Allows Secure Development
MapR Technologies
 
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningDeep Learning vs. Cheap Learning
Deep Learning vs. Cheap Learning
MapR Technologies
 
Sharing Sensitive Data Securely
Sharing Sensitive Data SecurelySharing Sensitive Data Securely
Sharing Sensitive Data Securely
Ted Dunning
 
Which Algorithms Really Matter
Which Algorithms Really MatterWhich Algorithms Really Matter
Which Algorithms Really Matter
Ted Dunning
 
How to find what you didn't know to look for, oractical anomaly detection
How to find what you didn't know to look for, oractical anomaly detectionHow to find what you didn't know to look for, oractical anomaly detection
How to find what you didn't know to look for, oractical anomaly detection
DataWorks Summit
 
Mathematical bridges From Old to New
Mathematical bridges From Old to NewMathematical bridges From Old to New
Mathematical bridges From Old to New
MapR Technologies
 
Anomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningAnomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine Learning
Ted Dunning
 
How to tell which algorithms really matter
How to tell which algorithms really matterHow to tell which algorithms really matter
How to tell which algorithms really matter
DataWorks Summit
 
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look for
Ted Dunning
 
Strata 2014 Anomaly Detection
Strata 2014 Anomaly DetectionStrata 2014 Anomaly Detection
Strata 2014 Anomaly Detection
Ted Dunning
 
How to Determine which Algorithms Really Matter
How to Determine which Algorithms Really MatterHow to Determine which Algorithms Really Matter
How to Determine which Algorithms Really Matter
DataWorks Summit
 
Hadoop and R Go to the Movies
Hadoop and R Go to the MoviesHadoop and R Go to the Movies
Hadoop and R Go to the Movies
DataWorks Summit
 
Predictive Analytics with Hadoop
Predictive Analytics with HadoopPredictive Analytics with Hadoop
Predictive Analytics with Hadoop
DataWorks Summit
 
HUG_Ireland_Streaming_Ted_Dunning
HUG_Ireland_Streaming_Ted_DunningHUG_Ireland_Streaming_Ted_Dunning
HUG_Ireland_Streaming_Ted_Dunning
John Mulhall
 
Practical Computing With Chaos
Practical Computing With ChaosPractical Computing With Chaos
Practical Computing With Chaos
DataWorks Summit
 
Practical Computing with Chaos
Practical Computing with ChaosPractical Computing with Chaos
Practical Computing with Chaos
MapR Technologies
 
How the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside DownHow the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside Down
DataWorks Summit
 
Dealing with an Upside Down Internet
Dealing with an Upside Down InternetDealing with an Upside Down Internet
Dealing with an Upside Down Internet
MapR Technologies
 
Strata 2014-tdunning-anomaly-detection-140211162923-phpapp01
Strata 2014-tdunning-anomaly-detection-140211162923-phpapp01Strata 2014-tdunning-anomaly-detection-140211162923-phpapp01
Strata 2014-tdunning-anomaly-detection-140211162923-phpapp01
MapR Technologies
 
Realistic Synthetic Generation Allows Secure Development
Realistic Synthetic Generation Allows Secure DevelopmentRealistic Synthetic Generation Allows Secure Development
Realistic Synthetic Generation Allows Secure Development
DataWorks Summit
 
Realistic Synthetic Generation Allows Secure Development
Realistic Synthetic Generation Allows Secure DevelopmentRealistic Synthetic Generation Allows Secure Development
Realistic Synthetic Generation Allows Secure Development
MapR Technologies
 
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningDeep Learning vs. Cheap Learning
Deep Learning vs. Cheap Learning
MapR Technologies
 
Sharing Sensitive Data Securely
Sharing Sensitive Data SecurelySharing Sensitive Data Securely
Sharing Sensitive Data Securely
Ted Dunning
 
Which Algorithms Really Matter
Which Algorithms Really MatterWhich Algorithms Really Matter
Which Algorithms Really Matter
Ted Dunning
 
Ad

More from DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
DataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
DataWorks Summit/Hadoop Summit
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit/Hadoop Summit
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
DataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
DataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
 

Recently uploaded (20)

Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 

Deep Learning for Fraud Detection

  • 1. © 2014 MapR Technologies 1© 2014 MapR Technologies
  • 2. © 2014 MapR Technologies 2 Contact Information Ted Dunning Chief Applications Architect at MapR Technologies Committer & PMC for Apache’s Drill, Zookeeper & others VP of Incubator at Apache Foundation Email [email protected] [email protected] Twitter @ted_dunning
  • 3. © 2014 MapR Technologies 3 Goals for Today • Explore the state of the art for deep-learning and fraud detection • Separate at least some of the wheat from the chaff • Provide some realistic guidance for getting results
  • 4. © 2014 MapR Technologies 4 Goals for Today • Explore the state of the art for deep-learning and fraud detection • Separate at least some of the wheat from the chaff • Provide some realistic guidance for getting results • Play with cool stuff !
  • 5. © 2014 MapR Technologies 5 Agenda • Motivation • What are neural networks and deep learning? • It can be simpler than you think • But, no free lunch / you get what you pay / other clever aphorism • Some experiments • Where to go from here
  • 6. © 2014 MapR Technologies 6 Motivation For Advanced Modeling in Fraud • Neural networks have completely dominated credit card fraud detection since late 80’s – Random forest, tree ensembles often used in other kinds of fraud and churn models • The reason is rule-based systems simply don’t work – Well, they do work at first – Fraudsters change tactics, you add rules, interaction mayhem ensues • And learning algorithms really do work – Fraudsters change tactics, you add features and retrain
  • 7. © 2014 MapR Technologies 7 So learning is good
  • 8. © 2014 MapR Technologies 8 So learning is good But good learning is hard
  • 9. © 2014 MapR Technologies 9 So learning is good But good learning is hard And finding good features is really hard
  • 10. © 2014 MapR Technologies 10 Some Sample Features • Charge size relative to previous averages for card • Charge size relative to previous average for merchant • Known merchant or not • Doubled transaction • AVS or CVV2 mismatch
  • 11. © 2014 MapR Technologies 11 Some Sample Features • Charge size relative to previous averages for card • Charge size relative to previous average for merchant • Known merchant or not • Doubled transaction • Address Verification System or CVV2 mismatch
  • 12. © 2014 MapR Technologies 12 Some Sample Features • Charge size relative to previous averages for card • Charge size relative to previous average for merchant • Known merchant or not • Doubled transaction • Address Verification System or Card Verification Value mismatch
  • 13. © 2014 MapR Technologies 13 Some Sample Features • Charge size relative to previous averages for card • Charge size relative to previous average for merchant • Known merchant or not • Doubled transaction • Address Verification System or Card Verification Value mismatch • Unusual region for card • Unusual time-of-day relative to history • Magstripe use if chip available • (hundreds more)
  • 14. © 2014 MapR Technologies 14 Sequence Based Features • Plausible pattern matching (rent a car, pay for gas at airport) • Probe transactions (gas in wrong place, pizza, big charge) • Previous transaction at compromised merchant • Card velocity
  • 15. © 2014 MapR Technologies 15 Key Problems • Good guys need data … that means that fraudsters get first chance at bat • Good guys are careful and test systems before releasing • Bad guys have many low-risk transactions and can change methods quickly • In some areas, fraudster adapt techniques in hours
  • 16. © 2014 MapR Technologies 16 Making up features is easy Finding features that add real lift is very hard
  • 17. © 2014 MapR Technologies 17 What are neural networks and deep learning? • Start simple … imagine we have 20 features, 0 or 1 – Let’s yell “Fraud” if any of the features is a 1 – Houston, we have a model • But this model isn’t any better than a rule • Also doesn’t have any interesting Greek letters
  • 18. © 2014 MapR Technologies 18 Real-world Intrudes • We assumed all features are equally good – What if some are kind of poor or weak? • Can we weight different features more or less? – Can we learn these weights from data?
  • 19. © 2014 MapR Technologies 19 Real-world Intrudes • We assumed all features are equally good – What if some are kind of poor or weak? • Can we weight different features more or less? – Can we learn these weights from data?
  • 20. © 2014 MapR Technologies 20 Learning Works • Yes. We can learn these models • How we measure error is important • We must have good features • Even good features may need transformation – Take logs of times and monetary values – Subtract means, scale, bin values
  • 21. © 2014 MapR Technologies 21 Not Good Enough • We need combinations of models • Simple linear combinations aren’t subtle enough • Enter multi-level models – Can we learn a model that uses combinations of inputs – Where each of those combinations is a model that we learn?
  • 22. © 2014 MapR Technologies 22 Yes, Virginia, There IS a Santa Claus Each circle is a sum and a (soft) threshold Arrows are multiplication by a learned weight
  • 23. © 2014 MapR Technologies 23 Errors on Output Can Propagate Each circle is sends error to each arrow Arrows weight back- propagating errors Inputs Hidden layer
  • 24. © 2014 MapR Technologies 24 Success! Triumph! World domination!
  • 25. © 2014 MapR Technologies 25 World domination! With some reservations because features are hard
  • 26. © 2014 MapR Technologies 26 Turtles All the Way Down – We Wish • This learning works well for just a few layers • This is still a big deal … – with cool features, we can build real systems • With many layers, the learning no longer converges • Well … until recently
  • 27. © 2014 MapR Technologies 27 Model Learning in an Ideal World • If we could just learn the features – Maybe unsupervised, maybe supervised – And at the same time learn the model • Presumably we could build models quicker • And more easily • And we wouldn’t have to dirty our minds with pedestrian domain knowledge
  • 28. © 2014 MapR Technologies 28 Example 1 – (not very) Deep Auto-encoder • Let’s take an example where we can learn features • Data is EKG traces • We want to find anomalies – No supervised training
  • 29. © 2014 MapR Technologies 29 Spot the Anomaly Anomaly?
  • 30. © 2014 MapR Technologies 30 Maybe not!
  • 31. © 2014 MapR Technologies 31 Where’s Waldo? This is the real anomaly
  • 32. © 2014 MapR Technologies 32 Normal Isn’t Just Normal • What we want is a model of what is normal • What doesn’t fit the model is the anomaly • For simple signals, the model can be simple … • The real world is rarely so accommodating x ~ m(t)+ N(0,e)
  • 33. © 2014 MapR Technologies 33 We Do Windows
  • 34. © 2014 MapR Technologies 34 We Do Windows
  • 35. © 2014 MapR Technologies 35 We Do Windows
  • 36. © 2014 MapR Technologies 36 We Do Windows
  • 37. © 2014 MapR Technologies 37 We Do Windows
  • 38. © 2014 MapR Technologies 38 We Do Windows
  • 39. © 2014 MapR Technologies 39 We Do Windows
  • 40. © 2014 MapR Technologies 40 We Do Windows
  • 41. © 2014 MapR Technologies 41 We Do Windows
  • 42. © 2014 MapR Technologies 42 Windows on the World • The set of windowed signals is a nice model of our original signal • Clustering can find the prototypes – Fancier techniques available using sparse coding • The result is a dictionary of shapes • New signals can be encoded by shifting, scaling and adding shapes from the dictionary
  • 43. © 2014 MapR Technologies 43 Most Common Shapes (for EKG)
  • 44. © 2014 MapR Technologies 44 Reconstructed signal Original signal Reconstructed signal Reconstruction error < 1 bit / sample
  • 45. © 2014 MapR Technologies 45 An Anomaly Original technique for finding 1-d anomaly works against reconstruction error
  • 46. © 2014 MapR Technologies 46 Close-up of anomaly Not what you want your heart to do. And not what the model expects it to do.
  • 47. © 2014 MapR Technologies 47 A Different Kind of Anomaly
  • 48. © 2014 MapR Technologies 48 Some k-means Caveats • But Eamonn Keogh says that k-means can’t work on time-series • That is silly … and kind of correct, k-means does have limits – Other kinds of auto-encoders are much more powerful • More fun and code demos at – https://github.com/tdunning/k-means-auto-encoder http://www.cs.ucr.edu/~eamonn/meaningless.pdf
  • 49. © 2014 MapR Technologies 49 The Limits of Clustering as Auto-encoder • Clustering is like trying to tile your sample distribution • Can be used to approximate a signal • Filling d dimensional region with k clusters should give • If d is large, this is no good e » 1/ kd
  • 50. © 2014 MapR Technologies 50 0 500 1000 1500 2000 −2−1012 Time series training data (first 2000 samples) Time Test data Reconstruction Error
  • 51. © 2014 MapR Technologies 51 0 500 1000 1500 2000 0.000.050.100.15 Reconstruction error for time−series data Centroids MAVError Training data Held−out data
  • 52. © 2014 MapR Technologies 52 Moral For Auto-encoders • The simplest auto-encoders can be good models • For more complex spaces/signals, more elaborate models may be required – Winner take (absolutely) all may be problematic – In particular, models that allow sparse linear combination may be better • Consider deep learning, recurrent networks, denoising
  • 53. © 2014 MapR Technologies 53 How Does Clustering Do Reconstruction? x1 x2 ... xn-1 xn Input For normalized cluster centroids, dot-product and distance are equivalent
  • 54. © 2014 MapR Technologies 54 How Does Clustering Do Reconstruction? x1 x2 ... xn-1 xn Input Winner takes all with k-means
  • 55. © 2014 MapR Technologies 55 How Does Clustering Do Reconstruction? x1 x2 ... xn-1 xn x'1 x'2 ... x'n-1 x'n Input Hidden layer (clusters) Reconstruction Dot-product scales centroid to reconstruct
  • 56. © 2014 MapR Technologies 56 AKA - Neural Network x1 x2 ... xn-1 xn x'1 x'2 ... x'n-1 x'n Input Hidden layer (clusters) Reconstruction
  • 57. © 2014 MapR Technologies 57 What If … We Had More Layers? ... ... ... ... ... ... ... ... ... ... ... ... ... ... A B A'
  • 58. © 2014 MapR Technologies 58 Other Thoughts • What if we allow more than one cluster to be active? – k-sparse learning!
  • 59. © 2014 MapR Technologies 59 Other Thoughts • What if we allow more than one cluster to be active? – k-sparse learning!
  • 60. © 2014 MapR Technologies 60 Other Thoughts • What if we allow more than one cluster to be active? – k-sparse learning!
  • 61. © 2014 MapR Technologies 61 Other Thoughts • What if we allow more than one cluster to be active? – k-sparse learning! • Well, almost
  • 62. © 2014 MapR Technologies 62 The Point of Deep Learning • It isn’t just many hidden layers in a neural network • The goal is to eliminate feature engineering by learning features as well as the classifier
  • 63. © 2014 MapR Technologies 63 Experiment 3 – Card Velocity • Most features so far are inherent in the data • Few are true sequence features • Card velocity is a pure combination – Starting point can be anywhere – The issue is where the next point is relative to starting point
  • 64. © 2014 MapR Technologies 64 Card Velocity Non-fraud steps are reasonable in terms of distance and time Fraudulent use of card by multiple attackers results in big, fast jumps
  • 65. © 2014 MapR Technologies 65 Synthetic Data Example • Generate random point • Take four small steps • If fraud, second step can be large • Result is five positions, each in 3-d on surface of a sphere – Data shape is N x (5 x 3) • Add secondary features containing step size … N x 4
  • 66. © 2014 MapR Technologies 66 The Truth is Out There • With the right feature (step-size), it is trivial to spot the fraud • Here we show the step size between positions • Fraud cases take a big jump that others don’t • But they can be anywhere
  • 67. © 2014 MapR Technologies 67 But Dimensionality Bites Hard • With the step-size feature, learning succeeds instantly with the simplest models and gets nearly perfect accuracy • Without the step-size feature, learning with TensorFlow gets modest accuracy after substantial learning cost (work in progress, could do better with lots more tuning) • The problem is that there are two many combinations of 15 variables, we need a very specific combination of three pair-wise diffs combined non-linearly into a distance
  • 68. © 2014 MapR Technologies 68 104 105 106 1 0 0.2 0.4 0.6 0.8 Data Size AUCorPrecision AUC Precision
  • 69. © 2014 MapR Technologies 69 We have a bona fide revolution But old tricks still pay
  • 70. © 2014 MapR Technologies 70 Greenfield Problem Landscape
  • 71. © 2014 MapR Technologies 71 Mature Problem Landscape
  • 72. © 2014 MapR Technologies 72 Summary • There is too much to say in 40 minutes, let’s talk some more at the MapR booth • Deep learning, especially with systems like TensorFlow have huge promise • Deep learning trades learning architecture engineering for feature engineering • There are powerful middle grounds
  • 73. © 2014 MapR Technologies 73
  • 74. © 2014 MapR Technologies 74 Short Books by Ted Dunning & Ellen Friedman • Published by O’Reilly in 2014 - 2016 • For sale from Amazon or O’Reilly • Free e-books currently available courtesy of MapR http://bit.ly/ebook-real- world-hadoop http://bit.ly/mapr-tsdb- ebook http://bit.ly/ebook- anomaly http://bit.ly/recommend ation-ebook
  • 75. © 2014 MapR Technologies 75 Streaming Architecture by Ted Dunning and Ellen Friedman © 2016 (published by O’Reilly) Free copies at book signing today http://bit.ly/mapr-ebook-streams
  • 76. © 2014 MapR Technologies 76 Thank You!
  • 77. © 2014 MapR Technologies 77 Q&A @mapr maprtech [email protected] Engage with us! MapR maprtech mapr-technologies