SlideShare a Scribd company logo
© 2016 MapR Technologies 1© 2014 MapR Technologies
© 2016 MapR Technologies 2
Contact Information
Ted Dunning
Chief Applications Architect at MapR Technologies
Committer & PMC for Apache’s Drill, Zookeeper & others
VP of Incubator at Apache Foundation
Email tdunning@apache.org tdunning@maprtech.com
Twitter @ted_dunning
Hashtags today: #hs16dublin #mapr
© 2016 MapR Technologies 3
Agenda
• What’s this persistent threat stuff?
– What attackers do
– How they do it
• Examples
• Sequence statistics
– Really geeking with gas now!
• Detection techniques
• Specifics
• Summary
© 2016 MapR Technologies 4
Agenda of All Security Talks
• Terror
• Faint hope
• More terror
• Practical suggestions
• Summary
© 2016 MapR Technologies 5
Operation Ababil – Brobots on Parade
• Dork attack to find unpatched default Joomla sites
– Especially web servers with high bandwidth connections
– Basically just Google searches for default strings
– Joomla compromised into attack Brobot
• C&C network checks in occasionally
– Note C&C is incoming request and looks like normal web requests
• Later, on command, multiple Brobots direct 50-75 Gb/s of attack
– Attacks come from white-listed sites
© 2016 MapR Technologies 6
Attack Sequence
Source
First level
C&C
Second
level C&C
© 2016 MapR Technologies 7
Google
Attack Sequence
Source
First level
C&C
Second
level C&C
© 2016 MapR Technologies 8
Brobot
Brobot
Brobot
Attack Sequence
Source
First level
C&C
Second
level C&C
© 2016 MapR Technologies 9
Target
Brobot
Brobot
Brobot
Attack Sequence
Source
First level
C&C
Second
level C&C
© 2016 MapR Technologies 10
Outline of an Advanced Persistent Threat
• Advanced
– Common use of zero-day for preliminary attacks
– Often attributed to state-level actors
– Modern privateers blur the line
• Persistent
– Result of first attack is heavily muffled, no immediate exploit
– Remote access toolset installed (RAT)
• Threat
– On command, data is exfiltrated covertly or en masse
– Or the compromised host is used for other nefarious purpose
© 2016 MapR Technologies 11
APT in Summary
• Attack, penetrate, pivot, exfiltrate or exploit
• If you are a high-value target, attack is likely and stealthy
– High-value = telecom, banks, utilities, retail targets, web100
– … and all their vendors
– Conventional multi-factor auth is easily breached
• Penetration and pivot are critical counter-measure opportunities
– In 2010, RAT would contact command and control (C&C)
– In 2016, C&C looks like normal traffic
• Once exfiltration or exploit starts, you may no longer have a
business
© 2016 MapR Technologies 12
So are we totally screwed?
© 2016 MapR Technologies 13
So are we totally screwed?
Not entirely!
© 2016 MapR Technologies 14
Event Sequences Provide Clues
• Event sequence appear in many places
• Headers
– Header types, ordering in requests
• IP address accesses
– Source and destination, sequences of either
• TLS options
– Which options, which values, which algorithms
• Incoming component request ordering and timing
– Body first, CSS, scripts and images next
– But which are cached, what is round-trip time?
© 2016 MapR Technologies 15
Sequences and Cooccurrences
• All of these characteristics form symbolic sequences
• Current systems use hand-crafted rules about particular state
– But hand-crafting depends on human knowledge
• We can do much, much better by considering cooccurrence and
ordering of symbols in these sequences
• Log-likelihood ratio test (jargon alert) is a key tool
© 2016 MapR Technologies 16
A core technique
• Many of these easy problems reduce to finding interesting
coincidences
• This can be summarized as a 2 x 2 table
• Actually, many of these tables
A Other
B k11 k12
Other k21 k22
© 2016 MapR Technologies 17
How do you do that?
• This is well handled using G-test
– See wikipedia
– See http://bit.ly/surprise-and-coincidence
• Original application in linguistics now cited > 2000 times
• Available in ElasticSearch, in Solr, in Mahout
• Available in R, C, Java, Python
© 2016 MapR Technologies 18
Which one is the anomalous co-occurrence?
A not A
B 13 1000
not B 1000 100,000
A not A
B 1 0
not B 0 10,000
A not A
B 10 0
not B 0 100,000
A not A
B 1 0
not B 0 2
© 2016 MapR Technologies 19
Which one is the anomalous co-occurrence?
A not A
B 13 1000
not B 1000 100,000
A not A
B 1 0
not B 0 10,000
A not A
B 10 0
not B 0 100,000
A not A
B 1 0
not B 0 2
0.90 1.95
4.52 14.3
Dunning Ted, Accurate Methods for the Statistics of Surprise and Coincidence,
Computational Linguistics vol 19 no. 1 (1993)
© 2016 MapR Technologies 20
How to Count (header-like documents)
For each “document”:
For each “word” A:
left[A]++
For each “word” B after that (within window):
count[A,B]++
right[B]++
total++
© 2016 MapR Technologies 21
• We wanted this 2 x 2 table for each A,B
• But we only counted k11 directly
• But we did count
k*1 = k11 + k21 (how many A’s we saw)
k1* = k11 + k12 (how many B’s we saw)
k** = k11 + k21 + k12 + k22 (how many pairs in total)
A Other
B k11 k12
Other k21 k22
© 2016 MapR Technologies 22
How to Count (continued)
Map<PriorityQueue> queue
for each pair (A,B)
k11 = count[A,B]
k1x = left[A]
kx1 = right[B]
kxx = total
k12 = k1x - k11
k21 = kx2 - k11
k22 = kxx - k11 - k12 - k21
queue.add(A, (LLR(k11,k12,k21,k22), B))
© 2016 MapR Technologies 23
How to Count (cooccurrence)
for each (C,B)=(“context”, “word”):
if (!filter(C) && !filter(B)):
right[B]++
for each A in history(C):
count[A,B]++
left[A]++
history(C) += B
total++
© 2016 MapR Technologies 24
Seriously...
It really can be that simple
© 2016 MapR Technologies 25
Basic techniques
• Counting – often the hardest part
• LLR – the basic tool
• Order models
– Ordered cooccurrences
– Transition probabilities
– Recurrent neural networks
• Ploughing a quiet field
– Reimage servers often
– Force attackers to pivot repeatedly
© 2016 MapR Technologies 26
Target
Brobot
Brobot
Brobot
Example 1 - Ababil
Source
First level
C&C
Second
level C&C
Defense has to
happen here
© 2016 MapR Technologies 27
Spot the Important Difference?
GET /personal/comparison-table.jsp?iODg2OQ=51a90 HTTP/1.1
Host: www.sometarget.com
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;)
Accept-Encoding: deflate
Accept-Charset: UTF-8
Accept-Language: fr
Cache-Control: no-cache
Pragma: no-cache
Connection: Keep-Alive
GET /photo.jpg HTTP/1.1
Host: lh4.googleusercontent.com
User-Agent: Mozilla/5.0 (Macint
Accept: image/png,image/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate,
Referer: https://www.google.com
Connection: keep-alive
If-None-Match: "v9”
Cache-Control: max-age=0
Attacker request Real request
© 2016 MapR Technologies 28
Spot the Important Difference?
GET /personal/comparison-table.jsp?iODg2OQ=51a90 HTTP/1.1
Host: www.sometarget.com
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;)
Accept-Encoding: deflate
Accept-Charset: UTF-8
Accept-Language: fr
Cache-Control: no-cache
Pragma: no-cache
Connection: Keep-Alive
GET /photo.jpg HTTP/1.1
Host: lh4.googleusercontent.com
User-Agent: Mozilla/5.0 (Macint
Accept: image/png,image/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate,
Referer: https://www.google.com
Connection: keep-alive
If-None-Match: "v9”
Cache-Control: max-age=0
Attacker request Real request
© 2016 MapR Technologies 29
This could only be found at scale
© 2016 MapR Technologies 30
Target
Brobot
Brobot
Brobot
Overall Outline Again
Source
First level
C&C
Second
level C&C
Tradecraft error!
© 2016 MapR Technologies 31
Large corpus analysis of source
IP’s wins big
© 2016 MapR Technologies 32
© 2016 MapR Technologies 33
Example 2 - Common Point of Compromise
• Scenario:
– Merchant 0 is compromised, leaks account data during compromise
– Fraud committed elsewhere during exploit
– High background level of fraud
– Limited detection rate for exploits
• Goal:
– Find merchant 0
• Meta-goal:
– Screen algorithms for this task without leaking sensitive data
© 2016 MapR Technologies 34
Example 2 - Common Point of Compromise
skim exploit
Merchant 0
Skimmed
data
Merchant n
Card data is stolen
from Merchant 0
That data is used
in frauds at other
merchants
© 2016 MapR Technologies 35
Simulation Setup
0 20 40 60 80 100
0100300500
day
count
Compromise period
Exploit period
compromises
frauds
© 2016 MapR Technologies 36
Simulation Strategy
• For each consumer
– Pick consumer parameters such as transaction rate, preferences
– Generate transactions until end of sim-time
• If merchant 0 during compromise time, possibly mark as compromised
• For all transactions, possible mark as fraud, probability depends on history
• Merchants are selected using hierarchical Pittman-Yor
• Restate data
– Flatten transaction streams
– Sort by time
• Tunables
– Compromise probability, transaction rates, background fraud, detection
probability
© 2016 MapR Technologies 37
© 2016 MapR Technologies 38
●●●●●●●●●●●●●●●●●●●● ● ●● ●●● ●●● ●●●●● ●●●●● ●●● ●●● ●● ● ●● ●● ●● ● ●●●● ●●●● ●● ●●●● ●●●● ●●● ●● ●● ● ●● ● ●●●● ●● ● ●●●● ●●●●●● ●● ●● ●●● ●●● ●●●●● ● ●●● ●● ●●● ●●● ●● ●●●● ●
●● ●●● ●●● ●
●
● ●●
●
●
●
●●
020406080
LLR score for real data
Number of Merchants
BreachScore(LLR)
Real truly bad guys
100
101
102
103
104
105
106
Really truly bad guys
© 2016 MapR Technologies 39
Historical cooccurrence gives high
S/N
© 2016 MapR Technologies 40
Summary
• The world can be seen as sequences of symbols
• We can find patterns
• Those patterns can nail opponents
• Many patterns only appear at scale
• You can do this
© 2016 MapR Technologies 41
© 2016 MapR Technologies 42
Short Books by Ted Dunning & Ellen Friedman
• Published by O’Reilly in 2014 and 2015
• For sale from Amazon or O’Reilly
• Free e-books currently available courtesy of MapR
http://bit.ly/ebook-real-
world-hadoop
http://bit.ly/mapr-tsdb-
ebook
http://bit.ly/ebook-
anomaly
http://bit.ly/recommend
ation-ebook
© 2016 MapR Technologies 43
Streaming Architecture
by Ted Dunning and Ellen Friedman © 2016 (published by O’Reilly)
Free copies at book
signing today
(oops… that was earlier)
http://bit.ly/mapr-ebook-streams
© 2016 MapR Technologies 44
Thank You!
© 2016 MapR Technologies 45
Q&A
@mapr maprtech
tdunning@mapr.tech.com
Engage with us!
MapR
maprtech
mapr-technologies
Ad

More Related Content

Similar to Using Sequence Statistics to Fight Advanced Persistent Threats (20)

Ted Dunning, Chief Application Architect, MapR at MLconf ATL - 9/18/15
Ted Dunning, Chief Application Architect, MapR at MLconf ATL - 9/18/15Ted Dunning, Chief Application Architect, MapR at MLconf ATL - 9/18/15
Ted Dunning, Chief Application Architect, MapR at MLconf ATL - 9/18/15
MLconf
 
Sharing Sensitive Data Securely
Sharing Sensitive Data SecurelySharing Sensitive Data Securely
Sharing Sensitive Data Securely
Ted Dunning
 
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Codemotion
 
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look for
Ted Dunning
 
Buzz Words Dunning Multi Modal Recommendations
Buzz Words Dunning Multi Modal RecommendationsBuzz Words Dunning Multi Modal Recommendations
Buzz Words Dunning Multi Modal Recommendations
MapR Technologies
 
Buzz words-dunning-multi-modal-recommendation
Buzz words-dunning-multi-modal-recommendationBuzz words-dunning-multi-modal-recommendation
Buzz words-dunning-multi-modal-recommendation
Ted Dunning
 
The Keys to Digital Transformation
The Keys to Digital TransformationThe Keys to Digital Transformation
The Keys to Digital Transformation
MapR Technologies
 
Артем Гавриченков "The Dark Side of Things: Distributed Denial of Service Att...
Артем Гавриченков "The Dark Side of Things: Distributed Denial of Service Att...Артем Гавриченков "The Dark Side of Things: Distributed Denial of Service Att...
Артем Гавриченков "The Dark Side of Things: Distributed Denial of Service Att...
Tanya Denisyuk
 
Handling the Extremes: Scaling and Streaming in Finance
Handling the Extremes: Scaling and Streaming in FinanceHandling the Extremes: Scaling and Streaming in Finance
Handling the Extremes: Scaling and Streaming in Finance
MapR Technologies
 
Ted Dunning, Chief Application Architect, MapR at MLconf SF
Ted Dunning, Chief Application Architect, MapR at MLconf SFTed Dunning, Chief Application Architect, MapR at MLconf SF
Ted Dunning, Chief Application Architect, MapR at MLconf SF
MLconf
 
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
MapR Technologies
 
Spoofing and Denial of Service: A risk to the decentralized Internet
Spoofing and Denial of Service: A risk to the decentralized InternetSpoofing and Denial of Service: A risk to the decentralized Internet
Spoofing and Denial of Service: A risk to the decentralized Internet
APNIC
 
DDoS And Spoofing, a risk to the decentralized internet
DDoS And Spoofing, a risk to the decentralized internetDDoS And Spoofing, a risk to the decentralized internet
DDoS And Spoofing, a risk to the decentralized internet
Tom Paseka
 
Polyvalent Recommendations
Polyvalent RecommendationsPolyvalent Recommendations
Polyvalent Recommendations
MapR Technologies
 
Dunning ml-conf-2014
Dunning ml-conf-2014Dunning ml-conf-2014
Dunning ml-conf-2014
MapR Technologies
 
CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016
Mathieu Dumoulin
 
How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications
MapR Technologies
 
Streaming in the Extreme
Streaming in the ExtremeStreaming in the Extreme
Streaming in the Extreme
Julius Remigio, CBIP
 
AktaionPPTv5_JZedits
AktaionPPTv5_JZeditsAktaionPPTv5_JZedits
AktaionPPTv5_JZedits
Rod Soto
 
Threat Hunting for Command and Control Activity
Threat Hunting for Command and Control ActivityThreat Hunting for Command and Control Activity
Threat Hunting for Command and Control Activity
Sqrrl
 
Ted Dunning, Chief Application Architect, MapR at MLconf ATL - 9/18/15
Ted Dunning, Chief Application Architect, MapR at MLconf ATL - 9/18/15Ted Dunning, Chief Application Architect, MapR at MLconf ATL - 9/18/15
Ted Dunning, Chief Application Architect, MapR at MLconf ATL - 9/18/15
MLconf
 
Sharing Sensitive Data Securely
Sharing Sensitive Data SecurelySharing Sensitive Data Securely
Sharing Sensitive Data Securely
Ted Dunning
 
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Codemotion
 
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look for
Ted Dunning
 
Buzz Words Dunning Multi Modal Recommendations
Buzz Words Dunning Multi Modal RecommendationsBuzz Words Dunning Multi Modal Recommendations
Buzz Words Dunning Multi Modal Recommendations
MapR Technologies
 
Buzz words-dunning-multi-modal-recommendation
Buzz words-dunning-multi-modal-recommendationBuzz words-dunning-multi-modal-recommendation
Buzz words-dunning-multi-modal-recommendation
Ted Dunning
 
The Keys to Digital Transformation
The Keys to Digital TransformationThe Keys to Digital Transformation
The Keys to Digital Transformation
MapR Technologies
 
Артем Гавриченков "The Dark Side of Things: Distributed Denial of Service Att...
Артем Гавриченков "The Dark Side of Things: Distributed Denial of Service Att...Артем Гавриченков "The Dark Side of Things: Distributed Denial of Service Att...
Артем Гавриченков "The Dark Side of Things: Distributed Denial of Service Att...
Tanya Denisyuk
 
Handling the Extremes: Scaling and Streaming in Finance
Handling the Extremes: Scaling and Streaming in FinanceHandling the Extremes: Scaling and Streaming in Finance
Handling the Extremes: Scaling and Streaming in Finance
MapR Technologies
 
Ted Dunning, Chief Application Architect, MapR at MLconf SF
Ted Dunning, Chief Application Architect, MapR at MLconf SFTed Dunning, Chief Application Architect, MapR at MLconf SF
Ted Dunning, Chief Application Architect, MapR at MLconf SF
MLconf
 
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
MapR Technologies
 
Spoofing and Denial of Service: A risk to the decentralized Internet
Spoofing and Denial of Service: A risk to the decentralized InternetSpoofing and Denial of Service: A risk to the decentralized Internet
Spoofing and Denial of Service: A risk to the decentralized Internet
APNIC
 
DDoS And Spoofing, a risk to the decentralized internet
DDoS And Spoofing, a risk to the decentralized internetDDoS And Spoofing, a risk to the decentralized internet
DDoS And Spoofing, a risk to the decentralized internet
Tom Paseka
 
CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016
Mathieu Dumoulin
 
How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications
MapR Technologies
 
AktaionPPTv5_JZedits
AktaionPPTv5_JZeditsAktaionPPTv5_JZedits
AktaionPPTv5_JZedits
Rod Soto
 
Threat Hunting for Command and Control Activity
Threat Hunting for Command and Control ActivityThreat Hunting for Command and Control Activity
Threat Hunting for Command and Control Activity
Sqrrl
 

More from DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
DataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
DataWorks Summit/Hadoop Summit
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit/Hadoop Summit
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
DataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
DataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
 
Ad

Recently uploaded (20)

Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Ad

Using Sequence Statistics to Fight Advanced Persistent Threats

  • 1. © 2016 MapR Technologies 1© 2014 MapR Technologies
  • 2. © 2016 MapR Technologies 2 Contact Information Ted Dunning Chief Applications Architect at MapR Technologies Committer & PMC for Apache’s Drill, Zookeeper & others VP of Incubator at Apache Foundation Email [email protected] [email protected] Twitter @ted_dunning Hashtags today: #hs16dublin #mapr
  • 3. © 2016 MapR Technologies 3 Agenda • What’s this persistent threat stuff? – What attackers do – How they do it • Examples • Sequence statistics – Really geeking with gas now! • Detection techniques • Specifics • Summary
  • 4. © 2016 MapR Technologies 4 Agenda of All Security Talks • Terror • Faint hope • More terror • Practical suggestions • Summary
  • 5. © 2016 MapR Technologies 5 Operation Ababil – Brobots on Parade • Dork attack to find unpatched default Joomla sites – Especially web servers with high bandwidth connections – Basically just Google searches for default strings – Joomla compromised into attack Brobot • C&C network checks in occasionally – Note C&C is incoming request and looks like normal web requests • Later, on command, multiple Brobots direct 50-75 Gb/s of attack – Attacks come from white-listed sites
  • 6. © 2016 MapR Technologies 6 Attack Sequence Source First level C&C Second level C&C
  • 7. © 2016 MapR Technologies 7 Google Attack Sequence Source First level C&C Second level C&C
  • 8. © 2016 MapR Technologies 8 Brobot Brobot Brobot Attack Sequence Source First level C&C Second level C&C
  • 9. © 2016 MapR Technologies 9 Target Brobot Brobot Brobot Attack Sequence Source First level C&C Second level C&C
  • 10. © 2016 MapR Technologies 10 Outline of an Advanced Persistent Threat • Advanced – Common use of zero-day for preliminary attacks – Often attributed to state-level actors – Modern privateers blur the line • Persistent – Result of first attack is heavily muffled, no immediate exploit – Remote access toolset installed (RAT) • Threat – On command, data is exfiltrated covertly or en masse – Or the compromised host is used for other nefarious purpose
  • 11. © 2016 MapR Technologies 11 APT in Summary • Attack, penetrate, pivot, exfiltrate or exploit • If you are a high-value target, attack is likely and stealthy – High-value = telecom, banks, utilities, retail targets, web100 – … and all their vendors – Conventional multi-factor auth is easily breached • Penetration and pivot are critical counter-measure opportunities – In 2010, RAT would contact command and control (C&C) – In 2016, C&C looks like normal traffic • Once exfiltration or exploit starts, you may no longer have a business
  • 12. © 2016 MapR Technologies 12 So are we totally screwed?
  • 13. © 2016 MapR Technologies 13 So are we totally screwed? Not entirely!
  • 14. © 2016 MapR Technologies 14 Event Sequences Provide Clues • Event sequence appear in many places • Headers – Header types, ordering in requests • IP address accesses – Source and destination, sequences of either • TLS options – Which options, which values, which algorithms • Incoming component request ordering and timing – Body first, CSS, scripts and images next – But which are cached, what is round-trip time?
  • 15. © 2016 MapR Technologies 15 Sequences and Cooccurrences • All of these characteristics form symbolic sequences • Current systems use hand-crafted rules about particular state – But hand-crafting depends on human knowledge • We can do much, much better by considering cooccurrence and ordering of symbols in these sequences • Log-likelihood ratio test (jargon alert) is a key tool
  • 16. © 2016 MapR Technologies 16 A core technique • Many of these easy problems reduce to finding interesting coincidences • This can be summarized as a 2 x 2 table • Actually, many of these tables A Other B k11 k12 Other k21 k22
  • 17. © 2016 MapR Technologies 17 How do you do that? • This is well handled using G-test – See wikipedia – See http://bit.ly/surprise-and-coincidence • Original application in linguistics now cited > 2000 times • Available in ElasticSearch, in Solr, in Mahout • Available in R, C, Java, Python
  • 18. © 2016 MapR Technologies 18 Which one is the anomalous co-occurrence? A not A B 13 1000 not B 1000 100,000 A not A B 1 0 not B 0 10,000 A not A B 10 0 not B 0 100,000 A not A B 1 0 not B 0 2
  • 19. © 2016 MapR Technologies 19 Which one is the anomalous co-occurrence? A not A B 13 1000 not B 1000 100,000 A not A B 1 0 not B 0 10,000 A not A B 10 0 not B 0 100,000 A not A B 1 0 not B 0 2 0.90 1.95 4.52 14.3 Dunning Ted, Accurate Methods for the Statistics of Surprise and Coincidence, Computational Linguistics vol 19 no. 1 (1993)
  • 20. © 2016 MapR Technologies 20 How to Count (header-like documents) For each “document”: For each “word” A: left[A]++ For each “word” B after that (within window): count[A,B]++ right[B]++ total++
  • 21. © 2016 MapR Technologies 21 • We wanted this 2 x 2 table for each A,B • But we only counted k11 directly • But we did count k*1 = k11 + k21 (how many A’s we saw) k1* = k11 + k12 (how many B’s we saw) k** = k11 + k21 + k12 + k22 (how many pairs in total) A Other B k11 k12 Other k21 k22
  • 22. © 2016 MapR Technologies 22 How to Count (continued) Map<PriorityQueue> queue for each pair (A,B) k11 = count[A,B] k1x = left[A] kx1 = right[B] kxx = total k12 = k1x - k11 k21 = kx2 - k11 k22 = kxx - k11 - k12 - k21 queue.add(A, (LLR(k11,k12,k21,k22), B))
  • 23. © 2016 MapR Technologies 23 How to Count (cooccurrence) for each (C,B)=(“context”, “word”): if (!filter(C) && !filter(B)): right[B]++ for each A in history(C): count[A,B]++ left[A]++ history(C) += B total++
  • 24. © 2016 MapR Technologies 24 Seriously... It really can be that simple
  • 25. © 2016 MapR Technologies 25 Basic techniques • Counting – often the hardest part • LLR – the basic tool • Order models – Ordered cooccurrences – Transition probabilities – Recurrent neural networks • Ploughing a quiet field – Reimage servers often – Force attackers to pivot repeatedly
  • 26. © 2016 MapR Technologies 26 Target Brobot Brobot Brobot Example 1 - Ababil Source First level C&C Second level C&C Defense has to happen here
  • 27. © 2016 MapR Technologies 27 Spot the Important Difference? GET /personal/comparison-table.jsp?iODg2OQ=51a90 HTTP/1.1 Host: www.sometarget.com User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;) Accept-Encoding: deflate Accept-Charset: UTF-8 Accept-Language: fr Cache-Control: no-cache Pragma: no-cache Connection: Keep-Alive GET /photo.jpg HTTP/1.1 Host: lh4.googleusercontent.com User-Agent: Mozilla/5.0 (Macint Accept: image/png,image/*;q=0.8 Accept-Language: en-US,en;q=0.5 Accept-Encoding: gzip, deflate, Referer: https://www.google.com Connection: keep-alive If-None-Match: "v9” Cache-Control: max-age=0 Attacker request Real request
  • 28. © 2016 MapR Technologies 28 Spot the Important Difference? GET /personal/comparison-table.jsp?iODg2OQ=51a90 HTTP/1.1 Host: www.sometarget.com User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;) Accept-Encoding: deflate Accept-Charset: UTF-8 Accept-Language: fr Cache-Control: no-cache Pragma: no-cache Connection: Keep-Alive GET /photo.jpg HTTP/1.1 Host: lh4.googleusercontent.com User-Agent: Mozilla/5.0 (Macint Accept: image/png,image/*;q=0.8 Accept-Language: en-US,en;q=0.5 Accept-Encoding: gzip, deflate, Referer: https://www.google.com Connection: keep-alive If-None-Match: "v9” Cache-Control: max-age=0 Attacker request Real request
  • 29. © 2016 MapR Technologies 29 This could only be found at scale
  • 30. © 2016 MapR Technologies 30 Target Brobot Brobot Brobot Overall Outline Again Source First level C&C Second level C&C Tradecraft error!
  • 31. © 2016 MapR Technologies 31 Large corpus analysis of source IP’s wins big
  • 32. © 2016 MapR Technologies 32
  • 33. © 2016 MapR Technologies 33 Example 2 - Common Point of Compromise • Scenario: – Merchant 0 is compromised, leaks account data during compromise – Fraud committed elsewhere during exploit – High background level of fraud – Limited detection rate for exploits • Goal: – Find merchant 0 • Meta-goal: – Screen algorithms for this task without leaking sensitive data
  • 34. © 2016 MapR Technologies 34 Example 2 - Common Point of Compromise skim exploit Merchant 0 Skimmed data Merchant n Card data is stolen from Merchant 0 That data is used in frauds at other merchants
  • 35. © 2016 MapR Technologies 35 Simulation Setup 0 20 40 60 80 100 0100300500 day count Compromise period Exploit period compromises frauds
  • 36. © 2016 MapR Technologies 36 Simulation Strategy • For each consumer – Pick consumer parameters such as transaction rate, preferences – Generate transactions until end of sim-time • If merchant 0 during compromise time, possibly mark as compromised • For all transactions, possible mark as fraud, probability depends on history • Merchants are selected using hierarchical Pittman-Yor • Restate data – Flatten transaction streams – Sort by time • Tunables – Compromise probability, transaction rates, background fraud, detection probability
  • 37. © 2016 MapR Technologies 37
  • 38. © 2016 MapR Technologies 38 ●●●●●●●●●●●●●●●●●●●● ● ●● ●●● ●●● ●●●●● ●●●●● ●●● ●●● ●● ● ●● ●● ●● ● ●●●● ●●●● ●● ●●●● ●●●● ●●● ●● ●● ● ●● ● ●●●● ●● ● ●●●● ●●●●●● ●● ●● ●●● ●●● ●●●●● ● ●●● ●● ●●● ●●● ●● ●●●● ● ●● ●●● ●●● ● ● ● ●● ● ● ● ●● 020406080 LLR score for real data Number of Merchants BreachScore(LLR) Real truly bad guys 100 101 102 103 104 105 106 Really truly bad guys
  • 39. © 2016 MapR Technologies 39 Historical cooccurrence gives high S/N
  • 40. © 2016 MapR Technologies 40 Summary • The world can be seen as sequences of symbols • We can find patterns • Those patterns can nail opponents • Many patterns only appear at scale • You can do this
  • 41. © 2016 MapR Technologies 41
  • 42. © 2016 MapR Technologies 42 Short Books by Ted Dunning & Ellen Friedman • Published by O’Reilly in 2014 and 2015 • For sale from Amazon or O’Reilly • Free e-books currently available courtesy of MapR http://bit.ly/ebook-real- world-hadoop http://bit.ly/mapr-tsdb- ebook http://bit.ly/ebook- anomaly http://bit.ly/recommend ation-ebook
  • 43. © 2016 MapR Technologies 43 Streaming Architecture by Ted Dunning and Ellen Friedman © 2016 (published by O’Reilly) Free copies at book signing today (oops… that was earlier) http://bit.ly/mapr-ebook-streams
  • 44. © 2016 MapR Technologies 44 Thank You!
  • 45. © 2016 MapR Technologies 45 Q&A @mapr maprtech [email protected] Engage with us! MapR maprtech mapr-technologies