SlideShare a Scribd company logo
Risk-Based Attack Surface Approximation:
How Much Data is Enough?
Chris Theisen, Brendan Murphy, Kim Herzig, Laurie Williams
North Carolina State University
Microsoft Research
Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]
Introduction
What is the “Attack Surface”? Quoting the Open Web Application
Security Project…
• All paths for data and commands in a software system
• The data that travels these paths
• The code that implements and protects both
Concept used for security effort prioritization.
3
Introduction | Background | Methodology | Results | Conclusion
4
Crashes represent activity that put the system under
stress.
Stack Traces tell us what happened.
foo!foobarDeviceQueueRequest+0x68
foo!fooDeviceSetup+0x72
foo!fooAllDone+0xA8
bar!barDeviceQueueRequest+0xB6
bar!barDeviceSetup+0x08
bar!barAllDone+0xFF
center!processAction+0x1034
center!dontDoAnything+0x1030
Risk-Based Attack Surface Approximation
(RASA)
Introduction | Background | Methodology | Results | Conclusion
• Previous RASA study used tens of millions of crashes.
• Previous study was per binary.
Previously…
5
[SEIP ‘15] Chris Theisen, Kim Herzig, Pat Morrison, Brendan Murphy, and Laurie Williams, “Approximating Attack Surfaces with Stack Traces”, in
Companion Proceedings of the 37th International Conference on Software Engineering (2015).
[SEIP ‘15] Crashes
%binaries 48.4%
%vulnerabilities 94.6%
Introduction | Background | Methodology | Results | Conclusion
• Previous RASA study used tens of millions of crashes.
• Previous study was per binary.
Previously…
6
[SEIP ‘15] Chris Theisen, Kim Herzig, Pat Morrison, Brendan Murphy, and Laurie Williams, “Approximating Attack Surfaces with Stack Traces”, in
Companion Proceedings of the 37th International Conference on Software Engineering (2015).
[SEIP ‘15] Crashes
%binaries 48.4%
%vulnerabilities 94.6%
Great! All done, right?
Introduction | Background | Methodology | Results | Conclusion
Practitioner Problems
• Previous RASA study used tens of millions of crashes.
• Previous study was per binary.
7
Introduction | Background | Methodology | Results | Conclusion
Practitioner Problems
• Previous RASA study used tens of millions of crashes.
• Previous study was per binary.
• Practitioners had some issues with it…
– “Binary prioritization isn’t actionable.”
8
Introduction | Background | Methodology | Results | Conclusion
Practitioner Problems
• Previous RASA study used tens of millions of crashes.
• Previous study was per binary.
• Practitioners had some issues with it…
– “Binary prioritization isn’t actionable.”
– “We don’t have that much data!”
9
Introduction | Background | Methodology | Results | Conclusion
Practitioner Problems
• Previous RASA study used tens of millions of crashes.
• Previous study was per binary.
• Practitioners had some issues with it…
– “Binary prioritization isn’t actionable.”
– “We don’t have that much data!”
– “We don’t store every crash we received, we don’t
see the value in that.”
10
Introduction | Background | Methodology | Results | Conclusion
Practitioner Problems
• Previous RASA study used tens of millions of crashes.
• Previous study was per binary.
• Practitioners had some issues with it…
– “Binary prioritization isn’t actionable.”
– “We don’t have that much data!”
– “We don’t store every crash we received, we don’t
see the value in that.”
– “We don’t have historical vulnerabilities to use as a
goodness measure.”
11
Introduction | Background | Methodology | Results | Conclusion
Research Questions
• RQ1: Can the RASA approach be implemented at the
source code file level with actionable results?
• RQ2: How does random sampling of crash dump stack
traces effect RASA?
12
Introduction | Background | Methodology | Results | Conclusion
Data Sources
• Mozilla Firefox
– ~1M crashes
– Vulnerability data from Mozilla Security
Blog and bug tracker
• Windows 8.1
– ~9M crashes
– Vulnerability data from internal data
sources
13
Introduction | Background | Methodology | Results | Conclusion
Methodology - RASA
14
Introduction | Background | Methodology | Results | Conclusion
Methodology - RASA
15
Introduction | Background | Methodology | Results | Conclusion
Methodology - RASA
16
Introduction | Background | Methodology | Results | Conclusion
Methodology - Sampling
17
10% of…
Introduction | Background | Methodology | Results | Conclusion
Methodology - Sampling
18
10% of…
20% of…
Introduction | Background | Methodology | Results | Conclusion
Methodology - Sampling
19
10% of…
20% of…
• Sample at each “level”
• Record stdev of files,
vulnerabilities covered
Introduction | Background | Methodology | Results | Conclusion
20
12%
13%
14%
15%
16%
17%
70%
71%
72%
73%
74%
75%
Random Sample Size
Introduction | Background | Methodology | Results | Conclusion
Files
Vulnerabilities
10%
12%
14%
16%
18%
20%
22%
24%
26%
30%
32%
34%
36%
38%
40%
42%
44%
46%
Random Sample Size
21
Introduction | Background | Methodology | Results | Conclusion
Files
Vulnerabilities
Why Does Sampling Work?
• Crashes tend not to happen in isolation.
– If something crashes once, it will likely crash again.
• For Firefox, only 6 files in the data set with a vulnerability
had only one crash occurrence.
– Against ~300 vulnerable files, 50,000 total files
• If foo.cpp crashes many times, random sampling unlikely
to remove all foo.cpp’s from the dataset.
22
Introduction | Background | Methodology | Results | Conclusion
Future Work
• We have a list of vulnerable files; now what?
– Further prioritization to assist developers.
• We’re looking at:
– How the attack surface changes over time.
– How the complexity of the attack surface predicts
vulnerabilities.
– How proximity to the boundary of a software
system predicts vulnerabilities.
23
Introduction | Background | Methodology | Results | Conclusion
Conclusions
• “Binary prioritization isn’t actionable.”
– RASA can prioritize security effort effectively at the
source code file level.
24
Introduction | Background | Methodology | Results | Conclusion
Conclusions
• “Binary prioritization isn’t actionable.”
– RASA can prioritize security effort effectively at the
source code file level.
• “We don’t have that much data!”
– Orders of magnitude less data required compared
to previous studies.
25
Introduction | Background | Methodology | Results | Conclusion
Conclusions
• “We don’t store every crash we received, we don’t see
the value in that.”
– A naïve approach like random sampling still works.
26
Introduction | Background | Methodology | Results | Conclusion
Conclusions
• “We don’t store every crash we received, we don’t see
the value in that.”
– A naïve approach like random sampling still works.
• “We don’t have historical vulnerabilities to use as a
goodness measure.”
– Satisfied previous complaints with less data, naïve
sampling; evidence it will work on new systems.
27
Introduction | Background | Methodology | Results | Conclusion
28
foo!foobarDeviceQueueRequest+0x68
foo!fooDeviceSetup+0x72
foo!fooAllDone+0xA8
bar!barDeviceQueueRequest+0xB6
bar!barDeviceSetup+0x08
bar!barAllDone+0xFF
crtheise@ncsu.edu
@theisencr
theisencr.github.io
Expected Graduation: May 2018
Data Science, Security Analytics,
Security Education
Ad

More Related Content

Similar to Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017] (20)

Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
confluent
 
Using Static Binary Analysis To Find Vulnerabilities And Backdoors in Firmware
Using Static Binary Analysis To Find Vulnerabilities And Backdoors in FirmwareUsing Static Binary Analysis To Find Vulnerabilities And Backdoors in Firmware
Using Static Binary Analysis To Find Vulnerabilities And Backdoors in Firmware
Lastline, Inc.
 
My flings with data analysis
My flings with data analysisMy flings with data analysis
My flings with data analysis
Venkatesh Prasad Ranganath
 
Software Analytics - Achievements and Challenges
Software Analytics - Achievements and ChallengesSoftware Analytics - Achievements and Challenges
Software Analytics - Achievements and Challenges
Tao Xie
 
Supporting image-based meta-analysis with NIDM: Standardized reporting of neu...
Supporting image-based meta-analysis with NIDM: Standardized reporting of neu...Supporting image-based meta-analysis with NIDM: Standardized reporting of neu...
Supporting image-based meta-analysis with NIDM: Standardized reporting of neu...
Camille Maumet
 
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
HostedbyConfluent
 
Building an application security program
Building an application security programBuilding an application security program
Building an application security program
Outpost24
 
Delivering Security Insights with Data Analytics and Visualization
Delivering Security Insights with Data Analytics and VisualizationDelivering Security Insights with Data Analytics and Visualization
Delivering Security Insights with Data Analytics and Visualization
Raffael Marty
 
BlackHat Presentation - Lies and Damn Lies: Getting past the Hype of Endpoint...
BlackHat Presentation - Lies and Damn Lies: Getting past the Hype of Endpoint...BlackHat Presentation - Lies and Damn Lies: Getting past the Hype of Endpoint...
BlackHat Presentation - Lies and Damn Lies: Getting past the Hype of Endpoint...
Mike Spaulding
 
Machine Learning Application Development
Machine Learning Application DevelopmentMachine Learning Application Development
Machine Learning Application Development
LARCA UPC
 
How to break web applications
How to break web applicationsHow to break web applications
How to break web applications
Dinis Cruz
 
Monitoring Application Attack Surface and Integrating Security into DevOps Pi...
Monitoring Application Attack Surface and Integrating Security into DevOps Pi...Monitoring Application Attack Surface and Integrating Security into DevOps Pi...
Monitoring Application Attack Surface and Integrating Security into DevOps Pi...
Denim Group
 
Experience Sharing on School Pentest Project
Experience Sharing on School Pentest ProjectExperience Sharing on School Pentest Project
Experience Sharing on School Pentest Project
eLearning Consortium 電子學習聯盟
 
501 ch 8 risk management tools
501 ch 8 risk management tools501 ch 8 risk management tools
501 ch 8 risk management tools
gocybersec
 
A Large-Scale Empirical Comparison of Static and DynamicTest Case Prioritizat...
A Large-Scale Empirical Comparison of Static and DynamicTest Case Prioritizat...A Large-Scale Empirical Comparison of Static and DynamicTest Case Prioritizat...
A Large-Scale Empirical Comparison of Static and DynamicTest Case Prioritizat...
Kevin Moran
 
Webinar: Five Problems Facing Business-Critical NFS Deployments
Webinar: Five Problems Facing Business-Critical NFS DeploymentsWebinar: Five Problems Facing Business-Critical NFS Deployments
Webinar: Five Problems Facing Business-Critical NFS Deployments
Storage Switzerland
 
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malware
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malwareDefcon 22-wesley-mc grew-instrumenting-point-of-sale-malware
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malware
DaveEdwards12
 
Policy 2012 presentation
Policy 2012 presentationPolicy 2012 presentation
Policy 2012 presentation
bdemchak
 
RISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time DecisionsRISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time Decisions
Jen Aman
 
RISELab: Enabling Intelligent Real-Time Decisions keynote by Ion Stoica
RISELab: Enabling Intelligent Real-Time Decisions keynote by Ion StoicaRISELab: Enabling Intelligent Real-Time Decisions keynote by Ion Stoica
RISELab: Enabling Intelligent Real-Time Decisions keynote by Ion Stoica
Spark Summit
 
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
confluent
 
Using Static Binary Analysis To Find Vulnerabilities And Backdoors in Firmware
Using Static Binary Analysis To Find Vulnerabilities And Backdoors in FirmwareUsing Static Binary Analysis To Find Vulnerabilities And Backdoors in Firmware
Using Static Binary Analysis To Find Vulnerabilities And Backdoors in Firmware
Lastline, Inc.
 
Software Analytics - Achievements and Challenges
Software Analytics - Achievements and ChallengesSoftware Analytics - Achievements and Challenges
Software Analytics - Achievements and Challenges
Tao Xie
 
Supporting image-based meta-analysis with NIDM: Standardized reporting of neu...
Supporting image-based meta-analysis with NIDM: Standardized reporting of neu...Supporting image-based meta-analysis with NIDM: Standardized reporting of neu...
Supporting image-based meta-analysis with NIDM: Standardized reporting of neu...
Camille Maumet
 
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
HostedbyConfluent
 
Building an application security program
Building an application security programBuilding an application security program
Building an application security program
Outpost24
 
Delivering Security Insights with Data Analytics and Visualization
Delivering Security Insights with Data Analytics and VisualizationDelivering Security Insights with Data Analytics and Visualization
Delivering Security Insights with Data Analytics and Visualization
Raffael Marty
 
BlackHat Presentation - Lies and Damn Lies: Getting past the Hype of Endpoint...
BlackHat Presentation - Lies and Damn Lies: Getting past the Hype of Endpoint...BlackHat Presentation - Lies and Damn Lies: Getting past the Hype of Endpoint...
BlackHat Presentation - Lies and Damn Lies: Getting past the Hype of Endpoint...
Mike Spaulding
 
Machine Learning Application Development
Machine Learning Application DevelopmentMachine Learning Application Development
Machine Learning Application Development
LARCA UPC
 
How to break web applications
How to break web applicationsHow to break web applications
How to break web applications
Dinis Cruz
 
Monitoring Application Attack Surface and Integrating Security into DevOps Pi...
Monitoring Application Attack Surface and Integrating Security into DevOps Pi...Monitoring Application Attack Surface and Integrating Security into DevOps Pi...
Monitoring Application Attack Surface and Integrating Security into DevOps Pi...
Denim Group
 
501 ch 8 risk management tools
501 ch 8 risk management tools501 ch 8 risk management tools
501 ch 8 risk management tools
gocybersec
 
A Large-Scale Empirical Comparison of Static and DynamicTest Case Prioritizat...
A Large-Scale Empirical Comparison of Static and DynamicTest Case Prioritizat...A Large-Scale Empirical Comparison of Static and DynamicTest Case Prioritizat...
A Large-Scale Empirical Comparison of Static and DynamicTest Case Prioritizat...
Kevin Moran
 
Webinar: Five Problems Facing Business-Critical NFS Deployments
Webinar: Five Problems Facing Business-Critical NFS DeploymentsWebinar: Five Problems Facing Business-Critical NFS Deployments
Webinar: Five Problems Facing Business-Critical NFS Deployments
Storage Switzerland
 
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malware
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malwareDefcon 22-wesley-mc grew-instrumenting-point-of-sale-malware
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malware
DaveEdwards12
 
Policy 2012 presentation
Policy 2012 presentationPolicy 2012 presentation
Policy 2012 presentation
bdemchak
 
RISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time DecisionsRISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time Decisions
Jen Aman
 
RISELab: Enabling Intelligent Real-Time Decisions keynote by Ion Stoica
RISELab: Enabling Intelligent Real-Time Decisions keynote by Ion StoicaRISELab: Enabling Intelligent Real-Time Decisions keynote by Ion Stoica
RISELab: Enabling Intelligent Real-Time Decisions keynote by Ion Stoica
Spark Summit
 

More from Chris Theisen (7)

Public Key Cryptosystems and RSA
Public Key Cryptosystems and RSAPublic Key Cryptosystems and RSA
Public Key Cryptosystems and RSA
Chris Theisen
 
Prioritizing Security Efforts with a Risk-Based Attack Surface Approximation
Prioritizing Security Efforts with a Risk-Based Attack Surface ApproximationPrioritizing Security Efforts with a Risk-Based Attack Surface Approximation
Prioritizing Security Efforts with a Risk-Based Attack Surface Approximation
Chris Theisen
 
Software Security Education at Scale
Software Security Education at ScaleSoftware Security Education at Scale
Software Security Education at Scale
Chris Theisen
 
Automated Attack Surface Approximation [FSE - SRC 2015]
Automated Attack Surface Approximation [FSE - SRC 2015]Automated Attack Surface Approximation [FSE - SRC 2015]
Automated Attack Surface Approximation [FSE - SRC 2015]
Chris Theisen
 
Attack Surface Analytics [ISSRE-DSW 15]
Attack Surface Analytics [ISSRE-DSW 15]Attack Surface Analytics [ISSRE-DSW 15]
Attack Surface Analytics [ISSRE-DSW 15]
Chris Theisen
 
Science of Security Industry Day - October 2015
Science of Security Industry Day - October 2015Science of Security Industry Day - October 2015
Science of Security Industry Day - October 2015
Chris Theisen
 
Approximating Attack Surfaces with Stack Traces [ICSE 15]
Approximating Attack Surfaces with Stack Traces [ICSE 15]Approximating Attack Surfaces with Stack Traces [ICSE 15]
Approximating Attack Surfaces with Stack Traces [ICSE 15]
Chris Theisen
 
Public Key Cryptosystems and RSA
Public Key Cryptosystems and RSAPublic Key Cryptosystems and RSA
Public Key Cryptosystems and RSA
Chris Theisen
 
Prioritizing Security Efforts with a Risk-Based Attack Surface Approximation
Prioritizing Security Efforts with a Risk-Based Attack Surface ApproximationPrioritizing Security Efforts with a Risk-Based Attack Surface Approximation
Prioritizing Security Efforts with a Risk-Based Attack Surface Approximation
Chris Theisen
 
Software Security Education at Scale
Software Security Education at ScaleSoftware Security Education at Scale
Software Security Education at Scale
Chris Theisen
 
Automated Attack Surface Approximation [FSE - SRC 2015]
Automated Attack Surface Approximation [FSE - SRC 2015]Automated Attack Surface Approximation [FSE - SRC 2015]
Automated Attack Surface Approximation [FSE - SRC 2015]
Chris Theisen
 
Attack Surface Analytics [ISSRE-DSW 15]
Attack Surface Analytics [ISSRE-DSW 15]Attack Surface Analytics [ISSRE-DSW 15]
Attack Surface Analytics [ISSRE-DSW 15]
Chris Theisen
 
Science of Security Industry Day - October 2015
Science of Security Industry Day - October 2015Science of Security Industry Day - October 2015
Science of Security Industry Day - October 2015
Chris Theisen
 
Approximating Attack Surfaces with Stack Traces [ICSE 15]
Approximating Attack Surfaces with Stack Traces [ICSE 15]Approximating Attack Surfaces with Stack Traces [ICSE 15]
Approximating Attack Surfaces with Stack Traces [ICSE 15]
Chris Theisen
 
Ad

Recently uploaded (20)

Classification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptxClassification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptx
wencyjorda88
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
ThanushsaranS
 
Calories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptxCalories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptx
TijiLMAHESHWARI
 
Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...
Pixellion
 
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your CompetitorsAI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
Contify
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
Defense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptxDefense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptx
Greg Makowski
 
FPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptxFPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptx
ssuser4ef83d
 
Flip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptxFlip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptx
mubashirkhan45461
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
Geometry maths presentation for begginers
Geometry maths presentation for begginersGeometry maths presentation for begginers
Geometry maths presentation for begginers
zrjacob283
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
Deloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit contextDeloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit context
Process mining Evangelist
 
Conic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptxConic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptx
taiwanesechetan
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
Classification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptxClassification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptx
wencyjorda88
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
ThanushsaranS
 
Calories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptxCalories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptx
TijiLMAHESHWARI
 
Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...
Pixellion
 
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your CompetitorsAI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
Contify
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
Defense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptxDefense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptx
Greg Makowski
 
FPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptxFPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptx
ssuser4ef83d
 
Flip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptxFlip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptx
mubashirkhan45461
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
Geometry maths presentation for begginers
Geometry maths presentation for begginersGeometry maths presentation for begginers
Geometry maths presentation for begginers
zrjacob283
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
Deloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit contextDeloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit context
Process mining Evangelist
 
Conic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptxConic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptx
taiwanesechetan
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
Ad

Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]

  • 1. Risk-Based Attack Surface Approximation: How Much Data is Enough? Chris Theisen, Brendan Murphy, Kim Herzig, Laurie Williams North Carolina State University Microsoft Research
  • 3. Introduction What is the “Attack Surface”? Quoting the Open Web Application Security Project… • All paths for data and commands in a software system • The data that travels these paths • The code that implements and protects both Concept used for security effort prioritization. 3 Introduction | Background | Methodology | Results | Conclusion
  • 4. 4 Crashes represent activity that put the system under stress. Stack Traces tell us what happened. foo!foobarDeviceQueueRequest+0x68 foo!fooDeviceSetup+0x72 foo!fooAllDone+0xA8 bar!barDeviceQueueRequest+0xB6 bar!barDeviceSetup+0x08 bar!barAllDone+0xFF center!processAction+0x1034 center!dontDoAnything+0x1030 Risk-Based Attack Surface Approximation (RASA) Introduction | Background | Methodology | Results | Conclusion
  • 5. • Previous RASA study used tens of millions of crashes. • Previous study was per binary. Previously… 5 [SEIP ‘15] Chris Theisen, Kim Herzig, Pat Morrison, Brendan Murphy, and Laurie Williams, “Approximating Attack Surfaces with Stack Traces”, in Companion Proceedings of the 37th International Conference on Software Engineering (2015). [SEIP ‘15] Crashes %binaries 48.4% %vulnerabilities 94.6% Introduction | Background | Methodology | Results | Conclusion
  • 6. • Previous RASA study used tens of millions of crashes. • Previous study was per binary. Previously… 6 [SEIP ‘15] Chris Theisen, Kim Herzig, Pat Morrison, Brendan Murphy, and Laurie Williams, “Approximating Attack Surfaces with Stack Traces”, in Companion Proceedings of the 37th International Conference on Software Engineering (2015). [SEIP ‘15] Crashes %binaries 48.4% %vulnerabilities 94.6% Great! All done, right? Introduction | Background | Methodology | Results | Conclusion
  • 7. Practitioner Problems • Previous RASA study used tens of millions of crashes. • Previous study was per binary. 7 Introduction | Background | Methodology | Results | Conclusion
  • 8. Practitioner Problems • Previous RASA study used tens of millions of crashes. • Previous study was per binary. • Practitioners had some issues with it… – “Binary prioritization isn’t actionable.” 8 Introduction | Background | Methodology | Results | Conclusion
  • 9. Practitioner Problems • Previous RASA study used tens of millions of crashes. • Previous study was per binary. • Practitioners had some issues with it… – “Binary prioritization isn’t actionable.” – “We don’t have that much data!” 9 Introduction | Background | Methodology | Results | Conclusion
  • 10. Practitioner Problems • Previous RASA study used tens of millions of crashes. • Previous study was per binary. • Practitioners had some issues with it… – “Binary prioritization isn’t actionable.” – “We don’t have that much data!” – “We don’t store every crash we received, we don’t see the value in that.” 10 Introduction | Background | Methodology | Results | Conclusion
  • 11. Practitioner Problems • Previous RASA study used tens of millions of crashes. • Previous study was per binary. • Practitioners had some issues with it… – “Binary prioritization isn’t actionable.” – “We don’t have that much data!” – “We don’t store every crash we received, we don’t see the value in that.” – “We don’t have historical vulnerabilities to use as a goodness measure.” 11 Introduction | Background | Methodology | Results | Conclusion
  • 12. Research Questions • RQ1: Can the RASA approach be implemented at the source code file level with actionable results? • RQ2: How does random sampling of crash dump stack traces effect RASA? 12 Introduction | Background | Methodology | Results | Conclusion
  • 13. Data Sources • Mozilla Firefox – ~1M crashes – Vulnerability data from Mozilla Security Blog and bug tracker • Windows 8.1 – ~9M crashes – Vulnerability data from internal data sources 13 Introduction | Background | Methodology | Results | Conclusion
  • 14. Methodology - RASA 14 Introduction | Background | Methodology | Results | Conclusion
  • 15. Methodology - RASA 15 Introduction | Background | Methodology | Results | Conclusion
  • 16. Methodology - RASA 16 Introduction | Background | Methodology | Results | Conclusion
  • 17. Methodology - Sampling 17 10% of… Introduction | Background | Methodology | Results | Conclusion
  • 18. Methodology - Sampling 18 10% of… 20% of… Introduction | Background | Methodology | Results | Conclusion
  • 19. Methodology - Sampling 19 10% of… 20% of… • Sample at each “level” • Record stdev of files, vulnerabilities covered Introduction | Background | Methodology | Results | Conclusion
  • 20. 20 12% 13% 14% 15% 16% 17% 70% 71% 72% 73% 74% 75% Random Sample Size Introduction | Background | Methodology | Results | Conclusion Files Vulnerabilities
  • 21. 10% 12% 14% 16% 18% 20% 22% 24% 26% 30% 32% 34% 36% 38% 40% 42% 44% 46% Random Sample Size 21 Introduction | Background | Methodology | Results | Conclusion Files Vulnerabilities
  • 22. Why Does Sampling Work? • Crashes tend not to happen in isolation. – If something crashes once, it will likely crash again. • For Firefox, only 6 files in the data set with a vulnerability had only one crash occurrence. – Against ~300 vulnerable files, 50,000 total files • If foo.cpp crashes many times, random sampling unlikely to remove all foo.cpp’s from the dataset. 22 Introduction | Background | Methodology | Results | Conclusion
  • 23. Future Work • We have a list of vulnerable files; now what? – Further prioritization to assist developers. • We’re looking at: – How the attack surface changes over time. – How the complexity of the attack surface predicts vulnerabilities. – How proximity to the boundary of a software system predicts vulnerabilities. 23 Introduction | Background | Methodology | Results | Conclusion
  • 24. Conclusions • “Binary prioritization isn’t actionable.” – RASA can prioritize security effort effectively at the source code file level. 24 Introduction | Background | Methodology | Results | Conclusion
  • 25. Conclusions • “Binary prioritization isn’t actionable.” – RASA can prioritize security effort effectively at the source code file level. • “We don’t have that much data!” – Orders of magnitude less data required compared to previous studies. 25 Introduction | Background | Methodology | Results | Conclusion
  • 26. Conclusions • “We don’t store every crash we received, we don’t see the value in that.” – A naïve approach like random sampling still works. 26 Introduction | Background | Methodology | Results | Conclusion
  • 27. Conclusions • “We don’t store every crash we received, we don’t see the value in that.” – A naïve approach like random sampling still works. • “We don’t have historical vulnerabilities to use as a goodness measure.” – Satisfied previous complaints with less data, naïve sampling; evidence it will work on new systems. 27 Introduction | Background | Methodology | Results | Conclusion

Editor's Notes

  • #3: In 2010 paper, Tom Zimmermann compared finding vulnerabilities in code to finding “a needle in a haystack.” Based on my experience as a security engineer, it can be more like finding a needle in a field of them! Most difficult part of my job.
  • #4: One prioritization technique is to identify the attack surface. Commonly used for prioritization of effort, but typically based on the common knowledge of the team; imperfect.
  • #5: We developed approach called Risk-Based Attack Surface Approximation, or RASA… Hypothesis was that code that crashes shares properties with vulnerable code. Attempting to crash systems is one of the primary tools in forensics/red-team activity; we’re reverse-engineering what attackers do.
  • #8: So we evangelized this approach to industry days at NCSU, tech talks, etc. Got great feedback on making RASA actionable. I want to highlight a few of the words from the previous slide.
  • #12: When we follow up with, “well, just run it and see!” the response we got was… So we need to run studies at a lower level of granularity, with a lot less data, while still having historical vulnerabilities to compare against.
  • #15: Mine out individual code artifacts (files, in this case), from the stack traces in crash dumps.
  • #16: Map code mined from each crash to code in source control; use binary/function mapping for files, if files aren’t in crash
  • #17: The resultant pairing between code on crashes to source control code is our approximation of the attack surface. Simple by design; not language limited, works in multiple domains, just need crashes with stack traces. Highly flexible!
  • #18: To limit data, we do random sampling of crashes placed into the process at the first step. 10% of crashes, 20% of crashes, et cetera.
  • #20: Do multiple samples at each “level” (10%, 20%, etc), also look at how the attack surface changes from sample to sample at each sampling level. Trying to answer; does our approximation change greatly with different samplings of crash dump stack traces?
  • #21: Chart of percentage of files on the attack surface vs. vulnerabiltiies covered by attack surface. Vulnerabilities are 5 times as likely to be in code that crashes than not! Great place to start to run other tools, like static analysis, vulnerability prediction models, etc. Limits the work you need to do. Tiny variations, can use a quarter of the data available and our metrics change less than a percentage point with no change between samples.
  • #22: Comparison against previous study; we see a similar 2:1 ratio from vulnerabilities to files, so vulnerabilities are twice as dense in crashing code across all samples.
  • #24: Shorten or identify keywords
  • #31: Make a chart of this data