SlideShare a Scribd company logo
WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
Jakub Sanojca & Joāo Da Silva, Avast
Researcher Data Engineer
Jakub Sanojca & Joāo Da Silva, Avast
Researcher Data Engineer
AI on Spark for Malware
Analysis and Anomalous Threat
Detection
Demonstrate how Avast leverages
AI and big data to burn malware.
Goal
Demonstrate how Avast leverages
AI and big data to burn malware.
Goal
Agenda
• What Avast does
• Malware research
• Structured Streaming
• AI anomaly detection
• Demo
Thank you
Thank you
• Big Data Systems
• AI team - especially Yura, Olga and Dmitry
• Threat researchers and analysts
Avast is dedicated to creating a world
that provides safety and privacy for all,
no matter who you are, where you are,
or how you connect.
Global reach
10#UnifiedDataAnalytics #SparkAISummit
Portfolio of security, privacy
and utility applications
World’s Largest Detection Network
300 M+
new files
monthly 10,000 +
globally
distributed
servers
200B+
URLs
12#UnifiedDataAnalytics #SparkAISummit
Training the Avast Machine Learning Engine
Purpose-built approach that takes < 12 hours to add
new features, train, and deploy into production
Malware classification
13#UnifiedDataAnalytics #SparkAISummit
Data
● >500 handcrafted features from binary
files from our experts
Task
● Classification to clean/malware/pup files
Two step ML Pipeline:
● Cluster data with custom k-means
● Classification inside the cluster is done
by Random Forest
Infrastructure: Underlying data lake - Burger
14#UnifiedDataAnalytics #SparkAISummit
15#UnifiedDataAnalytics #SparkAISummit15
Data
Features Clustering Training Validation Production
Clustering Training Validation
3h 4.5h 24 h
24 h
24 h 6 h
● ~700TB of binary files
● patented tailor-made solution
Architecture: Malware classification
Custom application Spark
• optimised & performant
• takes months to develop
• not that easy to change
• slower
• easy to experiment with
• very fast development
#UnifiedDataAnalytics #SparkAISummit
Threat Detections Streaming
1. Identify - threat researcher
2. Block - operator
3. Analyze and automate - data / AI researcher +
engineers
3 step threat approach
1. Identify - threat researcher
2. Block - operator
3. Analyze and automate - data / AI researcher +
engineers
3 step threat approach
1. Identify - threat researcher
2. Block - operator
3. Analyze and automate - data / AI researcher +
engineers
3 step threat approach
• Thousands of detection time series
• Where should operator focus?
Time series of detections
• Thousands of detection time series
• Where should operator focus?
Time series of detections
Short response time is necessary
Short response time is necessary
First idea - custom streaming app
• Python because of ML models
First idea - custom streaming app
• Python because of ML models
• Big part of code about already solved problems
First idea - custom streaming app
• Python because of ML models
• Big part of code about already solved problems
• POC written by researchers
First idea - custom streaming app
• Python because of ML models
• Big part of code about already solved problems
• POC written by researchers
• Gets job done, but not easy to maintain or experiment
Adopted solution:
Spark Structured Streaming
29#UnifiedDataAnalytics #SparkAISummit
30#UnifiedDataAnalytics #SparkAISummit
Structured Streaming
Advantages of
Structured Streaming
for fast threat detection
#UnifiedDataAnalytics #SparkAISummit
Advantages of Structured Streaming
• Unified processing engine
32#UnifiedDataAnalytics #SparkAISummit
Advantages of Structured Streaming
• Unified processing engine
• End to end AI with multiple sinks
33#UnifiedDataAnalytics #SparkAISummit
Advantages of Structured Streaming
• Unified processing engine
• End to end AI with multiple sinks
• Window aggregations and Watermarking
out of the box
34#UnifiedDataAnalytics #SparkAISummit
Advantages of Structured Streaming
• Unified processing engine
• End to end AI with multiple sinks
• Window aggregations and Watermarking out of the box
• Resilient streams
35#UnifiedDataAnalytics #SparkAISummit
#UnifiedDataAnalytics #SparkAISummit
Structured Streaming
Adoption
Structured Streaming Adoption
• Unbounded table
37#UnifiedDataAnalytics #SparkAISummit
Structured Streaming Adoption
• Unbounded table
• Triggers
38#UnifiedDataAnalytics #SparkAISummit
Structured Streaming Adoption
• Unbounded table
• Triggers
39#UnifiedDataAnalytics #SparkAISummit
>>> writer = sdf.writeStream.trigger(processingTime='5 seconds')
Structured Streaming Adoption
• Unbounded table
• Triggers
40#UnifiedDataAnalytics #SparkAISummit
>>> writer = sdf.writeStream.trigger(processingTime='5 seconds')
>>> writer = sdf.writeStream.trigger(once=True)
Structured Streaming Adoption
• Unbounded table
• Triggers
41#UnifiedDataAnalytics #SparkAISummit
>>> writer = sdf.writeStream.trigger(processingTime='5 seconds')
>>> writer = sdf.writeStream.trigger(once=True)
>>> writer = sdf.writeStream.trigger(continuous='5 seconds')
Structured Streaming Adoption
• Unbounded table
• Triggers
• Micro Batch Processing vs Continuous processing
42#UnifiedDataAnalytics #SparkAISummit
Structured Streaming Adoption
• Unbounded table
• Triggers
• Micro Batch Processing vs Continuous processing
– org.apache.spark.sql.execution.streaming.MicroBatchExecution
43#UnifiedDataAnalytics #SparkAISummit
Structured Streaming Adoption
• Unbounded table
• Triggers
• Micro Batch Processing vs Continuous processing
– org.apache.spark.sql.execution.streaming.MicroBatchExecution
– org.apache.spark.sql.execution.streaming.ContinuousExecution
(experimental)
44#UnifiedDataAnalytics #SparkAISummit
Structured Streaming Adoption
• Unbounded table
• Triggers
• Micro Batch Processing vs Continuous processing
45#UnifiedDataAnalytics #SparkAISummit
Before
46#UnifiedDataAnalytics #SparkAISummit
Before
47#UnifiedDataAnalytics #SparkAISummit
Before After
48#UnifiedDataAnalytics #SparkAISummit
49#UnifiedDataAnalytics #SparkAISummit
#UnifiedDataAnalytics #SparkAISummit
AI driven anomaly detection
on time series
How to quickly identify campaigns of malware and
potentially unwanted programs.
51#UnifiedDataAnalytics #SparkAISummit
AI driven anomaly detection on time series
How to quickly identify campaigns of malware and potentially
unwanted programs:
• Traditional approaches - find outliers
52#UnifiedDataAnalytics #SparkAISummit
AI driven anomaly detection on time series
How to quickly identify campaigns of malware and potentially
unwanted programs.
• Traditional approaches - find outliers
• Machine learning - predict and compare
– Neural networks - LSTMs vs CNNs
53#UnifiedDataAnalytics #SparkAISummit
AI driven anomaly detection on time series
How to quickly identify campaigns of malware and potentially
unwanted programs.
• Traditional approaches - find outliers
• Machine learning - predict and compare
– Neural networks - LSTMs vs CNNs
– Other - auto-regressive models etc.
54#UnifiedDataAnalytics #SparkAISummit
AI driven anomaly detection on time series
• Sequential
55#UnifiedDataAnalytics #SparkAISummit
Threat anomaly detection: training
• Sequential
• Parallel! mapPartitions / pandas_udf
56#UnifiedDataAnalytics #SparkAISummit
Threat anomaly detection: training
• Sequential
• Parallel!
• Distributed - TensorflowOnSpark
57#UnifiedDataAnalytics #SparkAISummit
Threat anomaly detection: training
• pandas_udf for parallel predictions
• super easy to test on already stored data as batch job
58#UnifiedDataAnalytics #SparkAISummit
Threat anomaly detection: stream serving
Demo + Code Walkthrough
59#UnifiedDataAnalytics #SparkAISummit
Challenges
60#UnifiedDataAnalytics #SparkAISummit
• Multiple potential incompatibility surfaces
• Unexpected behavior / Unknowns
• Silent failures
Takeaways
• Easier collaboration between Science and Engineering teams
• An excellent toolbox to do anomaly detection in near real time
• Easy ML/AI/DL integration
• Parallelism
61#UnifiedDataAnalytics #SparkAISummit
Questions?
Jakub Sanojca & Joāo Da Silva, Avast
Researcher Data Engineer
Ad

More Related Content

Similar to AI on Spark for Malware Analysis and Anomalous Threat Detection (20)

Discovering Flaws in Security-Focused Static Analysis Tools for Android using...
Discovering Flaws in Security-Focused Static Analysis Tools for Android using...Discovering Flaws in Security-Focused Static Analysis Tools for Android using...
Discovering Flaws in Security-Focused Static Analysis Tools for Android using...
Kevin Moran
 
Malware Analysis
Malware AnalysisMalware Analysis
Malware Analysis
Ramin Farajpour Cami
 
ATAGTR2017 Security Testing / IoT Testing in Real World
ATAGTR2017 Security Testing / IoT Testing in Real WorldATAGTR2017 Security Testing / IoT Testing in Real World
ATAGTR2017 Security Testing / IoT Testing in Real World
Agile Testing Alliance
 
Visualization in the Age of Big Data
Visualization in the Age of Big DataVisualization in the Age of Big Data
Visualization in the Age of Big Data
Raffael Marty
 
Creating Your Own Threat Intel Through Hunting & Visualization
Creating Your Own Threat Intel Through Hunting & VisualizationCreating Your Own Threat Intel Through Hunting & Visualization
Creating Your Own Threat Intel Through Hunting & Visualization
Raffael Marty
 
Software Analytics: Data Analytics for Software Engineering and Security
Software Analytics: Data Analytics for Software Engineering and SecuritySoftware Analytics: Data Analytics for Software Engineering and Security
Software Analytics: Data Analytics for Software Engineering and Security
Tao Xie
 
Big Crypto for Little Things
Big Crypto for Little ThingsBig Crypto for Little Things
Big Crypto for Little Things
H4Diadmin
 
Ten security product categories you've (probably) never heard of
Ten security product categories you've (probably) never heard ofTen security product categories you've (probably) never heard of
Ten security product categories you've (probably) never heard of
Adrian Sanabria
 
михаил дударев
михаил дударевмихаил дударев
михаил дударев
apps4allru
 
Ending the Tyranny of Expensive Security Tools: A New Hope
Ending the Tyranny of Expensive Security Tools: A New HopeEnding the Tyranny of Expensive Security Tools: A New Hope
Ending the Tyranny of Expensive Security Tools: A New Hope
Michele Chubirka
 
Soc analyst course content v3
Soc analyst course content v3Soc analyst course content v3
Soc analyst course content v3
ShivamSharma909
 
Soc analyst course content
Soc analyst course contentSoc analyst course content
Soc analyst course content
ShivamSharma909
 
20160000 Cloud Discovery Event - Cloud Access Security Brokers
20160000 Cloud Discovery Event - Cloud Access Security Brokers20160000 Cloud Discovery Event - Cloud Access Security Brokers
20160000 Cloud Discovery Event - Cloud Access Security Brokers
Robin Vermeirsch
 
Virtual Splunk User Group - Phantom Workbook Automation & Threat Hunting with...
Virtual Splunk User Group - Phantom Workbook Automation & Threat Hunting with...Virtual Splunk User Group - Phantom Workbook Automation & Threat Hunting with...
Virtual Splunk User Group - Phantom Workbook Automation & Threat Hunting with...
Harry McLaren
 
Malware Classification and Analysis
Malware Classification and AnalysisMalware Classification and Analysis
Malware Classification and Analysis
Prashant Chopra
 
DEF CON 27 - CHRISTOPHER ROBERTS - firmware slap
DEF CON 27 - CHRISTOPHER ROBERTS - firmware slapDEF CON 27 - CHRISTOPHER ROBERTS - firmware slap
DEF CON 27 - CHRISTOPHER ROBERTS - firmware slap
Felipe Prado
 
Scaling ML-Based Threat Detection For Production Cyber Attacks
Scaling ML-Based Threat Detection For Production Cyber AttacksScaling ML-Based Threat Detection For Production Cyber Attacks
Scaling ML-Based Threat Detection For Production Cyber Attacks
Databricks
 
Talos
TalosTalos
Talos
Muhammad ilyas
 
利用 SDACK 架構分析資安事件大數據
利用 SDACK 架構分析資安事件大數據利用 SDACK 架構分析資安事件大數據
利用 SDACK 架構分析資安事件大數據
Yu-Lun Chen
 
Application and Challenges of Streaming Analytics and Machine Learning on Mu...
 Application and Challenges of Streaming Analytics and Machine Learning on Mu... Application and Challenges of Streaming Analytics and Machine Learning on Mu...
Application and Challenges of Streaming Analytics and Machine Learning on Mu...
Databricks
 
Discovering Flaws in Security-Focused Static Analysis Tools for Android using...
Discovering Flaws in Security-Focused Static Analysis Tools for Android using...Discovering Flaws in Security-Focused Static Analysis Tools for Android using...
Discovering Flaws in Security-Focused Static Analysis Tools for Android using...
Kevin Moran
 
ATAGTR2017 Security Testing / IoT Testing in Real World
ATAGTR2017 Security Testing / IoT Testing in Real WorldATAGTR2017 Security Testing / IoT Testing in Real World
ATAGTR2017 Security Testing / IoT Testing in Real World
Agile Testing Alliance
 
Visualization in the Age of Big Data
Visualization in the Age of Big DataVisualization in the Age of Big Data
Visualization in the Age of Big Data
Raffael Marty
 
Creating Your Own Threat Intel Through Hunting & Visualization
Creating Your Own Threat Intel Through Hunting & VisualizationCreating Your Own Threat Intel Through Hunting & Visualization
Creating Your Own Threat Intel Through Hunting & Visualization
Raffael Marty
 
Software Analytics: Data Analytics for Software Engineering and Security
Software Analytics: Data Analytics for Software Engineering and SecuritySoftware Analytics: Data Analytics for Software Engineering and Security
Software Analytics: Data Analytics for Software Engineering and Security
Tao Xie
 
Big Crypto for Little Things
Big Crypto for Little ThingsBig Crypto for Little Things
Big Crypto for Little Things
H4Diadmin
 
Ten security product categories you've (probably) never heard of
Ten security product categories you've (probably) never heard ofTen security product categories you've (probably) never heard of
Ten security product categories you've (probably) never heard of
Adrian Sanabria
 
михаил дударев
михаил дударевмихаил дударев
михаил дударев
apps4allru
 
Ending the Tyranny of Expensive Security Tools: A New Hope
Ending the Tyranny of Expensive Security Tools: A New HopeEnding the Tyranny of Expensive Security Tools: A New Hope
Ending the Tyranny of Expensive Security Tools: A New Hope
Michele Chubirka
 
Soc analyst course content v3
Soc analyst course content v3Soc analyst course content v3
Soc analyst course content v3
ShivamSharma909
 
Soc analyst course content
Soc analyst course contentSoc analyst course content
Soc analyst course content
ShivamSharma909
 
20160000 Cloud Discovery Event - Cloud Access Security Brokers
20160000 Cloud Discovery Event - Cloud Access Security Brokers20160000 Cloud Discovery Event - Cloud Access Security Brokers
20160000 Cloud Discovery Event - Cloud Access Security Brokers
Robin Vermeirsch
 
Virtual Splunk User Group - Phantom Workbook Automation & Threat Hunting with...
Virtual Splunk User Group - Phantom Workbook Automation & Threat Hunting with...Virtual Splunk User Group - Phantom Workbook Automation & Threat Hunting with...
Virtual Splunk User Group - Phantom Workbook Automation & Threat Hunting with...
Harry McLaren
 
Malware Classification and Analysis
Malware Classification and AnalysisMalware Classification and Analysis
Malware Classification and Analysis
Prashant Chopra
 
DEF CON 27 - CHRISTOPHER ROBERTS - firmware slap
DEF CON 27 - CHRISTOPHER ROBERTS - firmware slapDEF CON 27 - CHRISTOPHER ROBERTS - firmware slap
DEF CON 27 - CHRISTOPHER ROBERTS - firmware slap
Felipe Prado
 
Scaling ML-Based Threat Detection For Production Cyber Attacks
Scaling ML-Based Threat Detection For Production Cyber AttacksScaling ML-Based Threat Detection For Production Cyber Attacks
Scaling ML-Based Threat Detection For Production Cyber Attacks
Databricks
 
利用 SDACK 架構分析資安事件大數據
利用 SDACK 架構分析資安事件大數據利用 SDACK 架構分析資安事件大數據
利用 SDACK 架構分析資安事件大數據
Yu-Lun Chen
 
Application and Challenges of Streaming Analytics and Machine Learning on Mu...
 Application and Challenges of Streaming Analytics and Machine Learning on Mu... Application and Challenges of Streaming Analytics and Machine Learning on Mu...
Application and Challenges of Streaming Analytics and Machine Learning on Mu...
Databricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
Ad

Recently uploaded (20)

computer organization and assembly language.docx
computer organization and assembly language.docxcomputer organization and assembly language.docx
computer organization and assembly language.docx
alisoftwareengineer1
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjksPpt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
panchariyasahil
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
Deloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit contextDeloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit context
Process mining Evangelist
 
Flip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptxFlip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptx
mubashirkhan45461
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
Conic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptxConic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptx
taiwanesechetan
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia
Alexander Romero Arosquipa
 
chapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.pptchapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.ppt
justinebandajbn
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
How to join illuminati Agent in uganda call+256776963507/0741506136
How to join illuminati Agent in uganda call+256776963507/0741506136How to join illuminati Agent in uganda call+256776963507/0741506136
How to join illuminati Agent in uganda call+256776963507/0741506136
illuminati Agent uganda call+256776963507/0741506136
 
Defense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptxDefense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptx
Greg Makowski
 
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTemplate_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
cegiver630
 
computer organization and assembly language.docx
computer organization and assembly language.docxcomputer organization and assembly language.docx
computer organization and assembly language.docx
alisoftwareengineer1
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjksPpt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
panchariyasahil
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
Deloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit contextDeloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit context
Process mining Evangelist
 
Flip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptxFlip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptx
mubashirkhan45461
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
Conic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptxConic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptx
taiwanesechetan
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
chapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.pptchapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.ppt
justinebandajbn
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
Defense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptxDefense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptx
Greg Makowski
 
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTemplate_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
cegiver630
 
Ad

AI on Spark for Malware Analysis and Anomalous Threat Detection