SlideShare a Scribd company logo
Brandon Hamric & Alex Meyer, Eventbrite
Deploying Python Machine Learning
Models with Apache Spark
#SAISDS2
Introduction
#SAISDS2
About Eventbrite
• Global ticketing and event technology platform that provides creators of
events of all shapes and sizes with tools and resources to seamlessly plan,
promote, and produce live experiences around the world
• Can be accessed online or via mobile apps, scales from basic registration
and ticketing to a fully featured event management platform
• 203 million tickets processed in 2017
• Powered 3 million events in 170+ countries in 2017
• 700k creators supported in 2017
3#SAISDS2
About Us
• Eventbrite
• We're data engineers
• We ship models for Eventbrite data scientists
• Started out at Eventbrite on Discovery - Event Recommendations
• Built new data infrastructure to support all business needs
• Our team creates, maintains, and supports the data infrastructure,
tasks, and pipelines that serve other engineers, business insights,
and product
4#SAISDS2
Brandon Hamric - bhamric@eventbrite.com
• Principal Data Engineer/Architect @ Eventbrite
• Co-founded Rescue Forensics (YC W15)
• 10 years experience in data engineering
• Worked with Spark since 2014
5#SAISDS2
• Senior Data Engineer @ Eventbrite
• MS in Computer Science - Distributed Systems
(Vanderbilt University)
• 4 years experience in data engineering
• Worked with Spark since 2014
6#SAISDS2
Alex Meyer - alexm@eventbrite.com
Structured Predictors
#SAISDS2
Common Predictor Workflow
• High coupling between
engineers and data
scientists
• Mostly serial workflow
• High barrier to entry
• Too many contributors
• Code duplication
8#SAISDS2
Improved Predictor Workflow
• Low coupling between
engineers and data
scientists
• Independent
Workflows
• Data scientists own
their models end-to-
end
• Data Engineering isn't
a bottleneck
9#SAISDS2
Predictor Code
10#SAISDS2
Model ManagementData prep and cleanup
● Training and
prediction code
can be
inconsistent
● Sample data
prep can be
different than
prod data prep
Feature Extraction Prediction
● Training and
prediction code
can be
inconsistent
● Mostly written
for vertical
scaling
● Version
management is
hard
● Can use a lot of
memory
● It can be hard to
switch between
models
● Bulk vs single-
item
Predictor Deployment Problems
• Shared code between dev, batch, and streaming is an afterthought
• Most models are written for vertical scaling first
• Deployment is ad-hoc without a common structure
• Model iteration is slow because of lack of automation
• Model versioning isn't consistent without a library
11#SAISDS2
Predictor Structure
12#SAISDS2
Notebook Offline Prediction Streaming Prediction
Data prep and
cleanup
Query to a local csv Convert to incremental
query
Convert to read stream
Feature Extraction Pandas dataframes and
python functions
Convert to spark
dataframe or rdd
operations
Convert to dataframe
operations or
foreachBatch in Spark
2.4
Load Model Load from a local pickle Load from s3 or hdfs
onto executors
Load from s3 or hdfs
onto executors
Predict Mixed into scoring logic Mapper or UDF on
features
UDF on feature rows
13
Predictor Class
• Manages Model
– Versioning
– Storage
– Loading
• Outlines structure
– Data loading
– Feature extraction
– Prediction
• Batch and streaming
• Enables Automation
#SAISDS2
Example Predictor and Demo
#SAISDS2
Demo - Latent Dirichlet Allocation (LDA)
• Generate topics on Eventbrite's event description corpus
• Get topic probabilities per event
• We can use topics to improve search, browse, and personalization
• LDA Wiki
• LDA Scikit Learn Model
<open notebook>
15#SAISDS2
Takeaways
• Consistent predictor structure makes distributed
prediction easy to automate deployment
• Streaming and batch prediction can share code
• Use bulk feature extraction and prediction often
• We may opensource our predictor library
• We're hiring!
16
Thanks!
Questions? Feel free to reach out!
17#SAISDS2
Ad

Recommended

Event driven microservices with axon and spring boot-excitingly boring
Event driven microservices with axon and spring boot-excitingly boring
Allard Buijze
 
Spring boot
Spring boot
NexThoughts Technologies
 
A Closer Look at Apache Kudu
A Closer Look at Apache Kudu
Andriy Zabavskyy
 
Khronos Munich 2018 - Halcyon and Vulkan
Khronos Munich 2018 - Halcyon and Vulkan
Electronic Arts / DICE
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
Flink Forward
 
Production-Grade Kubernetes With NGINX Ingress Controller
Production-Grade Kubernetes With NGINX Ingress Controller
NGINX, Inc.
 
Domain Driven Design: Zero to Hero
Domain Driven Design: Zero to Hero
Fabrício Rissetto
 
Apache Spark and MongoDB - Turning Analytics into Real-Time Action
Apache Spark and MongoDB - Turning Analytics into Real-Time Action
João Gabriel Lima
 
OpenGL 3.2 and More
OpenGL 3.2 and More
Mark Kilgard
 
Advanced RAG Optimization To Make it Production-ready
Advanced RAG Optimization To Make it Production-ready
Zilliz
 
Domain driven design
Domain driven design
Mustafa Dağdelen
 
Advancements in-tiled-rendering
Advancements in-tiled-rendering
mistercteam
 
Introduction to Docker
Introduction to Docker
Aditya Konarde
 
Google Codelabsをやってみた
Google Codelabsをやってみた
furusin
 
Devoxx 2012 hibernate envers
Devoxx 2012 hibernate envers
Romain Linsolas
 
Qt Installer Framework
Qt Installer Framework
ICS
 
Common issues with Apache Kafka® Producer
Common issues with Apache Kafka® Producer
confluent
 
Velero & Beyond: Backup & Restore for Kubernetes Applications (Cloud Native S...
Velero & Beyond: Backup & Restore for Kubernetes Applications (Cloud Native S...
Chakradhar Rao Jonagam
 
Indexing and Query Optimization
Indexing and Query Optimization
MongoDB
 
Dao pattern
Dao pattern
ciriako
 
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Guido Schmutz
 
Design Pattern For C# Part 1
Design Pattern For C# Part 1
Shahzad
 
Microsoft Dynamics 365 Business Central - ITA
Microsoft Dynamics 365 Business Central - ITA
Roberto Stefanetti
 
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
Tristan Lorach
 
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
DataWorks Summit
 
JavaOne 2013: Memory Efficient Java
JavaOne 2013: Memory Efficient Java
Chris Bailey
 
Hdr Meets Black And White 2
Hdr Meets Black And White 2
Francesco Carucci
 
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
HostedbyConfluent
 
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Databricks
 
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
Databricks
 

More Related Content

What's hot (20)

OpenGL 3.2 and More
OpenGL 3.2 and More
Mark Kilgard
 
Advanced RAG Optimization To Make it Production-ready
Advanced RAG Optimization To Make it Production-ready
Zilliz
 
Domain driven design
Domain driven design
Mustafa Dağdelen
 
Advancements in-tiled-rendering
Advancements in-tiled-rendering
mistercteam
 
Introduction to Docker
Introduction to Docker
Aditya Konarde
 
Google Codelabsをやってみた
Google Codelabsをやってみた
furusin
 
Devoxx 2012 hibernate envers
Devoxx 2012 hibernate envers
Romain Linsolas
 
Qt Installer Framework
Qt Installer Framework
ICS
 
Common issues with Apache Kafka® Producer
Common issues with Apache Kafka® Producer
confluent
 
Velero & Beyond: Backup & Restore for Kubernetes Applications (Cloud Native S...
Velero & Beyond: Backup & Restore for Kubernetes Applications (Cloud Native S...
Chakradhar Rao Jonagam
 
Indexing and Query Optimization
Indexing and Query Optimization
MongoDB
 
Dao pattern
Dao pattern
ciriako
 
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Guido Schmutz
 
Design Pattern For C# Part 1
Design Pattern For C# Part 1
Shahzad
 
Microsoft Dynamics 365 Business Central - ITA
Microsoft Dynamics 365 Business Central - ITA
Roberto Stefanetti
 
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
Tristan Lorach
 
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
DataWorks Summit
 
JavaOne 2013: Memory Efficient Java
JavaOne 2013: Memory Efficient Java
Chris Bailey
 
Hdr Meets Black And White 2
Hdr Meets Black And White 2
Francesco Carucci
 
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
HostedbyConfluent
 
OpenGL 3.2 and More
OpenGL 3.2 and More
Mark Kilgard
 
Advanced RAG Optimization To Make it Production-ready
Advanced RAG Optimization To Make it Production-ready
Zilliz
 
Advancements in-tiled-rendering
Advancements in-tiled-rendering
mistercteam
 
Introduction to Docker
Introduction to Docker
Aditya Konarde
 
Google Codelabsをやってみた
Google Codelabsをやってみた
furusin
 
Devoxx 2012 hibernate envers
Devoxx 2012 hibernate envers
Romain Linsolas
 
Qt Installer Framework
Qt Installer Framework
ICS
 
Common issues with Apache Kafka® Producer
Common issues with Apache Kafka® Producer
confluent
 
Velero & Beyond: Backup & Restore for Kubernetes Applications (Cloud Native S...
Velero & Beyond: Backup & Restore for Kubernetes Applications (Cloud Native S...
Chakradhar Rao Jonagam
 
Indexing and Query Optimization
Indexing and Query Optimization
MongoDB
 
Dao pattern
Dao pattern
ciriako
 
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Guido Schmutz
 
Design Pattern For C# Part 1
Design Pattern For C# Part 1
Shahzad
 
Microsoft Dynamics 365 Business Central - ITA
Microsoft Dynamics 365 Business Central - ITA
Roberto Stefanetti
 
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
Tristan Lorach
 
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
DataWorks Summit
 
JavaOne 2013: Memory Efficient Java
JavaOne 2013: Memory Efficient Java
Chris Bailey
 
Hdr Meets Black And White 2
Hdr Meets Black And White 2
Francesco Carucci
 
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
HostedbyConfluent
 

Similar to Deploying Python Machine Learning Models with Apache Spark with Brandon Hamric and Alex Meyer (20)

Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Databricks
 
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
Databricks
 
Detecting Mobile Malware with Apache Spark with David Pryce
Detecting Mobile Malware with Apache Spark with David Pryce
Databricks
 
Machine Learning With Spark
Machine Learning With Spark
Shivaji Dutta
 
Combining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache Spark
Databricks
 
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Spark Summit
 
Combining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache Spark
DataWorks Summit/Hadoop Summit
 
Using predictive APIs to create smarter apps
Using predictive APIs to create smarter apps
Louis Dorard
 
An Update on Scaling Data Science Applications with SparkR in 2018 with Heiko...
An Update on Scaling Data Science Applications with SparkR in 2018 with Heiko...
Databricks
 
Machine Learning with Apache Spark
Machine Learning with Apache Spark
IBM Cloud Data Services
 
Media_Entertainment_Veriticals
Media_Entertainment_Veriticals
Peyman Mohajerian
 
Machine Learning by Example - Apache Spark
Machine Learning by Example - Apache Spark
Meeraj Kunnumpurath
 
Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014
Vijay Srinivas Agneeswaran, Ph.D
 
L7. A developers’ overview of the world of predictive APIs
L7. A developers’ overview of the world of predictive APIs
Machine Learning Valencia
 
A developer's overview of the world of predictive APIs
A developer's overview of the world of predictive APIs
Louis Dorard
 
AI Class Topic 4: Text Analytics, Sentiment Analysis and Apache Spark
AI Class Topic 4: Text Analytics, Sentiment Analysis and Apache Spark
Value Amplify Consulting
 
Large Scale Machine learning with Spark
Large Scale Machine learning with Spark
Md. Mahedi Kaysar
 
Nose Dive into Apache Spark ML
Nose Dive into Apache Spark ML
Ahmet Bulut
 
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Maurice Nsabimana
 
Amazon reInvent 2020 Recap: AI and Machine Learning
Amazon reInvent 2020 Recap: AI and Machine Learning
Chris Fregly
 
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Databricks
 
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
Databricks
 
Detecting Mobile Malware with Apache Spark with David Pryce
Detecting Mobile Malware with Apache Spark with David Pryce
Databricks
 
Machine Learning With Spark
Machine Learning With Spark
Shivaji Dutta
 
Combining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache Spark
Databricks
 
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Spark Summit
 
Combining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache Spark
DataWorks Summit/Hadoop Summit
 
Using predictive APIs to create smarter apps
Using predictive APIs to create smarter apps
Louis Dorard
 
An Update on Scaling Data Science Applications with SparkR in 2018 with Heiko...
An Update on Scaling Data Science Applications with SparkR in 2018 with Heiko...
Databricks
 
Media_Entertainment_Veriticals
Media_Entertainment_Veriticals
Peyman Mohajerian
 
Machine Learning by Example - Apache Spark
Machine Learning by Example - Apache Spark
Meeraj Kunnumpurath
 
L7. A developers’ overview of the world of predictive APIs
L7. A developers’ overview of the world of predictive APIs
Machine Learning Valencia
 
A developer's overview of the world of predictive APIs
A developer's overview of the world of predictive APIs
Louis Dorard
 
AI Class Topic 4: Text Analytics, Sentiment Analysis and Apache Spark
AI Class Topic 4: Text Analytics, Sentiment Analysis and Apache Spark
Value Amplify Consulting
 
Large Scale Machine learning with Spark
Large Scale Machine learning with Spark
Md. Mahedi Kaysar
 
Nose Dive into Apache Spark ML
Nose Dive into Apache Spark ML
Ahmet Bulut
 
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Maurice Nsabimana
 
Amazon reInvent 2020 Recap: AI and Machine Learning
Amazon reInvent 2020 Recap: AI and Machine Learning
Chris Fregly
 
Ad

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
Ad

Recently uploaded (20)

定制OCAD学生卡加拿大安大略艺术与设计大学成绩单范本,OCAD成绩单复刻
定制OCAD学生卡加拿大安大略艺术与设计大学成绩单范本,OCAD成绩单复刻
taqyed
 
美国毕业证范本中华盛顿大学学位证书CWU学生卡购买
美国毕业证范本中华盛顿大学学位证书CWU学生卡购买
Taqyea
 
Introduction for GenAI for Faculty for University.pdf
Introduction for GenAI for Faculty for University.pdf
Saeed999312
 
PPT2 W1L2.pptx.........................................
PPT2 W1L2.pptx.........................................
palicteronalyn26
 
presentation4.pdf Intro to mcmc methodss
presentation4.pdf Intro to mcmc methodss
SergeyTsygankov6
 
Indigo_Airlines_Strategy_Presentation.pptx
Indigo_Airlines_Strategy_Presentation.pptx
mukeshpurohit991
 
lecture12.pdf Introduction to bioinformatics
lecture12.pdf Introduction to bioinformatics
SergeyTsygankov6
 
Shifting Focus on AI: How it Can Make a Positive Difference
Shifting Focus on AI: How it Can Make a Positive Difference
1508 A/S
 
最新版美国芝加哥大学毕业证(UChicago毕业证书)原版定制
最新版美国芝加哥大学毕业证(UChicago毕业证书)原版定制
taqyea
 
Presentation by Tariq & Mohammed (1).pptx
Presentation by Tariq & Mohammed (1).pptx
AbooddSandoqaa
 
YEAP !NOT WHAT YOU THINK aakshdjdncnkenfj
YEAP !NOT WHAT YOU THINK aakshdjdncnkenfj
payalmistryb
 
624753984-Annex-A3-RPMS-Tool-for-Proficient-Teachers-SY-2024-2025.pdf
624753984-Annex-A3-RPMS-Tool-for-Proficient-Teachers-SY-2024-2025.pdf
CristineGraceAcuyan
 
一比一原版(TUC毕业证书)开姆尼茨工业大学毕业证如何办理
一比一原版(TUC毕业证书)开姆尼茨工业大学毕业证如何办理
taqyed
 
最新版美国约翰霍普金斯大学毕业证(JHU毕业证书)原版定制
最新版美国约翰霍普金斯大学毕业证(JHU毕业证书)原版定制
Taqyea
 
@Reset-Password.pptx presentakh;kenvtion
@Reset-Password.pptx presentakh;kenvtion
MarkLariosa1
 
11_L2_Defects_and_Trouble_Shooting_2014[1].pdf
11_L2_Defects_and_Trouble_Shooting_2014[1].pdf
gun3awan88
 
Crafting-Research-Recommendations Grade 12.pptx
Crafting-Research-Recommendations Grade 12.pptx
DaryllWhere
 
Lesson-3_Program-Outcomes-and-Student-Learning-Outcomes_For-Students.pdf
Lesson-3_Program-Outcomes-and-Student-Learning-Outcomes_For-Students.pdf
SarahMaeDuallo
 
最新版美国加利福尼亚大学旧金山法学院毕业证(UCLawSF毕业证书)定制
最新版美国加利福尼亚大学旧金山法学院毕业证(UCLawSF毕业证书)定制
taqyea
 
NVIDIA Triton Inference Server, a game-changing platform for deploying AI mod...
NVIDIA Triton Inference Server, a game-changing platform for deploying AI mod...
Tamanna36
 
定制OCAD学生卡加拿大安大略艺术与设计大学成绩单范本,OCAD成绩单复刻
定制OCAD学生卡加拿大安大略艺术与设计大学成绩单范本,OCAD成绩单复刻
taqyed
 
美国毕业证范本中华盛顿大学学位证书CWU学生卡购买
美国毕业证范本中华盛顿大学学位证书CWU学生卡购买
Taqyea
 
Introduction for GenAI for Faculty for University.pdf
Introduction for GenAI for Faculty for University.pdf
Saeed999312
 
PPT2 W1L2.pptx.........................................
PPT2 W1L2.pptx.........................................
palicteronalyn26
 
presentation4.pdf Intro to mcmc methodss
presentation4.pdf Intro to mcmc methodss
SergeyTsygankov6
 
Indigo_Airlines_Strategy_Presentation.pptx
Indigo_Airlines_Strategy_Presentation.pptx
mukeshpurohit991
 
lecture12.pdf Introduction to bioinformatics
lecture12.pdf Introduction to bioinformatics
SergeyTsygankov6
 
Shifting Focus on AI: How it Can Make a Positive Difference
Shifting Focus on AI: How it Can Make a Positive Difference
1508 A/S
 
最新版美国芝加哥大学毕业证(UChicago毕业证书)原版定制
最新版美国芝加哥大学毕业证(UChicago毕业证书)原版定制
taqyea
 
Presentation by Tariq & Mohammed (1).pptx
Presentation by Tariq & Mohammed (1).pptx
AbooddSandoqaa
 
YEAP !NOT WHAT YOU THINK aakshdjdncnkenfj
YEAP !NOT WHAT YOU THINK aakshdjdncnkenfj
payalmistryb
 
624753984-Annex-A3-RPMS-Tool-for-Proficient-Teachers-SY-2024-2025.pdf
624753984-Annex-A3-RPMS-Tool-for-Proficient-Teachers-SY-2024-2025.pdf
CristineGraceAcuyan
 
一比一原版(TUC毕业证书)开姆尼茨工业大学毕业证如何办理
一比一原版(TUC毕业证书)开姆尼茨工业大学毕业证如何办理
taqyed
 
最新版美国约翰霍普金斯大学毕业证(JHU毕业证书)原版定制
最新版美国约翰霍普金斯大学毕业证(JHU毕业证书)原版定制
Taqyea
 
@Reset-Password.pptx presentakh;kenvtion
@Reset-Password.pptx presentakh;kenvtion
MarkLariosa1
 
11_L2_Defects_and_Trouble_Shooting_2014[1].pdf
11_L2_Defects_and_Trouble_Shooting_2014[1].pdf
gun3awan88
 
Crafting-Research-Recommendations Grade 12.pptx
Crafting-Research-Recommendations Grade 12.pptx
DaryllWhere
 
Lesson-3_Program-Outcomes-and-Student-Learning-Outcomes_For-Students.pdf
Lesson-3_Program-Outcomes-and-Student-Learning-Outcomes_For-Students.pdf
SarahMaeDuallo
 
最新版美国加利福尼亚大学旧金山法学院毕业证(UCLawSF毕业证书)定制
最新版美国加利福尼亚大学旧金山法学院毕业证(UCLawSF毕业证书)定制
taqyea
 
NVIDIA Triton Inference Server, a game-changing platform for deploying AI mod...
NVIDIA Triton Inference Server, a game-changing platform for deploying AI mod...
Tamanna36
 

Deploying Python Machine Learning Models with Apache Spark with Brandon Hamric and Alex Meyer

  • 1. Brandon Hamric & Alex Meyer, Eventbrite Deploying Python Machine Learning Models with Apache Spark #SAISDS2
  • 3. About Eventbrite • Global ticketing and event technology platform that provides creators of events of all shapes and sizes with tools and resources to seamlessly plan, promote, and produce live experiences around the world • Can be accessed online or via mobile apps, scales from basic registration and ticketing to a fully featured event management platform • 203 million tickets processed in 2017 • Powered 3 million events in 170+ countries in 2017 • 700k creators supported in 2017 3#SAISDS2
  • 4. About Us • Eventbrite • We're data engineers • We ship models for Eventbrite data scientists • Started out at Eventbrite on Discovery - Event Recommendations • Built new data infrastructure to support all business needs • Our team creates, maintains, and supports the data infrastructure, tasks, and pipelines that serve other engineers, business insights, and product 4#SAISDS2
  • 5. Brandon Hamric - [email protected] • Principal Data Engineer/Architect @ Eventbrite • Co-founded Rescue Forensics (YC W15) • 10 years experience in data engineering • Worked with Spark since 2014 5#SAISDS2
  • 6. • Senior Data Engineer @ Eventbrite • MS in Computer Science - Distributed Systems (Vanderbilt University) • 4 years experience in data engineering • Worked with Spark since 2014 6#SAISDS2 Alex Meyer - [email protected]
  • 8. Common Predictor Workflow • High coupling between engineers and data scientists • Mostly serial workflow • High barrier to entry • Too many contributors • Code duplication 8#SAISDS2
  • 9. Improved Predictor Workflow • Low coupling between engineers and data scientists • Independent Workflows • Data scientists own their models end-to- end • Data Engineering isn't a bottleneck 9#SAISDS2
  • 10. Predictor Code 10#SAISDS2 Model ManagementData prep and cleanup ● Training and prediction code can be inconsistent ● Sample data prep can be different than prod data prep Feature Extraction Prediction ● Training and prediction code can be inconsistent ● Mostly written for vertical scaling ● Version management is hard ● Can use a lot of memory ● It can be hard to switch between models ● Bulk vs single- item
  • 11. Predictor Deployment Problems • Shared code between dev, batch, and streaming is an afterthought • Most models are written for vertical scaling first • Deployment is ad-hoc without a common structure • Model iteration is slow because of lack of automation • Model versioning isn't consistent without a library 11#SAISDS2
  • 12. Predictor Structure 12#SAISDS2 Notebook Offline Prediction Streaming Prediction Data prep and cleanup Query to a local csv Convert to incremental query Convert to read stream Feature Extraction Pandas dataframes and python functions Convert to spark dataframe or rdd operations Convert to dataframe operations or foreachBatch in Spark 2.4 Load Model Load from a local pickle Load from s3 or hdfs onto executors Load from s3 or hdfs onto executors Predict Mixed into scoring logic Mapper or UDF on features UDF on feature rows
  • 13. 13 Predictor Class • Manages Model – Versioning – Storage – Loading • Outlines structure – Data loading – Feature extraction – Prediction • Batch and streaming • Enables Automation #SAISDS2
  • 14. Example Predictor and Demo #SAISDS2
  • 15. Demo - Latent Dirichlet Allocation (LDA) • Generate topics on Eventbrite's event description corpus • Get topic probabilities per event • We can use topics to improve search, browse, and personalization • LDA Wiki • LDA Scikit Learn Model <open notebook> 15#SAISDS2
  • 16. Takeaways • Consistent predictor structure makes distributed prediction easy to automate deployment • Streaming and batch prediction can share code • Use bulk feature extraction and prediction often • We may opensource our predictor library • We're hiring! 16
  • 17. Thanks! Questions? Feel free to reach out! 17#SAISDS2