SlideShare a Scribd company logo
robertwdempsey.com
Building a
Production-Level
Machine Learning Pipeline
Robert Dempsey, CEO
Atlantic Dominion Solutions
robertwdempsey.com Production ML Pipelines
Robert Dempsey
2
Entrepreneur, Software Engineer
Books and online courses
Lotus Guides, District Data Labs
Atlantic Dominion Solutions, LLC
Professional
Author
Instructor
Owner
robertwdempsey.com Production ML Pipelines
We’ve mastered three jobs so you can
focus on one - growing your business.
3
robertwdempsey.com Production ML Pipelines
The Three Jobs
At Atlantic Dominion Solutions we perform three functions for our
customers:
Consulting: we assess and advise in the areas of technology, team and
process to determine how machine learning can have the biggest impact on
your business.
Implementation: after a strategy session to determine the work you need we
get to work using our proven methodology and begin delivering smarter
applications.
Training: continuous improvement requires continuous learning. We provide
both on-premises and online training.
4
robertwdempsey.com Production ML Pipelines
Writing the Book
Co-authoring the book Building
Machine Learning Pipelines.
Written for software developers and
data scientists, Building Machine
Learning Pipelines teaches the skills
required to create and use the
infrastructure needed to run modern
intelligent systems.
machinelearningpipelines.com
5
robertwdempsey.com Production ML Pipelines6
What’s your biggest issue?
robertwdempsey.com Production ML Pipelines7
Technology is LEAST important
robertwdempsey.com Production ML Pipelines8
The REPORT Framework™
robertwdempsey.com Production ML Pipelines
REPORT Framework™
Risk Tolerance
Expectations
Product
Operations
Results
Team
9
robertwdempsey.com Production ML Pipelines
Risk Tolerance
Question: How risk averse are you?
Some companies happily deploy beta and release candidate versions of cutting
edge open source software. Others enjoy the freedom of open source and look
for only mature applications. And yet a third category swear off open source
all together and only buy software that comes with a license and a support
contract. Where does your company sit on the risk aversion spectrum?
Question: What are your non-technology risks?
Technology aside, what happens if your project fails? Do you get fired? Does
the entire team get fired? Do the naysayers get to say “I told you so” in a
meeting?
10
robertwdempsey.com Production ML Pipelines
Expectations
Question: What are the expectations around the project?
Here are a few questions to get you started:
• Non-Technical
• How long do you think the project will take? How much do you
expect it to cost?
• What are others expecting the system will be able to do?
• Technical
• How much volume does the system need to be able to process? In
what amount of time?
• What level of downtime can you absorb?
11
robertwdempsey.com Production ML Pipelines
Product
Question: What does the product roadmap say?
At a minimum a bullet point list will help set the expectations of others,
and allow you to make trade-offs as the project moves forward. It also
helps you measure results - discussed later - on an incremental basis,
which will help your team know if they are making progress, or not.
Question: What’s the budget and estimated ROI?
As with expectations and product roadmap, whether formalized or not,
there is always, or should always be a budget as well as an estimated
ROI. Write it down and use it as one of your metrics.
12
robertwdempsey.com Production ML Pipelines
Operations
Question: Got DevOps?
DevOps, sometimes called TechOps, is a group that manages
and maintains the technology infrastructure of the organization.
Just because you have a DevOps team doesn’t mean you want
to add additional strain on them by firing up more servers.
With cloud providers like AWS you still have to do some
infrastructure support and maintenance. The larger your
business the more support work there will be.
13
robertwdempsey.com Production ML Pipelines
Results
Question: What does the end result look like?
Here’s a very partial list of results we’ve seen measured:
• The project was completed on X date by X time.
• The project cost $X amount of money to complete.
• The team worked no more than 40 hours each week to get
the project done.
• X, Y and Z features are in the product and have 90%
automated test coverage.
14
robertwdempsey.com Production ML Pipelines
Team
Question: Are the right people on the bus to get the project completed?
Having the right people with the right skills, both hard and soft, can
make or break a project.
Question: Does each team member have the tools and support they
need to be successful?
• Does the team have the support of senior leadership?
• Are they going to encounter a deluge of bureaucratic red tape that
will slow their progress?
• Are development and testing environments available?
15
robertwdempsey.com Production ML Pipelines
ML Pipeline
Toolbox
16
robertwdempsey.com Production ML Pipelines
The “Standard” ML Pipeline
17
Collect Store Enrich
Train /
Apply
Visualize
Infrastructure
robertwdempsey.com Production ML Pipelines
Infrastructure
• Servers
• Amazon EC2
• Data center
• Container Technologies
• Docker
• Amazon Elastic Container Service (ECS)
18
robertwdempsey.com Production ML Pipelines
Collect
• Programming Languages
• Python
• Scala
• Go
• R
• Pre-Built Tools
• Pentaho Data Integration
• Various web scraping tools
19
robertwdempsey.com Production ML Pipelines
Store
• Elasticsearch
• Apache Kafka
• Redis
• Cassandra
• MongoDB
• SQL
• Amazon S3
• HDFS
• Many others
20
robertwdempsey.com Production ML Pipelines
Enrich
• Apache Storm
• Apache Spark
• Amazon Elastic MapReduce (EMR)
• Apache Nifi
• Airflow (Airbnb)
21
robertwdempsey.com Production ML Pipelines
Train / Apply
• Python Libraries
• Scikit-learn
• Pandas
• Spark Libraries
• MLlib
• Deep Learning
• Tensorflow
• PyTorch
22
robertwdempsey.com Production ML Pipelines
Visualize
• Kibana
• Grafana
• Amazon Athena (for S3)
• Flask
• D3.js
23
robertwdempsey.com Production ML Pipelines
Machine Learning
Pipeline Architectures
24
robertwdempsey.com Production ML Pipelines
Architecture 1
25
Agent
File
System
Apache
Spark
File
System
Agent ES
1 2 3
robertwdempsey.com Production ML Pipelines
Architecture 1 Choices
This pipeline was built at a company building a new platform
using all leading-edge technologies, and was a temporary
solution until another pipeline was built.
• Risk Aversion: not an issue.
• Expectations: the pipeline needed to be run in production
and be able to handle the amount of data the company had
in a timely fashion.
• Product: this was a short-term solution to process data until
the desired pipeline was ready to be deployed into
production.
26
robertwdempsey.com Production ML Pipelines
Architecture 1 Choices
• Operations: due to its simplicity and limited functionality,
the solution became a one-server solution deployed by an
engineer working in unison with an internal devops team
member.
• Results: the pipeline was deployed on time and was able to
process all the data within the parameters
• Team: after a consultant built the first version of the
application an internal team member took over and
deployed it into production.
27
robertwdempsey.com Production ML Pipelines
Architecture 2
28
Agent
1 2 3
Agent
Agent
ES
S3
HDFS
Apache
Kafka
Apache
Storm
robertwdempsey.com Production ML Pipelines
Architecture 2 Choices
This pipeline was built at a startup focused on data collection
and was core to the product.
• Risk Aversion: this was the second version of a previously
developed and well proven pipeline so risk aversion was low.
• Expectations: as a core product the pipeline was expected to
be continuously evolving, able to be horizontally scaled, able
to handle a growing amount of data, and have 100% uptime.
• Product: the functionality built was in line with a product
roadmap that was reviewed on a monthly basis.
29
robertwdempsey.com Production ML Pipelines
Architecture 2 Choices
• Operations: an internal devops team managed the
infrastructure while engineers were expected to support the
associated applications and data processors
• Results: the pipeline could be horizontally scaled, handled
between 1-2TB of data per day, and had 99.9% uptime.
• Team: the devops and engineering teams worked together
to produce and support it.
30
robertwdempsey.com Production ML Pipelines
Architecture 3
31
Agent
1 2 3
Agent
Agent
Athena
S3
S3
Apache
Spark
robertwdempsey.com Production ML Pipelines
Architecture 3 Choices
This pipeline was built at a company building a new platform
using all leading-edge technologies, and was a temporary
solution until another pipeline was built.
• Risk Aversion: this system was mission critical for delivering
data in real-time to customers. Failure was not an option so
best in class practices needed to be implemented included
using hosted solutions such as Databricks and S3.
• Expectations: this system would scale as data collection
efforts grew and would be extremely fault tolerant.
32
robertwdempsey.com Production ML Pipelines
Architecture 3 Choices
• Product: this system would be extended to accommodate
additional product offerings so flexibility was important.
• Operations: this system was maintained by the engineers
who built it as there no separate devops team.
• Results: the system processed several TBs of data per hour
(need to double check this) with minimal downtime.
• Team: the team supporting the pipeline set up monitoring
and alerting to ensure uptime and worked with other
engineering groups to deconflict deployments that might
impact the pipeline.
33
robertwdempsey.com Production ML Pipelines
Architecture 4
34
Agent
1 2 3
Agent
Agent
ES
S3
HDFS
Apache
Kafka
Apache
Spark
HBase
robertwdempsey.com Production ML Pipelines
Architecture 4 Choices
This pipeline was built at a company building a new platform using all
leading-edge technologies, and was a temporary solution until another
pipeline was built.
• Risk Aversion: this system supported a key customer and was being
implemented as a means to resolve data loss and data discrepancies
that had plagued a legacy system.
• Expectations: this system would be resilient in the event of an outage
so that no data would be lost.
• Product: this system would ultimately be replaced by a more general
system designed to support multiple customers, so it was considered
extremely critical yet a one-off.
35
robertwdempsey.com Production ML Pipelines
Architecture 4 Choices
• Operations: this system was maintained by the engineers
who built it as at the time there was no technical operations
team in place.
• Results: the system processed hundreds of GBs of data per
day with infrequent outages.
• Team: once deployed, the team of developers who built this
pipeline began work on incorporating its features into a
more generalized stream processing platform.
36
robertwdempsey.com Production ML Pipelines
Q&A
37
robertwdempsey.com Production ML Pipelines
Free Guide
robertwdempsey.com/machineryai
38
robertwdempsey.com Production ML Pipelines
Where to Find Me
Website
Lotus Guides
LinkedIn
Twitter
Github
39
robertwdempsey.com
lotusguides.com
robertwdempsey
rdempsey
rdempsey
robertwdempsey.com Production ML Pipelines
Thank You!
40
Ad

More Related Content

What's hot (20)

ML at the Edge: Building Your Production Pipeline with Apache Spark and Tens...
 ML at the Edge: Building Your Production Pipeline with Apache Spark and Tens... ML at the Edge: Building Your Production Pipeline with Apache Spark and Tens...
ML at the Edge: Building Your Production Pipeline with Apache Spark and Tens...
Databricks
 
Magdalena Stenius: MLOPS Will Change Machine Learning
Magdalena Stenius: MLOPS Will Change Machine LearningMagdalena Stenius: MLOPS Will Change Machine Learning
Magdalena Stenius: MLOPS Will Change Machine Learning
Lviv Startup Club
 
Seamless MLOps with Seldon and MLflow
Seamless MLOps with Seldon and MLflowSeamless MLOps with Seldon and MLflow
Seamless MLOps with Seldon and MLflow
Databricks
 
Developing ML-enabled Data Pipelines on Databricks using IDE & CI/CD at Runta...
Developing ML-enabled Data Pipelines on Databricks using IDE & CI/CD at Runta...Developing ML-enabled Data Pipelines on Databricks using IDE & CI/CD at Runta...
Developing ML-enabled Data Pipelines on Databricks using IDE & CI/CD at Runta...
Databricks
 
Provenance in Production-Grade Machine Learning
Provenance in Production-Grade Machine LearningProvenance in Production-Grade Machine Learning
Provenance in Production-Grade Machine Learning
Anand Sampat
 
Richard Coffey (x18140785) - Research in Computing CA2
Richard Coffey (x18140785) - Research in Computing CA2Richard Coffey (x18140785) - Research in Computing CA2
Richard Coffey (x18140785) - Research in Computing CA2
Richard Coffey
 
Version Control in Machine Learning + AI (Stanford)
Version Control in Machine Learning + AI (Stanford)Version Control in Machine Learning + AI (Stanford)
Version Control in Machine Learning + AI (Stanford)
Anand Sampat
 
Challenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in ProductionChallenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in Production
iguazio
 
MLOps - Build pipelines with Tensor Flow Extended & Kubeflow
MLOps - Build pipelines with Tensor Flow Extended & KubeflowMLOps - Build pipelines with Tensor Flow Extended & Kubeflow
MLOps - Build pipelines with Tensor Flow Extended & Kubeflow
Jan Kirenz
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
Provectus
 
Ai use cases
Ai use casesAi use cases
Ai use cases
Sparsh Agarwal
 
Deep Learning for Natural Language Processing Using Apache Spark and TensorFl...
Deep Learning for Natural Language Processing Using Apache Spark and TensorFl...Deep Learning for Natural Language Processing Using Apache Spark and TensorFl...
Deep Learning for Natural Language Processing Using Apache Spark and TensorFl...
Databricks
 
Managers guide to effective building of machine learning products
Managers guide to effective building of machine learning productsManagers guide to effective building of machine learning products
Managers guide to effective building of machine learning products
Gianmario Spacagna
 
Feature drift monitoring as a service for machine learning models at scale
Feature drift monitoring as a service for machine learning models at scaleFeature drift monitoring as a service for machine learning models at scale
Feature drift monitoring as a service for machine learning models at scale
Noriaki Tatsumi
 
Infrastructure Solutions for Deploying AI/ML/DL Workloads at Scale
Infrastructure Solutions for Deploying AI/ML/DL Workloads at ScaleInfrastructure Solutions for Deploying AI/ML/DL Workloads at Scale
Infrastructure Solutions for Deploying AI/ML/DL Workloads at Scale
Robb Boyd
 
Ml infra at an early stage
Ml infra at an early stageMl infra at an early stage
Ml infra at an early stage
Nick Handel
 
Vertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflowsVertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflows
Márton Kodok
 
Py data scikit-production
Py data scikit-productionPy data scikit-production
Py data scikit-production
Turi, Inc.
 
Simplifying AI integration on Apache Spark
Simplifying AI integration on Apache SparkSimplifying AI integration on Apache Spark
Simplifying AI integration on Apache Spark
Databricks
 
Machine Learning Teams - Full Stack Deep Learning
Machine Learning Teams - Full Stack Deep LearningMachine Learning Teams - Full Stack Deep Learning
Machine Learning Teams - Full Stack Deep Learning
Sergey Karayev
 
ML at the Edge: Building Your Production Pipeline with Apache Spark and Tens...
 ML at the Edge: Building Your Production Pipeline with Apache Spark and Tens... ML at the Edge: Building Your Production Pipeline with Apache Spark and Tens...
ML at the Edge: Building Your Production Pipeline with Apache Spark and Tens...
Databricks
 
Magdalena Stenius: MLOPS Will Change Machine Learning
Magdalena Stenius: MLOPS Will Change Machine LearningMagdalena Stenius: MLOPS Will Change Machine Learning
Magdalena Stenius: MLOPS Will Change Machine Learning
Lviv Startup Club
 
Seamless MLOps with Seldon and MLflow
Seamless MLOps with Seldon and MLflowSeamless MLOps with Seldon and MLflow
Seamless MLOps with Seldon and MLflow
Databricks
 
Developing ML-enabled Data Pipelines on Databricks using IDE & CI/CD at Runta...
Developing ML-enabled Data Pipelines on Databricks using IDE & CI/CD at Runta...Developing ML-enabled Data Pipelines on Databricks using IDE & CI/CD at Runta...
Developing ML-enabled Data Pipelines on Databricks using IDE & CI/CD at Runta...
Databricks
 
Provenance in Production-Grade Machine Learning
Provenance in Production-Grade Machine LearningProvenance in Production-Grade Machine Learning
Provenance in Production-Grade Machine Learning
Anand Sampat
 
Richard Coffey (x18140785) - Research in Computing CA2
Richard Coffey (x18140785) - Research in Computing CA2Richard Coffey (x18140785) - Research in Computing CA2
Richard Coffey (x18140785) - Research in Computing CA2
Richard Coffey
 
Version Control in Machine Learning + AI (Stanford)
Version Control in Machine Learning + AI (Stanford)Version Control in Machine Learning + AI (Stanford)
Version Control in Machine Learning + AI (Stanford)
Anand Sampat
 
Challenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in ProductionChallenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in Production
iguazio
 
MLOps - Build pipelines with Tensor Flow Extended & Kubeflow
MLOps - Build pipelines with Tensor Flow Extended & KubeflowMLOps - Build pipelines with Tensor Flow Extended & Kubeflow
MLOps - Build pipelines with Tensor Flow Extended & Kubeflow
Jan Kirenz
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
Provectus
 
Deep Learning for Natural Language Processing Using Apache Spark and TensorFl...
Deep Learning for Natural Language Processing Using Apache Spark and TensorFl...Deep Learning for Natural Language Processing Using Apache Spark and TensorFl...
Deep Learning for Natural Language Processing Using Apache Spark and TensorFl...
Databricks
 
Managers guide to effective building of machine learning products
Managers guide to effective building of machine learning productsManagers guide to effective building of machine learning products
Managers guide to effective building of machine learning products
Gianmario Spacagna
 
Feature drift monitoring as a service for machine learning models at scale
Feature drift monitoring as a service for machine learning models at scaleFeature drift monitoring as a service for machine learning models at scale
Feature drift monitoring as a service for machine learning models at scale
Noriaki Tatsumi
 
Infrastructure Solutions for Deploying AI/ML/DL Workloads at Scale
Infrastructure Solutions for Deploying AI/ML/DL Workloads at ScaleInfrastructure Solutions for Deploying AI/ML/DL Workloads at Scale
Infrastructure Solutions for Deploying AI/ML/DL Workloads at Scale
Robb Boyd
 
Ml infra at an early stage
Ml infra at an early stageMl infra at an early stage
Ml infra at an early stage
Nick Handel
 
Vertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflowsVertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflows
Márton Kodok
 
Py data scikit-production
Py data scikit-productionPy data scikit-production
Py data scikit-production
Turi, Inc.
 
Simplifying AI integration on Apache Spark
Simplifying AI integration on Apache SparkSimplifying AI integration on Apache Spark
Simplifying AI integration on Apache Spark
Databricks
 
Machine Learning Teams - Full Stack Deep Learning
Machine Learning Teams - Full Stack Deep LearningMachine Learning Teams - Full Stack Deep Learning
Machine Learning Teams - Full Stack Deep Learning
Sergey Karayev
 

Viewers also liked (14)

Production machine learning_infrastructure
Production machine learning_infrastructureProduction machine learning_infrastructure
Production machine learning_infrastructure
joshwills
 
Serverless machine learning operations
Serverless machine learning operationsServerless machine learning operations
Serverless machine learning operations
Stepan Pushkarev
 
PostgreSQL + Kafka: The Delight of Change Data Capture
PostgreSQL + Kafka: The Delight of Change Data CapturePostgreSQL + Kafka: The Delight of Change Data Capture
PostgreSQL + Kafka: The Delight of Change Data Capture
Jeff Klukas
 
Square's Machine Learning Infrastructure and Applications - Rong Yan
Square's Machine Learning Infrastructure and Applications - Rong YanSquare's Machine Learning Infrastructure and Applications - Rong Yan
Square's Machine Learning Infrastructure and Applications - Rong Yan
Hakka Labs
 
Python as part of a production machine learning stack by Michael Manapat PyDa...
Python as part of a production machine learning stack by Michael Manapat PyDa...Python as part of a production machine learning stack by Michael Manapat PyDa...
Python as part of a production machine learning stack by Michael Manapat PyDa...
PyData
 
Production and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning ModelsProduction and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning Models
Turi, Inc.
 
Multi runtime serving pipelines for machine learning
Multi runtime serving pipelines for machine learningMulti runtime serving pipelines for machine learning
Multi runtime serving pipelines for machine learning
Stepan Pushkarev
 
Machine learning in production with scikit-learn
Machine learning in production with scikit-learnMachine learning in production with scikit-learn
Machine learning in production with scikit-learn
Jeff Klukas
 
Using PySpark to Process Boat Loads of Data
Using PySpark to Process Boat Loads of DataUsing PySpark to Process Boat Loads of Data
Using PySpark to Process Boat Loads of Data
Robert Dempsey
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
Jose Quesada (hiring)
 
Machine Learning In Production
Machine Learning In ProductionMachine Learning In Production
Machine Learning In Production
Samir Bessalah
 
Machine Learning Pipelines
Machine Learning PipelinesMachine Learning Pipelines
Machine Learning Pipelines
jeykottalam
 
Spark and machine learning in microservices architecture
Spark and machine learning in microservices architectureSpark and machine learning in microservices architecture
Spark and machine learning in microservices architecture
Stepan Pushkarev
 
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
Carol Smith
 
Production machine learning_infrastructure
Production machine learning_infrastructureProduction machine learning_infrastructure
Production machine learning_infrastructure
joshwills
 
Serverless machine learning operations
Serverless machine learning operationsServerless machine learning operations
Serverless machine learning operations
Stepan Pushkarev
 
PostgreSQL + Kafka: The Delight of Change Data Capture
PostgreSQL + Kafka: The Delight of Change Data CapturePostgreSQL + Kafka: The Delight of Change Data Capture
PostgreSQL + Kafka: The Delight of Change Data Capture
Jeff Klukas
 
Square's Machine Learning Infrastructure and Applications - Rong Yan
Square's Machine Learning Infrastructure and Applications - Rong YanSquare's Machine Learning Infrastructure and Applications - Rong Yan
Square's Machine Learning Infrastructure and Applications - Rong Yan
Hakka Labs
 
Python as part of a production machine learning stack by Michael Manapat PyDa...
Python as part of a production machine learning stack by Michael Manapat PyDa...Python as part of a production machine learning stack by Michael Manapat PyDa...
Python as part of a production machine learning stack by Michael Manapat PyDa...
PyData
 
Production and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning ModelsProduction and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning Models
Turi, Inc.
 
Multi runtime serving pipelines for machine learning
Multi runtime serving pipelines for machine learningMulti runtime serving pipelines for machine learning
Multi runtime serving pipelines for machine learning
Stepan Pushkarev
 
Machine learning in production with scikit-learn
Machine learning in production with scikit-learnMachine learning in production with scikit-learn
Machine learning in production with scikit-learn
Jeff Klukas
 
Using PySpark to Process Boat Loads of Data
Using PySpark to Process Boat Loads of DataUsing PySpark to Process Boat Loads of Data
Using PySpark to Process Boat Loads of Data
Robert Dempsey
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
Jose Quesada (hiring)
 
Machine Learning In Production
Machine Learning In ProductionMachine Learning In Production
Machine Learning In Production
Samir Bessalah
 
Machine Learning Pipelines
Machine Learning PipelinesMachine Learning Pipelines
Machine Learning Pipelines
jeykottalam
 
Spark and machine learning in microservices architecture
Spark and machine learning in microservices architectureSpark and machine learning in microservices architecture
Spark and machine learning in microservices architecture
Stepan Pushkarev
 
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
Carol Smith
 
Ad

Similar to Building A Production-Level Machine Learning Pipeline (20)

Introduction to Agile Hardware
Introduction to Agile Hardware Introduction to Agile Hardware
Introduction to Agile Hardware
Cprime
 
Agileand saas davepatterson_armandofox_050813webinar
Agileand saas davepatterson_armandofox_050813webinarAgileand saas davepatterson_armandofox_050813webinar
Agileand saas davepatterson_armandofox_050813webinar
Roberto Jr. Figueroa
 
[DPE Summit] How Improving the Testing Experience Goes Beyond Quality: A Deve...
[DPE Summit] How Improving the Testing Experience Goes Beyond Quality: A Deve...[DPE Summit] How Improving the Testing Experience Goes Beyond Quality: A Deve...
[DPE Summit] How Improving the Testing Experience Goes Beyond Quality: A Deve...
Roberto Pérez Alcolea
 
Developing apps faster
Developing apps fasterDeveloping apps faster
Developing apps faster
Zend by Rogue Wave Software
 
Capstone- Milestone 3
Capstone- Milestone 3Capstone- Milestone 3
Capstone- Milestone 3
BrittanyDavis100580
 
Metrics to Power DevOps
Metrics to Power DevOpsMetrics to Power DevOps
Metrics to Power DevOps
CollabNet
 
Modern software architect post the agile wave
Modern software architect post the agile waveModern software architect post the agile wave
Modern software architect post the agile wave
Niels Bech Nielsen
 
Introduction to DevOps slides.pdf
Introduction to DevOps slides.pdfIntroduction to DevOps slides.pdf
Introduction to DevOps slides.pdf
BoreVishnusai
 
Agile Development – Why requirements matter
Agile Development – Why requirements matterAgile Development – Why requirements matter
Agile Development – Why requirements matter
Agile Austria Conference
 
Development And Operations PowerPoint Presentation Slides
Development And Operations PowerPoint Presentation Slides Development And Operations PowerPoint Presentation Slides
Development And Operations PowerPoint Presentation Slides
SlideTeam
 
OOP 2014 - Lifecycle By Design
OOP 2014 - Lifecycle By DesignOOP 2014 - Lifecycle By Design
OOP 2014 - Lifecycle By Design
Wolfgang Gottesheim
 
Agile Governance for Hybrid Programs
Agile Governance for Hybrid ProgramsAgile Governance for Hybrid Programs
Agile Governance for Hybrid Programs
Cprime
 
Building and Scaling High Performing Technology Organizations by Jez Humble a...
Building and Scaling High Performing Technology Organizations by Jez Humble a...Building and Scaling High Performing Technology Organizations by Jez Humble a...
Building and Scaling High Performing Technology Organizations by Jez Humble a...
Agile India
 
Moving to Agile Methods and DevOps on IBM i with ARCAD Pack for Rational 1479...
Moving to Agile Methods and DevOps on IBM i with ARCAD Pack for Rational 1479...Moving to Agile Methods and DevOps on IBM i with ARCAD Pack for Rational 1479...
Moving to Agile Methods and DevOps on IBM i with ARCAD Pack for Rational 1479...
Philippe Krief
 
Driving Faster Analytics at Symphony Health
Driving Faster Analytics at Symphony HealthDriving Faster Analytics at Symphony Health
Driving Faster Analytics at Symphony Health
Precisely
 
Open / Drupal Camp Presentation: Brent Bice
Open / Drupal Camp Presentation: Brent BiceOpen / Drupal Camp Presentation: Brent Bice
Open / Drupal Camp Presentation: Brent Bice
LevelTen Interactive
 
Utils_Presentation_Richard U
Utils_Presentation_Richard UUtils_Presentation_Richard U
Utils_Presentation_Richard U
Richard Uytdewilligen
 
Webinar: Ten Ways to Enhance Your Salesforce.com Application in 2013
Webinar: Ten Ways to Enhance Your Salesforce.com Application in 2013Webinar: Ten Ways to Enhance Your Salesforce.com Application in 2013
Webinar: Ten Ways to Enhance Your Salesforce.com Application in 2013
Emtec Inc.
 
Software architecture in a DevOps world
Software architecture in a DevOps worldSoftware architecture in a DevOps world
Software architecture in a DevOps world
Bert Jan Schrijver
 
Devops
DevopsDevops
Devops
Sun Technlogies
 
Introduction to Agile Hardware
Introduction to Agile Hardware Introduction to Agile Hardware
Introduction to Agile Hardware
Cprime
 
Agileand saas davepatterson_armandofox_050813webinar
Agileand saas davepatterson_armandofox_050813webinarAgileand saas davepatterson_armandofox_050813webinar
Agileand saas davepatterson_armandofox_050813webinar
Roberto Jr. Figueroa
 
[DPE Summit] How Improving the Testing Experience Goes Beyond Quality: A Deve...
[DPE Summit] How Improving the Testing Experience Goes Beyond Quality: A Deve...[DPE Summit] How Improving the Testing Experience Goes Beyond Quality: A Deve...
[DPE Summit] How Improving the Testing Experience Goes Beyond Quality: A Deve...
Roberto Pérez Alcolea
 
Metrics to Power DevOps
Metrics to Power DevOpsMetrics to Power DevOps
Metrics to Power DevOps
CollabNet
 
Modern software architect post the agile wave
Modern software architect post the agile waveModern software architect post the agile wave
Modern software architect post the agile wave
Niels Bech Nielsen
 
Introduction to DevOps slides.pdf
Introduction to DevOps slides.pdfIntroduction to DevOps slides.pdf
Introduction to DevOps slides.pdf
BoreVishnusai
 
Agile Development – Why requirements matter
Agile Development – Why requirements matterAgile Development – Why requirements matter
Agile Development – Why requirements matter
Agile Austria Conference
 
Development And Operations PowerPoint Presentation Slides
Development And Operations PowerPoint Presentation Slides Development And Operations PowerPoint Presentation Slides
Development And Operations PowerPoint Presentation Slides
SlideTeam
 
Agile Governance for Hybrid Programs
Agile Governance for Hybrid ProgramsAgile Governance for Hybrid Programs
Agile Governance for Hybrid Programs
Cprime
 
Building and Scaling High Performing Technology Organizations by Jez Humble a...
Building and Scaling High Performing Technology Organizations by Jez Humble a...Building and Scaling High Performing Technology Organizations by Jez Humble a...
Building and Scaling High Performing Technology Organizations by Jez Humble a...
Agile India
 
Moving to Agile Methods and DevOps on IBM i with ARCAD Pack for Rational 1479...
Moving to Agile Methods and DevOps on IBM i with ARCAD Pack for Rational 1479...Moving to Agile Methods and DevOps on IBM i with ARCAD Pack for Rational 1479...
Moving to Agile Methods and DevOps on IBM i with ARCAD Pack for Rational 1479...
Philippe Krief
 
Driving Faster Analytics at Symphony Health
Driving Faster Analytics at Symphony HealthDriving Faster Analytics at Symphony Health
Driving Faster Analytics at Symphony Health
Precisely
 
Open / Drupal Camp Presentation: Brent Bice
Open / Drupal Camp Presentation: Brent BiceOpen / Drupal Camp Presentation: Brent Bice
Open / Drupal Camp Presentation: Brent Bice
LevelTen Interactive
 
Webinar: Ten Ways to Enhance Your Salesforce.com Application in 2013
Webinar: Ten Ways to Enhance Your Salesforce.com Application in 2013Webinar: Ten Ways to Enhance Your Salesforce.com Application in 2013
Webinar: Ten Ways to Enhance Your Salesforce.com Application in 2013
Emtec Inc.
 
Software architecture in a DevOps world
Software architecture in a DevOps worldSoftware architecture in a DevOps world
Software architecture in a DevOps world
Bert Jan Schrijver
 
Ad

More from Robert Dempsey (20)

Analyzing Semi-Structured Data At Volume In The Cloud
Analyzing Semi-Structured Data At Volume In The CloudAnalyzing Semi-Structured Data At Volume In The Cloud
Analyzing Semi-Structured Data At Volume In The Cloud
Robert Dempsey
 
Practical Predictive Modeling in Python
Practical Predictive Modeling in PythonPractical Predictive Modeling in Python
Practical Predictive Modeling in Python
Robert Dempsey
 
Creating Your First Predictive Model In Python
Creating Your First Predictive Model In PythonCreating Your First Predictive Model In Python
Creating Your First Predictive Model In Python
Robert Dempsey
 
Growth Hacking 101
Growth Hacking 101Growth Hacking 101
Growth Hacking 101
Robert Dempsey
 
Web Scraping With Python
Web Scraping With PythonWeb Scraping With Python
Web Scraping With Python
Robert Dempsey
 
DC Python Intro Slides - Rob's Version
DC Python Intro Slides - Rob's VersionDC Python Intro Slides - Rob's Version
DC Python Intro Slides - Rob's Version
Robert Dempsey
 
Content Marketing Strategy for 2013
Content Marketing Strategy for 2013Content Marketing Strategy for 2013
Content Marketing Strategy for 2013
Robert Dempsey
 
Creating Lead-Generating Social Media Campaigns
Creating Lead-Generating Social Media CampaignsCreating Lead-Generating Social Media Campaigns
Creating Lead-Generating Social Media Campaigns
Robert Dempsey
 
Goal Writing Workshop
Goal Writing WorkshopGoal Writing Workshop
Goal Writing Workshop
Robert Dempsey
 
Google AdWords Introduction
Google AdWords IntroductionGoogle AdWords Introduction
Google AdWords Introduction
Robert Dempsey
 
20 Tips For Freelance Success
20 Tips For Freelance Success20 Tips For Freelance Success
20 Tips For Freelance Success
Robert Dempsey
 
How To Turn Your Business Into A Media Powerhouse
How To Turn Your Business Into A Media PowerhouseHow To Turn Your Business Into A Media Powerhouse
How To Turn Your Business Into A Media Powerhouse
Robert Dempsey
 
Agile Teams as Innovation Teams
Agile Teams as Innovation TeamsAgile Teams as Innovation Teams
Agile Teams as Innovation Teams
Robert Dempsey
 
Introduction to kanban
Introduction to kanbanIntroduction to kanban
Introduction to kanban
Robert Dempsey
 
Get The **** Up And Market
Get The **** Up And MarketGet The **** Up And Market
Get The **** Up And Market
Robert Dempsey
 
Introduction To Inbound Marketing
Introduction To Inbound MarketingIntroduction To Inbound Marketing
Introduction To Inbound Marketing
Robert Dempsey
 
Writing Agile Requirements
Writing  Agile  RequirementsWriting  Agile  Requirements
Writing Agile Requirements
Robert Dempsey
 
Twitter For Business
Twitter For BusinessTwitter For Business
Twitter For Business
Robert Dempsey
 
Introduction To Scrum For Managers
Introduction To Scrum For ManagersIntroduction To Scrum For Managers
Introduction To Scrum For Managers
Robert Dempsey
 
Introduction to Agile for Managers
Introduction to Agile for ManagersIntroduction to Agile for Managers
Introduction to Agile for Managers
Robert Dempsey
 
Analyzing Semi-Structured Data At Volume In The Cloud
Analyzing Semi-Structured Data At Volume In The CloudAnalyzing Semi-Structured Data At Volume In The Cloud
Analyzing Semi-Structured Data At Volume In The Cloud
Robert Dempsey
 
Practical Predictive Modeling in Python
Practical Predictive Modeling in PythonPractical Predictive Modeling in Python
Practical Predictive Modeling in Python
Robert Dempsey
 
Creating Your First Predictive Model In Python
Creating Your First Predictive Model In PythonCreating Your First Predictive Model In Python
Creating Your First Predictive Model In Python
Robert Dempsey
 
Web Scraping With Python
Web Scraping With PythonWeb Scraping With Python
Web Scraping With Python
Robert Dempsey
 
DC Python Intro Slides - Rob's Version
DC Python Intro Slides - Rob's VersionDC Python Intro Slides - Rob's Version
DC Python Intro Slides - Rob's Version
Robert Dempsey
 
Content Marketing Strategy for 2013
Content Marketing Strategy for 2013Content Marketing Strategy for 2013
Content Marketing Strategy for 2013
Robert Dempsey
 
Creating Lead-Generating Social Media Campaigns
Creating Lead-Generating Social Media CampaignsCreating Lead-Generating Social Media Campaigns
Creating Lead-Generating Social Media Campaigns
Robert Dempsey
 
Google AdWords Introduction
Google AdWords IntroductionGoogle AdWords Introduction
Google AdWords Introduction
Robert Dempsey
 
20 Tips For Freelance Success
20 Tips For Freelance Success20 Tips For Freelance Success
20 Tips For Freelance Success
Robert Dempsey
 
How To Turn Your Business Into A Media Powerhouse
How To Turn Your Business Into A Media PowerhouseHow To Turn Your Business Into A Media Powerhouse
How To Turn Your Business Into A Media Powerhouse
Robert Dempsey
 
Agile Teams as Innovation Teams
Agile Teams as Innovation TeamsAgile Teams as Innovation Teams
Agile Teams as Innovation Teams
Robert Dempsey
 
Introduction to kanban
Introduction to kanbanIntroduction to kanban
Introduction to kanban
Robert Dempsey
 
Get The **** Up And Market
Get The **** Up And MarketGet The **** Up And Market
Get The **** Up And Market
Robert Dempsey
 
Introduction To Inbound Marketing
Introduction To Inbound MarketingIntroduction To Inbound Marketing
Introduction To Inbound Marketing
Robert Dempsey
 
Writing Agile Requirements
Writing  Agile  RequirementsWriting  Agile  Requirements
Writing Agile Requirements
Robert Dempsey
 
Introduction To Scrum For Managers
Introduction To Scrum For ManagersIntroduction To Scrum For Managers
Introduction To Scrum For Managers
Robert Dempsey
 
Introduction to Agile for Managers
Introduction to Agile for ManagersIntroduction to Agile for Managers
Introduction to Agile for Managers
Robert Dempsey
 

Recently uploaded (20)

AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 

Building A Production-Level Machine Learning Pipeline

  • 1. robertwdempsey.com Building a Production-Level Machine Learning Pipeline Robert Dempsey, CEO Atlantic Dominion Solutions
  • 2. robertwdempsey.com Production ML Pipelines Robert Dempsey 2 Entrepreneur, Software Engineer Books and online courses Lotus Guides, District Data Labs Atlantic Dominion Solutions, LLC Professional Author Instructor Owner
  • 3. robertwdempsey.com Production ML Pipelines We’ve mastered three jobs so you can focus on one - growing your business. 3
  • 4. robertwdempsey.com Production ML Pipelines The Three Jobs At Atlantic Dominion Solutions we perform three functions for our customers: Consulting: we assess and advise in the areas of technology, team and process to determine how machine learning can have the biggest impact on your business. Implementation: after a strategy session to determine the work you need we get to work using our proven methodology and begin delivering smarter applications. Training: continuous improvement requires continuous learning. We provide both on-premises and online training. 4
  • 5. robertwdempsey.com Production ML Pipelines Writing the Book Co-authoring the book Building Machine Learning Pipelines. Written for software developers and data scientists, Building Machine Learning Pipelines teaches the skills required to create and use the infrastructure needed to run modern intelligent systems. machinelearningpipelines.com 5
  • 6. robertwdempsey.com Production ML Pipelines6 What’s your biggest issue?
  • 7. robertwdempsey.com Production ML Pipelines7 Technology is LEAST important
  • 8. robertwdempsey.com Production ML Pipelines8 The REPORT Framework™
  • 9. robertwdempsey.com Production ML Pipelines REPORT Framework™ Risk Tolerance Expectations Product Operations Results Team 9
  • 10. robertwdempsey.com Production ML Pipelines Risk Tolerance Question: How risk averse are you? Some companies happily deploy beta and release candidate versions of cutting edge open source software. Others enjoy the freedom of open source and look for only mature applications. And yet a third category swear off open source all together and only buy software that comes with a license and a support contract. Where does your company sit on the risk aversion spectrum? Question: What are your non-technology risks? Technology aside, what happens if your project fails? Do you get fired? Does the entire team get fired? Do the naysayers get to say “I told you so” in a meeting? 10
  • 11. robertwdempsey.com Production ML Pipelines Expectations Question: What are the expectations around the project? Here are a few questions to get you started: • Non-Technical • How long do you think the project will take? How much do you expect it to cost? • What are others expecting the system will be able to do? • Technical • How much volume does the system need to be able to process? In what amount of time? • What level of downtime can you absorb? 11
  • 12. robertwdempsey.com Production ML Pipelines Product Question: What does the product roadmap say? At a minimum a bullet point list will help set the expectations of others, and allow you to make trade-offs as the project moves forward. It also helps you measure results - discussed later - on an incremental basis, which will help your team know if they are making progress, or not. Question: What’s the budget and estimated ROI? As with expectations and product roadmap, whether formalized or not, there is always, or should always be a budget as well as an estimated ROI. Write it down and use it as one of your metrics. 12
  • 13. robertwdempsey.com Production ML Pipelines Operations Question: Got DevOps? DevOps, sometimes called TechOps, is a group that manages and maintains the technology infrastructure of the organization. Just because you have a DevOps team doesn’t mean you want to add additional strain on them by firing up more servers. With cloud providers like AWS you still have to do some infrastructure support and maintenance. The larger your business the more support work there will be. 13
  • 14. robertwdempsey.com Production ML Pipelines Results Question: What does the end result look like? Here’s a very partial list of results we’ve seen measured: • The project was completed on X date by X time. • The project cost $X amount of money to complete. • The team worked no more than 40 hours each week to get the project done. • X, Y and Z features are in the product and have 90% automated test coverage. 14
  • 15. robertwdempsey.com Production ML Pipelines Team Question: Are the right people on the bus to get the project completed? Having the right people with the right skills, both hard and soft, can make or break a project. Question: Does each team member have the tools and support they need to be successful? • Does the team have the support of senior leadership? • Are they going to encounter a deluge of bureaucratic red tape that will slow their progress? • Are development and testing environments available? 15
  • 16. robertwdempsey.com Production ML Pipelines ML Pipeline Toolbox 16
  • 17. robertwdempsey.com Production ML Pipelines The “Standard” ML Pipeline 17 Collect Store Enrich Train / Apply Visualize Infrastructure
  • 18. robertwdempsey.com Production ML Pipelines Infrastructure • Servers • Amazon EC2 • Data center • Container Technologies • Docker • Amazon Elastic Container Service (ECS) 18
  • 19. robertwdempsey.com Production ML Pipelines Collect • Programming Languages • Python • Scala • Go • R • Pre-Built Tools • Pentaho Data Integration • Various web scraping tools 19
  • 20. robertwdempsey.com Production ML Pipelines Store • Elasticsearch • Apache Kafka • Redis • Cassandra • MongoDB • SQL • Amazon S3 • HDFS • Many others 20
  • 21. robertwdempsey.com Production ML Pipelines Enrich • Apache Storm • Apache Spark • Amazon Elastic MapReduce (EMR) • Apache Nifi • Airflow (Airbnb) 21
  • 22. robertwdempsey.com Production ML Pipelines Train / Apply • Python Libraries • Scikit-learn • Pandas • Spark Libraries • MLlib • Deep Learning • Tensorflow • PyTorch 22
  • 23. robertwdempsey.com Production ML Pipelines Visualize • Kibana • Grafana • Amazon Athena (for S3) • Flask • D3.js 23
  • 24. robertwdempsey.com Production ML Pipelines Machine Learning Pipeline Architectures 24
  • 25. robertwdempsey.com Production ML Pipelines Architecture 1 25 Agent File System Apache Spark File System Agent ES 1 2 3
  • 26. robertwdempsey.com Production ML Pipelines Architecture 1 Choices This pipeline was built at a company building a new platform using all leading-edge technologies, and was a temporary solution until another pipeline was built. • Risk Aversion: not an issue. • Expectations: the pipeline needed to be run in production and be able to handle the amount of data the company had in a timely fashion. • Product: this was a short-term solution to process data until the desired pipeline was ready to be deployed into production. 26
  • 27. robertwdempsey.com Production ML Pipelines Architecture 1 Choices • Operations: due to its simplicity and limited functionality, the solution became a one-server solution deployed by an engineer working in unison with an internal devops team member. • Results: the pipeline was deployed on time and was able to process all the data within the parameters • Team: after a consultant built the first version of the application an internal team member took over and deployed it into production. 27
  • 28. robertwdempsey.com Production ML Pipelines Architecture 2 28 Agent 1 2 3 Agent Agent ES S3 HDFS Apache Kafka Apache Storm
  • 29. robertwdempsey.com Production ML Pipelines Architecture 2 Choices This pipeline was built at a startup focused on data collection and was core to the product. • Risk Aversion: this was the second version of a previously developed and well proven pipeline so risk aversion was low. • Expectations: as a core product the pipeline was expected to be continuously evolving, able to be horizontally scaled, able to handle a growing amount of data, and have 100% uptime. • Product: the functionality built was in line with a product roadmap that was reviewed on a monthly basis. 29
  • 30. robertwdempsey.com Production ML Pipelines Architecture 2 Choices • Operations: an internal devops team managed the infrastructure while engineers were expected to support the associated applications and data processors • Results: the pipeline could be horizontally scaled, handled between 1-2TB of data per day, and had 99.9% uptime. • Team: the devops and engineering teams worked together to produce and support it. 30
  • 31. robertwdempsey.com Production ML Pipelines Architecture 3 31 Agent 1 2 3 Agent Agent Athena S3 S3 Apache Spark
  • 32. robertwdempsey.com Production ML Pipelines Architecture 3 Choices This pipeline was built at a company building a new platform using all leading-edge technologies, and was a temporary solution until another pipeline was built. • Risk Aversion: this system was mission critical for delivering data in real-time to customers. Failure was not an option so best in class practices needed to be implemented included using hosted solutions such as Databricks and S3. • Expectations: this system would scale as data collection efforts grew and would be extremely fault tolerant. 32
  • 33. robertwdempsey.com Production ML Pipelines Architecture 3 Choices • Product: this system would be extended to accommodate additional product offerings so flexibility was important. • Operations: this system was maintained by the engineers who built it as there no separate devops team. • Results: the system processed several TBs of data per hour (need to double check this) with minimal downtime. • Team: the team supporting the pipeline set up monitoring and alerting to ensure uptime and worked with other engineering groups to deconflict deployments that might impact the pipeline. 33
  • 34. robertwdempsey.com Production ML Pipelines Architecture 4 34 Agent 1 2 3 Agent Agent ES S3 HDFS Apache Kafka Apache Spark HBase
  • 35. robertwdempsey.com Production ML Pipelines Architecture 4 Choices This pipeline was built at a company building a new platform using all leading-edge technologies, and was a temporary solution until another pipeline was built. • Risk Aversion: this system supported a key customer and was being implemented as a means to resolve data loss and data discrepancies that had plagued a legacy system. • Expectations: this system would be resilient in the event of an outage so that no data would be lost. • Product: this system would ultimately be replaced by a more general system designed to support multiple customers, so it was considered extremely critical yet a one-off. 35
  • 36. robertwdempsey.com Production ML Pipelines Architecture 4 Choices • Operations: this system was maintained by the engineers who built it as at the time there was no technical operations team in place. • Results: the system processed hundreds of GBs of data per day with infrequent outages. • Team: once deployed, the team of developers who built this pipeline began work on incorporating its features into a more generalized stream processing platform. 36
  • 38. robertwdempsey.com Production ML Pipelines Free Guide robertwdempsey.com/machineryai 38
  • 39. robertwdempsey.com Production ML Pipelines Where to Find Me Website Lotus Guides LinkedIn Twitter Github 39 robertwdempsey.com lotusguides.com robertwdempsey rdempsey rdempsey
  • 40. robertwdempsey.com Production ML Pipelines Thank You! 40