SlideShare a Scribd company logo
CONFIDENTIAL. Copyright © 1
Dagster - DataOps and MLOps for
Machine Learning Engineers
CONFIDENTIAL. Copyright © 2
8+ years swimming in data @
A Researcher, Engineer and Blogger
CONFIDENTIAL. Copyright © 3
Agenda
01
02
03
04
05
06
Motivation
Dagster's philosophy
Dagster 101
Dagster DataOps
Dagster MLOps
Q&A
CONFIDENTIAL. Copyright © 4
Motivation
CONFIDENTIAL. Copyright © 5
Typical Machine Learning pipeline
Data Preparation Model training Serving model
CONFIDENTIAL. Copyright © 6
Why we need orchestration?
1. Directed Acyclic Graphs
(DAGs)
2. Scheduling and Workflow
Management
3. Error Handling and Retry
Mechanisms
4. Monitoring and Logging
Source: link
CONFIDENTIAL. Copyright © 7
Orchestration frameworks
CONFIDENTIAL. Copyright © 8
Difficulties in answering important questions
• Is this data up-to-date?
• When this upstream data updated which downstream data
affected?
• How can we manage data version overtime?
• How is model's performance overtime?
ModelOps
DataOps
DevOps
90%
10%
10%
CONFIDENTIAL. Copyright © 9
Dagster's philosophy
CONFIDENTIAL. Copyright © 10
Dagster's philosophy: Assets
Reports
Tables
ML Models
CONFIDENTIAL. Copyright © 11
Ideas: transition from Imperative to Declarative
Say goodbye to spaghetti
code and complex DOM
manipulations with ReactJS
Infrastructure as code (IaC)
with Terraform
Managing containerized
applications at scale has
never been easier with K8s
More accurate and efficient
analytics with data
oriented
Front end
Cluster
orchestration
Dev Ops
Data job/op data
CONFIDENTIAL. Copyright © 12
Dagster 101
CONFIDENTIAL. Copyright © 13
• An open-source library used to build ETL and Machine Learning systems
(first released in 2018).
• 100+ contributors, 10K commits, 5K stars.
• Used by many innovation organizations.
CONFIDENTIAL. Copyright © 14
From Job/Op
def upstream_asset1():
return 1
def upstream_asset2():
return 2
def combine_asset(upstream_asset1, upstream_asset2):
combine = upstream_asset1 + upstream_asset2
print(f"{upstream_asset1} + {upstream_asset2} = {combine}")
return combine
result = combine_asset(upstream_asset1(), upstream_asset2())
CONFIDENTIAL. Copyright © 15
To assets
from dagster import asset
@asset
def upstream_asset1():
return 1
@asset
def upstream_asset2():
return 2
@asset
def combine_asset(context, upstream_asset1, upstream_asset2):
combine = upstream_asset1 + upstream_asset2
context.log.info(f"{upstream_asset1} + {upstream_asset2} =
{combine}")
return combine
Asset key
CONFIDENTIAL. Copyright © 16
dagster dev -f <file_name.py>
from dagster import asset
@asset
def upstream_asset1():
return 1
@asset
def upstream_asset2():
return 2
@asset
def combine_asset(context, upstream_asset1, upstream_asset2):
combine = upstream_asset1 + upstream_asset2
context.log.info(f"{upstream_asset1} + {upstream_asset2} =
{combine}")
return combine
Upstream asset key
CONFIDENTIAL. Copyright © 17
Dagster DataOps
CONFIDENTIAL. Copyright © 18
Modularity:
• Designed with modular architecture → easily organize complex data pipelines.
• Provides a clear separation between data processing logic, data management, and infrastructure
management.
Flexibility:
• Supports a wide range of data sources, including databases, application programming interfaces (APIs),
and file systems.
• provides integration with popular data processing frameworks (Apache Airflow, Apache Spark) → easy
integration into existing data pipelines.
Debugging and testing:
• Provides tools to debug, test data pipeline → easily identify and fix errors.
• Powerful UI allows data pipeline visualization and progress tracking.
Supportive Community:
• Dagster has a community of active users and contributors, developing, continuously adding new
features and improving the framework.
CONFIDENTIAL. Copyright © 19
Visualization and debugging
Dagster comes with Dagit,
a graphical user interface
that allows ML engineers
to visualize pipelines,
monitor execution
progress, and debug
issues using detailed logs
and error messages.
CONFIDENTIAL. Copyright © 20
Detailed logs and error messages
CONFIDENTIAL. Copyright © 21
1st: Organize complex data pipeline
• Where’s data come from?
• How’s data computed?
• Is this data up-to-date?
• When this upstream data updated which downstream data
affected?
CONFIDENTIAL. Copyright © 22
2nd : Easy integration into existing tech stacks
from dagster import materialize
if __name__ == "__main__":
result = materialize(assets=[my_first_asset])
pip install dagster dagit
Just install
And materialize your assets
Extensibility and integration: Dagster has a rich ecosystem
of libraries and plugins that support various tools and
platforms related to machine learning, data processing,
and infrastructure. This extensibility allows ML engineers
to integrate Dagster with existing tools and systems.
CONFIDENTIAL. Copyright © 23
3rd : assets changes detection
If the latest version of combine_asset was created before the latest version of upstream_asset1 or upstream_asset2, then
combine_asset may be obsolete. Dagster will warn the difference with the "upstream changed" indicator
CONFIDENTIAL. Copyright © 24
4th : IOManager: reduce data streamline complexity
Write Once, use everywhere!
CONFIDENTIAL. Copyright © 25
CSVIOManager - handle_output() & load_input()
CONFIDENTIAL. Copyright © 26
Dagster MLOps
CONFIDENTIAL. Copyright © 27
Benefits of building machine learning pipelines in Dagster
• Dagster makes iterating on machine learning models and testing easy, and it is designed to use during the
development process.
• Dagster has a lightweight execution model means you can access the benefits of an orchestrator, like re-
executing from the middle of a pipeline and parallelizing steps while you're experimenting.
• Dagster models data assets, not just tasks, so it understands the upstream and downstream data dependencies.
• Dagster is a one-stop shop for both the data transformations and the models that depend on the data
transformations.
CONFIDENTIAL. Copyright © 28
Typical Machine Learning pipeline
Data Preparation Model training Serving model
CONFIDENTIAL. Copyright © 29
Organize complex data pipeline (Modeling Pipeline)
Pipeline abstraction: Dagster
enables ML engineers to define
complex workflows as modular
pipelines composed of
individual units called assets.
This modularity aids in code
readability, maintainability, and
reusability.
CONFIDENTIAL. Copyright © 30
Organize complex data pipeline (Data preparation)
CONFIDENTIAL. Copyright © 31
Organize complex data pipeline (Model training)
CONFIDENTIAL. Copyright © 32
5th : Debug, test data pipeline
from dagster import asset
@asset
def my_first_asset(context):
context.log.info("This is my first asset")
return 1
from dagster import materialize, build_op_context
def test_my_first_asset():
result = materialize(assets=[my_first_asset])
assert result.success
context = build_op_context()
assert my_first_asset(context) == 1
my_assets.py
test_my_assets.py
Testing and development: Dagster supports
local development and testing by enabling
execution of individual assets or entire
pipelines independent of the production
environment, fostering faster iteration and
experimentation.
CONFIDENTIAL. Copyright © 33
Tracking model history
Viewing previous versions of a machine
learning model can be useful to
understand the evaluation history or
referencing a model that was used for
inference. Using Dagster will enable you
to understand:
• What data was used to train the
model
• When the model was refreshed
• The code version and ML model
version was used to generate the
predictions used for predicted values
CONFIDENTIAL. Copyright © 34
Monitoring potential model drift, data drift overtime
Monitoring and observability: Dagster makes it
easier to monitor and track model performance
metrics with built-in logging and error-handling,
enabling ML engineers to detect issues and ensure
the reliability of their machine learning workflows.
CONFIDENTIAL. Copyright © 35
Dagster’s architecture
Scalability and portability: With Dagster, ML engineers can define pipelines that scale across
different execution environments, such as cloud-based infrastructure, containerization
platforms like Docker, and orchestration tools like Kubernetes.
CONFIDENTIAL. Copyright © 36
6th : Transitioning Data Pipelines from Development to Production
Configuration
management: With
Dagster, ML engineers can
manage configurations
more efficiently and
consistently across various
environments, simplifying
pipeline and model
parameterization.
CONFIDENTIAL. Copyright © 37
Dagster features to take away
1.Organize complex data pipeline
2.Easy integration into existing tech stacks
3.Assets changes detection
4.IOManager: reduce data streamline complexity
5.Debug, test data pipeline
6.Transitioning Data Pipelines from Development to Production
37
CONFIDENTIAL. Copyright © 38
Dagster Pros & Cons
Pros Cons
• Data Pipeline Orchestration
• Modularity and Reusability
• Data Quality and Validation checks
• Monitoring and Observability
• Community Support
• Learning Curve
• Not appropriate for stream processing
CONFIDENTIAL. Copyright © 39
Q&A
CONFIDENTIAL. Copyright © 40
References
Introducing Software-Defined Assets
Dagster vs. Airflow
Building machine learning pipelines with Dagster
Managing machine learning models with Dagster
Open Source deployment architecture
Ad

More Related Content

What's hot (20)

dbt Python models - GoDataFest by Guillermo Sanchez
dbt Python models - GoDataFest by Guillermo Sanchezdbt Python models - GoDataFest by Guillermo Sanchez
dbt Python models - GoDataFest by Guillermo Sanchez
GoDataDriven
 
Introduction To Flink
Introduction To FlinkIntroduction To Flink
Introduction To Flink
Knoldus Inc.
 
Designing An Enterprise Data Fabric
Designing An Enterprise Data FabricDesigning An Enterprise Data Fabric
Designing An Enterprise Data Fabric
Alan McSweeney
 
Using MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOpsUsing MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOps
Weaveworks
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
Alluxio, Inc.
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
Alex Ivy
 
Snowflake essentials
Snowflake essentialsSnowflake essentials
Snowflake essentials
qureshihamid
 
Extracting Insights from Data at Twitter
Extracting Insights from Data at TwitterExtracting Insights from Data at Twitter
Extracting Insights from Data at Twitter
Prasad Wagle
 
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesPutting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
DATAVERSITY
 
Dataiku Data Science Studio (datasheet)
Dataiku Data Science Studio (datasheet)Dataiku Data Science Studio (datasheet)
Dataiku Data Science Studio (datasheet)
John Cann
 
Apache Flink Adoption at Shopify
Apache Flink Adoption at ShopifyApache Flink Adoption at Shopify
Apache Flink Adoption at Shopify
Yaroslav Tkachenko
 
Databricks Overview for MLOps
Databricks Overview for MLOpsDatabricks Overview for MLOps
Databricks Overview for MLOps
Databricks
 
MLOps Virtual Event: Automating ML at Scale
MLOps Virtual Event: Automating ML at ScaleMLOps Virtual Event: Automating ML at Scale
MLOps Virtual Event: Automating ML at Scale
Databricks
 
Data Engineering Basics
Data Engineering BasicsData Engineering Basics
Data Engineering Basics
Catherine Kimani
 
Snowflake for Data Engineering
Snowflake for Data EngineeringSnowflake for Data Engineering
Snowflake for Data Engineering
Harald Erb
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics Frameworks
Slim Baltagi
 
Distributed machine learning
Distributed machine learningDistributed machine learning
Distributed machine learning
Stanley Wang
 
Why Data Vault?
Why Data Vault? Why Data Vault?
Why Data Vault?
Kent Graziano
 
Data engineering zoomcamp introduction
Data engineering zoomcamp  introductionData engineering zoomcamp  introduction
Data engineering zoomcamp introduction
Alexey Grigorev
 
dbt Python models - GoDataFest by Guillermo Sanchez
dbt Python models - GoDataFest by Guillermo Sanchezdbt Python models - GoDataFest by Guillermo Sanchez
dbt Python models - GoDataFest by Guillermo Sanchez
GoDataDriven
 
Introduction To Flink
Introduction To FlinkIntroduction To Flink
Introduction To Flink
Knoldus Inc.
 
Designing An Enterprise Data Fabric
Designing An Enterprise Data FabricDesigning An Enterprise Data Fabric
Designing An Enterprise Data Fabric
Alan McSweeney
 
Using MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOpsUsing MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOps
Weaveworks
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
Alluxio, Inc.
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
Alex Ivy
 
Snowflake essentials
Snowflake essentialsSnowflake essentials
Snowflake essentials
qureshihamid
 
Extracting Insights from Data at Twitter
Extracting Insights from Data at TwitterExtracting Insights from Data at Twitter
Extracting Insights from Data at Twitter
Prasad Wagle
 
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data PipelinesPutting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
Putting the Ops in DataOps: Orchestrate the Flow of Data Across Data Pipelines
DATAVERSITY
 
Dataiku Data Science Studio (datasheet)
Dataiku Data Science Studio (datasheet)Dataiku Data Science Studio (datasheet)
Dataiku Data Science Studio (datasheet)
John Cann
 
Apache Flink Adoption at Shopify
Apache Flink Adoption at ShopifyApache Flink Adoption at Shopify
Apache Flink Adoption at Shopify
Yaroslav Tkachenko
 
Databricks Overview for MLOps
Databricks Overview for MLOpsDatabricks Overview for MLOps
Databricks Overview for MLOps
Databricks
 
MLOps Virtual Event: Automating ML at Scale
MLOps Virtual Event: Automating ML at ScaleMLOps Virtual Event: Automating ML at Scale
MLOps Virtual Event: Automating ML at Scale
Databricks
 
Snowflake for Data Engineering
Snowflake for Data EngineeringSnowflake for Data Engineering
Snowflake for Data Engineering
Harald Erb
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics Frameworks
Slim Baltagi
 
Distributed machine learning
Distributed machine learningDistributed machine learning
Distributed machine learning
Stanley Wang
 
Data engineering zoomcamp introduction
Data engineering zoomcamp  introductionData engineering zoomcamp  introduction
Data engineering zoomcamp introduction
Alexey Grigorev
 

Similar to Dagster - DataOps and MLOps for Machine Learning Engineers.pdf (20)

Peek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and RoadmapPeek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and Roadmap
Neo4j
 
Part 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to EndPart 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to End
Cloudera, Inc.
 
Machine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to ImplementationMachine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to Implementation
DataWorks Summit
 
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Databricks
 
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
HostedbyConfluent
 
Big Data Day LA 2015 - Brainwashed: Building an IDE for Feature Engineering b...
Big Data Day LA 2015 - Brainwashed: Building an IDE for Feature Engineering b...Big Data Day LA 2015 - Brainwashed: Building an IDE for Feature Engineering b...
Big Data Day LA 2015 - Brainwashed: Building an IDE for Feature Engineering b...
Data Con LA
 
Cloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made EasyCloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made Easy
Cloudera, Inc.
 
Re-Platforming Applications for the Cloud
Re-Platforming Applications for the CloudRe-Platforming Applications for the Cloud
Re-Platforming Applications for the Cloud
Carter Wickstrom
 
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
Igor De Souza
 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 

Cloudera, Inc.
 
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
DataStax Academy
 
Enterprise Metadata Integration, Cloudera
Enterprise Metadata Integration, ClouderaEnterprise Metadata Integration, Cloudera
Enterprise Metadata Integration, Cloudera
Neo4j
 
The Vision & Challenge of Applied Machine Learning
The Vision & Challenge of Applied Machine LearningThe Vision & Challenge of Applied Machine Learning
The Vision & Challenge of Applied Machine Learning
Cloudera, Inc.
 
Build and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus Webinar
Build and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus WebinarBuild and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus Webinar
Build and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus Webinar
Impetus Technologies
 
CS-Op Analytics
CS-Op AnalyticsCS-Op Analytics
CS-Op Analytics
Cloudera, Inc.
 
Presentation application change management and data masking strategies for ...
Presentation   application change management and data masking strategies for ...Presentation   application change management and data masking strategies for ...
Presentation application change management and data masking strategies for ...
xKinAnx
 
Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18
Cloudera, Inc.
 
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Cloudera, Inc.
 
The Changing Role of a DBA in an Autonomous World
The Changing Role of a DBA in an Autonomous WorldThe Changing Role of a DBA in an Autonomous World
The Changing Role of a DBA in an Autonomous World
Maria Colgan
 
Machine Learning Models: From Research to Production 6.13.18
Machine Learning Models: From Research to Production 6.13.18Machine Learning Models: From Research to Production 6.13.18
Machine Learning Models: From Research to Production 6.13.18
Cloudera, Inc.
 
Peek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and RoadmapPeek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and Roadmap
Neo4j
 
Part 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to EndPart 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to End
Cloudera, Inc.
 
Machine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to ImplementationMachine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to Implementation
DataWorks Summit
 
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Databricks
 
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
HostedbyConfluent
 
Big Data Day LA 2015 - Brainwashed: Building an IDE for Feature Engineering b...
Big Data Day LA 2015 - Brainwashed: Building an IDE for Feature Engineering b...Big Data Day LA 2015 - Brainwashed: Building an IDE for Feature Engineering b...
Big Data Day LA 2015 - Brainwashed: Building an IDE for Feature Engineering b...
Data Con LA
 
Cloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made EasyCloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made Easy
Cloudera, Inc.
 
Re-Platforming Applications for the Cloud
Re-Platforming Applications for the CloudRe-Platforming Applications for the Cloud
Re-Platforming Applications for the Cloud
Carter Wickstrom
 
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
Igor De Souza
 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 

Cloudera, Inc.
 
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
DataStax Academy
 
Enterprise Metadata Integration, Cloudera
Enterprise Metadata Integration, ClouderaEnterprise Metadata Integration, Cloudera
Enterprise Metadata Integration, Cloudera
Neo4j
 
The Vision & Challenge of Applied Machine Learning
The Vision & Challenge of Applied Machine LearningThe Vision & Challenge of Applied Machine Learning
The Vision & Challenge of Applied Machine Learning
Cloudera, Inc.
 
Build and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus Webinar
Build and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus WebinarBuild and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus Webinar
Build and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus Webinar
Impetus Technologies
 
Presentation application change management and data masking strategies for ...
Presentation   application change management and data masking strategies for ...Presentation   application change management and data masking strategies for ...
Presentation application change management and data masking strategies for ...
xKinAnx
 
Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18
Cloudera, Inc.
 
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Cloudera, Inc.
 
The Changing Role of a DBA in an Autonomous World
The Changing Role of a DBA in an Autonomous WorldThe Changing Role of a DBA in an Autonomous World
The Changing Role of a DBA in an Autonomous World
Maria Colgan
 
Machine Learning Models: From Research to Production 6.13.18
Machine Learning Models: From Research to Production 6.13.18Machine Learning Models: From Research to Production 6.13.18
Machine Learning Models: From Research to Production 6.13.18
Cloudera, Inc.
 
Ad

More from Hong Ong (8)

Feast Feature Store - An In-depth Overview Experimentation and Application in...
Feast Feature Store - An In-depth Overview Experimentation and Application in...Feast Feature Store - An In-depth Overview Experimentation and Application in...
Feast Feature Store - An In-depth Overview Experimentation and Application in...
Hong Ong
 
DBT ELT approach for Advanced Analytics.pptx
DBT ELT approach for Advanced Analytics.pptxDBT ELT approach for Advanced Analytics.pptx
DBT ELT approach for Advanced Analytics.pptx
Hong Ong
 
Data Products for Mobile Commerce in Real-time and Real-life.pdf
Data Products for Mobile Commerce in Real-time and Real-life.pdfData Products for Mobile Commerce in Real-time and Real-life.pdf
Data Products for Mobile Commerce in Real-time and Real-life.pdf
Hong Ong
 
VWS2017: Bắt đầu Big Data từ đâu và như thế nào?
VWS2017: Bắt đầu Big Data từ đâu và như thế nào?VWS2017: Bắt đầu Big Data từ đâu và như thế nào?
VWS2017: Bắt đầu Big Data từ đâu và như thế nào?
Hong Ong
 
Distance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thị
Distance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thịDistance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thị
Distance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thị
Hong Ong
 
Nền tảng thuật toán của AI, Machine Learning, Big Data
Nền tảng thuật toán của AI, Machine Learning, Big DataNền tảng thuật toán của AI, Machine Learning, Big Data
Nền tảng thuật toán của AI, Machine Learning, Big Data
Hong Ong
 
Bắt đầu nghiên cứu Big Data
Bắt đầu nghiên cứu Big DataBắt đầu nghiên cứu Big Data
Bắt đầu nghiên cứu Big Data
Hong Ong
 
Bắt đầu học data science
Bắt đầu học data scienceBắt đầu học data science
Bắt đầu học data science
Hong Ong
 
Feast Feature Store - An In-depth Overview Experimentation and Application in...
Feast Feature Store - An In-depth Overview Experimentation and Application in...Feast Feature Store - An In-depth Overview Experimentation and Application in...
Feast Feature Store - An In-depth Overview Experimentation and Application in...
Hong Ong
 
DBT ELT approach for Advanced Analytics.pptx
DBT ELT approach for Advanced Analytics.pptxDBT ELT approach for Advanced Analytics.pptx
DBT ELT approach for Advanced Analytics.pptx
Hong Ong
 
Data Products for Mobile Commerce in Real-time and Real-life.pdf
Data Products for Mobile Commerce in Real-time and Real-life.pdfData Products for Mobile Commerce in Real-time and Real-life.pdf
Data Products for Mobile Commerce in Real-time and Real-life.pdf
Hong Ong
 
VWS2017: Bắt đầu Big Data từ đâu và như thế nào?
VWS2017: Bắt đầu Big Data từ đâu và như thế nào?VWS2017: Bắt đầu Big Data từ đâu và như thế nào?
VWS2017: Bắt đầu Big Data từ đâu và như thế nào?
Hong Ong
 
Distance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thị
Distance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thịDistance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thị
Distance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thị
Hong Ong
 
Nền tảng thuật toán của AI, Machine Learning, Big Data
Nền tảng thuật toán của AI, Machine Learning, Big DataNền tảng thuật toán của AI, Machine Learning, Big Data
Nền tảng thuật toán của AI, Machine Learning, Big Data
Hong Ong
 
Bắt đầu nghiên cứu Big Data
Bắt đầu nghiên cứu Big DataBắt đầu nghiên cứu Big Data
Bắt đầu nghiên cứu Big Data
Hong Ong
 
Bắt đầu học data science
Bắt đầu học data scienceBắt đầu học data science
Bắt đầu học data science
Hong Ong
 
Ad

Recently uploaded (20)

md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
Deloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit contextDeloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit context
Process mining Evangelist
 
Ch3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendencyCh3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendency
ayeleasefa2
 
Cleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdfCleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdf
alcinialbob1234
 
computer organization and assembly language.docx
computer organization and assembly language.docxcomputer organization and assembly language.docx
computer organization and assembly language.docx
alisoftwareengineer1
 
VKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptxVKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptx
Vinod Srivastava
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
Geometry maths presentation for begginers
Geometry maths presentation for begginersGeometry maths presentation for begginers
Geometry maths presentation for begginers
zrjacob283
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your CompetitorsAI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
Contify
 
183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag
fardin123rahman07
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.pptJust-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
ssuser5f8f49
 
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbEDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
JessaMaeEvangelista2
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
DPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdfDPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdf
inmishra17121973
 
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
ThanushsaranS
 
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTemplate_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
cegiver630
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
Deloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit contextDeloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit context
Process mining Evangelist
 
Ch3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendencyCh3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendency
ayeleasefa2
 
Cleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdfCleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdf
alcinialbob1234
 
computer organization and assembly language.docx
computer organization and assembly language.docxcomputer organization and assembly language.docx
computer organization and assembly language.docx
alisoftwareengineer1
 
VKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptxVKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptx
Vinod Srivastava
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
Geometry maths presentation for begginers
Geometry maths presentation for begginersGeometry maths presentation for begginers
Geometry maths presentation for begginers
zrjacob283
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your CompetitorsAI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
Contify
 
183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag
fardin123rahman07
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.pptJust-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
ssuser5f8f49
 
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbEDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
JessaMaeEvangelista2
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
DPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdfDPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdf
inmishra17121973
 
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
ThanushsaranS
 
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTemplate_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
cegiver630
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 

Dagster - DataOps and MLOps for Machine Learning Engineers.pdf

  • 1. CONFIDENTIAL. Copyright © 1 Dagster - DataOps and MLOps for Machine Learning Engineers
  • 2. CONFIDENTIAL. Copyright © 2 8+ years swimming in data @ A Researcher, Engineer and Blogger
  • 3. CONFIDENTIAL. Copyright © 3 Agenda 01 02 03 04 05 06 Motivation Dagster's philosophy Dagster 101 Dagster DataOps Dagster MLOps Q&A
  • 5. CONFIDENTIAL. Copyright © 5 Typical Machine Learning pipeline Data Preparation Model training Serving model
  • 6. CONFIDENTIAL. Copyright © 6 Why we need orchestration? 1. Directed Acyclic Graphs (DAGs) 2. Scheduling and Workflow Management 3. Error Handling and Retry Mechanisms 4. Monitoring and Logging Source: link
  • 7. CONFIDENTIAL. Copyright © 7 Orchestration frameworks
  • 8. CONFIDENTIAL. Copyright © 8 Difficulties in answering important questions • Is this data up-to-date? • When this upstream data updated which downstream data affected? • How can we manage data version overtime? • How is model's performance overtime? ModelOps DataOps DevOps 90% 10% 10%
  • 9. CONFIDENTIAL. Copyright © 9 Dagster's philosophy
  • 10. CONFIDENTIAL. Copyright © 10 Dagster's philosophy: Assets Reports Tables ML Models
  • 11. CONFIDENTIAL. Copyright © 11 Ideas: transition from Imperative to Declarative Say goodbye to spaghetti code and complex DOM manipulations with ReactJS Infrastructure as code (IaC) with Terraform Managing containerized applications at scale has never been easier with K8s More accurate and efficient analytics with data oriented Front end Cluster orchestration Dev Ops Data job/op data
  • 12. CONFIDENTIAL. Copyright © 12 Dagster 101
  • 13. CONFIDENTIAL. Copyright © 13 • An open-source library used to build ETL and Machine Learning systems (first released in 2018). • 100+ contributors, 10K commits, 5K stars. • Used by many innovation organizations.
  • 14. CONFIDENTIAL. Copyright © 14 From Job/Op def upstream_asset1(): return 1 def upstream_asset2(): return 2 def combine_asset(upstream_asset1, upstream_asset2): combine = upstream_asset1 + upstream_asset2 print(f"{upstream_asset1} + {upstream_asset2} = {combine}") return combine result = combine_asset(upstream_asset1(), upstream_asset2())
  • 15. CONFIDENTIAL. Copyright © 15 To assets from dagster import asset @asset def upstream_asset1(): return 1 @asset def upstream_asset2(): return 2 @asset def combine_asset(context, upstream_asset1, upstream_asset2): combine = upstream_asset1 + upstream_asset2 context.log.info(f"{upstream_asset1} + {upstream_asset2} = {combine}") return combine Asset key
  • 16. CONFIDENTIAL. Copyright © 16 dagster dev -f <file_name.py> from dagster import asset @asset def upstream_asset1(): return 1 @asset def upstream_asset2(): return 2 @asset def combine_asset(context, upstream_asset1, upstream_asset2): combine = upstream_asset1 + upstream_asset2 context.log.info(f"{upstream_asset1} + {upstream_asset2} = {combine}") return combine Upstream asset key
  • 17. CONFIDENTIAL. Copyright © 17 Dagster DataOps
  • 18. CONFIDENTIAL. Copyright © 18 Modularity: • Designed with modular architecture → easily organize complex data pipelines. • Provides a clear separation between data processing logic, data management, and infrastructure management. Flexibility: • Supports a wide range of data sources, including databases, application programming interfaces (APIs), and file systems. • provides integration with popular data processing frameworks (Apache Airflow, Apache Spark) → easy integration into existing data pipelines. Debugging and testing: • Provides tools to debug, test data pipeline → easily identify and fix errors. • Powerful UI allows data pipeline visualization and progress tracking. Supportive Community: • Dagster has a community of active users and contributors, developing, continuously adding new features and improving the framework.
  • 19. CONFIDENTIAL. Copyright © 19 Visualization and debugging Dagster comes with Dagit, a graphical user interface that allows ML engineers to visualize pipelines, monitor execution progress, and debug issues using detailed logs and error messages.
  • 20. CONFIDENTIAL. Copyright © 20 Detailed logs and error messages
  • 21. CONFIDENTIAL. Copyright © 21 1st: Organize complex data pipeline • Where’s data come from? • How’s data computed? • Is this data up-to-date? • When this upstream data updated which downstream data affected?
  • 22. CONFIDENTIAL. Copyright © 22 2nd : Easy integration into existing tech stacks from dagster import materialize if __name__ == "__main__": result = materialize(assets=[my_first_asset]) pip install dagster dagit Just install And materialize your assets Extensibility and integration: Dagster has a rich ecosystem of libraries and plugins that support various tools and platforms related to machine learning, data processing, and infrastructure. This extensibility allows ML engineers to integrate Dagster with existing tools and systems.
  • 23. CONFIDENTIAL. Copyright © 23 3rd : assets changes detection If the latest version of combine_asset was created before the latest version of upstream_asset1 or upstream_asset2, then combine_asset may be obsolete. Dagster will warn the difference with the "upstream changed" indicator
  • 24. CONFIDENTIAL. Copyright © 24 4th : IOManager: reduce data streamline complexity Write Once, use everywhere!
  • 25. CONFIDENTIAL. Copyright © 25 CSVIOManager - handle_output() & load_input()
  • 26. CONFIDENTIAL. Copyright © 26 Dagster MLOps
  • 27. CONFIDENTIAL. Copyright © 27 Benefits of building machine learning pipelines in Dagster • Dagster makes iterating on machine learning models and testing easy, and it is designed to use during the development process. • Dagster has a lightweight execution model means you can access the benefits of an orchestrator, like re- executing from the middle of a pipeline and parallelizing steps while you're experimenting. • Dagster models data assets, not just tasks, so it understands the upstream and downstream data dependencies. • Dagster is a one-stop shop for both the data transformations and the models that depend on the data transformations.
  • 28. CONFIDENTIAL. Copyright © 28 Typical Machine Learning pipeline Data Preparation Model training Serving model
  • 29. CONFIDENTIAL. Copyright © 29 Organize complex data pipeline (Modeling Pipeline) Pipeline abstraction: Dagster enables ML engineers to define complex workflows as modular pipelines composed of individual units called assets. This modularity aids in code readability, maintainability, and reusability.
  • 30. CONFIDENTIAL. Copyright © 30 Organize complex data pipeline (Data preparation)
  • 31. CONFIDENTIAL. Copyright © 31 Organize complex data pipeline (Model training)
  • 32. CONFIDENTIAL. Copyright © 32 5th : Debug, test data pipeline from dagster import asset @asset def my_first_asset(context): context.log.info("This is my first asset") return 1 from dagster import materialize, build_op_context def test_my_first_asset(): result = materialize(assets=[my_first_asset]) assert result.success context = build_op_context() assert my_first_asset(context) == 1 my_assets.py test_my_assets.py Testing and development: Dagster supports local development and testing by enabling execution of individual assets or entire pipelines independent of the production environment, fostering faster iteration and experimentation.
  • 33. CONFIDENTIAL. Copyright © 33 Tracking model history Viewing previous versions of a machine learning model can be useful to understand the evaluation history or referencing a model that was used for inference. Using Dagster will enable you to understand: • What data was used to train the model • When the model was refreshed • The code version and ML model version was used to generate the predictions used for predicted values
  • 34. CONFIDENTIAL. Copyright © 34 Monitoring potential model drift, data drift overtime Monitoring and observability: Dagster makes it easier to monitor and track model performance metrics with built-in logging and error-handling, enabling ML engineers to detect issues and ensure the reliability of their machine learning workflows.
  • 35. CONFIDENTIAL. Copyright © 35 Dagster’s architecture Scalability and portability: With Dagster, ML engineers can define pipelines that scale across different execution environments, such as cloud-based infrastructure, containerization platforms like Docker, and orchestration tools like Kubernetes.
  • 36. CONFIDENTIAL. Copyright © 36 6th : Transitioning Data Pipelines from Development to Production Configuration management: With Dagster, ML engineers can manage configurations more efficiently and consistently across various environments, simplifying pipeline and model parameterization.
  • 37. CONFIDENTIAL. Copyright © 37 Dagster features to take away 1.Organize complex data pipeline 2.Easy integration into existing tech stacks 3.Assets changes detection 4.IOManager: reduce data streamline complexity 5.Debug, test data pipeline 6.Transitioning Data Pipelines from Development to Production 37
  • 38. CONFIDENTIAL. Copyright © 38 Dagster Pros & Cons Pros Cons • Data Pipeline Orchestration • Modularity and Reusability • Data Quality and Validation checks • Monitoring and Observability • Community Support • Learning Curve • Not appropriate for stream processing
  • 40. CONFIDENTIAL. Copyright © 40 References Introducing Software-Defined Assets Dagster vs. Airflow Building machine learning pipelines with Dagster Managing machine learning models with Dagster Open Source deployment architecture