0% found this document useful (0 votes)
51 views

Productionizing ML With Workflows at Twitter

This document discusses how Twitter uses Apache Airflow to automate, schedule, and share machine learning workflows. It describes how Airflow was integrated at Twitter by adding authentication, a self-service model, metrics integration, distributed workers, and custom ML operators. This allows ML teams at Twitter to focus on modeling rather than managing pipelines.

Uploaded by

Finigan Joyce
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views

Productionizing ML With Workflows at Twitter

This document discusses how Twitter uses Apache Airflow to automate, schedule, and share machine learning workflows. It describes how Airflow was integrated at Twitter by adding authentication, a self-service model, metrics integration, distributed workers, and custom ML operators. This allows ML teams at Twitter to focus on modeling rather than managing pipelines.

Uploaded by

Finigan Joyce
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

2020/11/2 Productionizing ML with workflows at Twitter

Engineering

Back

Insights

Productionizing ML with
workflows at Twitter
By Samuel Ngahane and Devin Goodsell
Friday, 22 June 2018

Cortex provides machine learning platform technologies, modeling expertise, and


education to teams at Twitter. Its purpose is to improve Twitter by enabling advanced
and ethical AI. With first-hand experience running machine learning models in
production, Cortex seeks to streamline difficult ML processes, freeing engineers to
focus on modeling, experimentation, and user experience.

Once upon a time…


Machine learning pipelines were run at Twitter through a series of scripts or ad-hoc
commands. Training and serving a model meant manually triggering and waiting for a
series of jobs to complete or writing a collection of custom scripts to do so. Example
jobs include extracting, transforming, and loading data, as well as training, validating,
and serving models.

The process of managing and triggering machine learning pipelines presents


problems that can impact the quality of models. Firstly, it reduces the frequency in
which models can be retrained by creating unnecessary overhead and causing
frequent errors. Each full run of the training pipeline requires active monitoring from
engineers in order to trigger next steps and remedy issues with the data or scripts.
Secondly, it reduces the rate of experimentation and tuning that can be done to
improve the model. For example, in order to tune the hyperparameters of a model,

https://blog.twitter.com/engineering/en_us/topics/insights/2018/ml-workflows.html 1/11
2020/11/2 Productionizing ML with workflows at Twitter

engineers would have to manually trigger, manage and record results from multiple
runs of a machine learning pipeline. This tedious and repetitive task reduces
engineering productivity and slows iteration time.

In an effort to reduce the cost of maintenance of production machine learning


pipelines, improve engineering productivity, and increase the rate of experimentation,
Cortex developed ML Workflows: a tool designed to automate, schedule, and share
machine learning pipelines at Twitter.

Airflow at Twitter
When we started building ML Workflows, our philosophy was to create a simple
solution that would solve most ML needs while reusing existing components and
open source technologies. Rather than reinvent the wheel, Cortex evaluated
technical solutions based on a simple Python API to describe workflow DAGs paired
with a backend job execution system.

After careful consideration of open-source projects, such as Luigi or Azkaban, as well


as internal solutions, we picked Aurora Workflows and Apache Airflow as the
frontrunners.

Why Airflow?

Aurora Workflows (https://www.youtube.com/watch?v=w36TOkuyAVc) is an


orchestration engine part of our company’s continuous deployment efforts. It
supports full automation and is used to coordinate test and release stages across
Aurora and CI (Jenkins) jobs.

Apache Airflow (https://airflow.apache.org/) is an open source technology for


creating, running, and managing data pipelines. It has a powerful UI to manage
DAGs and an easy to use API for defining and extending operators.

Despite Aurora Workflows being integrated, we chose to base our product on Airflow
as it:

https://blog.twitter.com/engineering/en_us/topics/insights/2018/ml-workflows.html 2/11
2020/11/2 Productionizing ML with workflows at Twitter

Offers a fully-fledged Web UI (https://airflow.apache.org/ui.html) with fine-


grained view and control over workflows.
Provides dynamic parameter passing between operators through the use of
XComs
Offers an API that is well designed for tasks of all sizes and creates a
consistent development experience for our users.
Supports execution of arbitrary Python code and is extendable thanks to its
support of custom plugins (https://airflow.apache.org/concepts.html#operators).
Has an active community in addition to being used by large tech companies
(https://github.com/apache/incubator-airflow#who-uses-airflow) in production.

Integration at Twitter

Integrating Airflow as a productionized system within Twitter required modifications to


Airflow and its deployment procedures. Here are some of the most noteworthy
contributions we made.

Authentication and authorization

At Twitter, engineers generally authenticate with internal web services via Kerberos.
To support Kerberos in Airflow, we took advantage of the existing framework provided
by Flask-Login and created a child class of LoginManager. At initialization time, this
class verifies that the principal name is known to Kerberos. It then sets up before and
after-request filters that drive the GSSAPI process. On success, LDAP groups for the
user are queried and and carried along with the user object. This allows us to offer
group-based DAG-level control of actions or even visibility. Taken together, this
approach allows Airflow to neatly drop into our authentication and authorization
infrastructure with no fuss.

Self-service model

We are working towards a self-service model for teams to stand up their own
instances. Which provides user groups complete control of their instance and DAGs,
and it is much simpler to manage permissions and quotas of a single service
account. This approach also paves the path to a potential multi-tenant Airflow server
at Twitter.

https://blog.twitter.com/engineering/en_us/topics/insights/2018/ml-workflows.html 3/11
2020/11/2 Productionizing ML with workflows at Twitter

In our self-service model, each team’s deployment differs from another in only Aurora
configuration and DAG population. So rather than copy and paste of a whole lot of
config, we use a simple, short json file to specify the per-instance variables such as
owning service

account, allowed groups, backing database, Kerberos principal credentials, and DAG
folder locations. At start up time a simple command invoked through a custom CLI
generates the appropriate Aurora configurations to start the scheduler, web server,
and worker instances.

Stats integration

Airflow, like most python-based open sourced systems, uses the StatsClient from
statsd. Twitter uses StatsReceiver, a part of our open-source Finagle stack in
github/twitter/util (https://twitter.github.io/util/). However, the two models are different: in
util/stats, each metric is registered, which places it on a list for future collection; in
statsd, metrics are simply emitted and the right thing happens on the back end. To
solve this problem we built a bridge that recognizes new metrics and register them
as they appear. It is API compatible with StatsClient which allows us to inject an
instance of our bridge object into Airflow as it is starting. With metrics now collected
by our visualization system, we are able to provide a templated dashboard to simplify
creation of a monitoring dashboard for our self-service clients.

Distributed workers with Celery

Airflow makes use of Celery (http://www.celeryproject.org/) to orchestrate a scaled


multi-worker node configuration. We configured Celery to work with Twitter cloud
containers and by default use a SQLAlchemy broker to exploit Airflow’s MySQL
database as a message queue for Celery. This allows our users to setup their
scalable Airflow workers without having to maintain an extra Redis or RabbitMQ
service. However this does not prevent them from switching over to another message
broker if they decide to support one in the future.

ML on Airflow

In order to make machine learning pipelines easy for our customers to build on
Airflow there were a series of extensions we needed to add or improve upon.

Custom ML operators

https://blog.twitter.com/engineering/en_us/topics/insights/2018/ml-workflows.html 4/11
2020/11/2 Productionizing ML with workflows at Twitter

Firstly, to help Twitter engineers run ML tasks, we developed reusable operators for
them, including:

Aurora (https://blog.twitter.com/engineering/en_us/a/2015/all-about-apache-
aurora.html) operators: At Twitter we use Aurora to schedule and run services and
jobs. We implemented a set of operators that allow users to run code in Aurora.
These operators are the foundation of many of our other operators.
DeepBird
(https://blog.twitter.com/engineering/en_us/topics/insights/2018/twittertensorflow.html) operators:
Being our core ML tool, we provide operators that run training process, launch
prediction service with trained models, and run load test on prediction service.
Together they allow our engineers to train and validate their models with
DeepBird from end to end.
Hyper parameter tuning operators: We created a set of operators to support
hyper parameter tuning. Paired with our DAG constructor classes they support
tuning hyper parameters of arbitrary DAG.
Utility operators: We also have operators for common tasks such as launching
CI jobs, managing files on HDFS, creating, updating and monitoring JIRA
tickets, sending information to our model metadata store, etc. When we identify
a widely required routine among our users, we consider adding it as an
operator. Contributing operators are not solely restricted to our team, but have
been made by our users as well.

Type checking in operators

We did not want our users to incur the cost of encountering errors while DAGs were
running due to compatibility in arguments being passed from one operator to another.
Thus we developed a type checking system for operators, which verifies input and
output data types of all operators in DAGs (args and XCOMs).

For example, suppose that we have two connected operators Foo and Bar in a DAG.
Foo outputs an integer XCOM value, and Bar takes that XCOM value but is
expecting a string. Our type checking system would raise an error to let the DAG
developer know the types do not match.

All our operators are built upon this type checking system and have their input and
output types declared through python decorators.

DAG constructors

https://blog.twitter.com/engineering/en_us/topics/insights/2018/ml-workflows.html 5/11
2020/11/2 Productionizing ML with workflows at Twitter

Our users often had DAGs which processed the same operations but with different
parameter values. To prevent having to create multiple DAGs they created a
constructor class around a DAG, and then exposed a dependent set of parameters
within it’s constructor.

We decided to support this pattern by making DAG constructors a first class citizen in
ML Workflows. We developed a DAG constructor base class which allows users to
declare a list of parameters of their DAG. For example:

We also developed a UI for our users to ad-hocly create instances of their DAGs with
different parameters:

https://blog.twitter.com/engineering/en_us/topics/insights/2018/ml-workflows.html 6/11
2020/11/2 Productionizing ML with workflows at Twitter

A python file is generated when a user creates a new DAG and is placed in Airflow’s
DAG_FOLDER which makes use of Airflow’s ability to automatically load new DAGs.

To make these DAG instances persistent on our stateless cloud containers, we


record information of them in the user’s Airflow database. When Airflow schedulers
and workers restart, the DAGs are automatically re-created and loaded by Airflow.

Airflow administrator UI plugin

https://blog.twitter.com/engineering/en_us/topics/insights/2018/ml-workflows.html 7/11
2020/11/2 Productionizing ML with workflows at Twitter

While creating many additions to Airflow to better support our ML use cases on the
backend we also wanted to provide a nice UI layer to interact with certain features on
the frontend. In order to achieve this we developed a UI plugin that adds a Flask view
to the Airflow webserver. The view loads our JavaScript bundle, which is a small
single page web app built with React. The app contains user interfaces of our
features, such as hyper parameter tuning and DAG constructors, and calls Flask
endpoints declared in our plugin to fetch data and invoke corresponding
functionalities.

Use case deep dive


One of the most powerful use cases of ML Workflows is hyperparameter tuning.
Leveraging Airflow’s DAGs, we enabled this feature in ML Workflows by
implementing DagConstructor, which parameterizes a DAG, along with
HyperParameterTuner, which builds a DAG that conducts the experiments and
reports results to the user.

After a user fully automates a workflow using ML Workflows, hyperparameter tuning


can be applied to the DAG by:

1. Wrapping the DAG in a DagConstructor, exposing the hyperparameters to be


tuned.
2. Adding an additional task to the DAG that pushes via xcom workflow_result: a
dictionary representing key value pairs of the metrics resulting from the
workflow.
3. Defining the hyperparameter tuning method, the TuningParameters and other
tuning configurations with the HyperparameterTuningSpec object.
4. Creating a HyperparameterTuner by passing it the DAG constructor and the
tuning spec.

The HyperparameterTuner will generate a DAG with several subdags, each


conducting an experiment using hyperparameters as specified by the
HyperparameterTuningSpec. The DAG also includes tasks for recording results and
reporting them to the user.

To make this more tangible, suppose a user has created a simple DeepBird model
defined as a Python file which can be trained given some parameters. To apply
hyperparameter tuning, the model needs to be wrapped in a SubDAG that exposes
the parameters to the tuner.

https://blog.twitter.com/engineering/en_us/topics/insights/2018/ml-workflows.html 8/11
2020/11/2 Productionizing ML with workflows at Twitter

This pattern is used so often it is included in ML Workflows as a general DAG


constructor for the purpose of hyperparameter tuning, aptly named
DeepBirdWorkflowDag.

HyperparameterTuner is then given this DAG, along with a


HyperpameterTuningSpec to build the hyperparameter tuning workflow

The HyperparameterTuningSpec specifies the hyperparameters, tuning method,


optimizing metric, and the number of experiments. This resulting DAG is generated:

Zooming into batch_workflow_training_subdag, the individual experiments can be


observed:

Here, each batch_workflow_subdag_op represents an experiment and is a DAG


constructed from DeepBirdWorkflowDag.

After running this workflow, the results are recorded and sent to the user.

Impact
The ML Workflows product has been adopted by several internal teams so far and
has delivered immediate impact. One example is in the Timelines Quality team
(https://blog.twitter.com/engineering/en_us/topics/insights/2017/using-deep-learning-at-scale-in-twitters-
which adopted ML Workflows and as a result reduced the interval for
timelines.html)
retraining and deploying their models to production from four weeks to one week.
The team also ran an online experiment to examine the difference of having the
models retrained more often. The result of the experiment was positive, indicating
that shorter retraining intervals provide better timeline quality and ranking for our
users.

Several teams have also begun to experiment with hyperparameter tuning on top of
ML Workflows and are seeing early results. As an example, the Abuse and Safety
team applied our hyperparameter tuning tools to one of their tweet-based models,
which allowed them to automatically run a number of experiments and return an
improved model based on offline metrics.

https://blog.twitter.com/engineering/en_us/topics/insights/2018/ml-workflows.html 9/11
2020/11/2 Productionizing ML with workflows at Twitter

As teams have adopted ML Workflows, the common Python operators and utility
functions have grown and teams are benefiting from reusing common components to
construct their workflows. Teams have adopted the DAG constructor pattern making
it easy to run workflows with different parameters. As more teams at Twitter continue
to adopt ML Workflows, we expect to see a large-scale, positive impact on
engineering productivity, iteration speed, and model quality.

Future work
While the initial results of ML Workflows are exciting, there is a lot of work for the
team ahead:

Broad adoption - Adoption by teams is expected to continue over the course of


the year with a goal of broad adoption by the end of the year. The team actively
supports onboarding customers, provides daily support on Slack, and offers
office hours to help teams.
ML Dashboard integration - Cortex is working to develop a unified UI layer for
all of its products including integrations with ML Workflows. In the meantime,
the team has already added UI functionality directly to Airflow as shown by our
custom plugin.
Hyperparameter tuning - So far we have deployed random search and grid
search techniques. Our goal is to create a better integration to leverage more
advanced parameter space search techniques such as Bayesian optimization.
Production hardening - Our goal is to deliver a reliable production system. That
means investing in metrics, robustness, and production deployment
strategies.
Managed Airflow - Currently teams need to deploy their own ML Workflows
instances under their respective service accounts. In the future we would like to
offer a more centralized solution in order to further eliminate maintenance
costs.

Acknowledgments
We would like to thank Paweł Zaborski and Stephan Ohlsson for spearheading the
integration of Airflow at Twitter. Major thanks go to the Cortex Platform Tools team for
initiating the analysis of alternatives, developing the design document, and
integrating Airflow into the Twitter stack: Devin Goodsell, Jeshua Bratman, Bill
Darrow, Newton Le, Xiao Zhu, Yu Zhou Lee, Samuel Ngahane, Matthew Bleifer,
Andrew Wilcox, Daniel Knightly, and Masoud Valafar. Honorable mention to

https://blog.twitter.com/engineering/en_us/topics/insights/2018/ml-workflows.html 10/11
2020/11/2 Productionizing ML with workflows at Twitter

management for supporting us in this endeavor: Nicolas Koumchatzky and Sandeep


Pandey. And finally, a special thanks for all the Tweeps that contributed feedback
during our beta offering.

(https://www.twitter.com/unclesam)
Samuel Ngahane

(https://www.twitter.com/devingoodsell)
Devin Goodsell

https://blog.twitter.com/engineering/en_us/topics/insights/2018/ml-workflows.html 11/11

You might also like