Productionizing ML With Workflows at Twitter
Productionizing ML With Workflows at Twitter
Engineering
Back
Insights
Productionizing ML with
workflows at Twitter
By Samuel Ngahane and Devin Goodsell
Friday, 22 June 2018
https://blog.twitter.com/engineering/en_us/topics/insights/2018/ml-workflows.html 1/11
2020/11/2 Productionizing ML with workflows at Twitter
engineers would have to manually trigger, manage and record results from multiple
runs of a machine learning pipeline. This tedious and repetitive task reduces
engineering productivity and slows iteration time.
Airflow at Twitter
When we started building ML Workflows, our philosophy was to create a simple
solution that would solve most ML needs while reusing existing components and
open source technologies. Rather than reinvent the wheel, Cortex evaluated
technical solutions based on a simple Python API to describe workflow DAGs paired
with a backend job execution system.
Why Airflow?
Despite Aurora Workflows being integrated, we chose to base our product on Airflow
as it:
https://blog.twitter.com/engineering/en_us/topics/insights/2018/ml-workflows.html 2/11
2020/11/2 Productionizing ML with workflows at Twitter
Integration at Twitter
At Twitter, engineers generally authenticate with internal web services via Kerberos.
To support Kerberos in Airflow, we took advantage of the existing framework provided
by Flask-Login and created a child class of LoginManager. At initialization time, this
class verifies that the principal name is known to Kerberos. It then sets up before and
after-request filters that drive the GSSAPI process. On success, LDAP groups for the
user are queried and and carried along with the user object. This allows us to offer
group-based DAG-level control of actions or even visibility. Taken together, this
approach allows Airflow to neatly drop into our authentication and authorization
infrastructure with no fuss.
Self-service model
We are working towards a self-service model for teams to stand up their own
instances. Which provides user groups complete control of their instance and DAGs,
and it is much simpler to manage permissions and quotas of a single service
account. This approach also paves the path to a potential multi-tenant Airflow server
at Twitter.
https://blog.twitter.com/engineering/en_us/topics/insights/2018/ml-workflows.html 3/11
2020/11/2 Productionizing ML with workflows at Twitter
In our self-service model, each team’s deployment differs from another in only Aurora
configuration and DAG population. So rather than copy and paste of a whole lot of
config, we use a simple, short json file to specify the per-instance variables such as
owning service
account, allowed groups, backing database, Kerberos principal credentials, and DAG
folder locations. At start up time a simple command invoked through a custom CLI
generates the appropriate Aurora configurations to start the scheduler, web server,
and worker instances.
Stats integration
Airflow, like most python-based open sourced systems, uses the StatsClient from
statsd. Twitter uses StatsReceiver, a part of our open-source Finagle stack in
github/twitter/util (https://twitter.github.io/util/). However, the two models are different: in
util/stats, each metric is registered, which places it on a list for future collection; in
statsd, metrics are simply emitted and the right thing happens on the back end. To
solve this problem we built a bridge that recognizes new metrics and register them
as they appear. It is API compatible with StatsClient which allows us to inject an
instance of our bridge object into Airflow as it is starting. With metrics now collected
by our visualization system, we are able to provide a templated dashboard to simplify
creation of a monitoring dashboard for our self-service clients.
ML on Airflow
In order to make machine learning pipelines easy for our customers to build on
Airflow there were a series of extensions we needed to add or improve upon.
Custom ML operators
https://blog.twitter.com/engineering/en_us/topics/insights/2018/ml-workflows.html 4/11
2020/11/2 Productionizing ML with workflows at Twitter
Firstly, to help Twitter engineers run ML tasks, we developed reusable operators for
them, including:
Aurora (https://blog.twitter.com/engineering/en_us/a/2015/all-about-apache-
aurora.html) operators: At Twitter we use Aurora to schedule and run services and
jobs. We implemented a set of operators that allow users to run code in Aurora.
These operators are the foundation of many of our other operators.
DeepBird
(https://blog.twitter.com/engineering/en_us/topics/insights/2018/twittertensorflow.html) operators:
Being our core ML tool, we provide operators that run training process, launch
prediction service with trained models, and run load test on prediction service.
Together they allow our engineers to train and validate their models with
DeepBird from end to end.
Hyper parameter tuning operators: We created a set of operators to support
hyper parameter tuning. Paired with our DAG constructor classes they support
tuning hyper parameters of arbitrary DAG.
Utility operators: We also have operators for common tasks such as launching
CI jobs, managing files on HDFS, creating, updating and monitoring JIRA
tickets, sending information to our model metadata store, etc. When we identify
a widely required routine among our users, we consider adding it as an
operator. Contributing operators are not solely restricted to our team, but have
been made by our users as well.
We did not want our users to incur the cost of encountering errors while DAGs were
running due to compatibility in arguments being passed from one operator to another.
Thus we developed a type checking system for operators, which verifies input and
output data types of all operators in DAGs (args and XCOMs).
For example, suppose that we have two connected operators Foo and Bar in a DAG.
Foo outputs an integer XCOM value, and Bar takes that XCOM value but is
expecting a string. Our type checking system would raise an error to let the DAG
developer know the types do not match.
All our operators are built upon this type checking system and have their input and
output types declared through python decorators.
DAG constructors
https://blog.twitter.com/engineering/en_us/topics/insights/2018/ml-workflows.html 5/11
2020/11/2 Productionizing ML with workflows at Twitter
Our users often had DAGs which processed the same operations but with different
parameter values. To prevent having to create multiple DAGs they created a
constructor class around a DAG, and then exposed a dependent set of parameters
within it’s constructor.
We decided to support this pattern by making DAG constructors a first class citizen in
ML Workflows. We developed a DAG constructor base class which allows users to
declare a list of parameters of their DAG. For example:
We also developed a UI for our users to ad-hocly create instances of their DAGs with
different parameters:
https://blog.twitter.com/engineering/en_us/topics/insights/2018/ml-workflows.html 6/11
2020/11/2 Productionizing ML with workflows at Twitter
A python file is generated when a user creates a new DAG and is placed in Airflow’s
DAG_FOLDER which makes use of Airflow’s ability to automatically load new DAGs.
https://blog.twitter.com/engineering/en_us/topics/insights/2018/ml-workflows.html 7/11
2020/11/2 Productionizing ML with workflows at Twitter
While creating many additions to Airflow to better support our ML use cases on the
backend we also wanted to provide a nice UI layer to interact with certain features on
the frontend. In order to achieve this we developed a UI plugin that adds a Flask view
to the Airflow webserver. The view loads our JavaScript bundle, which is a small
single page web app built with React. The app contains user interfaces of our
features, such as hyper parameter tuning and DAG constructors, and calls Flask
endpoints declared in our plugin to fetch data and invoke corresponding
functionalities.
To make this more tangible, suppose a user has created a simple DeepBird model
defined as a Python file which can be trained given some parameters. To apply
hyperparameter tuning, the model needs to be wrapped in a SubDAG that exposes
the parameters to the tuner.
https://blog.twitter.com/engineering/en_us/topics/insights/2018/ml-workflows.html 8/11
2020/11/2 Productionizing ML with workflows at Twitter
After running this workflow, the results are recorded and sent to the user.
Impact
The ML Workflows product has been adopted by several internal teams so far and
has delivered immediate impact. One example is in the Timelines Quality team
(https://blog.twitter.com/engineering/en_us/topics/insights/2017/using-deep-learning-at-scale-in-twitters-
which adopted ML Workflows and as a result reduced the interval for
timelines.html)
retraining and deploying their models to production from four weeks to one week.
The team also ran an online experiment to examine the difference of having the
models retrained more often. The result of the experiment was positive, indicating
that shorter retraining intervals provide better timeline quality and ranking for our
users.
Several teams have also begun to experiment with hyperparameter tuning on top of
ML Workflows and are seeing early results. As an example, the Abuse and Safety
team applied our hyperparameter tuning tools to one of their tweet-based models,
which allowed them to automatically run a number of experiments and return an
improved model based on offline metrics.
https://blog.twitter.com/engineering/en_us/topics/insights/2018/ml-workflows.html 9/11
2020/11/2 Productionizing ML with workflows at Twitter
As teams have adopted ML Workflows, the common Python operators and utility
functions have grown and teams are benefiting from reusing common components to
construct their workflows. Teams have adopted the DAG constructor pattern making
it easy to run workflows with different parameters. As more teams at Twitter continue
to adopt ML Workflows, we expect to see a large-scale, positive impact on
engineering productivity, iteration speed, and model quality.
Future work
While the initial results of ML Workflows are exciting, there is a lot of work for the
team ahead:
Acknowledgments
We would like to thank Paweł Zaborski and Stephan Ohlsson for spearheading the
integration of Airflow at Twitter. Major thanks go to the Cortex Platform Tools team for
initiating the analysis of alternatives, developing the design document, and
integrating Airflow into the Twitter stack: Devin Goodsell, Jeshua Bratman, Bill
Darrow, Newton Le, Xiao Zhu, Yu Zhou Lee, Samuel Ngahane, Matthew Bleifer,
Andrew Wilcox, Daniel Knightly, and Masoud Valafar. Honorable mention to
https://blog.twitter.com/engineering/en_us/topics/insights/2018/ml-workflows.html 10/11
2020/11/2 Productionizing ML with workflows at Twitter
(https://www.twitter.com/unclesam)
Samuel Ngahane
(https://www.twitter.com/devingoodsell)
Devin Goodsell
https://blog.twitter.com/engineering/en_us/topics/insights/2018/ml-workflows.html 11/11