XFaas: a Global Scale Serverless Platform
Serverless computing and Function as a Service (FaaS), popularized by AWS Lambda, have emerged as a transformative paradigm in cloud computing. FaaS allows developers to execute code in response to events, offloading server infrastructure management to the cloud provider. This approach enables greater agility, faster development cycles, and cost-efficiency by scaling resources only when needed. In essence, FaaS empowers developers to focus solely on writing code, while the cloud provider handles the operational complexities.
My team in Meta Infrastructure has built a suite of serverless computing platforms from the ground up in our private cloud, including FaaS, a prioritized message queue, and workflows. We presented xFaaS, our serverless function platform, in a paper at the 2023 SOSP conference. This system is characterized by its planetary scale (running across regions with trillions of daily function calls) and high performance (low startup cost and sustained high hardware utilization).
Starting from Asynchronous Execution
Our journey to a general serverless platform began with a need for asynchronous execution. Ten years ago, Facebook's primary application tier comprised servers running Hack code. Most of this logic was driven by live user requests, such as handling social network updates and processing user interactions (posts, likes, comments, etc.). Many operations, like posting (which involves committing writes to TAO, our social graph storage and cache), had to complete before servers responded to clients. However, some operations, like notifying friends about posts, could be handled asynchronously without blocking the user's flow. To reduce latency and increase server throughput, we developed a system to queue these operation requests to a database and replay them to a separate tier with special setup, removing them from the critical path user sessions.
This system proved effective, and we identified more use cases where services could offload execution asynchronously, greatly decoupling services and enabling independent scaling of different tiers. Gradually, this evolved into multiple services, including generic function services, priority queues, workflows, and triggers.
AWS Lambda followed a similar evolution, transitioning from asynchronous event processing to a generic compute platform. Initially focused on event-driven tasks like processing data from S3 or handling events from other AWS services, Lambda's capabilities expanded significantly. It now supports a wide range of use cases, including web application backends, data processing pipelines, and machine learning inference.
Journey to the Serverless Platform
Our path to building a generic serverless platform involved several major steps:
Expanding language support: Starting with Hack (executed in HHVM), we added support for Python (executed in Instagram’s Django app servers), Haskell, Erlang (the primary language behind WhatsApp), and C++.
Separating components: We separated a priority-based durable queue, a workload management and scheduling system, and a worker node management system to improve modularity and enable independent development and scaling.
Building triggers: We built triggers to integrate with other Meta infrastructure, simplifying the development of event-driven applications through event subscription and handling.
Supporting diverse application models: We built systems to support application models like workflows.
The growth of FaaS within Meta has been exponential. As the paper illustrates, the number of daily function invocations in XFaas has increased 50x over the past 5 years, reaching trillions of calls daily. And we have achieved constant high CPU utilization (66% daily average) This tremendous growth underscores the importance of efficient resource management, which XFaas provides.
Unique Solutions and Innovations
We developed unique solutions to three key challenges in the FaaS landscape:
Cold Start Time: The time it takes to initialize a function can lead to poor user experience and inefficient resource utilization. Most multi-tenant FaaS platforms manage function lifetimes by starting and tearing down containers, loading dependencies, and warming up the runtime/cache. This process usually leads to cold start times and lower utilization. XFaaS solves this problem by utilizing a "universal worker" that can run any function instantly. While there is a warm-up cost at the beginning of the worker, it is not incurred for every function invocation. We use profile-guided optimization for JIT compilation and code-dependency-aware locality grouping to minimize this warm-up time.
High Variance of Load: The unpredictable nature of function calls can lead to either over-provisioning (wasting resources) or overload (impacting performance). The organic peak-to-trough ratio of function calls in XFaas can be as high as 4.3, which would necessitate excessive resource provisioning for peak times and result in low utilization during troughs. To smooth out the workload, XFaaS employs techniques including time-shift computing (deferring delay-tolerant functions), prioritization, global load balancing across regions, function quotas, and time-based scheduling.
Overloading Downstream Services: Function calls can sometimes overwhelm the services they depend on, causing performance bottlenecks and outages. XFaaS addresses this by using adaptive concurrency control, leveraging a global resource isolation and management system and a TCP-like congestion control mechanism to regulate function execution rates.
In essence, XFaaS demonstrates how a hyperscale FaaS platform can achieve high performance and efficiency. We hope our work provides valuable insights for the future of serverless computing. Please read the paper for details or watch the SOSP presentation.