Asynchronous Programming in Python
Quora's mission is to share and grow the world's knowledge, and to achieve that mission, we're continuing to roll out improvements that make Quora faster for our readers and writers. In our last post, Faster Paint Times, we covered our recent optimizations to client-side performance, and in this post, we'll discuss our recent improvements to server-side performance via our asynchronous programming framework.
To optimize the performance of both page loads and user actions, we use caching aggressively in order to ensure that access to frequently-used data is consistently fast. At Quora, we use memcached as our primary caching layer for data that's otherwise stored in a slower storage system such as MySQL. For example, when we render the list of people who have upvoted an answer, we need to fetch a mapping from users' IDs to users' names from our storage tier. Going straight to MySQL for each request would quickly overload our database, so we instead use memcached to cache this mapping. While issuing a separate memcached get for each user ID would be faster than querying MySQL, fetching all of the data with a single batched, multi-get is even faster.
Because networking is often the most expensive part of a memcached request (upwards of 80% of the total time in our environment), proper batching of cache gets is important to keeping Quora fast. However, it would be tedious and error-prone if developers had to manually specify how all data should be batch-retrieved from memcached. So, we've developed an abstraction called Asynq that makes it easy for developers to write batched cache requests, which we're open-sourcing today.
Priming
Before we developed Asynq, we used an approach we called priming to issue batch gets to memcached. Each time a developer wrote a function that accessed data, they would also write a separate priming function that specified all the data that would be accessed by that function. For example, suppose we had the following function that retrieves a list of names given a list of user IDs:
def get_names(uids):
# let's say name_of_user is cached
return [name_of_user(uid) for uid in uids]
The corresponding priming function would look like this:
def prime_get_names(uids):
for uid in uids:
prime_name_of_user(uid)
Code that calls get_names would be responsible for calling prime_get_names first, which would retrieve all necessary data from memcached using a multi-get. Then, that data would be stored in the server's local cache, so when the actual function (get_names in this example) was run, it wouldn't make any network requests at all! Essentially, then, each prime function was a way of representing the dependencies of a function, or the memcached keys that it depended on.
Priming got more complicated when we needed to call a cached function in order to decide what additional data to get from memcached. Consider the following model function:
def get_upvoter_names(aid):
# let's say upvoters_of_answer is cached as well
upvoter_uids = upvoters_of_answer(aid)
return [name_of_user(uid) for uid in upvoter_uids]
In this case, prime_get_upvoter_names needs to know upvoter_uids in order to call prime_name_of_user on those uids, but because upvoters_of_answer is cached, it also must be primed. So, the corresponding prime function would be significantly more complex:
@priming.generator
def prime_get_upvoter_names(aid):
prime_upvoters_of_answer(aid)
yield
# the above yield has already fetched upvoters_of_answer and put it
# in the local cache, so the call below is just going to hit local cache
upvoter_uids = upvoters_of_answer(aid)
for uid in upvoter_uids:
prime_name_of_user(uid)
We saw remarkable speed improvements by properly priming all our model calls, and as a result, we used our static analysis tools to enforce a rule that all data needed to render any page on Quora must be primed first. However, these speed gains came with a significant development overhead, as developers essentially needed to write (and call) all model functions twice. As our codebase grew, priming became tedious, hard to reason about, and error-prone.
Asynq
To address the complexity introduced by priming, we created a framework called Asynq, which takes a similar approach under the hood, but revamps the API so that cache gets are integrated into the model code itself. With Asynq, all functions that require data access are run by a scheduler that keeps track of their dependencies. When a function needs to retrieve data by calling some other function, it yields control to the scheduler and indicates the data that it needs to retrieve. The scheduler then stops execution of the function until it has resolved all of its dependencies.
With Asynq, the earlier get_names function looks like this:
from asynq import async
@async()
def get_names(uids):
names = yield [name_of_user.async(uid) for uid in uids]
return names
No additional priming function is necessary—all code is contained within the model function itself. So, not only do developers no longer need to write an entirely separate priming function, but they also don't need to remember to call that priming function each time the model function is called. The more complex get_upvoter_names example from before is also much simpler in Asynq:
from asynq import async
@async()
def get_upvoter_names(aid):
upvoter_uids = yield upvoters_of_answer.async(aid)
names = yield [name_of_user.async(uid) for uid in upvoter_uids]
return names
Async functions can be understood as creating a dependency graph: at its first yield, get_upvoter_names is dependent on upvoters_of_answer finishing. Similarly, upvoters_of_answer might have its own dependencies. The Asynq scheduler resolves this dependency graph executing async functions until all currently-executing functions are blocked on getting data from memcached. Then, the scheduler fetches data from memcached using a single multi-get and continues executing until async functions have finished.
Suppose we have an async function called model_call() that has three dependencies, each of which reads multiple memcached keys. A naive implementation would use three multi-gets (or even six single-gets), one for each dependency function, while the dependency graph resolved by the async scheduler might look like the below:
Our approach to asynchronous programming is different than other asynchronous Python libraries like asyncio, Twisted, gevent, and Tornado. While those libraries focus on asynchronous I/O, Asynq focuses instead on efficient batching. For example, a typical cache implementation using memcached and asyncio will resolve cache dependencies independently, such that each memcached get will spawn a separate request to memcached. With Asynq, dependencies will be batched into a single memcached multi-get, which can decrease the total amount of time blocked on I/O. Additionally, Asynq allows functions to be called either synchronously or asynchronously (via an added .async attribute), while asyncio requires all async def functions to be explicitly scheduled with asyncio.get_event_loop(). With Asynq's batching, we spend little time blocked on I/O, so adopting other asynchronous I/O libraries hasn't been a priority for us.
Compared to priming, Asynq provides a more generalized, concise, and principled way to enable batching. Because logic only needs to be implemented once, Asynq has significantly less development overhead compared to priming. Reducing the duplicative logic of priming also led to performance improvements, as we saw server-side speed wins as we migrated more of our codebase from priming to Asynq.
Migration and Learning
After developing the first version of Asynq, we set out to validate our design and implementation decisions by migrating several small parts of the codebase from priming to Asynq. In doing so, we discovered and fixed a variety of issues:
yield keyword, which made code hard to read. As an alternative solution, we back-ported support for returning a value from a generator—introduced in PEP 380—from Python 3 to Python 2.7, and we're currently using a patched version of Python 2.7 in our codebase where necessary. (Asynq also supports unpatched Python 2.7 by using a result function that raises an exception that's interpreted as the return value.)@async() decorator completely changed its interface—all callers had to use a special syntax to call async functions. This decision made it harder for priming and Asynq to co-exist in our codebase, since all developers needed to be aware of this difference. To fix this problem, we updated the @async() decorator to add a new .async attribute to all asynchronous functions, so calling a decorated function directly would still return the result.mock module extensively in our unit tests, but mocking async functions became difficult, as doing so required special handling of the .async attribute and return value. As a solution, we created a dedicated mocking function, asynq.mock.patch, that takes care of this automatically, making it much easier to mock async functions.After implementing the above improvements (along with many others), we decided to fully migrate our codebase from priming to Asynq. Having both abstractions exist in our codebase at once was detrimental to our development velocity, since engineers needed to context-switch between the two APIs depending on which model they were editing. We utilized our existing static analysis tools to automate the migration process, so our engineers needed only to verify the output of scripts and make small changes, rather than manually migrate code themselves.
Before embarking on a large-scale migration of our entire codebase, we completed a number of smaller “dry runs” with engineers on different teams, as a means of refining our automated migration tools and coming up with accurate scoping estimates. After completing several dry runs, we held a coordinated migration with around 30 engineers (i.e., 50% of our engineering team), during which we migrated the 15,000+ lines of priming in Quora's codebase over a period of just four days.
Today, our entire codebase uses only Asynq, and server-side development is much faster and less error-prone.
Open Sourcing Asynq (and Friends)
Asynq is available today on GitHub and PyPI. You can browse the source code or install Asynq via pip install asynq. Along with Asynq, we're also open-sourcing QCore (Asynq's only dependency), which is a collection of helpers that are used throughout Quora's codebase, including a decorator framework, an enum implementation, testing helpers, and an event implementation. Both Asynq and QCore are compatible with both Python 2.7 and Python 3, and more extensive documentation for Asynq can be found at the GitHub repository.
Moving forward, we're continuing to invest in efforts that make Quora faster for our users and make new features even easier for our engineers to develop. We're currently hiring Platform engineers to develop core frameworks and abstractions like Asynq, so check out our careers page if you'd like to share and grow the world's knowledge with us!