You only needed one tomato. Unfortunately, the three people in line ahead of you at the supermarket have loaded their carts like they’re on an episode of "Doomsday Preppers." You scan the other checkout lanes, but even the express line is backed up because some dude insists on paying in spare change.
A similar problem bedevils the cloud services industry. Google Maps, for example, must parse simple tasks—for instance, directions to a friend's house—without getting bogged down by those more complicated—go to a friend's house, but avoid highways and add a stop at the supermarket to pick up some tomatoes. It, and other cloud-based services, accomplish this thanks to tools that send incoming tasks to different servers, effectively forming queues according to size and complexity.
Now, the developers of a new task scheduling tool say their program beats the competition at managing and prioritizing such tasks in an efficient way.
Their system, an algorithm coupled with a bespoke processor, is called Q-Zilla. It moves tasks from one server queue to the next, so simple tasks don't have to wait behind those requiring more time.
In the software world, engineers refer to a system's slowest possible response times as its tail latency. On a single computer, tail latency isn't a big deal. For instance, if you are copying files from one hard drive to another, the kilobyte-sized PDF files will copy so quickly that you won't notice how long it takes for chunky video files to transfer. The differences average out.
That doesn’t work for the cloud. "When we are dealing with millions of users for a cloud service, we cannot average out one user's dissatisfaction with another user's satisfaction," says Amirhossein Mirhosseini, a senior Ph.D. candidate at the University of Michigan who developed Q-Zilla with colleagues.
The most straightforward fix would be to install faster processors. But Moore's Law moves at its own pace. Besides, there's a more efficient solution. In a server, each processor core can serve its own queue of tasks. Software developers have even created algorithms that will send easy-to-process tasks (called “mice,” in queueing theory jargon) to express lanes. Meanwhile, the more rare, bulky, and complex tasks (“elephants,” to extend the metaphor) get steered to their own dedicated queues.
One of the most well-known scheduling algorithms is called Size-Interval Task-Assignment, or SITA. But it sorts tasks as mouse or elephant as they arrive. What if a task is an elephant wearing mouse ears (like the guy counting out pennies in the Express checkout)?
"If you look at the distribution of response times for a cloud processor, the slowest tasks to resolve are not the elephants," says Mirhosseini. "It's the mice that get stuck behind the elephants."
If only SITA could move those elephants out of the way! Q-Zilla features an improved SITA algorithm, with the not-improved name of Server-Queue Decoupled-SITA (SQD-SITA). It automatically classifies every incoming task as a mouse and puts it in the express queue. Then, every few microseconds, a timer goes off. Any task at the head of a queue that hasn't processed by this deadline is by definition an elephant, and SQD-SITA removes it from the express lane and puts it into a different server queue. If the task isn't complete when the timer goes off again, SQD-SITA moves it again. That way, mice—or even a Dumbo—doesn't get blocked by a woolly mammoth.
Switching tasks from server lane to server lane has its own costs, though, and at some point, it becomes more efficient to process the mammoth. So, the team programmed an upper limit to the number of times a task can be rescheduled.
The group made a variant of their algorithm, called Interruptible-SQD-SITA that does this process in reverse, too. If an express lane clears all its tasks, the algorithm will start herding elephants back through. And if an elephant lane is empty, it will begin serving mice.
Mirhosseini says SQD-SITA and its variant will work on any server, but he and his colleagues are targeting microservice computer architectures. Microservices started gaining popularity about a decade ago. Before that, most computer systems were so-called monoliths built using the same hardware and the same programming language. The problem with monoliths is, well, they tend to be monolithic. They are good at being general-purpose, or highly specialized, but not so great at doing anything in between, or at scaling up into monstrous server farms that companies need to compete in today’s cloudified world.
Microservice systems, by contrast, are generalists built from a mosaic of specialized systems. Each of these systems uses whatever programming language and hardware best serve its goal, and they all communicate using APIs. Microservice systems are ideal can process tasks an order of magnitude faster than monoliths and are the magic ingredient in streaming and other cloud services—Amazon and Netflix were early adopters.
The fact that microservices are so fast means the cloud giants are in a constant race to shave microseconds off their task resolution time. This is why Mirhosseini and his colleagues built a custom processor for Q-Zilla. Called CoreZilla, it optimizes SQD-SITA's algorithmic scheduling. In their research, Q-Zilla (running on CoreZilla) outperformed the competition, on average, at most microservice workloads they tested. The group will present its work at the IEEE International Symposium on High-Performance Computer Architecture in February.
At this point, Q-Zilla's performance jumps have only been verified through simulations. It has yet to get stress-tested out in the wild and woolly world of cloud services. "We are talking to cloud provider companies to see how it fits into their services," says Mirhosseini. He was mum on which cloud providers were interested, but one of his co-authors lists his affiliation as Amazon Web Services, so that might be a clue.
This post was updated on 3 February 2020.