TLA+ Helps Programmers Squash Bugs Before Coding

The language from LaTeX’s inventor is already used by Amazon, Microsoft, and Oracle

3 min read
colorful text on a black background
iStock

Design is an essential part of the development process for many software engineers. Like an architect sketching blueprints, a programmer devises algorithms to support their code and creates models to envision how the different elements of their systems will work together. But what if programmers could test those algorithms and models to uncover design flaws before they could turn into bugs in the written code?

That’s the goal with TLA+, an open-source, high-level language for modeling software programs and hardware systems. Its underlying logic is based on the temporal logic of actions (TLA), a mathematical way to reason about the correctness of concurrent algorithms. Both TLA and TLA+ were developed by Leslie Lamport, a distinguished scientist at Microsoft Research who is best known for inventing LaTeX, a document-preparation system for scientific papers. Lamport also won the 2013 A.M. Turing Award from the Association for Computing Machinery for his work on clarifying the behavior of distributed systems.

Lamport is quick to note that TLA+ is not a programming language but a specification language. “It’s describing the program at a higher level of abstraction—what it is supposed to do and how it’s supposed to do it,” he says.

This makes TLA+ valuable for verifying that a program’s design or supporting algorithm is valid, a feature made possible by the TLA+ model checker. After creating specifications and writing models on TLA+, engineers can run everything through the model checker to find and fix design errors before they are implemented into code.

The language was first used in the industry to model hardware, particularly at Intel. “It was initially appealing to hardware engineers because [they] were used to the idea of describing things precisely above the circuit level,” says Lamport. “TLA+ gave them a language in which they could express their high-level circuit designs rigorously and check them.”

While it was first used in the hardware sector, TLA+ is specifically geared toward concurrent programs and distributed systems, with an emphasis on correctness rather than speed. “It’s not about how to compute something faster, but how to get the processes to interact with each other so that they are computing the right thing,” says Lamport. “As systems get larger, more and more engineers realize the importance of getting those high-level designs correct.”

Some of today’s tech giants have integrated TLA+ into their development processes. Amazon, for instance, has used TLA+ to test its cloud-computing services, as well as search for hard-to-find yet fundamental algorithm flaws in its distributed systems and optimize performance without sacrificing correctness. Microsoft has applied a similar approach to its Azure cloud-computing platform. Engineers at Oracle have, with the help of TLA+, ensured that their distributed-systems designs are correct. Meanwhile, the group that developed Virtuoso, a real-time operating system used in the European Space Agency’s Rosetta spacecraft, created a model of the next operating system, OpenComRTOS, on TLA+. The resulting codebase is about one-tenth the size of its predecessor, according to Lamport.

Yet the language is not only for building large, complex systems. “If you’re a programmer writing a piece of concurrent code, you can use TLA+ for that particular algorithm to make sure it is correct and code it,” Lamport says.

The challenge, however, can be getting started. As TLA+ is math-based, it comes with a difficult learning curve and might appear intimidating to software engineers. Lamport has created some resources that could help, but programmers may find it easier to begin with PlusCal, a programming language he developed for writing algorithms. A PlusCal algorithm is designed so that it can be translated into a TLA+ model, which can in turn be reviewed using the model checker.

Lamport has passed on the responsibility of maintaining TLA+ to the TLA+ Foundation, whose inaugural members include Amazon, the Linux Foundation, Microsoft, and Oracle. The foundation aims to “promote adoption, provide education and training resources, fund research, develop tools, and build a community of TLA+ practitioners,” as well as “ensure the continuous improvement and evolution of the TLA+ language.”

Lamport believes that the current biggest need for the language is a tool that could translate a high-level TLA+ design directly into code. But for now, he hopes the language will help engineers in their day-to-day work of developing software or hardware. “It improves the way you can design a system,” he says. “It gets you thinking outside the box and changes the way you think.”

The Conversation (1)
Moshe Waisberg
Moshe Waisberg01 Jun, 2023
INDV

Most of a system's bugs are at the lower implementation level.