The December 2022 issue of IEEE Spectrum is here!

Close bar

Posits, a New Kind of Number, Improves the Math of AI

The first posit-based processor core gave a ten-thousandfold accuracy boost

4 min read
Squares with 0s and 1s form a colorful brain shape and blue background.
Hiroshi Watanabe/Getty Images

Training the large neural networks behind many modern AI tools requires real computational might: For example, OpenAI’s most advanced language model, GPT-3, required an astounding million billion billions of operations to train, and cost about US $5 million in compute time. Engineers think they have figured out a way to ease the burden by using a different way of representing numbers.

Back in 2017, John Gustafson, then jointly appointed at A*STAR Computational Resources Centre and the National University of Singapore, and Isaac Yonemoto, then at Interplanetary Robot and Electric Brain Co., developed a new way of representing numbers. These numbers, called posits, were proposed as an improvement over the standard floating-point arithmetic processors used today.

Now, a team of researchers at the Complutense University of Madrid have developed the first processor core implementing the posit standard in hardware and showed that, bit-for-bit, the accuracy of a basic computational task increased by up to four orders of magnitude, compared to computing using standard floating-point numbers. They presented their results at last week’s IEEE Symposium on Computer Arithmetic.

“Nowadays it seems that Moore’s law is starting to fade,” says David Mallasén Quintana, a graduate researcher in the ArTeCS group at Complutense. “So, we need to find some other ways of getting more performance out of the same machines. One of the ways of doing that is changing how we encode the real numbers, how we represent them.”

The Complutense team isn’t alone in pushing the envelope with number representation. Just last week, Nvidia, Arm, and Intel agreed on a specification for using 8-bit floating-point numbers instead of the usual 32-bit or 16-bit for machine-learning applications. Using the smaller, less-precise format improves efficiency and memory usage, at the cost of computational accuracy.

Real numbers can’t be perfectly represented in hardware simply because there are infinitely many of them. To fit into a designated number of bits, many real numbers have to be rounded. The advantage of posits comes from the way the numbers they represent exactly are distributed along the number line. In the middle of the number line, around 1 and -1, there are more posit representations than floating point. And at the wings, going out to large negative and positive numbers, posit accuracy falls off more gracefully than floating point.

“It’s a better match for the natural distribution of numbers in a calculation,” says Gustafson. “It’s the right dynamic range, and it’s the right accuracy where you need more accuracy. There’s an awful lot of bit patterns in floating-point arithmetic no one ever uses. And that’s waste.”

Posits accomplish this improved accuracy around 1 and -1 thanks to an extra component in their representation. Floats are made up of three parts: a sign bit (0 for positive, 1 for negative), several “mantissa” (fraction) bits denoting what comes after the binary version of a decimal point, and the remaining bits defining the exponent (2 exp).

Floating point number representation: sign, exponent, and fraction bits. Posit number representation: sign, regime, exponent, and fraction bits. Accuracy plot for floating point and posit: The accuracy of floats increases sharply and then remains flat over their dynamic range. Posit accuracy grows gradually and peaks above floating point when exponent is close to zero, then decays gradually for positives. This graph shows components of floating-point-number representation [top] and posit representation [middle]. The accuracy comparison shows posits’ advantage when the exponent is close to 0. Complutense University of Madrid/IEEE

Posits keep all the components of a float but add an extra “regime” section, an exponent of an exponent. The beauty of the regime is that it can vary in bit length. For small numbers, it can take as few as two bits, leaving more precision for the mantissa. This allows for the higher accuracy of posits in their sweet spot around 1 and -1.

Deep neural networks usually work with normalized parameters called weights, making them the perfect candidate to benefit from posits’ strengths. Much of neural-net computation is comprised of multiply-accumulate operations. Every time such a computation is performed, each sum has to be truncated anew, leading to accuracy loss. With posits, a special register called a quire can efficiently do the accumulation step to reduce the accuracy loss. But today’s hardware implements floats, and so far, computational gains from using posits in software have been largely overshadowed by losses from converting between the formats.

With their new hardware implementation, which was synthesized in a field-programmable gate array (FPGA), the Complutense team was able to compare computations done using 32-bit floats and 32-bit posits side by side. They assessed their accuracy by comparing them to results using the much more accurate but computationally costly 64-bit floating-point format. Posits showed an astounding four-order-of-magnitude improvement in the accuracy of matrix multiplication, a series of multiply-accumulates inherent in neural network training. They also found that the improved accuracy didn’t come at the cost of computation time, only a somewhat increased chip area and power consumption.

Although the numerical accuracy gains are undeniable, how exactly this would impact the training of large AIs like GPT-3 remains to be seen.

“It’s possible that posits will speed up training because you’re not losing as much information on the way,” says Mallasén “But these are things that that we don’t know. Some people have tried it out in software, but we also want to try that in hardware now that we have it.”

Other teams are working on their own hardware implementations to advance posit usage. “It’s doing exactly what I hoped it would do; it’s getting adopted like crazy,” Gustafson says. “The posit number format caught fire, and there are dozens of groups, both companies and universities, that are using it.”

This article appears in the December 2022 print issue as “A New Type of Number Improves the Math of AI.”

The Conversation (2)
Garrett Apple18 Oct, 2022
LS

Congratulations. You've rediscovered the Mu-law (and A-law) coding that has been used in telephone T-carrier for more than half a century. I also did a short article back then about something similar in the Proceedings of the IEEE, which I called "log-linear coding".

John Gustafson26 Sep, 2022
M

Technical details are available at posithub.org, or https://www.youtube.com/watch?v=aP0Y1uAA-2Y

Will AI Steal Submarines’ Stealth?

Better detection will make the oceans transparent—and perhaps doom mutually assured destruction

11 min read
A photo of a submarine in the water under a partly cloudy sky.

The Virginia-class fast attack submarine USS Virginia cruises through the Mediterranean in 2010. Back then, it could effectively disappear just by diving.

U.S. Navy

Submarines are valued primarily for their ability to hide. The assurance that submarines would likely survive the first missile strike in a nuclear war and thus be able to respond by launching missiles in a second strike is key to the strategy of deterrence known as mutually assured destruction. Any new technology that might render the oceans effectively transparent, making it trivial to spot lurking submarines, could thus undermine the peace of the world. For nearly a century, naval engineers have striven to develop ever-faster, ever-quieter submarines. But they have worked just as hard at advancing a wide array of radar, sonar, and other technologies designed to detect, target, and eliminate enemy submarines.

The balance seemed to turn with the emergence of nuclear-powered submarines in the early 1960s. In a 2015 study for the Center for Strategic and Budgetary Assessment, Bryan Clark, a naval specialist now at the Hudson Institute, noted that the ability of these boats to remain submerged for long periods of time made them “nearly impossible to find with radar and active sonar.” But even these stealthy submarines produce subtle, very-low-frequency noises that can be picked up from far away by networks of acoustic hydrophone arrays mounted to the seafloor.

Keep Reading ↓Show less
{"imageShortcodeIds":["30133857"]}