Cerebras Systems, which makes a specialized AI computer based on the largest chip ever made, is breaking out of its original role as a neural-network training powerhouse and turning its talents toward more traditional scientific computing. In a simulation having 500 million variables, the CS-1 trounced the 69th-most powerful supercomputer in the world.
It also solved the problem—combustion in a coal-fired power plant—faster than the real-world flame it simulates. To top it off, Cerebras and its partners at the U.S. National Energy Technology Center claim, the CS-1 performed the feat faster than any present-day CPU or GPU-based supercomputer could.
The research, which was presented this week at the supercomputing conference SC20, shows that Cerebras’ AI architecture “is not a one trick pony,” says Cerebras CEO Andrew Feldman.
Weather forecasting, design of airplane wings, predicting temperatures in a nuclear power plant, and many other complex problems are solved by simulating “the movement of fluids in space over time,” he says. The simulation divides the world up into a set of cubes, models the movement of fluid in those cubes, and determines the interactions between the cubes. There can be 1 million or more of these cubes and it can take 500,000 variables to describe what’s happening.
According to Feldman, solving that takes a computer system with lots of processor cores, tons of memory very close to the cores, oodles of bandwidth connecting the cores and the memory, and loads of bandwidth connecting the cores. Conveniently, that’s what a neural-network training computer needs, too. The CS-1 contains a single piece of silicon with 400,000 cores, 18 gigabytes of memory, 9 petabytes of memory bandwidth, and 100 petabits per second of core-to-core bandwidth.
Scientists at NETL simulated combustion in a powerplant using both a Cerebras CS-1 and the Joule supercomputer, which has 84,000 CPU cores and consumes 450 kilowatts. By comparison, Cerebras runs on about 20 kilowatts. Joule completed the calculation in 2.1 milliseconds. The CS-1 was more than 200-times faster, finishing in 6 microseconds.
This speed has two implications, according to Feldman. One is that there is no combination of CPUs or even of GPUs today that could beat the CS-1 on this problem. He backs this up by pointing to the nature of the simulation—it does not scale well. Just as you can have too many cooks in the kitchen, throwing too many cores at a problem can actually slow the calculation down. Joule’s speed peaked when using 16,384 of its 84,000 cores.
The limitation comes from connectivity between the cores and between cores and memory. Imagine the volume to be simulated as a 370 x 370 x 370 stack of cubes (136,900 vertical stacks with 370 layers). Cerebras maps the problem to the wafer-scale chip by assigning the array of vertical stacks to a corresponding array of processor cores. Because of that arrangement, communicating the effects of one cube on another is done by transferring data between neighboring cores, which is as fast as it gets. And while each layer of the stack is computed, the data representing the other layers reside inside the core’s memory where it can be quickly accessed.
(Cerebras takes advantage of a similar kind of geometric mapping when training neural networks. [See sidebar "The Software Side of Cerebras," January 2020.])
And because the simulation completed faster than the real-world combustion event being simulated, the CS-1 could now have a new job on its hands—playing a role in control systems for complex machines.
Feldman reports that the SC-1 has made inroads in the purpose for which it was originally built, as well. Drugmaker GlaxoSmithKline is a known customer, and the SC-1 is doing AI work at Argonne National Laboratory and Lawrence Livermore National Lab, the Pittsburgh Supercomputing Center. He says there are several customers he cannot name in the military, intelligence, and heavy manufacturing industries.
A next generation SC-1 is in the works, he says. The first generation used TSMC’s 16-nanometer process, but Cerebras already has a 7-nanometer version in hand with more than double the memory—40 GB—and the number of AI processor cores—850,000.