At Supercomputing 2019 in Denver, Colo., Cerebras Systems unveiled the computer powered by the world’s biggest chip. Cerebras says the computer, the CS-1, has the equivalent machine learning capabilities of hundreds of racks worth of GPU-based computers consuming hundreds of kilowatts, but it takes up only one-third of a standard rack and consumes about 17 kW. Argonne National Labs, future home of what’s expected to be the United States’ first exascale supercomputer, says it has already deployed a CS-1. Argonne is one of two announced U.S. National Laboratories customers for Cerebras, the other being Lawrence Livermore National Laboratory.
The system “is the fastest AI computer,” says CEO and cofounder Andrew Feldman. He compared it with Google's TPU clusters (the 2nd of three generations of that company’s AI computers), noting that one of those “takes 10 racks and over 100 kilowatts to deliver a third of the performance of a single [CS-1] box.”
The CS-1 is designed to speed the training of novel and large neural networks, a process that can take weeks or longer. Powered by a 400,000-core, 1-trillion-transistor wafer-scale processor chip, the CS-1 should collapse that task to minutes or even seconds. However, Cerebras did not provide data showing this performance in terms of standard AI benchmarks such as the new MLPerf standards. Instead it has been wooing potential customers by having them train their own neural network models on machines at Cerebras.
This approach is not unusual, according to analysts. “Everybody runs their own models that they developed for their own business,” says Karl Freund, an AI analyst at Moor Insights & Strategies. “That’s the only thing that matters to buyers.”
A blowout of the CS-1 shows that most of the system is devoted to powering and cooling the Wafer Scale Engine chip at the back left. Image: Cerebras Systems
Cerebras also unveiled some details of the software side of the system. The software allows users to write their machine learning models using standard frameworks such as Pytorch and Tensorflow. It then sets about devoting variously-sized portions of the wafer-scale engine to layers of the neural network. How does it do this? By solving an optimization problem in order to ensure that the layers all complete their work at roughly the same pace and are contiguous with their neighbors. The result: Information can flow through the network without any holdups.
The software can perform that optimization problem across multiple computers, allowing a cluster of computers to act as one big machine. Cerebras has linked as many as 32 CS-1s together to get a roughly 32-fold performance increase. This is in contrast with the behavior of GPU-based clusters, says Feldman. “Today, when you cluster GPUs, you don't get the behavior of one big machine. You get the behavior of lots of little machines.”
Argonne has been working with Cerebras for two years, said Rick Stevens, its director for computing, in a press release. “By deploying the CS-1, we have dramatically shrunk training time across neural networks, allowing our researchers to be vastly more productive to make strong advances across deep learning research in cancer, traumatic brain injury, and other areas important to society today and in the years to come.”
The CS-1’s first application is in predicting cancer drug response as part of a U.S. Department of Energy and National Cancer Institute collaboration. It is also being used to help understand the behavior of colliding black holes and the gravitational waves they produce. A previous instance of that problem required 1024 out of 4392 nodes of the Theta supercomputer.