Japan’s computer giant Fujitsu and Riken, the country’s largest research institute, have begun field-testing a prototype CPU for a next-generation supercomputer they believe will take the country back to the leading position in global rankings of supercomputer might.
The next-generation machine, dubbed the Post-K supercomputer, follows the two collaborators’ development of the 8 petaflops K supercomputer that commenced operations for Riken in 2012, and which has since been upgraded to 11 petaflops in application processing speed.
Now the aim is to “create the world’s highest performing supercomputer,” with “up to one hundred times the application execution performance of the K computer,” Fujitsu declared in a press release on 21 June. The plan is to install the souped-up machine at the government-affiliated Riken around 2021.
If the partners achieve those execution speeds, that would place the Post-K machine in exascale territory (one exaflops being a billion billion floating point operations a second).
To do this, they have replaced the SPARC64 VIIIfx CPU powering the K computer with the Arm8A-SVE (Scalable Vector Extension) 512-bit architecture that’s been enhanced for supercomputer use, and which both Fujitsu and Riken had a hand in developing.
The new design runs on CPUs with 48 cores plus 2 assistant cores for the computational nodes, and with 48 cores plus 4 assistant cores for the I/O and computational nodes. The system structure uses 1 CPU per node, and 384 nodes make up one rack.
For strategic reasons, neither Fujitsu nor Riken will reveal how many nodes they are targeting with the Post-K. However, Satoshi Matsuoka, director of the Riken Center for Computational Sciences in Kobe, says, “It will be the largest Arm system in the world and in fact, likely the largest supercomputer in the world.”
For system interconnection, Fujitsu is employing its Tofu 6D Mesh/Torus topology originally created for the K computer.
Besides the adoption of a new CPU, several other key technologies are behind the Post-K’s ramp up in execution speed, says Matsuoka. Memory bandwidth has been increased by “more than an order of magnitude,” and network bandwidth has also significantly increased.
In addition, Fujitsu has enhanced the double-precision arithmetic performance of that found on the K computer. And to increase application versatility, it has also added support for half-precision floating-point arithmetic that reduces memory loads in applications like AI, where lower precision is acceptable, explains Koji Uchikawa, in Fujitsu’s business strategy and development division.
As well as adopting the Arm instruction-set architecture, Fujitsu worked with Arm Limited, the Cambridgeshire, England–based company that develops and licenses Arm technology, to implement new instructions for the scalable vector extension.
Moreover, Fujitsu has developed its own microarchitecture for the chip. Whereas a processor’s instruction-set architecture interfaces between the hardware and software to provide instructions to the processor, it does not define the chip’s internal structure. Rather, that is the job of the microarchitecture, and because it directly impacts the processor’s performance, Fujitsu believes this will be an important differentiating factor in its favor.
Riken and Fujitsu see several other advantages in adopting the new architecture, not least the design’s inherent power-saving features, such as power knobs that dial down the power in certain elements of the CPU when they are not needed. Consequently, Fujitsu is claiming a power consumption of just 30 to 40 megawatts compared to the K computer’s 12.7 MW—despite the Post-K’s target of delivering up to a hundredfold increase in application processing speed.
Both Fujitsu and Riken say they also intend to leverage Arm’s large software ecosystem. “We, Fujitsu, and other collaborators will drive the Arm ecosystem in the high-end server space,” says Riken’s Matsuoka. This, he adds, will help contribute to any commercial success Fujitsu has “in selling not only their systems but also the chip to external companies.”
At the same time, Fujitsu “will provide a compatible performance balance with the K computer so that current applications can be migrated after recompiling,” says Uchikawa.
But the supercomputer race is nothing if not a game of hopscotch.
For the first time in six years, the United States has just regained the top slot in global rankings of supercomputer performance with the newly installed Summit supercomputer in Oak Ridge National Laboratory, in Tennessee. According to June’s TOP500 assessment, Summit achieved a performance of 122.3 petaflops, bumping China’s Sunway to second place with a performance of 93 petaflops. Lawrence Livermore National Laboratory’s Sierra came in third with 71.6 petaflops.
So when the Post-K comes online around 2021, it will find no shortage of competitors vying for the leading position. Nevertheless, Riken’s Matsuoka brushes aside such comparisons. “Catalog flops is not our concern. For most applications, Post-K will likely exhibit the fastest time-to-solution and utmost scalability due to its brilliant memory and network bandwidths, as well as an outstanding power-efficient design.”
No doubt it won’t be long before competitors beg to differ.