Behind Intel’s HPC Chip that Will Pierce the Exascale Barrier

IEEE SpectrumFOR THE TECHNOLOGY INSIDER
TopicsAerospaceAIBiomedicalClimate TechComputingConsumer ElectronicsEnergyHistory of TechnologyRoboticsSemiconductorsTelecommunicationsTransportation
SectionsFeaturesNewsOpinionCareersDIYEngineering Resources
MoreNewslettersSpecial ReportsCollectionsExplainersTop Programming LanguagesRobots Guide ↗IEEE Job Site ↗
For IEEE MembersCurrent IssueMagazine ArchiveThe InstituteThe Institute Archive
For IEEE MembersCurrent IssueMagazine ArchiveThe InstituteThe Institute Archive
IEEE SpectrumAbout UsContact UsReprints & Permissions ↗Advertising ↗
Follow IEEE Spectrum
Support IEEE SpectrumIEEE Spectrum is the flagship publication of the IEEE — the world’s largest professional organization devoted to engineering and applied sciences. Our articles, videos, and infographics inform our readers about developments in technology, engineering, and science.
Subscribe
About IEEEContact & SupportAccessibilityNondiscrimination PolicyTermsIEEE Privacy PolicyCookie PreferencesAd Privacy Options
© Copyright 2025 IEEE — All rights reserved. A public charity, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity.

On Monday, Intel unveiled new details of the processor that will power the Aurora supercomputer, which is designed to become one of the first U.S.-based high-performance computers (HPCs) to pierce the exaflop barrier—a billion billion high-precision floating-point calculations per second. Intel Fellow Wilfred Gomes told engineers virtually attending the IEEE International Solid State Circuits Conference this week that the processor pushed Intel’s 2D and 3D chiplet integration technologies to the limits.

The processor, called Ponte Vecchio, is a package that combines multiple compute, cache, networking, and memory silicon tiles, or “chiplets.” Each of the tiles in the package is made using different process technologies, in a stark example of a trend called heterogeneous integration.

Ponte Vecchio is, among other things, a master class in 3D integration.

The result is that Intel packed 3,100 square millimeters of silicon—nearly equal to four Nvidia A100 GPUs—into a 2,330 mm² footprint. That’s more than 100 billion transistors across 47 pieces of silicon.

An illustration shows groupings of labelled blue, dark grey, and tan tiles on the left and their positions on a green circuitboard on the right. Ponte Vecchio is made of multiple compute, cache, I/O, and memory tiles connected using 3D and 2D technology.Source: Intel Corp.

Ponte Vecchio is, among other things, a master class in 3D integration. Each Ponte Vecchio processor is really two mirror image sets of chiplets tied together using Intel’s 2D integration technology Co-EMIB. Co-EMIB forms a bridge of high-density interconnects between two 3D stacks of chiplets. The bridge itself is a small piece of silicon embedded in a package’s organic substrate. The interconnect lines on silicon can be made narrower than on the organic substrate. Ponte Vecchio’s ordinary connections to the package substrate were 100 micrometers apart, whereas they were nearly twice as dense in the Co-EMIB chip. Co-EMIB dies also connect high-bandwidth memory (HBM) and the Xe Link I/O chiplet to the “base silicon,” the largest chiplet, upon which others are stacked.

An illustration shows an exploded view of the parts of a processor each represented by different colored rectangles. The parts of Ponte Vecchio.Source: Intel Corp.

Each set of eight compute tiles, four SRAM cache chiplets called RAMBO tiles, and eight blank “thermal” tiles meant to remove heat from the processor is connected vertically to a base tile. This base provides cache memory and a network that allows any compute tile to access any memory.

Notably, these tiles are made using different manufacturing technologies, according to what suited their performance requirements and yield. The latter term, the fraction of usable chips per wafer, is particularly important in a chiplet integration like Ponte Vecchio, because attaching bad tiles to good ones means you’ve ruined a lot of expensive silicon. The compute tiles needed top performance, so they were made using TSMC’s N5 (often called a 5-nanometer) process. The RAMBO tile and the base tile both used Intel 7 (often called a 7-nanometer) process. HBM, a 3D stack of DRAM, uses a completely different process than the logic technology of the other chiplets, and the Xe Link tile was made using TSMC’s N7 process.

Ponte Vecchio Foveros + EMIB Construction

An illustration shows a cut-through of a processor with large blue, black, and grey rectangles representing the silicon parts and copper showing the connections. The different parts of the processor are made using different manufacturing processes, such as Intel 7 and TSMC N5. Intel’s Foveros technology creates the 3D interconnects and its Co-EMIB makes horizontal connections.Source: Intel Corp.

The base die also used Intel’s 3D stacking technology, called Foveros. The technology makes a dense array of die-to-die vertical connections between two chips. These connections are just 36 micrometers apart and are made by connecting the chips “face to face”; that is, the top of one chip is bonded to the top of the other. Signals and power get into this stack by means of through-silicon vias, fairly wide vertical interconnects that cut right through the bulk of the silicon. The Foveros technology used on Ponte Vecchio is an improvement over the one used to make Intel’s Lakefield mobile processor, doubling the density of signal connections.

Expect the “zettascale” era of supercomputers to kick off sometime around 2028.

Needless to say, none of this was easy. It took innovations in yield, clock circuits, thermal regulation, and power delivery, Gomes said. In order to ramp performance up or down with need, each compute tile could run at a different voltage and clock frequency. The clock signals originate in the base die but each compute tile can runs at its own rate. Providing the voltage was even more complicated. Intel engineers chose to supply the processor with a higher than normal voltage (1.8 volts) so they could simplify the package structure due to the lower current needs. Circuits in the base tile reduce the voltage to something closer to 0.7 volts for use on the compute tiles, and each compute tile had to have its own power domain in the base tile. Key to this ability were new high-efficiency inductors called coaxial magnetic integrated inductors. Because these are built into the package substrate, the circuit actually snakes back and forth between the base tile and the package before supplying the voltage to the compute tile.

A micrograph at left shows grey and white layers of the processor calling out parts that control the flow of heat. The same parts are illustrate on the right. Getting the heat out of a complex 3D stack of chips was no easy feat.Source: Intel Corp.

Ponte Vecchio is meant to consume 600 watts, so making sure heat could be extracted from the 3D stack was always a high priority. Intel engineers used tiles that had no other function than to draw heat away from the active chiplets in the design. They also coated the top of the entire chiplet agglomeration in heat-conducting metal, despite the various parts having different heights. Atop that was a solder-based thermal interface material (STIM) and an integrated heat spreader. The different tiles each have different operating-temperature limits under liquid cooling and air cooling, yet this solution managed to keep them all in range, said Gomes.

“Ponte Vecchio started with a vision that we wanted democratize computing and bring petaflops to the mainstream,” said Gomes. Each Ponte Vecchio system is capable of more than 45 trillion 32-bit floating-point operations per second (teraflops). Four such systems fit together with two Sapphire Rapids CPUs in a complete compute system. These will be combined for a total exceeding 54,000 Ponte Vecchios and 18,000 Sapphire Rapids to form Aurora, a machine targeting 2 exaflops.

It’s taken 14 years to go from the first petaflop supercomputers in 2008—capable of one million billion calculations per second—to exaflops today, Gomes pointed out. A 1000-fold increase in performance “is a really difficult task, and it’s taken multiple innovations across many fields,” he said. But with improvements in manufacturing processes, packaging, power delivery, memory, thermal control, and processor architecture, Gomes told engineers, the next thousandfold increase could be accomplished in just six years rather than another 14.

From Your Site Articles

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Behind Intel’s HPC Chip that Will Pierce the Exascale Barrier

Ponte Vecchio packs in a lot of silicon to power the Aurora supercomputer

Ponte Vecchio Foveros + EMIB Construction

This AI Can Beat You At Rock-Paper-Scissors

NTT's Photonics to Slash Data Center Energy Use

Electric Salt Devices Make Low-Salt Food Tastier

Related Stories

Europe’s First Exascale Supercomputer Powers Up

New Fastest Supercomputer Will Simulate Nuke Testing

Industry Out of Phase With Supercomputers

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Enjoy more free content and benefits by creating an account

Saving articles to read later requires an IEEE Spectrum account

The Institute content is only available for members

Downloading full PDF issues is exclusive for IEEE Members

Downloading this e-book is exclusive for IEEE Members

Access to Spectrum 's Digital Edition is exclusive for IEEE Members

Following topics is a feature exclusive for IEEE Members

Adding your response to an article requires an IEEE Spectrum account

Create an account to access more content and features on IEEE Spectrum , including the ability to save articles to read later, download Spectrum Collections, and participate in conversations with readers and editors. For more exclusive content and features, consider Joining IEEE .

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to all of Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more about IEEE →

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to this e-book plus all of IEEE Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more about IEEE →

Access Thousands of Articles — Completely Free

Create an account and get exclusive content and features: Save articles, download collections, and post comments — all free! For full access and benefits, subscribe to Spectrum.

Behind Intel’s HPC Chip that Will Pierce the Exascale Barrier

Ponte Vecchio packs in a lot of silicon to power the Aurora supercomputer

Ponte Vecchio Foveros + EMIB Construction

This AI Can Beat You At Rock-Paper-Scissors

NTT's Photonics to Slash Data Center Energy Use

Electric Salt Devices Make Low-Salt Food Tastier

Related Stories

Europe’s First Exascale Supercomputer Powers Up

New Fastest Supercomputer Will Simulate Nuke Testing

Industry Out of Phase With Supercomputers