What if any desktop PC could become an AI inference beast with a single upgrade? And what if that transformed beast still sipped power like it was enjoying a martini?
That’s the idea pitched by Neuchips, a Taiwanese startup founded in 2019 and known for delivering top-class AI efficiency. It came to CES Unveiled 2024—the media pregame show before the main event—with a PCIe add-on card that can upgrade the AI capabilities of a typical desktop computer while adding just 55 watts to the PC’s power budget.
It’s not just a concept. The card was plugged into a desktop computer on the show floor and offered real-time, offline conversation with a chatbot powered by Meta’s popular Llama 2 7B large language model (Neuchips says the card will also run Llama 2 13B).
Neuchips’ card, the Evo PCIe accelerator, is built around the company’s Raptor Gen AI accelerator chip. The Raptor chip delivers up to 200 teraoperations per second measured with Meta’s DLRM benchmark, and the company says it’s optimized for transformer-based models..
The card that Neuchips demonstrated had the Raptor chip, but a single chip isn’t the card’s final form. Neuchips’ CEO Ken Lau, an Intel veteran of 26 years, says Raptor can be used to design cards with varying numbers of chips onboard.
“The chip is actually scalable,” says Lau. “So we start with one chip. And then we have four chips. And then eight chips.” Each chip provides up to 200 trillion operations per second (TOPS), according to Neuchip’s press release. The card also carries 32 GB of LPDDR5 memory and reaches 1.6 terabits of memory bandwidth. Memory bandwidth is important, because it’s often a factor when handling AI inference on a single PC.
Neuchips wants to give owners the tools needed to use the card effectively as well, although with many months until release the details here remain a bit sparse. A Neuchips representative said the company has compiler software and will provide a driver. The demonstration I saw had a custom interface for interacting with the Llama 2 7B model. Neuchips’ card was running, but it appeared bare-bones.
A focus on efficiency
There’s already hardware that anyone can plug into a desktop’s PCIe slot to greatly improve AI performance. It’s called a GPU, and Nvidia has a stranglehold on the market. Going toe-to-toe with Nvidia on performance would be difficult. In fact, Nvidia announced new cards with a focus on AI at CES 2024; the RTX 4080 Super, which will retail for US $999 starting on 31 January, quotes AI performance of up to 836 TOPS.
Neuchips, however, sees an opening. “We are focused on power efficiency,” says Lau, “and on handling the many different models that are out there.”
Modern graphics cards are powerful, but also power hungry. The RTX 4080 Super can draw up to 320 W of power and will typically require a computer with a power supply that can deliver at least 750 W. Neuchips’ Evo PCIe accelerator, by contrast, consumes just 55 W of power. It consumes so little power, in fact, that the card Neuchips demonstrated at CES didn’t have an external PCIe power connection. Such connectors are a must for most GPU cards.
I was also told that the final card, which should ship in the latter half of 2024, will be roughly half the size of the card shown at CES. That’s an important detail, as the card I saw was as large as most current Nvidia GPU cards, and too large to fit most small form-factor desktop computers. A smaller card would make the Evo PCIe accelerator usable in a wide range of modern PC hardware.
Neuchips’ accelerator, though perhaps the most high-profile AI accelerator card at CES 2024, was far from alone at the show. Several startups came with their own AI accelerators packing unique features. Panmnesia won a CES Innovation Award for an AI accelerator that includes a Compute eXpress Link interface for access to huge pools of memory. Other companies with AI accelerators include DeepX and MemryX. Intel and AMD are in on it, too; each offers an AI accelerator in its latest CPU architecture.
Make no mistake: Nvidia remains the 800-pound gorilla in this arena, and that’s not going to change overnight. Still, new AI accelerators like Neuchips’ Raptor and the Evo PCIe card look ready to deliver new options for developers who don’t care about graphics or have a need for improved power efficiency while running AI inference.
Neuchips’ Evo PCI accelerator is due for full release in the second half of 2024. Pricing remains to be announced.
This post was update on 12 January to clarify benchmark operation speeds and correct the system’s memory bandwidth.