Irvine, Calif.–based Syntiant thinks it can use embedded flash memory to greatly reduce the amount of power needed to perform deep-learning computations. Austin, Texas–based Mythic thinks it can use embedded flash memory to greatly reduce the amount of power needed to perform deep-learning computations. They both might be right.
A growing crowd of companies are hoping to deliver chips that accelerate otherwise onerous deep learning applications, and to some degree they all have similarities because “these are solutions that are created by the shape of the problem,” explains Mythic founder and CTO Dave Fick.
When executed in a CPU, that problem is shaped like a traffic jam of data. A neural network is made up of connections and “weights” that denote how strong those connections are, and having to move those weights around so they can be represented digitally in the right place and time is the major energy expenditure in doing deep learning today.
“Our approach is to completely eliminate both memory bandwidth and memory power penalties by doing computation in memory,” explains Syntiant CEO Kurt Busch.
In both companies’ approaches, the network weights are actually levels of charge stored in an array of flash memory cells. The charge alters the amount of current that flows through the cell, and the cells are arranged in a way that the current produces the crucial “multiply and accumulate” computations needed for a network to tell a stop sign from a sunset, or “OK Google” from “big gray poodle.”
Because the weights are always just where they need to be, there’s no need to expend any time or energy to move them. The analog nature of the computation also keeps power low. While training neural networks is typically done by computing with fairly precise (8- or 16-bit) numbers, actually using the trained network—called inferencing—can be done faster and at lower power using much-less precise—5-bit or even 3-bit—numbers as the weights. “With analog computation, you can build multiply and accumulate that is low precision but very, very accurate,” says Busch.
Mythic is aiming for a mere 0.5 picojoules per multiply and accumulate, which would result in about 4 trillion operations per watt (TOPS/W). Syntiant is hoping to get to 20 TOPS/W. An Nvidia Volta V100 GPU can do 0.4 TOPS/W, according to Syntiant. However, real apples-to-apples comparisons in the machine learning world are difficult to determine, Fick points out.
The extent of analog circuitry each startup uses is a key difference between them. Syntiant’s entire network is analog, but Mythic surrounds the analog flash array with programmable digital circuitry. Mythic uses the surrounding circuitry to add flexibility to the size and types of networks its chips can run. “All network topologies run at approximately the same efficiency on our chip,” says Fick.
This difference also influences the two companies’ target customers and applications. At Syntiant “we often say that Mythic comes up at 100 percent of investment meetings and [with] zero percent of customers,” says Busch. Both companies say they are seeking to add AI at “the edge.” But that’s a broad category that includes everything from self-driving cars to AI-enhanced hearing aids.
Syntiant is going after smaller, often-milliwatt power applications. It’s first device will do things like spotting wake words and identifying speakers.
Mythic is going for applications that need more complex networks capable of handling high-resolution video in systems with low-single-digit watt power, such as autonomous drones and smartphones. Fick says there are orders of magnitude differences between the operations per second needed for these applications and those Syntiant is after.