We now perform mathematical calculations so often and so effortlessly with digital electronic computers that it’s easy to forget that there was ever any other way to compute things. In an earlier era, though, engineers had to devise clever strategies to calculate the solutions they needed using various kinds of analog computers.
Some of those early computers were electronic, but many were mechanical, relying on gears, balls and disks, hydraulic pumps and reservoirs, or the like. For some applications, like the processing of synthetic-aperture radar data in the 1960s, the analog computations were done optically. That approach gave way to digital computations as electronic technology improved.
Curiously, though, some researchers are once again exploring the use of analog optical computers for a modern-day computational challenge: neural-network calculations.
The calculations at the heart of neural networks (matrix multiplications) are conceptually simple—a lot simpler than, say, the Fourier transforms needed to process synthetic-aperture radar data. For readers unfamiliar with matrix multiplication, let me try to de-mystify it.
A matrix is, well, a matrix of numbers, arrayed into rows and columns. When you multiply two matrices together, the result is another matrix, whose elements are determined by multiplying various pairs of numbers (drawn from the two matrices you started with) and summing the results. That is, multiplying matrices just amounts to a lot of multiplying and adding.
But neural networks can be huge, many-layer affairs, meaning that the arithmetic operations required to run them are so numerous that they can tax the hardware (or energy budget) that’s available. Often graphics processing units (GPUs) are enlisted to help with all the number crunching. Electrical engineers have also been busy designing all sorts of special-purpose chips to serve as neural-network accelerators, Google’s Tensor Processing Unit probably being the most famous. And now optical accelerators are on the horizon.
Two MIT spin-offs—Lightelligence and Lightmatter—are of particular note. These startups grew out of work on an optical-computing chip for neural-network computations that MIT researchers published in 2017.
More recently, yet another set of MIT researchers (including two who had contributed to the 2017 paper) has developed yet another approach for carrying out neural-network calculations optically. Although it’s still years away from commercial application, it neatly illustrates how optics (or more properly a combination of optics and electronics) can be used to perform the necessary calculations.
The new strategy is entirely theoretical at this point, but Ryan Hamerly, lead author on the paper that’s recently been published about the new approach, says, “We’re building a demonstration experiment.” And while it might take many such experiments and several years of chip development to really know whether it works, their approach, “promises to be significantly better than what can be done with current-day electronics,” according to Hamerly.
So how does the new strategy work? I’m not sure I could explain all the details even if I had the space, but let me try to give you a flavor here.
The necessary matrix multiplications can be done using three simple kinds of components: optical beam splitters, photodiodes, and capacitors. That sounds rather remarkable, but recall that matrix multiplications are really just a bunch of multiplications and additions. So all we really need here is an analog gizmo that can multiply two values together and another analog gizmo to sum up the results.
It turns out that you can build an analog multiplier with a beam splitter and a photodiode. A beam splitter is an optical device that takes two optical inputs and provides two optical outputs. If it is configured in a certain way, the amplitude of light that it outputs on one side will be the sum of the amplitudes of its two inputs; the amplitude of its other output will be the difference of the two inputs. A photodiode outputs an electronic signal that is proportional to the intensity of the light impinging on it.
The essential thing to realize here is that the intensity of light (a measure of the power it carries) is proportional to its amplitude squared. That’s key because if you square the sum of two light signals (let’s denote this as A + B), you will get A2 + 2AB + B2. If you square the difference of these same two light signals (A – B), you will get A2 – 2AB + B2. Subtract the latter from the former and you get 4AB, which you will notice is proportional to the product of the two inputs, A and B.
So by scaling your analog signals appropriately, a beam splitter and photodiode in combination can serve as an analog multiplier. What’s more, you can do a series of multiplications just by presenting the appropriate light signals, one after the other, to this kind of multiplier. Feed the series of electronic outputs of your multiplier into a capacitor and you’ll be adding up the results of each multiplication, forming the result you need to define one element in the product matrix. Rinse and repeat enough times, and you have just multiplied two matrices!
There are some other mathematical manipulations, too, that you’d need to run a neural network; in particular you have to apply a non-linear activation function to each neuron. But that can easily be done electronically. The question is what kind of signal-to-noise ratio a real device could maintain while doing all this, which will control the resolution of the calculations it performs. That resolution might not end up being very high. “That’s a downside of any analog system,” says Hamerly. Happily, at least for inference calculations (during which a neural network that has already been trained does its thing), relatively low resolution is normally fine.
It’s hard to know how fast an electro-optical accelerator chip designed along these lines would compute, explains Hamerly, because the metric normally used to judge such performance depends on both throughput and chip area, and he isn’t yet prepared to estimate what sort of area the chip he is envisioning would require. But he’s optimistic that this approach could slash the energy required for such calculations.
Indeed, Hamerly and his colleagues argue that their approach could use less energy than even the theoretical minimum for a gate-based digital device of equivalent accuracy—a value known as the Landauer limit. (It’s impossible to reduce the energy of computation to anything less than this limit without resorting to some form of reversible computing.) If that’s true for this or any other optical accelerator on the drawing board, many neural network calculations would no doubt be done using light rather than just electrons.
With the remarkable advances electronic computers have made over the past 50 years, optical computing never really gained traction, but maybe neural networks will finally provide the killer app for it. As Hamerly’s colleague and coauthor Liane Bernstein notes: “This could be the time for optics.”