# A Neural-Net Based on Light Could Best Digital Computers

## Researchers turn to optical computing to carry out neural-network calculations

Photo: iStockphoto

We now perform mathematical calculations so often and so effortlessly with digital electronic computers that it’s easy to forget that there was ever any other way to compute things. In an earlier era, though, engineers had to devise clever strategies to calculate the solutions they needed using various kinds of analog computers.

Some of those early computers were electronic, but many were mechanical, relying on gears, balls and disks, hydraulic pumps and reservoirs, or the like. For some applications, like the processing of synthetic-aperture radar data in the 1960s, the analog computations were done optically. That approach gave way to digital computations as electronic technology improved.

Curiously, though, some researchers are once again exploring the use of analog optical computers for a modern-day computational challenge: neural-network calculations.

The calculations at the heart of neural networks (matrix multiplications) are conceptually simple—a lot simpler than, say, the Fourier transforms needed to process synthetic-aperture radar data. For readers unfamiliar with matrix multiplication, let me try to de-mystify it.

A matrix is, well, a matrix of numbers, arrayed into rows and columns. When you multiply two matrices together, the result is another matrix, whose elements are determined by multiplying various pairs of numbers (drawn from the two matrices you started with) and summing the results. That is, multiplying matrices just amounts to a lot of multiplying and adding.

But neural networks can be huge, many-layer affairs, meaning that the arithmetic operations required to run them are so numerous that they can tax the hardware (or energy budget) that’s available. Often graphics processing units (GPUs) are enlisted to help with all the number crunching. Electrical engineers have also been busy designing all sorts of special-purpose chips to serve as neural-network accelerators, Google’s Tensor Processing Unit probably being the most famous. And now optical accelerators are on the horizon.

Two MIT spin-offs—Lightelligence and Lightmatter—are of particular note. These startups grew out of work on an optical-computing chip for neural-network computations that MIT researchers published in 2017.

More recently, yet another set of MIT researchers (including two who had contributed to the 2017 paper) has developed yet another approach for carrying out neural-network calculations optically. Although it’s still years away from commercial application, it neatly illustrates how optics (or more properly a combination of optics and electronics) can be used to perform the necessary calculations.

A new photonic chip design drastically reduces the energy needed to compute with light. Simulations suggest it could run optical neural networks 10 million times more efficiently than its electrical counterparts.Illustration: MIT

The new strategy is entirely theoretical at this point, but Ryan Hamerly, lead author on the paper that’s recently been published about the new approach, says, “We’re building a demonstration experiment.” And while it might take many such experiments and several years of chip development to really know whether it works, their approach, “promises to be significantly better than what can be done with current-day electronics,” according to Hamerly.

So how does the new strategy work? I’m not sure I could explain all the details even if I had the space, but let me try to give you a flavor here.

The necessary matrix multiplications can be done using three simple kinds of components: optical beam splitters, photodiodes, and capacitors. That sounds rather remarkable, but recall that matrix multiplications are really just a bunch of multiplications and additions. So all we really need here is an analog gizmo that can multiply two values together and another analog gizmo to sum up the results.

It turns out that you can build an analog multiplier with a beam splitter and a photodiode. A beam splitter is an optical device that takes two optical inputs and provides two optical outputs. If it is configured in a certain way, the amplitude of light that it outputs on one side will be the sum of the amplitudes of its two inputs; the amplitude of its other output will be the difference of the two inputs. A photodiode outputs an electronic signal that is proportional to the intensity of the light impinging on it.

The essential thing to realize here is that the intensity of light (a measure of the power it carries) is proportional to its amplitude squared. That’s key because if you square the sum of two light signals (let’s denote this as A + B), you will get A2 + 2AB + B2. If you square the difference of these same two light signals (AB), you will get A2 2AB + B2. Subtract the latter from the former and you get 4AB, which you will notice is proportional to the product of the two inputs, A and B.

[shortcode ieee-pullquote quote=""This could be the time for optics."" float="left" expand=1]

So by scaling your analog signals appropriately, a beam splitter and photodiode in combination can serve as an analog multiplier. What’s more, you can do a series of multiplications just by presenting the appropriate light signals, one after the other, to this kind of multiplier. Feed the series of electronic outputs of your multiplier into a capacitor and you’ll be adding up the results of each multiplication, forming the result you need to define one element in the product matrix. Rinse and repeat enough times, and you have just multiplied two matrices!

There are some other mathematical manipulations, too, that you’d need to run a neural network; in particular you have to apply a non-linear activation function to each neuron. But that can easily be done electronically. The question is what kind of signal-to-noise ratio a real device could maintain while doing all this, which will control the resolution of the calculations it performs. That resolution might not end up being very high. “That’s a downside of any analog system,” says Hamerly. Happily, at least for inference calculations (during which a neural network that has already been trained does its thing), relatively low resolution is normally fine.

It’s hard to know how fast an electro-optical accelerator chip designed along these lines would compute, explains Hamerly, because the metric normally used to judge such performance depends on both throughput and chip area, and he isn’t yet prepared to estimate what sort of area the chip he is envisioning would require. But he’s optimistic that this approach could slash the energy required for such calculations.

Indeed, Hamerly and his colleagues argue that their approach could use less energy than even the theoretical minimum for a gate-based digital device of equivalent accuracy—a value known as the Landauer limit. (It’s impossible to reduce the energy of computation to anything less than this limit without resorting to some form of reversible computing.) If that’s true for this or any other optical accelerator on the drawing board, many neural network calculations would no doubt be done using light rather than just electrons.

With the remarkable advances electronic computers have made over the past 50 years, optical computing never really gained traction, but maybe neural networks will finally provide the killer app for it. As Hamerly’s colleague and coauthor Liane Bernstein notes: “This could be the time for optics.”

The Conversation (0)

## Why Functional Programming Should Be the Future of Software Development

### It’s hard to learn, but your code will produce fewer nasty surprises

Vertical
Shira Inbar
DarkBlue1

You’d expectthe longest and most costly phase in the lifecycle of a software product to be the initial development of the system, when all those great features are first imagined and then created. In fact, the hardest part comes later, during the maintenance phase. That’s when programmers pay the price for the shortcuts they took during development.

So why did they take shortcuts? Maybe they didn’t realize that they were cutting any corners. Only when their code was deployed and exercised by a lot of users did its hidden flaws come to light. And maybe the developers were rushed. Time-to-market pressures would almost guarantee that their software will contain more bugs than it would otherwise.

The struggle that most companies have maintaining code causes a second problem: fragility. Every new feature that gets added to the code increases its complexity, which then increases the chance that something will break. It’s common for software to grow so complex that the developers avoid changing it more than is absolutely necessary for fear of breaking something. In many companies, whole teams of developers are employed not to develop anything new but just to keep existing systems going. You might say that they run a software version of the Red Queen’s race, running as fast as they can just to stay in the same place.

It’s a sorry situation. Yet the current trajectory of the software industry is toward increasing complexity, longer product-development times, and greater fragility of production systems. To address such issues, companies usually just throw more people at the problem: more developers, more testers, and more technicians who intervene when systems fail.

Surely there must be a better way. I’m part of a growing group of developers who think the answer could be functional programming. Here I describe what functional programming is, why using it helps, and why I’m so enthusiastic about it.

## With functional programming, less is more

A good way to understand the rationale for functional programming is by considering something that happened more than a half century ago. In the late 1960s, a programming paradigm emerged that aimed to improve the quality of code while reducing the development time needed. It was called structured programming.

Various languages emerged to foster structured programming, and some existing languages were modified to better support it. One of the most notable features of these structured-programming languages was not a feature at all: It was the absence of something that had been around a long time— the GOTO statement.

The GOTO statement is used to redirect program execution. Instead of carrying out the next statement in sequence, the flow of the program is redirected to some other statement, the one specified in the GOTO line, typically when some condition is met.

The elimination of the GOTO was based on what programmers had learned from using it—that it made the program very hard to understand. Programs with GOTOs were often referred to as spaghetti code because the sequence of instructions that got executed could be as hard to follow as a single strand in a bowl of spaghetti.

Shira Inbar

The inability of these developers to understand how their code worked, or why it sometimes didn’t work, was a complexity problem. Software experts of that era believed that those GOTO statements were creating unnecessary complexity and that the GOTO had to, well, go.

Back then, this was a radical idea, and many programmers resisted the loss of a statement that they had grown to rely on. The debate went on for more than a decade, but in the end, the GOTO went extinct, and no one today would argue for its return. That’s because its elimination from higher-level programming languages greatly reduced complexity and boosted the reliability of the software being produced. It did this by limiting what programmers could do, which ended up making it easier for them to reason about the code they were writing.

Although the software industry has eliminated GOTO from modern higher-level languages, software nevertheless continues to grow in complexity and fragility. Looking for how else such programming languages could be modified to avoid some common pitfalls, software designers can find inspiration, curiously enough, from their counterparts on the hardware side.

## Nullifying problems with null references

In designing hardware for a computer, you can’t have a resistor shared by, say, both the keyboard and the monitor’s circuitry. But programmers do this kind of sharing all the time in their software. It’s called shared global state: Variables are owned by no one process but can be changed by any number of processes, even simultaneously.

Now, imagine that every time you ran your microwave, your dishwasher’s settings changed from Normal Cycle to Pots and Pans. That, of course, doesn’t happen in the real world, but in software, this kind of thing goes on all the time. Programmers write code that calls a function, expecting it to perform a single task. But many functions have side effects that change the shared global state, giving rise to unexpected consequences.

In hardware, that doesn’t happen because the laws of physics curtail what’s possible. Of course, hardware engineers can mess up, but not like you can with software, where just too many things are possible, for better or worse.

Another complexity monster lurking in the software quagmire is called a null reference, meaning that a reference to a place in memory points to nothing at all. If you try to use this reference, an error ensues. So programmers have to remember to check whether something is null before trying to read or change what it references.

Nearly every popular language today has this flaw. The pioneering computer scientist Tony Hoare introduced null references in the ALGOL language back in 1965, and it was later incorporated into numerous other languages. Hoare explained that he did this “simply because it was so easy to implement,” but today he considers it to be a “billion-dollar mistake.” That’s because it has caused countless bugs when a reference that the programmer expects to be valid is really a null reference.

Software developers need to be extremely disciplined to avoid such pitfalls, and sometimes they don’t take adequate precautions. The architects of structured programming knew this to be true for GOTO statements and left developers no escape hatch. To guarantee the improvements in clarity that GOTO-free code promised, they knew that they’d have to eliminate it entirely from their structured-programming languages.

History is proof that removing a dangerous feature can greatly improve the quality of code. Today, we have a slew of dangerous practices that compromise the robustness and maintainability of software. Nearly all modern programming languages have some form of null references, shared global state, and functions with side effects—things that are far worse than the GOTO ever was.

How can those flaws be eliminated? It turns out that the answer has been around for decades: purely functional programming languages.

Of the top dozen functional-programming languages, Haskell is by far the most popular, judging by the number of GitHub repositories that use these languages.

The first purely functional language to become popular, called Haskell, was created in 1990. So by the mid-1990s, the world of software development really had the solution to the vexing problems it still faces. Sadly, the hardware of the time often wasn’t powerful enough to make use of the solution. But today’s processors can easily manage the demands of Haskell and other purely functional languages.

Indeed, software based on pure functions is particularly well suited to modern multicore CPUs. That’s because pure functions operate only on their input parameters, making it impossible to have any interactions between different functions. This allows the compiler to be optimized to produce code that runs on multiple cores efficiently and easily.

As the name suggests, with purely functional programming, the developer can write only pure functions, which, by definition, cannot have side effects. With this one restriction, you increase stability, open the door to compiler optimizations, and end up with code that’s far easier to reason about.

But what if a function needs to know or needs to manipulate the state of the system? In that case, the state is passed through a long chain of what are called composed functions—functions that pass their outputs to the inputs of the next function in the chain. By passing the state from function to function, each function has access to it and there’s no chance of another concurrent programming thread modifying that state—another common and costly fragility found in far too many programs.

### Avoiding Null-Reference Surprises

A comparison of Javascript and Purescript shows how the latter can help programmers avoid bugs.

Functional programming also has a solution to Hoare’s “billion-dollar mistake,” null references. It addresses that problem by disallowing nulls. Instead, there is a construct usually called Maybe (or Option in some languages). A Maybe can be Nothing or Just some value. Working with Maybes forces developers to always consider both cases. They have no choice in the matter. They must handle the Nothing case every single time they encounter a Maybe. Doing so eliminates the many bugs that null references can spawn.

Functional programming also requires that data be immutable, meaning that once you set a variable to some value, it is forever that value. Variables are more like variables in math. For example, to compute a formula, y = x2 + 2x – 11, you pick a value for x and at no time during the computation of y does x take on a different value. So, the same value for x is used when computing x2 as is used when computing 2x. In most programming languages, there is no such restriction. You can compute x2 with one value, then change the value of x before computing 2x. By disallowing developers from changing (mutating) values, they can use the same reasoning they did in middle-school algebra class.

Unlike most languages, functional programming languages are deeply rooted in mathematics. It’s this lineage in the highly disciplined field of mathematics that gives functional languages their biggest advantages.

Why is that? It’s because people have been working on mathematics for thousands of years. It’s pretty solid. Most programming paradigms, such as object-oriented programming, have at most half a dozen decades of work behind them. They are crude and immature by comparison.

Imagine if every time you ran your microwave, your dishwasher’s settings changed from Normal Cycle to Pots and Pans. In software, this kind of thing goes on all the time.

Let me share an example of how programming is sloppy compared with mathematics. We typically teach new programmers to forget what they learned in math class when they first encounter the statement x = x + 1. In math, this equation has zero solutions. But in most of today’s programming languages, x = x + 1 is not an equation. It is a statement that commands the computer to take the value of x, add one to it, and put it back into a variable called x.

In functional programming, there are no statements, only expressions. Mathematical thinking that we learned in middle school can now be employed when writing code in a functional language.

Thanks to functional purity, you can reason about code using algebraic substitution to help reduce code complexity in the same way you reduced the complexity of equations back in algebra class. In non-functional languages (imperative languages), there is no equivalent mechanism for reasoning about how the code works.

## Functional programming has a steep learning curve

Pure functional programming solves many of our industry’s biggest problems by removing dangerous features from the language, making it harder for developers to shoot themselves in the foot. At first, these limitations may seem drastic, as I’m sure the 1960s developers felt regarding the removal of GOTO. But the fact of the matter is that it’s both liberating and empowering to work in these languages—so much so that nearly all of today’s most popular languages have incorporated functional features, although they remain fundamentally imperative languages.

The biggest problem with this hybrid approach is that it still allows developers to ignore the functional aspects of the language. Had we left GOTO as an option 50 years ago, we might still be struggling with spaghetti code today.

To reap the full benefits of pure functional programming languages, you can’t compromise. You need to use languages that were designed with these principles from the start. Only by adopting them will you get the many benefits that I’ve outlined here.

But functional programming isn’t a bed of roses. It comes at a cost. Learning to program according to this functional paradigm is almost like learning to program again from the beginning. In many cases, developers must familiarize themselves with math that they didn’t learn in school. The required math isn’t difficult—it’s just new and, to the math phobic, scary.

More important, developers need to learn a new way of thinking. At first this will be a burden, because they are not used to it. But with time, this new way of thinking becomes second nature and ends up reducing cognitive overhead compared with the old ways of thinking. The result is a massive gain in efficiency.

But making the transition to functional programming can be difficult. My own journey doing so a few years back is illustrative.

I decided to learn Haskell—and needed to do that on a business timeline. This was the most difficult learning experience of my 40-year career, in large part because there was no definitive source for helping developers make the transition to functional programming. Indeed, no one had written anything very comprehensive about functional programming in the prior three decades.

To reap the full benefits of pure functional programming languages, you can’t compromise. You need to use languages that were designed with these principles from the start.

I was left to pick up bits and pieces from here, there, and everywhere. And I can attest to the gross inefficiencies of that process. It took me three months of days, nights, and weekends living and breathing Haskell. But finally, I got to the point that I could write better code with it than with anything else.

When I decided that our company should switch to using functional languages, I didn’t want to put my developers through the same nightmare. So, I started building a curriculum for them to use, which became the basis for a book intended to help developers transition into functional programmers. In my book, I provide guidance for obtaining proficiency in a functional language called PureScript, which stole all the great aspects of Haskell and improved on many of its shortcomings. In addition, it’s able to operate in both the browser and in a back-end server, making it a great solution for many of today’s software demands.

While such learning resources can only help, for this transition to take place broadly, software-based businesses must invest more in their biggest asset: their developers. At my company, Panoramic Software, where I’m the chief technical officer, we’ve made this investment, and all new work is being done in either PureScript or Haskell.

We started down the road of adopting functional languages three years ago, beginning with another pure functional language called Elm because it is a simpler language. (Little did we know we would eventually outgrow it.) It took us about a year to start reaping the benefits. But since we got over the hump, it’s been wonderful. We have had no production runtime bugs, which were so common in what we were formerly using, JavaScript on the front end and Java on the back. This improvement allowed the team to spend far more time adding new features to the system. Now, we spend almost no time debugging production issues.

But there are still challenges when working with a language that relatively few others use—in particular, the lack of online help, documentation, and example code. And it’s hard to hire developers with experience in these languages. Because of that, my company uses recruiters who specialize in finding functional programmers. And when we hire someone with no background in functional programming, we put them through a training process for the first few months to bring them up to speed.

## Functional programming’s future

My company is small. It delivers software to governmental agencies to enable them to help veterans receive benefits from the U.S. Department of Veteran’s Affairs. It’s extremely rewarding work, but it’s not a lucrative field. With razor-slim margins, we must use every tool available to us to do more with fewer developers. And for that, functional programming is just the ticket.

It’s very common for unglamorous businesses like ours to have difficulty attracting developers. But we are now able to hire top-tier people because they want to work on a functional codebase. Being ahead of the curve on this trend, we can get talent that most companies our size could only dream of.

I anticipate that the adoption of pure functional languages will improve the quality and robustness of the whole software industry while greatly reducing time wasted on bugs that are simply impossible to generate with functional programming. It’s not magic, but sometimes it feels like that, and I’m reminded of how good I have it every time I’m forced to work with a non-functional codebase.

One sign that the software industry is preparing for a paradigm shift is that functional features are showing up in more and more mainstream languages. It will take much more work for the industry to make the transition fully, but the benefits of doing so are clear, and that is no doubt where things are headed.

This article appears in the December 2022 print issue as “A New Way to Squash Bugs.”