Today's boom in AI is centered around a technique called deep learning, which is powered by artificial neural networks. Here's a graphical explanation of how these neural networks are structured and trained.


Diagram of an artificial neuron and its connections

Each neuron in an artificial neural network sums its inputs and applies an activation function to determine its output. This architecture was inspired by what goes on in the brain, where neurons transmit signals between one another via synapses.

A structure of lines connecting dots.


Here's the structure of a hypothetical feed-forward deep neural network ("deep" because it contains multiple hidden layers). This example shows a network that interprets images of hand-written digits and classifies them as one of the 10 possible numerals.

The input layer contains many neurons, each of which has an activation set to the gray-scale value of one pixel in the image. These input neurons are connected to neurons in the next layer, passing on their activation levels after they have been multiplied by a certain value, called a weight. Each neuron in the second layer sums its many inputs and applies an activation function to determine its output, which is fed forward in the same manner.


This kind of neural network is trained by calculating the difference between the actual output and the desired output. The mathematical optimization problem here has as many dimensions as there are adjustable parameters in the network—primarily the weights of the connections between neurons, which can be positive [blue lines] or negative [red lines].

Training the network is essentially finding a minimum of this multidimensional "loss" or "cost" function. It's done iteratively over many training runs, incrementally changing the network's state. In practice, that entails making many small adjustments to the network's weights based on the outputs that are computed for a random set of input examples, each time starting with the weights that control the output layer and moving backward through the network. (Only the connections to a single neuron in each layer are shown here, for simplicity.) This backpropagation process is repeated over many random sets of training examples until the loss function is minimized, and the network then provides the best results it can for any new input.


Two colums of dots and a column of numbers.


When presented with a handwritten "3" at the input, the output neurons of an untrained network will have random activations. The desire is for the output neuron associated with 3 to have high activation [dark shading] and other output neurons to have low activations [light shading]. So the activation of the neuron associated with 3, for example, must be increased [purple arrow].

A column of dots all connected to one by lines.


To do that, the weights of the connections from the neurons in the second hidden layer to the output neuron for the digit "3" should be made more positive [black arrows], with the size of the change being proportional to the activation of the connected
hidden neuron.

Column of doos connected to one dot with arrows going down.


A similar process is then performed for the neurons in the second hidden layer. For example, to make the network more accurate, the top neuron in this layer may need to have its activation reduced [green arrow]. The network can be pushed in that direction by adjusting the weights of its connections with the first hidden layer [black arrows].

Two vertical sets of dots connected to lines to a single dot.


The process is then repeated for the first hidden layer. For example, the first neuron in this layer may need to have its activation increased [orange arrow].

Special Report: The Great AI Reckoning

READ NEXT: How DeepMind Is Reinventing the Robot

Or see the full report for more articles on the future of AI.

The Conversation (1)
Steven Dixon 21 Nov, 2021

Very good articles. Enlightening.

Andrew Ng: Unbiggen AI

The AI pioneer says it’s time for smart-sized, “data-centric” solutions to big issues

10 min read
​Andrew Ng listens during the Power of Data: Sooner Than You Think global technology conference in Brooklyn, New York, on Wednesday, October 30, 2019.

Andrew Ng was involved in the rise of massive deep learning models trained on vast amounts of data, but now he’s preaching small-data solutions.

Cate Dingley/Bloomberg/Getty Images

Andrew Ng has serious street cred in artificial intelligence. He pioneered the use of graphics processing units (GPUs) to train deep learning models in the late 2000s with his students at Stanford University, cofounded Google Brain in 2011, and then served for three years as chief scientist for Baidu, where he helped build the Chinese tech giant’s AI group. So when he says he has identified the next big shift in artificial intelligence, people listen. And that’s what he told IEEE Spectrum in an exclusive Q&A.

Keep Reading ↓ Show less