Yann LeCun: AI Doesn​’t Need Our Supervision

Meta’s AI chief says self-supervised learning can build the metaverse and maybe even human-level AI

6 min read
artificial intelligence brain connecting neural networks

When Yann LeCun gives talks, he’s apt to include a slide showing a famous painting of a scene from the French Revolution. Superimposed over a battle scene are these words: "THE REVOLUTION WILL NOT BE SUPERVISED."

LeCun, VP and chief AI scientist of Meta (formerly Facebook), believes that the next AI revolution will come about when AI systems no longer require supervised learning. No longer will they rely on carefully labeled data sets that provide ground truth in order for them to gain an understanding of the world and perform their assigned tasks. AI systems need to be able to learn from the world with minimal help from humans, LeCun says. In an email Q&A with IEEE Spectrum, he talked about how self-supervised learning can create more robust AI systems imbued with common sense.

He’ll be exploring this theme tomorrow at a virtual Meta AI event titled Inside the Lab: Building for the Metaverse With AI. That event will feature talks by Mark Zuckerberg, a handful of Meta’s AI scientists, and a discussion between LeCun and Yoshua Bengio about the path to human-level AI.

Yann LeCunYann LeCunCourtesy Yann LeCun

You’ve said that the limitations of supervised learning are sometimes mistakenly seen as intrinsic limitations of deep learning. Which of these limitations can be overcome with self-supervised learning?

Yann LeCun: Supervised learning works well on relatively well-circumscribed domains for which you can collect large amounts of labeled data, and for which the type of inputs seen during deployment are not too different from the ones used during training. It’s difficult to collect large amounts of labeled data that are not biased in some way. I’m not necessarily talking about societal bias, but about correlations in the data that the system should not be using. A famous example of that is when you train a system to recognize cows and all the examples are cows on grassy fields. The system will use the grass as a contextual cue for the presence of a cow. But if you now show a cow on a beach, it may have trouble recognizing it as a cow.

Self-supervised learning (SSL) allows us to train a system to learn good representation of the inputs in a task-independent way. Because SSL training uses unlabeled data, we can use very large training sets, and get the system to learn more robust and more complete representations of the inputs. It then takes a small amount of labeled data to get good performance on any supervised task. This greatly reduces the necessary amount of labeled data [endemic to] pure supervised learning, and makes the system more robust, and more able to handle inputs that are different from the labeled training samples. It also sometimes reduces the sensitivity of the system to bias in the data—an improvement about which we’ll share more of our insights in research to be made public in the coming weeks.

What’s happening now in practical AI systems is that we are moving toward larger architectures that are pretrained with SSL on large amounts of unlabeled data. These can be used for a wide variety of tasks. For example, Meta AI now has language-translation systems that can handle a couple hundred languages. It’s a single neural net! We also have multilingual speech-recognition systems. These systems can deal with languages for which we have very little data, let alone annotated data.

Other leading figures have said that the way forward for AI is improving supervised learning with better data labeling. Andrew Ng recently talked to me about data-centric AI, and Nvidia’s Rev Lebaredian talked to me about synthetic data that comes with all the labels. Is there division in the field about the path forward?

LeCun: I don’t think there is a philosophical division. SSL pretraining is very much standard practice in NLP [natural language processsing]. It has shown excellent performance improvements in speech recognition, and it’s starting to become increasingly useful in vision. Yet, there are still many unexplored applications of “classical” supervised learning, such that one should certainly use synthetic data with supervised learning whenever possible. That said, Nvidia is actively working on SSL.

Back in the mid-2000s Geoff Hinton, Yoshua Bengio, and I were convinced that the only way we would be able to train very large and very deep neural nets was through self-supervised (or unsupervised) learning.This is when Andrew Ng started being interested in deep learning. His work at the time also focused on methods that we would now call self-supervised.

How could self-supervised learning lead to AI systems with common sense? How far can common sense take us toward human-level intelligence?

LeCun: I think significant progress in AI will come once we figure out how to get machines to learn how the world works like humans and animals do: mostly by watching it, and a bit by acting in it. We understand how the world works because we have learned an internal model of the world that allows us to fill in missing information, predict what’s going to happen, and predict the effects of our actions. Our world model enables us to perceive, interpret, reason, plan ahead, and act. How can machines learn world models?

This comes down to two questions: What learning paradigm should we use to train world models? And what architecture should world models use? To the first question, my answer is SSL. An instance of that would be to get a machine to watch a video, stop the video, and get the machine to learn a representation of what’s going to happen next in the video. In doing so, the machine may learn enormous amounts of background knowledge about how the world works, perhaps similarly to how baby humans and animals learn in the first weeks and months of life.

To the second question, my answer is a new type of deep macro-architecture that I call Hierarchical Joint Embedding Predictive Architecture (H-JEPA). It would be a bit too long to explain here in detail, but let’s just say that instead of predicting future frames of a video clip, a JEPA learns abstract representations of the video clip and the future of the clip so that the latter is easily predictable based on its understanding of the former. This can be made to work using some of the latest developments in non-contrastive SSL methods, particularly a method that my colleagues and I recently proposed called VICReg (Variance, Invariance, Covariance Regularization).

A few weeks ago, you responded to a tweet from OpenAI’s Ilya Sutskever in which he speculated that today’s large neural networks may be slightly conscious. Your response was a resounding “Nope.” In your opinion, what would it take to build a neural network that qualifies as conscious? What would that system look like?

LeCun: First of all, consciousness is a very ill-defined concept. Some philosophers, neuroscientists, and cognitive scientists think it’s a mere illusion, and I’m pretty close to that opinion.

But I have a speculation about what causes the illusion of consciousness. My hypothesis is that we have a single world model “engine” in our prefrontal cortex. That world model is configurable to the situation at hand. We are at the helm of a sailboat; our world model simulates the flow of air and water around our boat. We build a wooden table; our world model imagines the result of cutting pieces of wood and assembling them, etc. There needs to be a module in our brains, that I call the configurator, that sets goals and subgoals for us, configures our world model to simulate the situation at hand, and primes our perceptual system to extract the relevant information and discard the rest. The existence of an overseeing configurator might be what gives us the illusion of consciousness. But here is the funny thing: We need this configurator because we only have a single world model engine. If our brains were large enough to contain many world models, we wouldn’t need consciousness. So, in that sense, consciousness is an effect of the limitation of our brain!

What role will self-supervised learning play in building the metaverse?

LeCun: There are many specific applications of deep learning for the metaverse, some of which are things like motion tracking for VR goggles and AR glasses, capturing and resynthesizing body motion and facial expressions, etc.

There are large opportunities for new AI-powered creative tools that will allow everyone to create new things in the metaverse, and in the real world too. But there is also an “AI-complete” application for the metaverse: virtual AI assistants. We should have virtual AI assistants that can help us in our daily lives, answer any question we have, and help us deal with the deluge of information that bombards us every day. For that, we need our AI systems to possess some understanding of how the world works (physical or virtual), some ability to reason and plan, and some level of common sense. In short, we need to figure out how to build autonomous AI systems that can learn like humans do. This will take time. But Meta is playing a long game here.

The Conversation (2)
William Adams01 Mar, 2022

one has to wonder if what it does on its own its what we need or want done

magic ??

Vaibhav Sunder02 Mar, 2022

Top to bottom approach. If the axiom doesn't disperse correctly or trickle down well then it fails. On a very ideative thought, top heavy is natural law not a theos law. Instead an axiom on top, that "creates objectively" there is a possibility to create a middle of order corrector. Then, examples such as Maslows triangle or Sierpinski can be applied. Bottom observed through Asian tech :)

The Inner Beauty of Basic Electronics

Open Circuits showcases the surprising complexity of passive components

5 min read
A photo of a high-stability film resistor with the letters "MIS" in yellow.
All photos by Eric Schlaepfer & Windell H. Oskay

Eric Schlaepfer was trying to fix a broken piece of test equipment when he came across the cause of the problem—a troubled tantalum capacitor. The component had somehow shorted out, and he wanted to know why. So he polished it down for a look inside. He never found the source of the short, but he and his collaborator, Windell H. Oskay, discovered something even better: a breathtaking hidden world inside electronics. What followed were hours and hours of polishing, cleaning, and photography that resulted in Open Circuits: The Inner Beauty of Electronic Components (No Starch Press, 2022), an excerpt of which follows. As the authors write, everything about these components is deliberately designed to meet specific technical needs, but that design leads to “accidental beauty: the emergent aesthetics of things you were never expected to see.”

From a book that spans the wide world of electronics, what we at IEEE Spectrum found surprisingly compelling were the insides of things we don’t spend much time thinking about, passive components. Transistors, LEDs, and other semiconductors may be where the action is, but the simple physics of resistors, capacitors, and inductors have their own sort of splendor.

High-Stability Film Resistor

A photo of a high-stability film resistor with the letters "MIS" in yellow.

All photos by Eric Schlaepfer & Windell H. Oskay

This high-stability film resistor, about 4 millimeters in diameter, is made in much the same way as its inexpensive carbon-film cousin, but with exacting precision. A ceramic rod is coated with a fine layer of resistive film (thin metal, metal oxide, or carbon) and then a perfectly uniform helical groove is machined into the film.

Instead of coating the resistor with an epoxy, it’s hermetically sealed in a lustrous little glass envelope. This makes the resistor more robust, ideal for specialized cases such as precision reference instrumentation, where long-term stability of the resistor is critical. The glass envelope provides better isolation against moisture and other environmental changes than standard coatings like epoxy.

15-Turn Trimmer Potentiometer

A photo of a blue chip
A photo of a blue chip on a circuit board.

It takes 15 rotations of an adjustment screw to move a 15-turn trimmer potentiometer from one end of its resistive range to the other. Circuits that need to be adjusted with fine resolution control use this type of trimmer pot instead of the single-turn variety.

The resistive element in this trimmer is a strip of cermet—a composite of ceramic and metal—silk-screened on a white ceramic substrate. Screen-printed metal links each end of the strip to the connecting wires. It’s a flattened, linear version of the horseshoe-shaped resistive element in single-turn trimmers.

Turning the adjustment screw moves a plastic slider along a track. The wiper is a spring finger, a spring-loaded metal contact, attached to the slider. It makes contact between a metal strip and the selected point on the strip of resistive film.

Ceramic Disc Capacitor

A cutaway of a Ceramic Disc Capacitor
A photo of a Ceramic Disc Capacitor

Capacitors are fundamental electronic components that store energy in the form of static electricity. They’re used in countless ways, including for bulk energy storage, to smooth out electronic signals, and as computer memory cells. The simplest capacitor consists of two parallel metal plates with a gap between them, but capacitors can take many forms so long as there are two conductive surfaces, called electrodes, separated by an insulator.

A ceramic disc capacitor is a low-cost capacitor that is frequently found in appliances and toys. Its insulator is a ceramic disc, and its two parallel plates are extremely thin metal coatings that are evaporated or sputtered onto the disc’s outer surfaces. Connecting wires are attached using solder, and the whole assembly is dipped into a porous coating material that dries hard and protects the capacitor from damage.

Film Capacitor

An image of a cut away of a capacitor
A photo of a green capacitor.

Film capacitors are frequently found in high-quality audio equipment, such as headphone amplifiers, record players, graphic equalizers, and radio tuners. Their key feature is that the dielectric material is a plastic film, such as polyester or polypropylene.

The metal electrodes of this film capacitor are vacuum-deposited on the surfaces of long strips of plastic film. After the leads are attached, the films are rolled up and dipped into an epoxy that binds the assembly together. Then the completed assembly is dipped in a tough outer coating and marked with its value.

Other types of film capacitors are made by stacking flat layers of metallized plastic film, rather than rolling up layers of film.

Dipped Tantalum Capacitor

A photo of a cutaway of a Dipped Tantalum Capacitor

At the core of this capacitor is a porous pellet of tantalum metal. The pellet is made from tantalum powder and sintered, or compressed at a high temperature, into a dense, spongelike solid.

Just like a kitchen sponge, the resulting pellet has a high surface area per unit volume. The pellet is then anodized, creating an insulating oxide layer with an equally high surface area. This process packs a lot of capacitance into a compact device, using spongelike geometry rather than the stacked or rolled layers that most other capacitors use.

The device’s positive terminal, or anode, is connected directly to the tantalum metal. The negative terminal, or cathode, is formed by a thin layer of conductive manganese dioxide coating the pellet.

Axial Inductor

An image of a cutaway of a Axial Inductor
A photo of a collection of cut wires

Inductors are fundamental electronic components that store energy in the form of a magnetic field. They’re used, for example, in some types of power supplies to convert between voltages by alternately storing and releasing energy. This energy-efficient design helps maximize the battery life of cellphones and other portable electronics.

Inductors typically consist of a coil of insulated wire wrapped around a core of magnetic material like iron or ferrite, a ceramic filled with iron oxide. Current flowing around the core produces a magnetic field that acts as a sort of flywheel for current, smoothing out changes in the current as it flows through the inductor.

This axial inductor has a number of turns of varnished copper wire wrapped around a ferrite form and soldered to copper leads on its two ends. It has several layers of protection: a clear varnish over the windings, a light-green coating around the solder joints, and a striking green outer coating to protect the whole component and provide a surface for the colorful stripes that indicate its inductance value.

Power Supply Transformer

A photo of a collection of cut wires
A photo of a yellow element on a circuit board.

This transformer has multiple sets of windings and is used in a power supply to create multiple output AC voltages from a single AC input such as a wall outlet.

The small wires nearer the center are “high impedance” turns of magnet wire. These windings carry a higher voltage but a lower current. They’re protected by several layers of tape, a copper-foil electrostatic shield, and more tape.

The outer “low impedance” windings are made with thicker insulated wire and fewer turns. They handle a lower voltage but a higher current.

All of the windings are wrapped around a black plastic bobbin. Two pieces of ferrite ceramic are bonded together to form the magnetic core at the heart of the transformer.

This article appears in the February 2023 print issue.