This 6-Million-Dollar AI Changes Accents as You Speak

Three international Stanford undergrads start company to “help the world understand”

4 min read
Picture of three young men sitting outside on a bench in front of bushes and greenery

In 2020, Stanford students Shawn Zhang, Maxim Serebryakov, and Andres Perez Soderi [left to right] founded the AI-powered accent-translation company Sanas.

Sanas

Stanford University prides itself on its international diversity, touting that today's undergraduates hail from 70 countries. So a friend-group that included a computer science major from China, an AI-focused management science and engineering (MSE) major from Russia, and a business-oriented MSE major from Venezuela isn't an anomaly. The friends did the normal things Stanford students do with their free time, like fountain hopping, cheering at football games, and hiking the trail around the Stanford Dish radio telescope.

And then came the pandemic.

"Stanford went virtual," Andres Perez Soderi recalls. (He's the member of the trio from Venezuela.) "And we scattered around the Bay Area, to San Francisco, Pleasanton, as well as Palo Alto, and we were keeping in touch online. School just isn't fulfilling when you aren't physically there, and we had a lot of time on our hands."

They also had an idea, sparked by a conversation with another friend, a computer science major who had gone back to his home in Guatemala, where he had gotten a job at a call center doing tech support in order to support his family.

“We knew from our own experience that forcing a different accent on yourself is uncomfortable. … We thought if we could allow software to translate the accent [instead], we could let people speak naturally.”
—Andres Perez Soderi, Sanas

"When he got the job," Soderi said, "we told him that he'd be the best tech support person they'd ever had, he's the smartest guy we've met and always had a smile on his face."

But the job didn't last—his customer satisfaction numbers were too low, because callers struggled to understand his accent and would lash out in frustration.

Given the three spoke English with vastly different accents, the problem hit home.

"We decided to help the world understand and be understood," Soderi said.

They dedicated their empty pandemic hours to building a solution.

"We did a lot of research around what people have done in the past. People have done voice conversion for deep fakes, and that technology is pretty advanced. But there's been little done in accent translation. So, say, if I used an existing system to make me sound like Batman, I would sound like a Chinese-accented Batman" says Shawn Zhang, the trio's member from China.

"We knew about accent-reduction therapy and being taught to emulate the way someone else speaks in order to connect with them. And we knew from our own experience that forcing a different accent on yourself is uncomfortable. I went to a British high school and tried to force a British accent; it was an experience that was hard to digest. We thought if we could allow software to translate the accent [instead], we could let people speak naturally," says Soderi.

"Our first approach was naïve," Zhang says. "We built a system that converted speech to text and then text to speech." That wasn't going to be particularly useful for real-time conversation, their ultimate goal. So they began thinking about how to structure data to use in training a neural network to convert accents directly, speech to speech. They reached out to professors at Stanford and experts in industry to advise them.

And they filed the paperwork to incorporate as a company—Sanas. (Incorporation is something else that is not an unusual step when Stanford undergrads start tinkering with anything.)

The name came from a hunt through random syllables, looking for something that sounded good and was available to use. Sanas jumped out because it is a palindrome—and it turned out to refer to whispers or sounds in some forms of ancient Latin. They assigned the CTO title to Zhang, CFO to Soderi, and CEO to Maxim Serebryakov.

That all happened in the first half of 2020, and things have continued to move quickly. Sanas now has a full-time engineering staff of 14, including the founders, and three more part-time developers, plus two employees working on the business side. All now work remotely, spread out internationally. The company completed a seed funding round of US $5.5 million in late May, a few months shy of Zhang's twenty-first birthday, bringing total investment to about $6 million.

Baris Akis, the president and co-founder of Human Capital, who led the seed round, stated at the time: "As an immigrant from Turkey, I've always felt that getting rid of the accent barrier was a critical next step for a more fair and prosperous world."

Today, Sanas has an algorithm that can shift English to and from American, Australian, British, Filipino, and Spanish accents. They developed it using a neural network, trained with recordings made, for the most part, by professional voice actors.

Says Zhang, "You aren't just doing audio signal processing, changing the pitch and tone. You have to change the phonetics. So we really needed parallel data sets, created by readers using the same source material, so the neural network could learn to map from one to the other, examining both to learn how to transform the pronunciation."

The algorithm runs locally on a CPU (not in the cloud), with 150 milliseconds of delay, at the speech quality of telephone audio, working alongside communications apps like Zoom, Skype, and WhatsApp. A typical Zoom delay is about 50 milliseconds, bringing the total delay to about 200 milliseconds. Soderi indicated that generally anything below 300-to-350 milliseconds is imperceptible in audio communications, so users don't notice a lag. And the algorithm is efficient in terms of CPU usage.

But, Zhang admits, there's plenty of room for improvement. "We are trying to make more clear, natural, and pleasant to hear; it's an ongoing process."

The team plans to add more accents within English, but also work with accents of other languages, including Spanish and French.

Their first customers will be among outsourcing companies, the kinds hired to provide customer service and other telephone support functions. Seven such firms are currently piloting the system.

"But that's just our first use case," says Zhang, "because it is a measurable and controlled environment. We don't see ourselves as a call center company, we want to go into healthcare, entertainment, education, and other spaces. We want to develop this as a tool that helps people with human-to-human interaction, without hurting their cultural identities."

The Conversation (0)

The Inner Beauty of Basic Electronics

Open Circuits showcases the surprising complexity of passive components

5 min read
Vertical
A photo of a high-stability film resistor with the letters "MIS" in yellow.
All photos by Eric Schlaepfer & Windell H. Oskay
Blue

Eric Schlaepfer was trying to fix a broken piece of test equipment when he came across the cause of the problem—a troubled tantalum capacitor. The component had somehow shorted out, and he wanted to know why. So he polished it down for a look inside. He never found the source of the short, but he and his collaborator, Windell H. Oskay, discovered something even better: a breathtaking hidden world inside electronics. What followed were hours and hours of polishing, cleaning, and photography that resulted in Open Circuits: The Inner Beauty of Electronic Components (No Starch Press, 2022), an excerpt of which follows. As the authors write, everything about these components is deliberately designed to meet specific technical needs, but that design leads to “accidental beauty: the emergent aesthetics of things you were never expected to see.”

From a book that spans the wide world of electronics, what we at IEEE Spectrum found surprisingly compelling were the insides of things we don’t spend much time thinking about, passive components. Transistors, LEDs, and other semiconductors may be where the action is, but the simple physics of resistors, capacitors, and inductors have their own sort of splendor.

High-Stability Film Resistor

A photo of a high-stability film resistor with the letters "MIS" in yellow.

All photos by Eric Schlaepfer & Windell H. Oskay

This high-stability film resistor, about 4 millimeters in diameter, is made in much the same way as its inexpensive carbon-film cousin, but with exacting precision. A ceramic rod is coated with a fine layer of resistive film (thin metal, metal oxide, or carbon) and then a perfectly uniform helical groove is machined into the film.

Instead of coating the resistor with an epoxy, it’s hermetically sealed in a lustrous little glass envelope. This makes the resistor more robust, ideal for specialized cases such as precision reference instrumentation, where long-term stability of the resistor is critical. The glass envelope provides better isolation against moisture and other environmental changes than standard coatings like epoxy.

15-Turn Trimmer Potentiometer

A photo of a blue chip
A photo of a blue chip on a circuit board.

It takes 15 rotations of an adjustment screw to move a 15-turn trimmer potentiometer from one end of its resistive range to the other. Circuits that need to be adjusted with fine resolution control use this type of trimmer pot instead of the single-turn variety.

The resistive element in this trimmer is a strip of cermet—a composite of ceramic and metal—silk-screened on a white ceramic substrate. Screen-printed metal links each end of the strip to the connecting wires. It’s a flattened, linear version of the horseshoe-shaped resistive element in single-turn trimmers.

Turning the adjustment screw moves a plastic slider along a track. The wiper is a spring finger, a spring-loaded metal contact, attached to the slider. It makes contact between a metal strip and the selected point on the strip of resistive film.

Ceramic Disc Capacitor

A cutaway of a Ceramic Disc Capacitor
A photo of a Ceramic Disc Capacitor

Capacitors are fundamental electronic components that store energy in the form of static electricity. They’re used in countless ways, including for bulk energy storage, to smooth out electronic signals, and as computer memory cells. The simplest capacitor consists of two parallel metal plates with a gap between them, but capacitors can take many forms so long as there are two conductive surfaces, called electrodes, separated by an insulator.

A ceramic disc capacitor is a low-cost capacitor that is frequently found in appliances and toys. Its insulator is a ceramic disc, and its two parallel plates are extremely thin metal coatings that are evaporated or sputtered onto the disc’s outer surfaces. Connecting wires are attached using solder, and the whole assembly is dipped into a porous coating material that dries hard and protects the capacitor from damage.

Film Capacitor

An image of a cut away of a capacitor
A photo of a green capacitor.

Film capacitors are frequently found in high-quality audio equipment, such as headphone amplifiers, record players, graphic equalizers, and radio tuners. Their key feature is that the dielectric material is a plastic film, such as polyester or polypropylene.

The metal electrodes of this film capacitor are vacuum-deposited on the surfaces of long strips of plastic film. After the leads are attached, the films are rolled up and dipped into an epoxy that binds the assembly together. Then the completed assembly is dipped in a tough outer coating and marked with its value.

Other types of film capacitors are made by stacking flat layers of metallized plastic film, rather than rolling up layers of film.

Dipped Tantalum Capacitor

A photo of a cutaway of a Dipped Tantalum Capacitor

At the core of this capacitor is a porous pellet of tantalum metal. The pellet is made from tantalum powder and sintered, or compressed at a high temperature, into a dense, spongelike solid.

Just like a kitchen sponge, the resulting pellet has a high surface area per unit volume. The pellet is then anodized, creating an insulating oxide layer with an equally high surface area. This process packs a lot of capacitance into a compact device, using spongelike geometry rather than the stacked or rolled layers that most other capacitors use.

The device’s positive terminal, or anode, is connected directly to the tantalum metal. The negative terminal, or cathode, is formed by a thin layer of conductive manganese dioxide coating the pellet.

Axial Inductor

An image of a cutaway of a Axial Inductor
A photo of a collection of cut wires

Inductors are fundamental electronic components that store energy in the form of a magnetic field. They’re used, for example, in some types of power supplies to convert between voltages by alternately storing and releasing energy. This energy-efficient design helps maximize the battery life of cellphones and other portable electronics.

Inductors typically consist of a coil of insulated wire wrapped around a core of magnetic material like iron or ferrite, a ceramic filled with iron oxide. Current flowing around the core produces a magnetic field that acts as a sort of flywheel for current, smoothing out changes in the current as it flows through the inductor.

This axial inductor has a number of turns of varnished copper wire wrapped around a ferrite form and soldered to copper leads on its two ends. It has several layers of protection: a clear varnish over the windings, a light-green coating around the solder joints, and a striking green outer coating to protect the whole component and provide a surface for the colorful stripes that indicate its inductance value.

Power Supply Transformer

A photo of a collection of cut wires
A photo of a yellow element on a circuit board.

This transformer has multiple sets of windings and is used in a power supply to create multiple output AC voltages from a single AC input such as a wall outlet.

The small wires nearer the center are “high impedance” turns of magnet wire. These windings carry a higher voltage but a lower current. They’re protected by several layers of tape, a copper-foil electrostatic shield, and more tape.

The outer “low impedance” windings are made with thicker insulated wire and fewer turns. They handle a lower voltage but a higher current.

All of the windings are wrapped around a black plastic bobbin. Two pieces of ferrite ceramic are bonded together to form the magnetic core at the heart of the transformer.

This article appears in the February 2023 print issue.

{"imageShortcodeIds":[]}