This 6-Million-Dollar AI Changes Accents as You Speak

Three international Stanford undergrads start company to “help the world understand”

4 min read
Picture of three young men sitting outside on a bench in front of bushes and greenery

In 2020, Stanford students Shawn Zhang, Maxim Serebryakov, and Andres Perez Soderi [left to right] founded the AI-powered accent-translation company Sanas.

Sanas

Stanford University prides itself on its international diversity, touting that today's undergraduates hail from 70 countries. So a friend-group that included a computer science major from China, an AI-focused management science and engineering (MSE) major from Russia, and a business-oriented MSE major from Venezuela isn't an anomaly. The friends did the normal things Stanford students do with their free time, like fountain hopping, cheering at football games, and hiking the trail around the Stanford Dish radio telescope.

And then came the pandemic.

"Stanford went virtual," Andres Perez Soderi recalls. (He's the member of the trio from Venezuela.) "And we scattered around the Bay Area, to San Francisco, Pleasanton, as well as Palo Alto, and we were keeping in touch online. School just isn't fulfilling when you aren't physically there, and we had a lot of time on our hands."

They also had an idea, sparked by a conversation with another friend, a computer science major who had gone back to his home in Guatemala, where he had gotten a job at a call center doing tech support in order to support his family.

“We knew from our own experience that forcing a different accent on yourself is uncomfortable. … We thought if we could allow software to translate the accent [instead], we could let people speak naturally.”
—Andres Perez Soderi, Sanas

"When he got the job," Soderi said, "we told him that he'd be the best tech support person they'd ever had, he's the smartest guy we've met and always had a smile on his face."

But the job didn't last—his customer satisfaction numbers were too low, because callers struggled to understand his accent and would lash out in frustration.

Given the three spoke English with vastly different accents, the problem hit home.

"We decided to help the world understand and be understood," Soderi said.

They dedicated their empty pandemic hours to building a solution.

"We did a lot of research around what people have done in the past. People have done voice conversion for deep fakes, and that technology is pretty advanced. But there's been little done in accent translation. So, say, if I used an existing system to make me sound like Batman, I would sound like a Chinese-accented Batman" says Shawn Zhang, the trio's member from China.

"We knew about accent-reduction therapy and being taught to emulate the way someone else speaks in order to connect with them. And we knew from our own experience that forcing a different accent on yourself is uncomfortable. I went to a British high school and tried to force a British accent; it was an experience that was hard to digest. We thought if we could allow software to translate the accent [instead], we could let people speak naturally," says Soderi.

"Our first approach was naïve," Zhang says. "We built a system that converted speech to text and then text to speech." That wasn't going to be particularly useful for real-time conversation, their ultimate goal. So they began thinking about how to structure data to use in training a neural network to convert accents directly, speech to speech. They reached out to professors at Stanford and experts in industry to advise them.

And they filed the paperwork to incorporate as a company—Sanas. (Incorporation is something else that is not an unusual step when Stanford undergrads start tinkering with anything.)

The name came from a hunt through random syllables, looking for something that sounded good and was available to use. Sanas jumped out because it is a palindrome—and it turned out to refer to whispers or sounds in some forms of ancient Latin. They assigned the CTO title to Zhang, CFO to Soderi, and CEO to Maxim Serebryakov.

That all happened in the first half of 2020, and things have continued to move quickly. Sanas now has a full-time engineering staff of 14, including the founders, and three more part-time developers, plus two employees working on the business side. All now work remotely, spread out internationally. The company completed a seed funding round of US $5.5 million in late May, a few months shy of Zhang's twenty-first birthday, bringing total investment to about $6 million.

Baris Akis, the president and co-founder of Human Capital, who led the seed round, stated at the time: "As an immigrant from Turkey, I've always felt that getting rid of the accent barrier was a critical next step for a more fair and prosperous world."

Today, Sanas has an algorithm that can shift English to and from American, Australian, British, Filipino, and Spanish accents. They developed it using a neural network, trained with recordings made, for the most part, by professional voice actors.

Says Zhang, "You aren't just doing audio signal processing, changing the pitch and tone. You have to change the phonetics. So we really needed parallel data sets, created by readers using the same source material, so the neural network could learn to map from one to the other, examining both to learn how to transform the pronunciation."

The algorithm runs locally on a CPU (not in the cloud), with 150 milliseconds of delay, at the speech quality of telephone audio, working alongside communications apps like Zoom, Skype, and WhatsApp. A typical Zoom delay is about 50 milliseconds, bringing the total delay to about 200 milliseconds. Soderi indicated that generally anything below 300-to-350 milliseconds is imperceptible in audio communications, so users don't notice a lag. And the algorithm is efficient in terms of CPU usage.

But, Zhang admits, there's plenty of room for improvement. "We are trying to make more clear, natural, and pleasant to hear; it's an ongoing process."

The team plans to add more accents within English, but also work with accents of other languages, including Spanish and French.

Their first customers will be among outsourcing companies, the kinds hired to provide customer service and other telephone support functions. Seven such firms are currently piloting the system.

"But that's just our first use case," says Zhang, "because it is a measurable and controlled environment. We don't see ourselves as a call center company, we want to go into healthcare, entertainment, education, and other spaces. We want to develop this as a tool that helps people with human-to-human interaction, without hurting their cultural identities."

The Conversation (0)

Can This DIY Rocket Program Send an Astronaut to Space?

Copenhagen Suborbitals is crowdfunding its crewed rocket

15 min read
Vertical
Five people stand in front of two tall rockets. Some of the people are wearing space suits and holding helmets, others are holding welding equipment.

Copenhagen Suborbitals volunteers are building a crewed rocket on nights and weekends. The team includes [from left] Mads Stenfatt, Martin Hedegaard Petersen, Jørgen Skyt, Carsten Olsen, and Anna Olsen.

Mads Stenfatt
Red

It was one of the prettiest sights I have ever seen: our homemade rocket floating down from the sky, slowed by a white-and-orange parachute that I had worked on during many nights at the dining room table. The 6.7-meter-tall Nexø II rocket was powered by a bipropellant engine designed and constructed by the Copenhagen Suborbitals team. The engine mixed ethanol and liquid oxygen together to produce a thrust of 5 kilonewtons, and the rocket soared to a height of 6,500 meters. Even more important, it came back down in one piece.

That successful mission in August 2018 was a huge step toward our goal of sending an amateur astronaut to the edge of space aboard one of our DIY rockets. We're now building the Spica rocket to fulfill that mission, and we hope to launch a crewed rocket about 10 years from now.

Copenhagen Suborbitals is the world's only crowdsourced crewed spaceflight program, funded to the tune of almost US $100,000 per year by hundreds of generous donors around the world. Our project is staffed by a motley crew of volunteers who have a wide variety of day jobs. We have plenty of engineers, as well as people like me, a pricing manager with a skydiving hobby. I'm also one of three candidates for the astronaut position.

Keep Reading ↓ Show less