Baidu’s AI Produces Short Videos in One Click

Near the end of 2019, when Baidu's AI, named ERNIE, beat Google's AI, named BERT, in its understanding of human language, a team at Baidu Research was already prepping ERNIE for a new tool. They envisioned a program that could analyze the text from a URL, synthesize a pithy narrative, and align it with machine-selected clips to churn out a 2-minute video with voice-over—all in less time than it would take to play a song.

Last month, a prototype version of such a program, called VidPress, debuted. The AI’s goal is not only to save human video editors' time but also to outperform them in quality.

In a test performed by the team within Baidu’s video platform, Haokan (link in Chinese), it took up to 9 minutes for VidPress to generate a video from scratch. When it comes to viewers’ video completion rate, a rough proxy for quality, viewers stayed with 65 percent of VidPress’s videos from the beginning to the end, whereas the rate for videos produced by human editors was 50 percent, says Xi Chen, a research engineer at Baidu.

Chen and his team of engineers at Baidu Research in the San Francisco Bay Area are not alone in testing AI for the booming short-video market. For example, GliaStudio, a Taiwan-based startup, has been creating video summaries of articles since 2015. But few startups have the resources and advantages Baidu has, Chen says.

With access to ERNIE and other Baidu proprietary technologies, including computer-vision programs, the VidPress team is “standing on giant's shoulder,” says Julia Li, director of Baidu Research USA.

To understand how VidPress works, Li explains, consider someone feeding a web page about the death of NBA basketball star Kobe Bryant, who was killed in a helicopter accident in January 2020, to the tool.

On one level of a parallel process, VidPress generates a lightweight version of the story, making sure that important sentences, which can be crafted by the AI or pulled directly from the web page, appear early in the script. Such sentences might include keywords like "helicopter" and "Kobe." During this step, the program also ensures that the logical structure of the summary is coherent and clear, and it can also fix human writers' bad habits, such as using vague pronouns, Li says.

After having text-to-speech services convert the script into a synthesized speech, VidPress sets "anchors" in this audio track to suggest time points where viewers are most interested in seeing new visuals. Chen and colleagues wrote a decision-tree model to choose these anchor points based on how well the content around them correlates with the theme of the story. The system also pays attention to phrases people are normally curious about, such as the names of brands and locations.

On the other parallel level, VidPress finds and scores relevant media captured from the Internet, starting from the given web page and through other relevant pages on Baidu's newsfeed network Baijiahao. The algorithms are written in such a way that only higher-ranking videos or images are aligned to those anchor points in the timeline. Chen says the team is working on accessing general web pages, and developing capabilities to use commercial clients' copyrighted databases.

Baidu's computer-vision technologies are also involved. So, after a crash-site photo in the video about Bryant, Li says, VidPress can add post-match interview footage of Bryant and not of another NBA player when recapping Bryant’s career.

This ability to mine materials in multiple formats including text and visuals from a vast database of websites, as well as the ability to create a timeline dotted with anchor points to hook people’s attention, allows VidPress to improve viewers' satisfaction, Li explains. That’s probably why VidPress had a better video completion rate than human editors, she says.

An observer of China’s technology industry, Hefei Zhang, notes in an online post (link in Chinese) that the value of VidPress lies in how it uses algorithms to reduce the time costs of footage compilation, material organization, and editing. Like most AI products in the market, although VidPress saves time, it can’t yet replace or outperform humans in creativity, he says.

As Baidu’s Li points out, becoming more creative and even providing customized video content based on viewers’ tastes is a direction they’d like to take VidPress, but she acknowledges it’s not there yet.

From Your Site Articles

Watch out, Wedding Videographers, AI is Coming for You - IEEE Spectrum ›

software audio ai baidu automation machine learning

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Baidu’s AI Produces Short Videos in One Click

VidPress combines Baidu’s natural-language processing and computer-vision technologies

Why One Man Spent 12 Years Fighting Robocalls

Tiny Biosensor Unlocks the Secrets of Sweat

Startups Say India Is Ideal for Testing Self-Driving Cars

Related Stories

15 Graphs That Explain the State of AI in 2024

Deep Learning Picks Apart DNA Data-Copying Puzzles

Machine Learning Turns Up COVID Surprise

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Enjoy more free content and benefits by creating an account

Saving articles to read later requires an IEEE Spectrum account

The Institute content is only available for members

Downloading full PDF issues is exclusive for IEEE Members

Downloading this e-book is exclusive for IEEE Members

Access to Spectrum 's Digital Edition is exclusive for IEEE Members

Following topics is a feature exclusive for IEEE Members

Adding your response to an article requires an IEEE Spectrum account

Create an account to access more content and features on IEEE Spectrum , including the ability to save articles to read later, download Spectrum Collections, and participate in conversations with readers and editors. For more exclusive content and features, consider Joining IEEE .

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to all of Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more →

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to this e-book plus all of IEEE Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more →

Access Thousands of Articles — Completely Free

Create an account and get exclusive content and features: Save articles, download collections, and talk to tech insiders — all free! For full access and benefits, join IEEE as a paying member.

Baidu’s AI Produces Short Videos in One Click

VidPress combines Baidu’s natural-language processing and computer-vision technologies

Why One Man Spent 12 Years Fighting Robocalls

Tiny Biosensor Unlocks the Secrets of Sweat

Startups Say India Is Ideal for Testing Self-Driving Cars

Related Stories

15 Graphs That Explain the State of AI in 2024

Deep Learning Picks Apart DNA Data-Copying Puzzles

Machine Learning Turns Up COVID Surprise