Phone Keyboard Exploits Leave 1 Billion Users Exposed

Popular Chinese-language keyboard apps reveal leaky security standards

4 min read

hands typing on a smartphone with a person wearing a bright yellow shirt

Digital Chinese-language keyboards that are vulnerable to spying and eavesdropping have been used by 1 billion smartphone users, according to a new report. The widespread threats these leaky systems reveal could also present a concerning new kind of exploit for cyberattacks, whether the device uses a Chinese-language keyboard, an English keyboard, or any other.

Last year, the University of Toronto’s Citizen Lab released a study of a proprietary Chinese keyboard system owned by the Shenzhen-based tech giant Tencent. Citizen Lab’s “Sogou Keyboard” report exposed the widespread range of attacks possible on the keyboard, which could leak a user’s key presses to outside eavesdroppers. Now, in the group’s new study, released last week, the same researchers have discovered that essentially all the world’s popular Chinese smartphone keyboards have suffered similar vulnerabilities.

“Whatever Chinese-language users of your app might have typed into it has been exposed for years.” —Jedidiah Crandall, Arizona State University

And while the specific bugs the two reports have uncovered have been fixed in most instances, the researchers’ findings—and in particular, their recommendations—point to substantially larger gaps in the systems that extend into software developed around the world, no matter the language.

“All of these keyboards were also using custom network protocols,” says Mona Wang, a computer science Ph.D. student at Princeton University and coauthor of the report. “Because I had studied these sort of custom network protocols before, then this immediately screamed to me that there was something really terrible going on.”

Jedidiah Crandall, an associate professor of computing and augmented intelligence at Arizona State University in Tempe, who was consulted in the report’s preparation but was not on the research team, says these vulnerabilities matter for nearly any coder or development team that releases their work to the world. “If you are a developer of a privacy-focused chat app or an app for tracking something health related, whatever Chinese language users of your app might have typed into it has been exposed for years,” he says.

The Chinese keyboard problem

Chinese, a language of tens of thousands of characters with some 4,000 or more in common use, represents a distinct challenge for keyboard input. A range of different keyboard systems have been developed in the digital era—sometimes called pinyin keyboards, named after a popular romanization system for standard Chinese. Ideally, these creative approaches to digital input enable a profoundly complex language to be straightforwardly phoneticized and transliterated via a compact, often QWERTY-style keyboard format.

“Even competent and well-resourced people get encryption wrong, because it’s really hard to do correctly.” —Mona Wang, Princeton University

Computational and AI smarts can help transform key presses into Chinese characters on the screen. But Chinese keyboards often involve many interchanges across the Internet between cloud servers and other assistive networked apps, just to make it possible for a Chinese-speaking person to be able to type the characters.

According to the report—and an FAQ the researchers released explaining the technical points in plain language—the Chinese keyboards studied all used character-prediction features, which in turn relied on cloud-computing resources. The researchers found that improperly secured communications between a device’s keyboard app and those external cloud servers meant that users’ keystrokes (and therefore their messages) could be accessed in transit.

Jeffrey Knockel, a senior research associate at Citizen Lab and the report coauthor, says cloud-based character prediction is a particularly attractive feature for Chinese-language keyboards, given the vast array of possible characters that any given QWERTY keystroke sequence might be attempting to represent. “If you’re typing in English or any language where there’s enough keys on a keyboard for all your letters, that’s already a much simpler task to design a keyboard around than an ideographic language where you might have over 10,000 characters,” he says.

keyboard with english and chinese charactersChinese-language keyboards are often “pinyin keyboards,” which allow for thousands of characters to be typed using a QWERTY-style approach.Zamoeux/Wikimedia

Sarah Scheffler, a postdoctoral associate at MIT, expressed concern also about other kinds of data vulnerabilities that the Citizen Lab report reveals—beyond keyboards and Chinese-language specific applications necessarily. “The vulnerabilities [identified by the report] are not at all specific to pinyin keyboards,” she says. “It applies to any application sending data over the Internet. Any app sending unencrypted—or badly encrypted—information would have similar issues.”

Wang says the chief problem the researchers uncovered concerns the fact that so many Chinese-keyboard protocols transmit data using inferior and sometimes custom-made encryption.

“These encryption protocols are probably developed by very, very competent and very well-resourced people,” Wang says. “But even competent and well-resourced people get encryption wrong, because it’s really hard to do correctly.”

Beyond the vulnerabilities exposed

Scheffler points to the two-decades-long testing, iteration, and development of the transport layer security (TLS) system underlying much of the Internet’s secure communications, including websites that use the Hypertext Transfer Protocol Secure (HTTPS) protocol. (The first version of TLS was specified and released in 1999.) “All these Chinese Internet companies who are rolling their own [cryptography] or using their own encryption algorithms are sort of missing out on all those 20 years of standard encryption development,” Wang says.

Crandall says the report may have also inadvertently highlighted assumptions about security protocols that may not always apply in every corner of the globe. “Protocols like TLS sometimes make assumptions that don’t suit the needs of developers in certain parts of the world,” he says. For instance, he adds, custom-made, non-TLS security systems may be more attractive “where the network delay is high or where people may spend large amounts of time in areas where the network is not accessible.”

Scheffler says the Chinese-language keyboard problem could even represent a kind of canary in the coal mine for a range of computer, smartphone, and software systems. Because of their reliance on extensive Internet communications, such systems—while perhaps overlooked or relegated to the background by developers—also still represent potential cybersecurity attack surfaces.

“Anecdotally, a lot of these security failures arise from groups that don’t think they’re doing anything that requires security or don’t have much security expertise,” Scheffler says.

Scheffler identifies “Internet-based predictive-text keyboards in any language, and maybe some of the Internet-based AI features that have crept into apps over the years” as possible places concealing cybersecurity vulnerabilities similar to those that the Citizen Lab team discovered in Chinese-language keyboards. This category could include voice recognition, speech-to-text, text-to-speech, and generative AI tools, she adds.

“Security and privacy isn’t many people’s first thought when they’re building their cool image-editing application,” says Scheffler. ”Maybe it shouldn’t be the first thought, but it should definitely be a thought by the time the application makes it to users.”

This story was updated 29 April 2024.

The Conversation (0)