Blind and low-vision (BLV) people often use sound to navigate, such as via echolocation or within various assistive technologies. Scientists at the University of Technology Sydney (UTS) and Aria Research, a company that makes bionic devices, decided to blend the two to develop a technique called “acoustic touch.” When used in combination with smart glasses, acoustic touch converts objects in front of the user into auditory icons.
Acoustic touch uses head movement—head position also being key in echolocation—to dictate what sound icons play to support the exploration of a surrounding environment. Howe Yuan Zhu, one of the paper’s authors, describes acoustic touch as a user interface based on sensory-motor coupling. In this case, sensory feedback is generated by the wearer’s head movement.
Imagine that there is a virtual cone extending out in front of one’s head, he says. Any object in the region of this cone will be represented with a unique audio signature. “The wearable glasses might still see more objects, but it will only relay to the user the objects within this narrower field of view.” Their paper was published in PLOS One last month. The research team also included two blind researchers.
The researchers tested the interface with 14 participants—7 BLV individuals and 7 sighted people who were blindfolded—wearing acoustic touch-enabled smart glasses. They had to identify objects on a table before them. They found that the BLV participants performed well in recognizing and reaching for objects without adding to their mental exertion.
“We were focused on understanding, is this learnable?”
—Howe Yuan Zhu, University of Technology Sydney
Wearables traditionally relay information using computer vision along with computer-generated speech. With acoustic touch, however, even though the glasses can detect everything before the wearer, the “head-scanning” movement creates a “middle layer.” This allows the user to decide what they want to explore first, and choose that.
“One of the key questions we weren’t sure about is how intuitive head movement was,” Zhu says. “We know in vision, it plays a key role in how we observe a room…but [with] audio, we know it plays a role, but not how much.” Their observations suggested that the head-scanning movement wasn’t something that required a significant amount of added effort. “Even though it was a bit more physical effort, the participants were still able to pick it up, and still found it somewhat intuitive,” Zhu says.
For their research, they also built a benchmark platform that used computer vision and object-recognition algorithms to recognize 120-odd objects. However, in the tests, they used only four objects—to understand whether users preferred using the interface to play icons for all the objects, or if it was better for them to use head movement to selectively explore. “We were focused on understanding, is this learnable? Can someone build mental associations between a similar sound to the object? And then we compared that against speech,” Zhu says. “We don’t want to necessarily dismiss speech-based interfaces [either].” For example, for identifying a specific person, or to describe more complex objects, it might be simpler to use speech.
He also clarifies that while their research is about the technique of acoustic touch rather than building smart glasses (which is something Aria Research does), they did make some technical observations that could be relevant in the future. For example, they noticed that the speed of head rotation could be quite important. “If [the user] sweeps too fast, they’re more likely to miss objects,” Zhu says, “because the computer vision is just not fast enough.”
One of the main limitations in the current paper was that the study was heavily controlled and in a closed, indoor environment. These are not conditions that can be achievable in the real world, Zhu says. Another technical challenge was accurate object recognition. “But that’s something that’s continually improving,” he says. “And even between the time of the study and now…object recognition is a lot better.”
Since the publication of the paper, he says, they have been continuing their research using more real-world environments, wider contexts, and more objects. This includes navigating a maze using sound icons to dictate a path or the way around an obstacle. Zhu notes that using computer speech is more likely to slow people down, as the person would need to stop to process it. “Whereas,” he adds, “if we played just the audio icon that pings for a lane, the person actually could just follow along, using it like a cue that you just walk along with.”