Tech Talk iconTech Talk

The Japanese company believes it has created speech separation technology good enough to solve the cocktail party problem

Mitsubishi Electric’s AI Can Follow and Separate Simultaneous Speech

The cocktail party problem refers to the challenge of following a single person’s speech in a room full of surrounding chatter and noise. With a little concentration, humans can focus in on what a particular person is saying. But when we want technology to separate the speech of a targeted person from the simultaneous conversations of others—as we do with hands-free telephony when a caller is in a car with kids in the back seat—the results leave much to be desired.

Until now, that is, says Mitsubishi Electric. The company demonstrated its speech separation technology at its annual R&D Open House in Tokyo on 24 May. In one type of demonstration, two people spoke a sentence in different languages simultaneously into a single microphone. The speech separation technology separated the two sentences in real time (about 3 seconds), and then reconstructed and played them back consecutively with impressive accuracy. However, the demonstration took place in a closed room and required silence from all those watching. 

A second demonstration used a simulated mix of three speakers. Unsurprisingly, the result was noticeably less accurate.

Mitsubishi claims up to 90-percent and 80-percent accuracy levels respectively for the two scenarios under ideal conditions of low ambient noise and speakers talking at about the same volume—the best ever, the company believes. This compares well to conventional technology, which has an accuracy of only around 50 percent for two speakers using a single microphone, says the company.

The technology uses Mitsubishi’s Deep Clustering, a proprietary deep-learning method based on artificial intelligence. 

The system has learned how to examine and separate mixed speech data.  A deep network encodes the speech signals or elements based on each speaker’s tone, pitch, intonation, etc. The encoded signals are optimized so that different components belonging to the same speaker have similar encodings, while those belonging to another speaker have dissimilar encodings. A clustering algorithm processes the encodings into groups depending on their similarities. Each person’s speech is then reconstructed by synthesizing the separated speech components.

“Unlike separating a speaker from background noise, separating a speaker’s speech from another speaker talking is most difficult, because they have similar characteristics,” says Anthony Vetro, deputy director at Mitsubishi Electric Research Laboratories in Cambridge, Mass. “You can do it to some degree by using more elaborate set-ups of two or more mics to localize the speakers, but it is very difficult with just one mic.”

The beauty of this system, he adds, is that it is not speaker dependent, so no speaker-specific training is involved. Similarly, it is not language dependent. 

Yohei Okato, senior manager of Mitsubishi Electric’s Natural Language Processing Technology Group in Kamakura, near Tokyo, says the company will use the technology to improve the quality of voice communications and the accuracy of automatic speech recognition in applications such as controlling automobiles and elevators, as will as in the home to operate various appliances and gadgets. “We will be introducing it in the near future,” he adds.

A red-and-white cell tower erected by Afghan Wireless is pictured.

Afghan Wireless Launches First 4G LTE Network in Afghanistan

In May 2017, Afghan Wireless announced a milestone—the company had launched the first 4G LTE service in Afghanistan. That service is now live in Kabul, and the company plans to extend 4G LTE to the entire country within the next 12 to 18 months.  

Building or upgrading a reliable wireless network in Afghanistan, where road access is limited and power is no guarantee, poses a unique set of challenges. The job has been made far more difficult by the U.S.-led war in Afghanistan, which lasted for 13 years from 2001 to 2014, and ongoing fighting among Taliban insurgents, the Islamic State group, and remaining troops.

On Wednesday, a truck bomb exploded in central Kabul, killing 80 people and wounding hundreds. Such violence reshapes many aspects of daily life for Afghanistan's 33 million residents. For Afghan Wireless, it also presents major operational hurdles.

Read More
At left an aerial image of an industrial area overlaid with a zig-zag plume of green yellow and red in the lower left corner. At right the same image but with the plume as a more detailed mass of red.

Differential Lidar Catches "Fugitive" Methane on the Fly

In 2015, methane accounted for 655 million kilograms  of the 7.1 billion kilograms of greenhouse gases released into the atmosphere of the United States alone. The energy sector was responsible for just under half of the methane released, about 279 million kg—lost product with a value of hundreds of millions of dollars.

So detecting leaks from the 2.6 million miles of natural gas pipelines snaking across America is properly both a business and an environmental priority. Air surveillance has reduced serious pipeline leaks by 39 percent since 2009, but there have still been 250 serious incidents in the past 8 years. These include a San Bruno, Calif., pipeline blast that killed eight people in 2010 and the Aliso Canyon leak in 2016—which released about 97 million kilograms of methane, essentially doubling the Greater Los Angeles area’s usual volume of methane emissions from all sources for a three month period.

Until now, efforts to detect what the industry calls “fugitive emissions” have been constrained by the instrument sensitivity and response times. Airborne surveillance required low-flying, slow-moving, expensive-to-run helicopters.

A new approach increases sensitivity and tightens control of timing and synchronization to permit the system to operate at higher speeds and higher altitudes—allowing a shift from helicopters to faster-moving, higher-flying single-engine, fixed-wing aircraft, which are less expensive to own and operate. The innovation earned Ball Aerospace & Technologies engineers Steve Karcher, Phil Lyman, and Jarett Bartholomew the Engineering Impact Award for Energy at NIWeek 2017 in Austin, Tex. The award was presented on 23 May.

Their Methane Monitor is a differential-absorption lidar (DIAL) methane detection system that uses two lasers of slightly different infrared wavelengths to map the ground and measure atmospheric methane. Methane strongly absorbs one of the wavelengths (at about 1,645.55 nm, the “on-resonance beam”) and is virtually transparent to the other—at about 1645.4 nm, the “off-resonance beam”). DIAL makes 1,000 to 10,000 measurements per second, firing the off- and on-resonance beams a few nanoseconds apart. The lasers light bounces off the ground and scatters back to the receiver, and the system calculates the intensity differences between the returns to measure the amount of methane in the beams’ paths. Overall, the differential intensity measurement requires signal-to-noise ratios 500 times better than ordinary lidar applications demand.

Return pulses may be sharply reflected by solid ground, distorted by foliage, or, in the case of the on-resonance pulse, completely absent because they have been fully absorbed by high concentrations of methane. The adaptive FPGA-based controller allows the system to compensate on the fly for variations in ground reflectivity, the energies and wavelengths of the two pulses, and aircraft velocity and position. Overall, Methane Monitor gathers data at rates of 2 to 17 gigabits per second.

Cruising in calm conditions at an altitude of 500 to 1000 meters, Methane Monitor can detect methane leaking at 50 cubic feet per hour (a rate about equivalent to what a single person achieves while blowing up a rubber party balloon, noted National Instruments VP Dave Wilson noted during the presentation)—all while sweeping a corridor up to 60 meters wide and providing real-time heat-map images of methane plumes overlaid on ground images from the system itself and such resources as Google Maps.  

A photo illustration shows a money sign on a blue background.

Citi Launches Blockchain-Based Payments Service With Nasdaq for Private Equity

A major U.S. bank and financial exchange have married two blockchain-based systems to enable clients who are raising funds or swapping private shares through Nasdaq to take advantage of payment services provided by Citi.

The Citi-Nasdaq partnership is one of the first examples of an enterprise blockchain system to enter production. Citi says the project went live on Monday in an announcement at the annual Consensus conference in New York City.

Over the past year, many banks and financial institutions have completed proofs of concept for projects that rely on blockchain or distributed ledger technology. But so far, few of those projects have graduated into functioning systems.

Read More
An out of focus face behind a hand holding a green and black computer chip.

Memristor Image Processor Uses Sparse Coding to See

Every time you open your eyes, a magnificent feat of low-power pattern matching begins in your brain. But it’s very difficult to replicate that same system in conventional computers.

Now researchers led by Wei Lu at the University of Michigan have designed hardware specifically to run brain-like “sparse coding” algorithms. Their system learns and stores visual patterns, and can recognize natural images while using very little power compared to machine learning programs run on GPUs and CPUs. Lu hopes these designs, described this week in the journal Nature Nanotechnology, will be layered on image sensors in self-driving cars.

The key, he says, is thinking about hardware and software in tandem. “Most approaches to machine learning are about the algorithm,” says Lu. Conventional processors use a lot of energy to run these algorithms, because they are not designed to process large amounts of data, he says. “I want to design efficient hardware that naturally fits with the algorithm,” he says. Running a machine-learning algorithm on a powerful processor can require 300 watts of power, says Lu. His prototype uses 20 milliwatts to process video in real time. Lu says that’s due to a few years of careful work modifying the hardware and software designs together.

The device design is based on a 32-by-32 array of resistive RAM memory cells based on tungsten oxides. The resistance of these cells can be changed by applying a voltage. “As memory, the device is already pretty mature—it’s available commercially at a large scale,” says Lu. (He co-founded Crossbar, a company that sells resistive RAM.) In a traditional memory application, high resistance in a a resistive RAM cell might represent a 0 and low resistance a 1.

These cells can also be operated in analog mode, taking advantage of a continuum of electrical resistance. This allows them to behave as memristors, a kind of electronic component with a memory. In memristors, the resistance of the cell can be used to modulate signals—in other words, they can both store and process data. That contrasts with conventional computing, where there is a strict delineation between logic and memory.

The Michigan group used the memristor arrays to run a kind of algorithm that performs pattern matching. The algorithm is based on vector multiplication, a way of checking the stored data against incoming data. “The vector multiplication process directly tells you which stored pattern matches the input pattern,” says Lu.

Then the Michigan group took things a step further, programming the memristor array using a brain-inspired approach called sparse coding to save energy. “The firing of neurons is sparse,” he says. In the brain, only a small number of neurons fire in response to an image when matchin it to something you’ve seen before. In Lu’s system only the memory cells storing the relevant visual patterns become active. The sparse code was “mapped” onto the memristor array by training it against a set of sample images.

Lu says these memristor arrays can be stacked on an image sensor, since they don’t use much energy. Instead of sending all the image data to the processor, like in existing designs, the sparse coding hardware could sort out the most important parts and pass those along. He expects this will enable more energy-efficient and speedier video systems for self-driving cars. Lu’s group is currently working on integrated designs.

A white Phazr base station is shown in front of the Ridgeland, Mississippi headquarters of C Spire during a trial last week.

C Spire and Phazr Complete 5G Trial With Millimeter Waves in Mississippi

C Spire, a privately-held wireless provider that serves the American South, completed a technical trial with Phazr last week at C Spire’s headquarters in Ridgeland, Miss. The goal was to test the startup’s millimeter wave base station technology, which could become a key component of future 5G networks.

Though several national carriers have announced 5G trials in major cities across the U.S., C Spire is a rare example of a regional provider investing in new 5G technology primarily for rural areas.

“When you look at this tech, I think it holds a lot of opportunities for serving that market,” says Stephen Bye, C Spire’s president. “We're very bullish about it.”

Millimeter waves are high-frequency waves that fall between 30 and 300 gigahertz, where spectrum remains empty—unlike bands below 6 GHz, which have become crowded with wireless signals. Phazr’s technology, called Quadplex, uses these waves to deliver over-the-air Internet service to homes and businesses within range of a Phazr base station.

C Spire has a problem that it hopes Quadplex can help solve. Though C Spire owns fiber optic cables that run by many rural communities, countless areas still lack broadband service because it’s financially impractical for the company to extend that cable to serve a smattering of households.

“It's just buried gold for a lot of small towns and rural folks,” says Craig Sparks, vice president of technology strategy and planning for C Spire. “We just need to pop it up with solutions that make delivering it quicker.”

In the United States, the average broadband connection delivers data at a clip of 55 megabits per second. But the average Internet speed in Mississippi, which makes up the bulk of C Spire’s service area, is only 26 Mbps.

With Phazr’s technology, C Spire could, in a sense, bring its buried cables to the surface by providing wireless service to homes in the area. Sparks hopes it will allow the company to deliver service with downlink speeds of hundreds of megabits per second to homes that are getting by today on only a fraction of those rates.

A unique aspect of Phazr’s approach is that the company uses only millimeter waves for the downlink carrying data from a base station to its customers. For the uplink, or data sent from customers to a base station, Phazr relies on the traditional cellular frequencies used today.

This strategy has proven popular with wireless providers eager to roll out improved services while 5G standards are still in the works. C Spire, which refers to its work with Phazr as “pre-5G,” sounds particularly optimistic about the company’s ability to help.

“The system’s performing, and we’re seeing the numbers we want,” said Sparks in the midst of the Phazr trials.

For now, Phazr’s setup is only meant to provide wireless service to devices that are inside of a home or other building. The version that C Spire tested does not provide the on-the-go mobile broadband that smartphones require.

A Phazr base station broadcasts signals over millimeter waves to a device called a Gazer that is placed at a customer’s home. The Gazer converts the signal to a lower frequency and then rebroadcasts the signals on Wi-Fi to nearby wireless devices.

To upload data, a device in the home sends it over Wi-Fi to a Gazer mounted on a wall or window, which converts it to a traditional cellular frequency and then sends it back to the Phazr base station. 

Sparks says that for C Spire to deploy Phazr’s technology across its network, its base stations would need to be modestly priced and capable of serving customers up to a kilometer away. During last week’s trials, Phazr showed speeds of 250 Mbps as far as a kilometer away from a base station, with a clear line of sight.

Farooq Khan, CEO of Phazr, says he believes the economic sweet spot for any provider to deploy Phazr’s technology will be if they can find areas where the combined cost of installing base stations and providing customers with Gazers works out to be $1,000 or less per subscriber. 

When Khan, a soft-spoken former Samsung engineer, first began working on millimeter waves, conventional wisdom in the field held that higher frequencies were cursed with higher signal propagation losses. But Khan realized that, by using directional beams that focus a wave’s energy on one device, such as a Gazer, it’s possible to still deliver reliable service from a distance. 

Now, each Phazr base station, called a Rabacks, has 384 millimeter wave antennas and 108 low-frequency antennas that form these directional beams. A Phazr cell site, consisting of three base stations, can support up to 36 beams, which together provide 360-degree coverage.

Rabacks can operate at any frequency between 24 and 40 GHz. For its tests with C Spire, Phazr used 28 GHz. This week, Verizon and Phazr will begin trials near Fort Worth, Texas using Phazr’s equipment at both 28 GHz and 39 GHz. Verizon recently paid $3.1 billion to acquire Straight Path Communications and its spectrum holdings at 39 GHz.

Sanyogita Shamsunder, Verizon’s director of network planning, downplayed the significance of its trials with Phazr in a recent interview. “We test a lot of different technologies in the network,” she says. “It’s routine for us.” She also wouldn’t say what role, if any, Phazr’s tech would have in a series of 11 fixed wireless trials that Verizon will conduct this year.

In one test with C Spire, Phazr sent six high-definition video streams, at 28 GHz, from a Raback to a Gazer mounted to the inside of a trailer containing six televisions. The Gazer rebroadcast the streams over Wi-Fi to the televisions nearby. Meanwhile, the base station also broadcast an “always on” application, which is an app that runs continuously in the background, to three Gazers. During this test, overall throughput for the network of three Gazers connected to one Raback reached 2.53 gigabits per second.

In a real-world deployment, as more customers, and Gazers, are added to a base station, performance will change. Khan says one Raback maintain download speeds of 1 Gbps to as many as six Gazers at a time in a sparsely populated area, or provide speeds of 100 megabits per second to as many as 60 Gazers at once in a more crowded community. In reality, not all customers use their devices at the same time, so providers often oversubscribe their networks by a factor of five or 10.

Though beamforming has helped Phazr to overcome the signal losses known to plague high frequencies, one thing still stands in the company’s way: leafy trees. Foliage causes higher signal losses for millimeter waves than traditional cell signals, and Khan admits it is a problem.

To avoid as many trees as possible during leafy summers in Mississippi, Phazr plans to attach its base stations to water towers or other tall fixtures in rural areas, and advise customers to install Gazers high up in their homes for the best service.

“We think if you can put this on top of water towers, those heights are hundreds of feet, we expect the range could be several kilometers,” Khan says.

Khan says even without a clear line of sight, and surrounded by lots of foilage, their base station has delivered hundreds of megabits per second to devices 300 to 400 meters away.

Moving forward, says Khan, Phazr will launch a commercial product for millimeter waves and fixed wireless that will be ready by the second half of 2017. Around the same time, Sparks says C Spire plans to begin trials with friendly users who can test the system’s performance in real-world settings.

Editor's note: This story was updated on 5/24 to correct Stephen Bye’s title (he was formerly CTO and is now president of C Spire) and to change “uplink” to "downlink” in referring to the speeds that Sparks hopes to achieve with Phazr’s system.

This prototype chip learns a style of music, then composes its own tunes.

A Neuromorphic Chip That Makes Music

A chip made by researchers at IMEC in Belgium uses brain-inspired circuits to compose melodies. The prototype neuromorphic chip learns the rules of musical composition by detecting patterns in the songs it’s exposed to. It then creates its own song in the same style. It’s an early demo from a  project to develop low-power, general purpose learning accelerators that could help tailor medical sensors to their wearers and enable personal electronics to learn their users’ patterns of behavior.

Read More
Computer immersed in liquid cooling system with blue light emanating from tank

Fujitsu Liquid Immersion Not All Hot Air When It Comes to Cooling Data Centers

Given the prodigious heat generated by the trillions of transistors switching on and off 24 hours a day in data centers, air conditioning has become a major operating expense. Consequently, engineers have come up with several imaginative ways to ameliorate such costs, which can amount to a third or more of data center operations. 

One favored method is to set up hot and cold aisles of moving air through a center to achieve maximum cooling efficiency. Meanwhile, Facebook has chosen to set up a data center in Lulea, northern Sweden on the fringe of the Arctic Circle to take advantage of the natural cold conditions there; and Microsoft engineers have seriously proposed putting server farms under water.

Fujitsu, on the other hand, is preparing to launch a less exotic solution: a liquid immersion cooling system it says will usher in a “next generation of ultra-dense data centers.” 

Read More
A white box with an open side displaying many green circuit boards stacked edge on.

A Circuit That Sees Radiation Strikes Could Keep Errors at Bay

For a short time, it looked like the worlds electronics would be safe (well, safer) from radiation. With the switch from planar transistors to FinFETs, ICs suddenly became naturally resistant (literally) to having their bits flipped by a neutron splashing into them and blasting lose a small cloud of charge. But two things are now making them vulnerable again: One is the move to operating at voltages so low, that it’s easier for a pulse of radiation-induced charge to flip a transistor on or off. The other is how the unprecedented density of those transistors is giving radiation more targets than ever.

Engineers at the University of Minnesota are nearing a solution that could help bring down the rate of so-called logic soft errors—signals temporarily flipped by a radiation strike. It’s a circuit called a back-sampling chain that has, for the first time, allowed them to reconstruct the strike pulse—called a single event transient—resulting from the radiation strike. In research to be presented in June at the IEEE VLSI Symposia in Kyoto, Kim’s team shows that the back-sampling chain (BSC) circuit—a kind of cross-connected chain of inverters—can detect orders of magnitude higher number of strikes compared to previous approaches.

Read More
Advertisement

Tech Talk

IEEE Spectrum’s general technology blog, featuring news, analysis, and opinions about engineering, consumer electronics, and technology and society, from the editorial staff and freelance contributors.

Newsletter Sign Up

Sign up for the Tech Alert newsletter and receive ground-breaking technology and science news from IEEE Spectrum every Thursday.

Load More