AI and Big Data vs. Air Pollution

Physics simulations and AI combine to give pollution forecasts to city dwellers in Beijing and beyond


Beijing and other Chinese cities are choking under a blanket of smog. It’s so thick in Tianjin that planes can’t land. Authorities have issued the first “red alert” of 2016, and 1,200 Beijing-area factories were ordered to shut down or to reduce production, according to press reports.

This winter, officials will be equipped with forecasting tools from IBM and Microsoft that they tested last year. IBM’s tool, used by the city government, is designed to incorporate data from traditional sources, such as the 35 official multipollutant air-quality monitoring stations in Beijing, and lower-cost but more widespread sources, such as environmental monitoring stations, traffic systems, weather satellites, topographic maps, economic data, and even social media. Microsoft’s system incorporates data from over 3,000 stations around the country. Both IBM’s and Microsoft’s tools blend traditional physical models of atmospheric chemistry with data-hungry statistical tools such as machine learning to try to make better forecasts in less time.

“Our advantage or differentiation is to combine all those together,” says environmental engineer Jin Huang, who is project manager for the Green Horizon Initiative at IBM Research–China, in Beijing. IBM reports an accuracy of over 80 percent for 3-day forecasts and around 75 percent for its 7- to 10-day forecasts. Microsoft now provides China’s Ministry of Environmental Protection with a 48-hour forecast that as of 2015 reached 75 percent accuracy for 6 hours and 60 percent for 12 hours in Beijing.

How best to combine physics models and machine learning for air-quality forecasts is “an active research area,” says atmosphere scientist Vincent-Henri Peuch, the head of the European Copernicus Atmosphere Monitoring Service in Reading, England. He adds that blending is the right choice: Both types of models have something to offer and do not need to preclude each other. The market seems to agree so far. IBM now offers its combined model in New Delhi and Johannesburg, and the Beijing startup AirVisual also offers machine-learning-enhanced forecasts for private commercial use.

Beijing officials have been able to claim some success beating down their fine-particle pollution levels: They reported that 2015 levels were 6 percent below 2014 levels. And while governments are under pressure to reduce air pollution, they are also under pressure not to let economic growth slip. IBM’s forecasting tool includes a simulator for measures such as shutting down factories upwind of the city or reducing road traffic for a day or two. “The tool estimates both emissions outcomes and the economic consequences of each proposed intervention,” Huang says.

AirVisual, IBM, and Microsoft are all generalizing their software to work in different locations, which requires integrating different local physical models on the one hand but also tuning for differing types of input data and their changing parameters. Johannesburg, for example, has just 8 monitoring stations to Beijing’s 35. Still, “there’s an opportunity to reuse some of the assets they developed here in South Africa,” says computer engineer Tapiwa M. Chiwewe, at the newly opened IBM Research lab in Johannesburg.

Each setting may require its own type of machine learning, a University of British Columbia team reported in 2016. In their study, they found that the computational expense of several types of learning depended on how much data they included up front versus how much data they fed into the program during its operation. The best solution for a place such as Beijing, with just a couple of years of historic air-quality data, may differ from what’s best for a city with many more years of historical data, and that poses a challenge for officials trying to choose the right system for their city. It is difficult to compare different models without using the exact same data set at the same location, Peuch warns.

And cities around the world have a long way to go before they bring air quality down to levels recommended by the World Health Organization. In 2015, ambient particulate matter—which does not include tobacco smoke—cost 103.1 million disability-adjusted life years (a measure of the quality and length of human life), according to the 2015 Global Burden of Disease Study in The Lancet, making it the sixth most harmful disease risk factor. That makes it an important target for governments and companies. By one estimate, the market for monitoring air quality will grow 8.5 percent per year for the next five years, reaching US $5.64 billion. It seems safe to forecast that the market for air-quality forecasting will grow, too.

This article appears in the January 2016 print issue as “Big Data vs. Bad Air.”