By combining insights from Google Flu Trends with data from the CDC, scientists now say they can predict the spread of flu a week into the future in the United States.
Each year, 250,000 to 500,000 people die of influenza worldwide, with 3,000 to 50,000 of those fatalities happening in the United States. These deaths are largely preventable by using flu shots, but the CDC must have up-to-date knowledge about where influenza is happening to make sure these vaccines get to where they are needed.
The CDC continuously monitors both the number of doctor visits attributed to flu-like illness as well as the number of patient samples that test positive for influenza. However, it can take a long time to collect and analyze all this activity, resulting in data that is typically up to two weeks out of date once it’s made available.
Recently Google unveiled Google Flu Trends as a way to predict flu levels in real-time — two weeks earlier than the CDC — by analyzing how often people Google search terms related to influenza. However, while Google Flu Trends is promising, this “big data” approach has made dramatic errors — for example, it predicted double the number of doctors’ visits from the flu in 2013 than really happened. This is because people can search Google for information about influenza when they do not actually have the flu if they are enticed by factors such as increased media attention related to illness.
Now researchers at the University of California, San Diego, the combo of Google and CDC data could do better than predict U.S. flu levels in real-time — it could forecast a week into the future.
The scientists used CDC data to determine which U.S. regions experienced influenza outbreaks at similar times in the past. The flu is best at spreading between these areas due to factors such as geographic proximity. This information helped correct exaggerations in Google’s estimates and also shed light on future influenza levels by revealing which the virus might spread.
“Big data does not always work the best in a vacuum,” said study lead author Michael Davidson, a data scientist at the University of California, San Diego. “By combining big data with traditional sources of data, we can often do better than by relying on big data alone.”
In the future, scientists might combine other sources of flu data, such as Wikipedia page visits, with Google Flu Trends and CDC data for even more accurate and timely estimates, Davidson said. He and his colleagues detailed their findings online Jan. 29 in the journal Scientific Reports.
Charles Q. Choi is a science reporter who contributes regularly to IEEE Spectrum. He has written for Scientific American, The New York Times, Wired, and Science, among others.