Six Creative Ways to Solve Biomedicine's Big Data Problem

Photo-illustration: Donald Iain Smith/Getty Images

Biomedical research generates an obscene amount of data. Many of the sensors, robots and other technologies IEEE Spectrum regularly profiles spew out terabytes to petabytes of data—and that’s only a sliver of the volume of health information stored in databases around the world.

Now, three funding agencies are trying to spur the development of tools and platforms to improve researchers’ ability to find, access and use that data. Yesterday at the 7th Health Datapalooza conference in Washington, D.C., the National Institutes of Health, the U.K.-based Wellcome Trust, and the Howard Hughes Medical Institute announced six finalists for the first-ever Open Science Prize, a global science competition for prototype tools and platforms to tame biomedical’s big data behemoth.

Part of the problem with developing these kind of tools is that no one is sure who should be responsible for them. “Data is generated globally, but it’s essentially managed and funded nationally,” says Philip Bourne, associate director for data science at the NIH. In an effort to transcend international borders and fund data science in a new way, Bourne and colleagues at Wellcome and HHMI hatched a plan for the prize.

Launched last October, 96 teams spanning 45 countries entered the competition. Each team was required to have one member based in the U.S. and at least one other in another country.

Yesterday, an expert panel announced the six finalists who would receive $80,000 each to spend over the next nine months to develop their prototype. “We tried to pick things that had real promise, and where a small amount of money and a bit of publicity would really help,” says Bourne.

Here we take a peek at the six finalists, but be sure to stay tuned later this year to help pick a winner: In December, each team will demonstrate their prototype at a showcase, and the public will be invited to vote for their favorites. The winner will receive a grand prize of $230,000 to turn their idea into reality.

Without further ado, let’s meet the finalists:

  • Brainbox – The amount of brain imaging data available on the Internet is, well, mind-boggling. And compared to other types of data, neuroimaging data requires a substantial amount of human effort, such as curating and editing images. BrainBox is an online laboratory designed to give researchers easy access to brain imaging data (notably without downloading it) and to enable distributed collaboration so everyone can share in the effort.

  • NeuroArch – Despite valiant efforts to map the entire human brain, a more near-term goal is to map a smaller brain, such as that of a fruit flywhich shares more than 70 percent of the genes involved in human brain disorders. The Fruit Fly Brain Observatory project would develop an open graph database platform called NeuroArch to store and process information about the fly brain, including the location, shape, and connectivity of every neuron. With all that data in one place, it might be possible to generate a simulated fly brain and see what happens when it is altered via genetics or drugs.

  • MyGene2 – Rare diseases aren’t as rare as you think. More than 6,000 known rare diseases affect an estimated 25 million people in the U.S. today. Yet more than half of families who undergo genetic testing fail to get a diagnosis for a suspected rare disease. A website named MyGene2 provides a place for families and clinicians to share health and genetic information on rare diseases as a way to promote the diagnosis and discovery of new rare conditions and the genes that cause them.

  • Nextstrain – To intervene and stop the outbreak of an epidemic, scientists need to get their hands on genomic data from viral pathogens as soon as possible. The Nextstrain project pools genetic data from research groups around the world to visualize the spread of a virus in near real-time. For example, check out their graphic of the current evolution of the Zika virus.

  • OpenAQ – According to the World Health Organization, air pollution exposure is responsible for one in eight global deaths, yet air quality data has traditionally been stored on obscure websites that are difficult to access and have inconsistent formats. The OpenAQ platform prototype aggregates and standardizes publicly available, real-time air quality data. It has already collected and shared 9.7 million air quality measurements from 500+ locations in 13 countries.

  • OpenTrialsFDA – When the U.S. Food and Drug Administration approves a drug, the agency publically publishes a package of information about that drug, often including previously unpublished clinical trials. Though this information is quite valuable, it is notoriously difficult to access, aggregate and search. OpenFDA is an effort to build a user-friendly web interface to enable anyone to access the data, plus APIs to allow third-party platforms to tap into and search the data. 

The Human OS

IEEE Spectrum’s biomedical blog, featuring the wearable sensors, big data analytics, and implanted devices that enable new ventures in personalized medicine.

Eliza Strickland
New York City
Emily Waltz
Megan Scudellari

Newsletter Sign Up

Sign up for The Human OS newsletter and get biweekly news about how technology is making healthcare smarter.