A Global Alliance for Genomic Data Sharing

Group aims to prevent the creation of competing standards for analyzing, sharing, and using the flood of data resulting from gene sequencing

3 min read
A Global Alliance for Genomic Data Sharing

In June, a group of 70 hospitals, research institutes, and technology companies from 40 countries formed the Global Alliance (pdf), a consortium to promote open standards and best practices for organizations producing, using, or sharing genomic and clinical data.

Created in response to the flood of genomic data generated by increasingly affordable gene sequencing technologies, the Global Alliance aims to foster an environment of widespread data sharing that is unencumbered by competing, proprietary standards, the likes of which have plagued electronic health records in the United States and elsewhere. For example, although analysis of individuals’ genomes already sees widespread application in the treatment of cancer, inherited disease, and infectious disease, it’s not always possible for researchers to achieve the sample sizes necessary to study rare conditions. This is due in part to the fact that hospitals cannot aggregate data stored in different hospital systems using unstandardized analytical tools and methods. By creating a standardized framework for sharing and using genomic data, the Global Alliance will enhance the opportunities for broader study of a range of diseases while also improving information sharing globally.

The group is modeled after the World Wide Web Consortium (W3C), a nonprofit community that serves as the de facto standards-setting organization for Web technologies. Like the W3C, the Global Alliance plans to secure funding through philanthropic support, grants from research agencies, and member dues.

The Global Alliance has seven core principles:

  • Respect: The Global Alliance will respect the right of individuals to release some or none of their genomic data.
  • Transparency: The Global Alliance will employ transparent management and operating practices.
  • Accountability: The Global Alliance will develop and disseminate best practices for the technology, ethics, and public outreach behind genomic and clinical data sharing.
  • Inclusivity: The Global Alliance will foster partnerships among genomic data stakeholders.
  • Collaboration: Global Alliance members will share data to advance human health.
  • Innovation: The Global Alliance will promote technological advances to accelerate scientific and clinical progress.
  • Agility: The Global Alliance will act quickly to keep pace with rapidly changing technology.

The core principle of innovation is stressed repeatedly in the refreshingly specific section on technological considerations. The Global Alliance strongly advocates a cloud-based data archiving platform in order to minimize data storage costs across member organizations. It also urges gene sequencing technology makers to ensure that their products are compatible with Hadoop and Spark (a Hadoop alternative optimized for real-time and memory-intensive applications) to enable efficient, massively-parallel computation. In addition, the group calls for the creation of an application programming interface (API) that will allow developers to query the system and develop their own applications employing the data.

One major challenge for the alliance will be the broadly differing international attitudes on sharing personal data. According to a 2010 survey from the European Commission (pdf), public opinion regarding the sharing of medical data for research varied widely at the national level. In Sweden and Norway, for example, 82 percent of poll respondents said they would be willing to provide such data; in relatively nearby Latvia and Lithuania, only 31 percent and 41 percent, respectively, would be willing to share their data.

In addition, it is not clear at this stage how the Global Alliance will work with existing genomic data standardization efforts such as those undertaken by the international health standards organization Health Level Seven. The issues of competing standards and global attitudes toward data sharing will need to be worked out to ensure the success of the Alliance.  

The alliance, though still in the early stages, is nevertheless an ambitious project, and one that could be widely influential if its international members can resolve these sensitive data ethics and standards regulation issues.

Travis Korte is a Research Analyst with the Information Technology and Innovation Foundation.

Photo: Alan John Lander Phillips/Getty Images

The Conversation (0)

This CAD Program Can Design New Organisms

Genetic engineers have a powerful new tool to write and edit DNA code

11 min read
A photo showing machinery in a lab

Foundries such as the Edinburgh Genome Foundry assemble fragments of synthetic DNA and send them to labs for testing in cells.

Edinburgh Genome Foundry, University of Edinburgh

In the next decade, medical science may finally advance cures for some of the most complex diseases that plague humanity. Many diseases are caused by mutations in the human genome, which can either be inherited from our parents (such as in cystic fibrosis), or acquired during life, such as most types of cancer. For some of these conditions, medical researchers have identified the exact mutations that lead to disease; but in many more, they're still seeking answers. And without understanding the cause of a problem, it's pretty tough to find a cure.

We believe that a key enabling technology in this quest is a computer-aided design (CAD) program for genome editing, which our organization is launching this week at the Genome Project-write (GP-write) conference.

With this CAD program, medical researchers will be able to quickly design hundreds of different genomes with any combination of mutations and send the genetic code to a company that manufactures strings of DNA. Those fragments of synthesized DNA can then be sent to a foundry for assembly, and finally to a lab where the designed genomes can be tested in cells. Based on how the cells grow, researchers can use the CAD program to iterate with a new batch of redesigned genomes, sharing data for collaborative efforts. Enabling fast redesign of thousands of variants can only be achieved through automation; at that scale, researchers just might identify the combinations of mutations that are causing genetic diseases. This is the first critical R&D step toward finding cures.

Keep Reading ↓ Show less