Ambitious Data Project Aims to Organize the World’s Geoscientific Records

Geoscience researchers are excited by a new big-data effort to connect millions of hard-won scientific records in databases around the world. When complete, the network will be a virtual portal into the ancient history of the planet.

The project is called Deep-time Digital Earth, and one of its leaders, Nanjing-based paleontologist Fan Junxuan, says it unites hundreds of researchers—geochemists, geologists, mineralogists, paleontologists—in an ambitious plan to link potentially hundreds of databases.

The Chinese government has lined up US $75 million for a planned complex near Shanghai that will house dedicated programming teams and academics supporting the project, and a supercomputer for related research. More support will come from other institutions and companies, with Fan estimating total costs to create the network at about $90 million.

Photo of a fossilized leaf.

Photo of a fossilized plant.

Photo of Fan Junxuan Fossil Record: Deep-time Digital Earth will make it easier for scientists to study fossils such as these. The project is led by paleontologist Fan Junxuan (bottom).Photos: Fan Junxuan

Right now, a handful of independent databases with more than a million records each serve the geosciences. But there are hundreds more out there holding data related to Earth's history. These smaller collections were built with assorted software and documentation formats. They're kept on local hard drives or institutional servers, some decades old, and converted from one format into another as time, funding, and interest allow. The data might be in different languages and is often guided by informal or variably defined concepts. There is no standard for arranging the hundreds of tables or thousands of fields. This archipelago of information is potentially very useful but hard to access.

Fan saw an opportunity while building a database comprising the Chinese geological literature. Once it was complete, he and his colleagues were able to use parallel computing programs to examine data on 11,000 marine fossil species in 3,000 geological sections. The results dated patterns of paleobiodiversity—the appearance, flowering, and extinction of whole species—at a temporal resolution of 26,000 years. In geologic time, that's pretty accurate.

The Deep-time project planners want to build a decentralized system that would bring these large and small data sources together. The main technical challenge is not to aggregate petabytes of data on centralized servers but rather to script strings of code. These strings would work through a programming interface to link individual databases so that any user could extract information through that interface.

Harmonizing these data fields requires human beings to talk to one another. Fan and his colleagues hope to kick off those discussions in New Delhi, which in March is hosting a big gathering of geoscientists. A linked network could be a gold mine for researchers scouring geologic data for clues.

In a 19th-century building behind Berlin's Museum für Naturkunde, micropaleontology curator David Lazarus and paleobiologist postdoc Johan Renaudie run the group's Neptune database, which is likely to be linked with Deep-time Digital Earth as it develops. Neptune holds a wealth of data on core samples from the world's ocean floors. Lazarus started the database in the late 1980s, before the current SQL language standard was readily available—at that time it was mostly found only on mainframes. Renaudie explains that Neptune has been modified from its incarnation as a relational database using 4th Dimension for Mac, and has been carefully patched over the years.

There are many such patched-up archives in the field, and some researchers start, develop, and care for data centers that drift into oblivion when funding runs out. “We call them whale fall," Lazarus says, referring to dead whales that sink to the ocean floor.

Creating a database network could keep this information alive longer and distribute it further. It could lead to new kinds of queries, says Mike Benton, a vertebrate paleontologist in Bristol, England, making it possible to combine independent data sources with iterative algorithms that run through millions or billions of equations. Doing this can deliver more precise time resolutions, which hitherto has been really difficult. “If you want to analyze the dynamics of ancient geography and climate and its influence on life, you need a high-resolution geological timeline," Fan says. “Right now this analysis is not available."

This article appears in the March 2020 print issue as “Data Project Aims to Organize Scientific Records."

software databases computing China Big Data

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Ambitious Data Project Aims to Organize the World’s Geoscientific Records

Deep-time Digital Earth will link hundreds of bespoke scientific databases in one easy-to-search network

The Rise and Fall of 3M’s Floppy Disk

Toyota’s Bubble-ized Humanoid Grasps With Its Whole Body

Stuart Parkin Revolutionized Disk Drive Storage

Related Stories

AI-Powered Proof Generator Helps Debug Software

GPUs Can Now Analyze a Billion Complex Vectors in Record Time

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Enjoy more free content and benefits by creating an account

Saving articles to read later requires an IEEE Spectrum account

The Institute content is only available for members

Downloading full PDF issues is exclusive for IEEE Members

Downloading this e-book is exclusive for IEEE Members

Access to Spectrum 's Digital Edition is exclusive for IEEE Members

Following topics is a feature exclusive for IEEE Members

Adding your response to an article requires an IEEE Spectrum account

Create an account to access more content and features on IEEE Spectrum , including the ability to save articles to read later, download Spectrum Collections, and participate in conversations with readers and editors. For more exclusive content and features, consider Joining IEEE .

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to all of Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more →

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to this e-book plus all of IEEE Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more →

Access Thousands of Articles — Completely Free

Create an account and get exclusive content and features: Save articles, download collections, and talk to tech insiders — all free! For full access and benefits, join IEEE as a paying member.

Ambitious Data Project Aims to Organize the World’s Geoscientific Records

Deep-time Digital Earth will link hundreds of bespoke scientific databases in one easy-to-search network

The Rise and Fall of 3M’s Floppy Disk

Toyota’s Bubble-ized Humanoid Grasps With Its Whole Body

Stuart Parkin Revolutionized Disk Drive Storage

Related Stories

AI-Powered Proof Generator Helps Debug Software

GPUs Can Now Analyze a Billion Complex Vectors in Record Time

SeKVM Makes Cloud Computing Provably Secure