The December 2022 issue of IEEE Spectrum is here!

Close bar

Genomic Data Growing Faster Than Twitter and YouTube

As many as two billion human genomes will be sequenced by 2025

2 min read
Genomic Data Growing Faster Than Twitter and YouTube
Andrey Prokhorov/Getty Images

human os icon

In the age of Big Data, it turns out that the largest, fastest growing data source lies within your cells.

Quantitative biologists at the University of Illinois Urbana-Champaign and Cold Spring Harbor Laboratory, in New York, found that genomics reigns as champion over three of the biggest data domains around: astronomy, Twitter, and YouTube.

The scientists determined which would expand the fastest by evaluating acquisition, storage, distribution, and analysis of each set of data. Genomes are quantified by their chemical constructs, or base pairs. Genomics trumps other data generators because the genome sequencing rate doubles every seven months. If it maintains this rate, by 2020 more than one billion billion bases will be sequenced and stored per year, or 1 exabase. By 2025, researchers estimate the rate will be almost one zettabase, one trillion billion bases, per sequence per year.

[shortcode ieee-pullquote quote=""What does it mean to have more genomes than people on the planet?"" float="left" expand=1]

90 percent of the genome data analyzed in the study was human. The scientists estimate that 100 million to 2 billion human genomes will be sequenced by 2025. That’s a four to five order of magnitude of growth in ten years, which far exceeds the other three data generators they studied.

“For human genomics, which is the biggest driver of the whole field, the hope is that by sequencing many, many individuals, that knowledge will be obtained to help predict and cure a variety of diseases,” says University of Illinois Urbana-Champaign co-author, Gene Robinson. Before it can be useful for medicine, genomes must be coupled with other genomic data sets, including tissue information.

One reason the rate is doubling so quickly is because scientists have begun sequencing individual cells. Single-cell genome sequencing technology for cancer research can reveal mutated sequences and aid in diagnosis. Patients may have multiple single cells sequenced, and there could end up being more than 7 billion genomes sequenced.

That “is more than the population of the Earth,” says Michael Schatz, associate professor at Cold Spring Harbor Laboratory, in New York. “What does it mean to have more genomes than people on the planet?”

What it means is a mountain of information must be collected, filed, and analyzed.

“Other disciplines have been really successful at these scales, like YouTube,” says Schatz. Today, YouTube users upload 300 hours of video every minute, and the researchers expect that rate to grow up to 1,700 hours per minute, or 2 exabytes of video data per year, by 2025. Google set up a seamless data-flowing infrastructure for YouTube. They provided really fast Internet, huge hard drive space, algorithms that optimized results, and a team of experienced researchers.

“We need that investment in genomics in order to understand your diseases, what kinds of treatments to apply, or answer questions about ancestry,” Schatz says. “By sequencing hundreds of millions of people, we can look through the pattern. We can get a sense of global community, and how incredibly connected we really are.”

The Conversation (0)

Are You Ready for Workplace Brain Scanning?

Extracting and using brain data will make workers happier and more productive, backers say

11 min read
A photo collage showing a man wearing a eeg headset while looking at a computer screen.
Nadia Radic

Get ready: Neurotechnology is coming to the workplace. Neural sensors are now reliable and affordable enough to support commercial pilot projects that extract productivity-enhancing data from workers’ brains. These projects aren’t confined to specialized workplaces; they’re also happening in offices, factories, farms, and airports. The companies and people behind these neurotech devices are certain that they will improve our lives. But there are serious questions about whether work should be organized around certain functions of the brain, rather than the person as a whole.

To be clear, the kind of neurotech that’s currently available is nowhere close to reading minds. Sensors detect electrical activity across different areas of the brain, and the patterns in that activity can be broadly correlated with different feelings or physiological responses, such as stress, focus, or a reaction to external stimuli. These data can be exploited to make workers more efficient—and, proponents of the technology say, to make them happier. Two of the most interesting innovators in this field are the Israel-based startup InnerEye, which aims to give workers superhuman abilities, and Emotiv, a Silicon Valley neurotech company that’s bringing a brain-tracking wearable to office workers, including those working remotely.

Keep Reading ↓Show less