Can Computing Keep up With the Neuroscience Data Deluge?

Today's neuroscientists have some magnificent tools at their disposal. They can, for example, examine the entire brain of a live zebrafish larva and record the activation patterns of nearly all of its 100,000 neurons in a process that takes only 1.5 seconds. The only problem: One such imaging run yields about 1 terabyte of data, making analysis the real bottleneck as researchers seek to understand the brain.

To address this issue, scientists at Janelia Farm Research Campus have come up with a set of analytical tools designed for neuroscience and built on a distributed computing platform called Apache Spark. In their paper in Nature Methods, they demonstrate their system's capabilities by making sense of several enormous data sets. (The image above shows the whole-brain neural activity of a zebrafish larva when it was exposed to a moving visual stimulus; the different colors indicate which neurons activated in response to a movement to the left or right.)

The researchers argue that the Apache Spark platform offers an improvement over a more popular distributed computing model known as Hadoop MapReduce, which was originally based on Google's search engine technology. Here's how Spectrum described these conventional systems in an article on "DNA and the Data Deluge":

While Hadoop and MapReduce are simple by design, their ability to coordinate the activity of many computers makes them powerful. Essentially, they divide a large computational task into small pieces that are distributed to many computers across the network. Those computers perform their jobs (the “map” step), and then communicate with each other to aggregate the results (the “reduce” step). This process can be repeated many times over, and the repetition of computation and aggregation steps quickly produces results.

But the Janelia Farm researchers note that with MapReduce, data has to be loaded from disk for each operation. The Apache Spark advantage lies in its ability to cache data sets and intermediate results in the memory of many computers across the network, allowing for much faster iterative computations. This caching is particularly useful for neural data, which can be analyzed in many different ways, each offering a new view into the brain's structure and function.

The researchers have made their library of analytic tools, which they call Thunder, available to the neuroscience community at large. With U.S. government money pouring into neuroscience research for the new BRAIN Initiative, which emphasizes recording from the brain in unprecedented detail, this computing advance comes just in the nick of time.

medical imaging software neuroscience Human OS Big Data brains distributed computing BRAIN initiative

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Can Computing Keep up With the Neuroscience Data Deluge?

When an imaging run generates 1 terabyte of data, analysis becomes the problem

The Legacy of the Datapoint 2200 Microcomputer

Announcing a Benchmark to Improve AI Safety

Related Stories

Study: Medical Image AIs Need a Good “Hallucination Map”

Researchers Take the Guesswork Out of PET Imaging

Far-Infrared Now Near: Researchers Debut Compact Terahertz Laser

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Enjoy more free content and benefits by creating an account

Saving articles to read later requires an IEEE Spectrum account

The Institute content is only available for members

Downloading full PDF issues is exclusive for IEEE Members

Downloading this e-book is exclusive for IEEE Members

Access to Spectrum 's Digital Edition is exclusive for IEEE Members

Following topics is a feature exclusive for IEEE Members

Adding your response to an article requires an IEEE Spectrum account

Create an account to access more content and features on IEEE Spectrum , including the ability to save articles to read later, download Spectrum Collections, and participate in conversations with readers and editors. For more exclusive content and features, consider Joining IEEE .

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to all of Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more →

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to this e-book plus all of IEEE Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more →

Access Thousands of Articles — Completely Free

Create an account and get exclusive content and features: Save articles, download collections, and talk to tech insiders — all free! For full access and benefits, join IEEE as a paying member.

Can Computing Keep up With the Neuroscience Data Deluge?

When an imaging run generates 1 terabyte of data, analysis becomes the problem

The Legacy of the Datapoint 2200 Microcomputer

Announcing a Benchmark to Improve AI Safety

Prepping For Post-Quantum Cryptography

Related Stories

Study: Medical Image AIs Need a Good “Hallucination Map”

Researchers Take the Guesswork Out of PET Imaging

Far-Infrared Now Near: Researchers Debut Compact Terahertz Laser