DIY

Profile: Gonçalo Abecasis Mines Genomes for Biomedical Gold

Abecasis and his lab look for more efficient ways to compress, store, assess, and prioritize huge genomic data sets

Photo: Peter Smith

Gonçalo Abecasis’s sole intention was to become a geneticist while using his computer programming prowess to pay for university. But years later, those skills have become an integral part of his research.

Today, as the Felix Moore Collegiate Professor of Biostatistics at the University of Michigan’s School of Public Health, in Ann Arbor, Abecasis develops statistical tools and computational methods that help determine the genetic fingerprints for such diseases as psoriasis, macular degeneration, diabetes, and heart disease.

“I want to figure out why people get sick, find better ways to treat and manage disease, and learn how best to use computational modeling to get that information,” he says.

Computer modeling is now an unavoidable component of Abecasis’s genetic research, as the volume of available data has increased at a faster rate than computer processing power.

“The same amount of money can generate four times as much data as the year before,” says Abecasis, who estimates that his lab has amassed some 5 petabytes of data. “Computer processing power only doubles in that time, so the cost of generating data has gone down much faster than the cost of computing it. So you constantly need more efficient ways of compressing, storing, assessing, and prioritizing the data.” [For more about this problem, read "The DNA Data Deluge," IEEE Spectrum, July 2013.]

“We’re working on more efficient mathematical models for data—such as leveraging repeated patterns in data—and compressing data to store it,” he adds. “We’re also thinking differently about what data is important and what we can discard.”

Over the years, Abecasis and his staff have developed their own software, written mostly in C++, to handle the voluminous data and visualize it in colorful 2-D pictures and graphs instead of tables of numbers. The tools are distributed free and widely used in the disease genetics community.

Once Abecasis and his team have used their tools to isolate genetic variants associated with a disease, they look for a cascade of small steps leading to disease, starting with the variants and then including inputs such as diet and environment. “We’re trying to determine the steps that start with a small genetic defect and eventually manifest into the disease,” he says.

Abecasis grew up with 11 siblings in Portugal and Macao, a former Portuguese colony in China near Hong Kong. He taught himself programming as a teenager with some assistance from a software engineer who ran a youth program. Later, computer programming jobs helped pay his way through the University of Leeds, in the United Kingdom, where he earned a bachelor’s in genetics in 1997, and the University of Oxford, where he graduated with a Ph.D. in human genetics in 2001.

That might have been the end of his programming had Abecasis not experienced a computing bottleneck firsthand: A project he was working on at Oxford generated more data than the school’s computer could handle. “At some point, I coded something to be more efficient, and then I became a supervisor for the computational side of the research,” he says. “The tools we ended up developing were more popular than the established methodology, so I began focusing on that side of things.”

After Oxford, Abecasis headed to the University of Michigan as a research faculty member, working his way up to full professor in 2009. Along the way he garnered several awards, including this year’s Overton Award from the International Society for Computational Biology for exceptional contributions made by a researcher in the early to middle stages of his or her career.

Abecasis’s work often has him bridging the distinct cultures of engineering and biology. “Both are attracted to big data, but they think differently,” he says. “Biologists like to figure out cause and effect that they can replicate in experiments, while computer scientists are happy to create models that make predictions but they don’t necessarily need to understand why. Over time, differences can be a good thing. Things I thought weren’t important turned out to be useful, and vice versa.”

This article originally appeared in print as “Gonçalo Abecasis.”

IEEE Spectrum
FOR THE TECHNOLOGY INSIDER

Follow IEEE Spectrum

Support IEEE Spectrum

IEEE Spectrum is the flagship publication of the IEEE — the world’s largest professional organization devoted to engineering and applied sciences. Our articles, podcasts, and infographics inform our readers about developments in technology, engineering, and science.