Big Data Tamps Down HIV Outbreaks

A cutting-edge system detects the spread HIV infections in near real-time

3 min read

Big Data Tamps Down HIV Outbreaks
Photo-illustration: Getty Images

One of the best ways to prevent the spread of HIV is to treat those at high risk with a daily prophylactic pill. Unfortunately, this week Stanford University health researchers concluded that it’s simply too expensive to pre-treat even a fraction of people at increased risk for HIV. 

But what if healthcare providers could track a brewing outbreak in real-time, and quickly help those at highest risk of infection? Thanks to big data and crackerjack new software, Canada’s westernmost province is doing just that.

In June 2014, a monitoring system operated by the British Columbia Centre for Excellence in HIV/AIDS (BC-CfE) detected a cluster of 11 new HIV cases in a town just outside Vancouver. The system, designed by bioinformatician Art Poon, analyzes massive amounts of HIV genetic data to detect outbreaks.

“Our objective is to get to zero incidence of HIV.”

Such data is surprisingly easy to come by. In many developed countries, it is now routine for a doctor to sequence viral DNA from the blood of a HIV-positive patient. By doing so, the physician can identify which drugs, if any, the virus is resistant to and prescribe an optimal treatment.

In Canada, that DNA sequence data is regularly uploaded to BC-CfE’s secure Oracle database, home to 30,000+ anonymized HIV genotypes. Every time new sequences are added—which happens almost every day—it triggers the entire database to be downloaded to a secure workstation, where Poon’s software works its magic. During the download, all patient information is de-identified. “The system is designed to maintain patient privacy,” says Poon.

Once the download is complete, the software analyzes the de-identified DNA and demographic information to determine where new infections have popped up, if they carry drug-resistance mutations, and how they are related. HIV evolves very quickly, so if sequences from different infections are genetically similar, those infections are almost surely related by one or more recent transmissions.


Image: Art Poon
A network diagram of HIV cases, represented by circles. Related cases are linked by lines. Red circles indicate drug-resistant infections.

The software—which combines numerous open source software packages including Graphviz, LaTeX, and Jinja2—then produces a report with a diagram of an evolutionary tree: Each HIV infection is a branch, and clusters of related infections form bushy extensions. These clusters are visualized as network diagrams, from which health officials in British Columbia can identify hotspots of HIV transmission, especially areas where the amount of virus in patients’ blood is very high (which increases risk of transmission) and where drug-resistant strains are spreading. “We look for clusters of short branches in this tree that represent groups in the population where the virus is moving rapidly,” says Poon. “That becomes a significant clinical and public health problem.”

In June 2014, 8 of the 11 newly detected infections carried a mutation that made them resistant to a common type of HIV drugs. Public health officials accessed the pateints’ information at that point and notified the clinic where they were being treated. Then, local personnel conducted outreach to ensure the affected individuals had access to treatment to reduce their viral load and to offer partner notification and referral services.

It worked. The outbreak slowed, with only one new case by the end of 2014.

The system has been in operation for two years. In addition to daily reports, it produces a formal monthly report that is distributed to public health agencies across British Columbia, who meet via teleconference each month to discuss new clusters and make HIV prevention decisions.

“The concept is providing treatment as early as possible to individuals to reduce the chance of onward transmission,” says Poon. “Our objective is to get to zero incidence of HIV.”

Health agencies in the U.S. have expressed interest in the system, and Poon has already shared his code with the Centers for Disease Control in Atlanta. As routine genotyping for hepatitis C infection becomes more common, he is planning to use big data to track outbreaks of that virus too. 

The Conversation (0)