Pfizer’s Edge in the COVID-19 Vaccine Race: Data Science

A year in the life of the data scientists who helped bring Pfizer's COVID-19 vaccine to the public in record time

6 min read

A U.S. Army soldier immunizes a woman with the Pfizer COVID-19 vaccine in North Miami, Florida.

A U.S. Army soldier immunizes a woman with the Pfizer COVID-19 vaccine in North Miami, Florida.

Joe Raedle/Getty Images

Pfizer dominated news headlines and family dinner conversations last December when it became the first company to bring a COVID-19 vaccine to the U.S. market. The pharma giant accomplished the feat in record time: less than a year after the disease was first identified.

Integral to that effort was the work of Pfizer's informatics and digital technology team for its vaccine R&D business. Led by Frank DePierro, this group of researchers crunched and chronicled all of the clinical trial data that led to a green light from the U.S. Food and Drug Administration (FDA), and a safeguard for millions of people. What did it take to make that happen? IEEE Spectrum spoke with DePierro last week via video call to find out. The transcript below has been edited for length and clarity.

Frank DePierroFrank DePierro, Pfizer

IEEE Spectrum: What's it like to have the eyes of the world on your work?

Frank DePierro: It's been wild knowing how close my team and I are to the success of this vaccine. I would read articles with people speculating about the data or science, watch news clips with talking heads or experts, or overhear people discussing rumors while out and about, and the whole time I would be biting my tongue thinking about how wrong or right something was that was being said. Being so close to the process but also having to hold so much back, even from close family and friends, made it a tense year.

Spectrum: That must have been exasperating.

DePierro: The spring and summer of 2020 was the most difficult time of my life. I was at home working remotely, under intense pressure to deliver and run the team, had to balance working the longest hours of my career while helping a third grader with school and keeping a three-year-old entertained because my wife was busy on the front lines as a healthcare worker. I think people forget—especially in the news cycles—that we are human too, juggling all the same things while also trying to advance science.

“Being so close to the process but also having to hold so much back, even from close family and friends, made it a tense year."

Spectrum: What does your team do for Pfizer?

DePierro: My team supports clinical trial data. When blood and other kinds of samples are brought to the lab, each sample has to be received, tracked, divided up, and analyzed with complex robotics and instruments, and then statistical analysis is performed and final data generated. So the tools that track all of those samples and generate the data—that's my team. Then we report it out to the FDA.

Spectrum:How many samples from COVID-19 vaccine trials have you processed?

DePierro: The short answer is: a lot! In the last year and a half we logged more clinical samples for COVID than all other vaccine programs combined since 2014.

Spectrum: What kinds of assays or tests are conducted on the samples?

DePierro: It depends on what kind of clinical trial we're running. One test we run is where we introduce a live virus to a blood sample to see how the blood reacts. If the virus is neutralized in the blood, that tells us that the person had built up immunity. We also do PCR, which is the same technology used ubiquitously now in COVID diagnostics.

Spectrum: What are some of the informatics tools that you use?

DePierro: The main behemoth behind all of what we do is called LIMS, which is a Laboratory Information Management System. This is the broad system that enables us to track samples coming in, collect the data, and aggregate it. These are off-the-shelf, but highly customizable, and a lot of venders offer them. The one we happen to use for our vaccine trials is by a company called LabWare. There are certain aspects where we just check boxes to configure, but there are other aspects where we're going in and writing complex subroutines and code using a proprietary language called LIMS Basic, which is very similar to a Java or BASIC.

Then we have other tools that enable us to connect the instruments, robotics and statistical modeling. We're heavy users of SAS. That's the bread and butter of many of the algorithms we write that statistically analyze the clinical data to generate final results. We also use R for statistical analysis. And we have many complex instruments and robotics that have their own proprietary applications that need to be configured and have to communicate effectively with our LIMS or other applications.

“From a workforce perspective, it was all-hands-on-deck. There was no weekend, there was no evening. It was all work, and everyone understood that."

Spectrum: How did your job change when COVID hit?

DePierro: We were really compressed on time. When we're going to run a new assay or sample test, we have to program it into our systems and make sure the robotics and instruments are properly configured, and for a PCR assay, for example, this typically takes 6 to 12 months. For COVID, we did it in about two weeks with a team that dropped everything else. A neutralization assay normally takes 24 months because there are complex algorithms to program to match the scientific requirements. We did it in about two months.

We also became a lot better at summarizing the data. This involves cleaning it up and packaging it for the company's leadership and the FDA to look at. For a normal study, we would prepare this maybe two or three times during the course of a long study. With COVID, we had to come up with a way to report data every day for the company's leadership. As we got closer to having something to submit to the FDA, we started reporting it up to four times a day—often overnight.

Spectrum: How were you able to speed up your timeline by so much?

DePierro: It was serendipitous because maybe a year before the pandemic, we had put into place a lot of new informatics infrastructure for one of our Prevnar 20 vaccine trials against pneumonia. In preparing for that we put a tremendous amount of time, energy, money, and resources into improving our LIMS and putting in more servers and optimizing our background processes and robotics so that everything was more efficient.

Once COVID hit, we capitalized on those improvements. And from a workforce perspective, it was all-hands-on-deck. There was no weekend, there was no evening. It was all work, and everyone understood that. My whole team got pulled into COVID straight away, and we grew by probably by 30-40% over the last year. We did as much as we could with the same regulatory compliance and without cutting corners.

“If a new study or update to an existing study is needed to look at variants, theoretically it requires work, but hopefully most of it will leverage what has already been built."

Spectrum:Pfizer was the first company to get authorization from the FDA for its COVID-19 vaccine. How much do you think your team's work contributed to that?

DePierro: We were a huge part of that. The improvements we had made to our infrastructure in the couple of years leading up to the pandemic played a big role. For example, our LIMS is configured to allow for a certain amount of pseudo-parallel processing. So we spent months retooling algorithms in the system, adding additional load balanced servers, retooling the database, splitting processes into backgrounds, and improving general parameters for many of our configured assays, which is an incredible amount of work to do under the strict validation standards we follow. The result was noteworthy. In 2017 we were averaging about 20 concurrent users at peak processing and only several dozen batches of samples a week. By the end of 2019 we had upwards of 150 concurrent users, and the number of batches processed per week exceeded 300. This helped to set us up for success in 2020 and 2021.

Spectrum: When new variants emerge, how does that affect your work?

DePierro: If a new study or update to an existing study is needed to look at variants, theoretically it requires work, but hopefully most of it will leverage what has already been built. Some of the biggest challenges come whenever we add novel assays to an existing study or new study. That then requires that we understand the science of the requirements and how that study depends on new instruments or robotics. We will also continue to have pressures on us to package and report out data at quicker intervals than in the past.

Spectrum: What's next for your team as the pandemic plays out?

DePierro: It's a matter of making things more robust and even more efficient. We've become really good at summarizing data. And people have an expectation now of seeing summarized data often and quickly on novel dashboards. So it's still a lot of weekends and nights, and the problem is how do we dial that back when we're still in a pandemic?

The Conversation (2)
Marinela Profi
Marinela Profi17 Sep, 2021
INDV

As a data scientist, I appreciate the spotlight my colleagues at Pfizer are getting for their work. And being from SAS, I always appreciate the shout-out, too! Keep up the great work. Well deserved!

Basil Latif
Basil Latif08 Sep, 2021
INDV

It's interesting that they're using LIMS Basic to do the data analysis. Great job done by Pfizer and hats off to DePierro.