Descartes Labs Built a Top 500 Supercomputer From Amazon Cloud

Cofounder Mike Warren talks about the future of high-performance computing in a data-rich, cloud computing world

8 min read

Samuel K. Moore is IEEE Spectrum’s semiconductor editor.

Headshot of Mike Warren
Mike Warren
Photo: Daniel Nadelbach

Descartes Labs cofounder Mike Warren has had some notable firsts in his career, and a surprising number have had lasting impact. Back in 1998 for instance, his was the first Linux-based computer fast enough to gain a spot in the coveted Top 500 list of supercomputers. Today, they all run Linux. Now his company, which crunches geospatial and location data to answer hard questions, has achieved something else that may be indicative of where high-performance computing is headed: It’s built the world’s 136th fastest supercomputer using just Amazon Web Services and Descartes Labs’ own software. In 2010, this would have been the most powerful computer on the planet.

Descartes Labs\u2019 Linpack Benchmark scoreScreenshot of Descartes Labs’ Benchmark score using cloud resources on AWS.Photo: Descartes

Notably, Amazon didn’t do anything special for Descartes. Warren’s firm just plunked down US $5,000 on the company credit card for the use of a “high-network-throughput instance block” consisting of 41,472 processor cores and 157.8 gigabytes of memory. It then worked out some software to make the collection act as a single machine. Running the standard supercomputer test suite, called LinPack, the system reached 1,926.4 teraFLOPS (trillion floating point operations per second). (Amazon itself made an appearance much lower down on the Top 500 list a few years back, but that’s thought to have been for its own dedicated system in which Amazon was the sole user rather than what’s available to the public.)

In a blog post announcing the feat, Descartes said:

“We believe that true [high-performance computing, or HPC] applications will eventually migrate over to the cloud en masse… The democratization of HPC is bringing the price point down to something that’s available for business, and we are well positioned to help our customers take advantage of it.”

The company is using HPC and enormous amounts of data to answer questions like—where are all the solar panels on Earth?—and uses the answer to do things like predict solar energy output.IEEE Spectrum spoke to supercomputing pioneer Warren late last year, fresh from receiving the ACM’s Test of Time Award for parallel algorithm work he did about 25 years ago. He explains what Descartes has accomplished and his hopes for cloud-based high-performance computing.

Mike Warren on…

IEEE Spectrum: What was your motivation for founding Descartes?

Mike Warren: Well, I had been at Los Alamos National Laboratories for about 25 years working on cosmology and large simulations on parallel machines and was just seeing a lot of the sort of things I was interested in doing coming together. For instance, we had a project at Los Alamos that was actually looking at ImageNet and deep learning back in 2011, which was before the big advance with deep neural networks that happened in that time frame. So we were right there watching that happen and saw the opportunity. With my background in high-performance computing and analysis of really large amounts of data, both astronomical image data as well as simulation data, there was a real opportunity to go out and find the places where data was really being underutilized because there was so much of it.

It was hard to process, pulling together what had been very separate fields of machine learning, physics, and other modeling required with that data. [We also required expertise in] cloud computing, which just provided tremendous amounts of capability without the big capital investment that had been required earlier to acquire your own machine.

So in December of 2014, the other cofounders and I started the company, and it's just been a rocketship. Since then, we definitely were kind of at the right place at the right time, and it's grown from the 6 founding members now to over 90 people in the company.

IEEE Spectrum: Is the hardware you need for your business available in the cloud to you at the scale that you need it now?

Mike Warren: It is. When we started out, one of the early test problems we did was this: We took all of the existing Landsat satellite data, which is available in Google Cloud, as well as a little bit extra to make it add up to a petabyte of data. Then we processed that in—it was about 16 hours in a continuous run—in Google Cloud, which used about 30,000 cores. So that was just your ordinary Intel multi-core processors, harnessing together 30,000 of those, reading at very high rates, and processing the satellite imagery.

That was using general processing (CPUs). But now for some of the machine or the deep learning applications, there's a bit more efficiency with GPUs, so we use those as well, and we're also looking to the even more special-purpose hardware with Google, the tensor processing units.

Back to top

IEEE Spectrum: Have you had to adapt to the change in the sorts of hardware that was available in the cloud?

Mike Warren: Yes. My personal bias is that if you look at the overall system, it's not the hardware that's really expensive. It's the software. Anytime somebody comes out with a hardware that breaks the software that we've written, that's incredibly expensive. So you really have to be careful about the applications that you're going to move to that sort of special-purpose hardware.

Certainly, if you are Tesla and you know exactly what your computer vision application is going to be, that certainly justifies special hardware and all the rewritten software to go along with that. But for a lot of these applications, dedicating a team of software engineers to port some existing code over to a GPU or a TPU—that's got to be a really big account to justify that investment.

Back to top

IEEE Spectrum: In addition to the hardware side, there's more and more data. Is the quality of the data that's available getting better or is fixing that still a big part of the job?

Mike Warren: To some extent, the sensors are getting better. I forgot who said this, but “sometimes quantity has a quality all of its own.” It’s essentially a Moore's Law, but it's happening in the sensors. So we're just getting exponential growth in the data volume and the number of sensors. And as launch technology improves, it's much, much cheaper to get many satellites up into orbit. That’s contributing to the amount of data. It's very analogous to what was happening with the transition from very expensive computers down to the PC and the explosion in microprocessors.

IEEE Spectrum: So basically, with enough quantity, the quality is good enough for you to provide the answers to the questions you're asking?

Mike Warren: Right. For example, things like clouds are always a big problem in visual satellite imagery. You can work very hard to correct the data that's in a cloud's shadow or partially obscured by clouds. But if you can just have more data—that was an hour later when the cloud moved or the next day when the weather system went through—you can just use much simpler techniques that are more robust and get the same answers.

IEEE Spectrum: So what are the new frontiers of what you're capable of doing? What do you hope to be able to do that you couldn't do even a couple of years ago?

Mike Warren: A lot of it is connecting the questions that businesses are interested in solving and understanding what data's out there to answer the problem. The main point there is it's usually not a single data source that provides the answer. It's a lot of different sources looking at different wavelengths or different imaging modalities.

One of the things we're doing now is moving beyond imagery into things like synthetic-aperture radar. There are satellites that are actually sending down microwave wavelength radiation and then measuring what gets reflected back. And you can do some very interesting things with that data and also combine it with optical data. Synthetic-aperture radar can do things like see through clouds. If you're looking at flooding or damage from a hurricane, it’s often very cloudy, so it takes quite a long time before the optical satellites can see those areas and provide the information you're looking for. But the radar satellites are there right away and can provide more timely information. So this network effect of different types of sensors and tying that all together in a consistent way with a standard geospatial [application programming interface, or API] to access that is really the very exciting frontier we're crossing right now.

Back to top

IEEE Spectrum: What are some of the most interesting use cases that you guys worked on?

Mike Warren: One of the very early applications that we did was that big application in Google Cloud where we ingested the Landsat data. That set us up for modeling the yield of corn and soybeans in the U.S. The first year we did that was the 2015 growing season. [We took] high-resolution imagery where we could identify each field and what crop it was growing, imagery that arrived on a daily timescale at lower resolution to capture the health of those crops over the growing season, and then put together a whole model that could predict the corn yield. We calibrated that versus the really great dataset that the USDA has collected of how much corn is produced in every county in the U.S. every year. That proved our idea that it was possible to do some of these analytics. And we were as good as the USDA, who goes out and actually has people in the field calculating what the yield's going to be. We were able to do that from hundreds of kilometers up in space without ever stepping into a cornfield.

Back to top

IEEE Spectrum: Is high-performance computing progressing at the pace and in the direction that you think it should?

Mike Warren: Well, I mean, in terms of the pace, it's progressing. If you look at the work that I did since I was a graduate student, just large-scale simulations over 25 years, that was a factor of a million in performance on this one particular code. There’s nothing else in the world that has improved by that sort of scale.

Of course, at the CPU scale, things sort of stopped improving. Basically, 15 years ago, you had single core Intel CPUs running at a couple of gigahertz. And that's more or less what we have now. We just have lots of those cores inside of a single CPU. Instead of making the components faster, we're putting more and more of them together. That’s making having good parallel codes that much more important. So I think it's created a dichotomy of applications. There are those that scale well in parallel with not a lot of effort and then there are those that don't. Things like all of the Internet and Web infrastructure is fairly naturally parallel. And you have huge sorts of industries around that.

But other things that are harder to make parallel have really lagged behind, and that's created an opportunity. That’s certainly something that Descartes is taking advantage of—that we can make some of these processing tasks very parallel and quick, so that it takes an hour to do an experiment, rather than a month.

IEEE Spectrum: Can we ask a bit about the Test of Time Award?

Mike Warren: It was for a paper in 1993 for a way to solve the gravitational n-body problem on a parallel machine. We went from writing a code that was intended to run on at most a few hundred processors [then], and [recently] used it in the last project I did at Los Alamos, where we were running on hundreds of thousands of processors. I think the underlying ideas are going to continue to work at exascale. Having codes that can persist that long is a huge advantage, because if you had to rewrite them every time—because the architecture of the GPU changes or something—there aren't enough programmers in the world to keep up with that challenge.

IEEE Spectrum: Have you seen any particularly interesting modern uses of this code?

Mike Warren: Certainly. In [summer 2018], one of my former collaborators worked on the simulation of the giant planetary collision—something like where the moon came from. For this paper, we used the code to calculate what happened to the planet Uranus, because it's flipped over on its side and spins the opposite direction than the rest of the planets. We showed that could actually be explained by a certain type of collision that happened early on in the solar system.

Back to top

The Conversation (0)