A Made-For-TV Compression Algorithm

Photo:Isabella Vosmikova/HBO

In a scene from HBO's Silicon Valley, software engineer Richard, played by Thomas Middleditch, and members of fictional startup Pied Piper's engineering team prepare to put the final touches on their compression algorithm; the Post-It notes on the board are part of the Scrum method of product development.

HBO’s Silicon Valley, a television series heading into its second season, tells the story of Richard, a young whiz kid plunged into the wild world of Silicon Valley startups. He forms a company, Pied Piper, and pitches his technology to venture capitalists and at a startup competition even as he and his cohorts struggle to improve their product, scribbling on whiteboards during long meetings. The technology is practically a character of its own in the show.

So the creators of Silicon Valley needed to “cast” the right technology. Early on, they settled on a piece of software—a universal compression algorithm. They were looking for what venture capitalists call “deep tech,” some kind of core technology that could be part of many different kinds of products or affect multiple industries. (The original working title of the show was “Deep Tech.”) They also needed a technology whose usefulness could be easily understood by a lay audience—and compression is a simple concept to comprehend, though not always easy to execute.

The creators had already hired a tech advisor with Silicon Valley startup experience, Jonathan Dotan. But Dotan isn’t an expert in compression, and the show’s creators wanted everything technical about the show to seem as real as possible. And so Dotan turned to Google to find an expert on compression and landed on information about a class taught by Stanford professor Tsachy Weissman.

Photo: Tekla Perry
Stanford Professor Tsachy Weissman

Dotan sent Weissman an email, asking to chat with him about an upcoming TV series. Though Weissman often doesn't get around to looking at unsolicited emails, he opened that one, and was immediately intrigued. He quickly tossed out a number of ideas involving genomic data compression algorithms, lossy compression, and denoising, but kept coming back to a sort of Holy Grail in the compression world—a form of lossless compression far more powerful and efficient than anything that exists today, that could work on any type of data, and could be searchable, that is, could be decompressed in small chunks.

The busy Weissman brought in Vinith Misra, a student working on his Ph.D., to flesh out the details of the fictional algorithm.

“We had to come up with an approach that isn’t possible today, but it isn’t immediately obvious that it isn’t possible,” says Misra. “That is, something that an expert would have to think about for a while before realizing that there is something wrong with it.” It would pass, he said, a Powerpoint test, that is, in a Powerpoint presentation you could convince even technically knowledgeable people that it might be theoretically possible; it’s just when you sit down to build it that you run into insurmountable problems.

Misra explained that even though the goal was to create a lossless compression algorithm, he started out by looking at lossy compression, simply because these algorithms, involving transformations performed on data, are more visually compelling when you illustrate them on a whiteboard. (Lossy compression algorithms throw out parts of the file, so the original file can only be approximately reconstructed. Lossless compression algorithms can recreate all of the information in the original file.) He came up with the idea of using lossy compression techniques to compress the original file, then calculating the difference between the approximate reconstruction and the original file and compressing that data; by combining the two pieces, you have a lossless compressor.

This type of approach can work, Misra said, but, in most cases, would not be more efficient than a standard lossless compression algorithm, because coding the error usually isn’t any easier than coding the data. On the surface, however, it’s an interesting idea, and one that can survive at least a cursory analysis.

That wasn’t the end of Misra’s work, however. Besides the original algorithm that launches the young entrepreneur, the creators of HBO Silicon Valley needed to later reveal a breakthrough improvement in that algorithm. This breakthrough fuels the plotline of the season finale, in which Pied Piper's giant competitor Hooli reverse engineers the original algorithm and is poised to crush the startup.

The show creators had one constraint on that “breakthrough.” Somehow, the new algorithm had to connect to an elaborate joke that included the phrase "middle out."  Misra later published a 12-page mathematical analysis of the scenario the joke sets up. (You can read the paper here, but be warned that if it were a movie it would be rated R for explicit content.)  He recently received his Ph.D. and has accepted a job in IBM’s Watson group, and has mixed feelings about the fact that this paper may turn out to be the most widely read technical document he’ll ever write.

For the breakthrough technology Misra was thinking big. “We wanted to be in the history of [data compression] giants,” he says, like Shannon, Huffman, Lempel, and Ziv.”

Shannon codes model data in a treelike structure, working from the roots to the leaves. Huffman codes, used today in media files like MP3 and JPEG, perform better by working from leaves to roots. Lempel and Ziv’s famous algorithm digests data in a stream, working from one end to another, constructing a model for the data as it progresses. Misra’s algorithm, as the HBO’s writers suggested, works from the middle of the data stream outwards, bouncing around the data to find any hidden structure and compressing it at the same time.

This kind of algorithm doesn’t exist today, but, says Weissman, it is certainly reasonable to think that someone will come up an approach to compression that will completely disrupt current standards by modeling data in new ways. Thinking about this possibility made him wonder, at least for a moment, if he should be starting a compression company like Pied Piper rather than consulting for a fictional one; but he’s tabled that idea, at least for now.

Weissman says he’s gotten all sorts of reactions from colleagues to his work on the show. There are those who believe that raising the visibility of compression technology is great for the field, and will draw more engineers to work in compression and more students to the study of information theory. And there are those who think he’s tarnished his name by associating it with a sophomoric comedy.

Misra and Weissman’s work on HBO is not over. The fictional engineers on the show will likely face new competition, and have to improve the algorithm to beat it. “Perhaps we’ll look more closely into local decodability,” Weissman said, “finding a technique in which the amount of compressed data needed to reconstruct a small chunk of the original data would be proportional to the size of the data I’m trying to retrieve.” That’s a technology that would be incredibly useful, particularly, Weissman said, in the world of genomics. Whether such schemes exist is still an open research question, but, he says, “I believe it might be possible, with little compromise on the overall compression.”

To read about a compression algorithm metric developed for the show, see "A Fictional Compression Algorithm Moves Into the Real World."

Advertisement

View From the Valley

IEEE Spectrum’s blog featuring the people, places, and passions of the world of technologists in Silicon Valley and its environs.
Contact us:  t.perry@ieee.org

Senior Editor
Tekla Perry
Palo Alto
Advertisement