# Model Predicts Whether NFL Teams Will Run or Pass

## North Carolina State University undergrads create a tool that guesses what an offense with do with the ball with up to 91.6% accuracy

Teddy Bridgewater (#5), the quarterback of of the Minnesota Vikings, eludes the pursuit of Pittsburgh Steelers defenders during the NFL Hall of Fame Game on 9 August 2015 in Canton, Ohio.
Photo: Joe Robbins/Getty Images

National Football League (NFL) playbooks are the size of telephone books. They’re filled with dozens and dozens of plays, each designed so that a team can play to its strengths while taking advantage of its opponents’ weaknesses. Despite the endless variations, they all basically boil down to two options for the offense: pass or run. No matter how intricately designed an offensive play is, if the defense can sniff out whether the ball will be tossed down field or toted along the ground, it gains a tremendous advantage. (Yes, we know that teams punt and kick field goals and extra points after touchdowns. But we’re not talking about that right now.)

Earlier this week, a pair of statisticians from North Carolina State University showed off a model they built that predicts whether a specific team will call a passing or running play with a high degree of accuracy. They presented the model  in Seattle at JSM 2015, a joint conference of statistical and mathematical societies.

William Burton, an undergraduate who is majoring in industrial engineering and minoring in statistics, and Michael Dickey, who graduated in May with a degree in statistics, used a listing of actual NFL offensive plays from the 2000 through 2014 seasons that had been compiled by a company called Armchair Analysis to figure out the ratio of passes to runs. They showed empirically what fans already understood anecdotally: the aerial attack is being utilized ever more frequently. Pass plays were called on 56.7 percent of the time in 2014, compared with 54.4 percent in 2000.

But what makes a team decide whether to run or throw? Burton and Dickey looked at a host of factors that affect a team's play selection. Among these are: the distance to the first-down marker, whether it’s first, second, third or fourth down, how much time is left on the game clock, the team’s score in relation to its opponent’s, and field position. For example, there’s a high probability that the coach will opt for a passing play if the other team is leading by three points, there’s a minute left in the fourth quarter, the offense is facing third down at its own 30-yard line, and needs to advance 7 yards to pick up a fresh set of downs. On the other hand, a team that’s leading by 7 points, facing the same down and distance at the same point in the game, might very likely run the ball (to avoid an interception and to take time off the clock so the other team can’t mount a score-tying drive before time runs out).

For their system, Burton and Dickey developed logistic regression models—methods used to, for example, predict if someone will default on a mortgage—and random forest models—a machine learning method. But they quickly realized that teams’ strategies differ significantly in each of a game’s quarters. To account for that, they produced six separate logistic regression models: one each for the first, second, and third quarters, plus one for the fourth quarter if the offensive team is winning, another if it is losing, and a third for when the score is tied. They tested their models on 20 randomly selected games. Overall, the models accurately predicted pass or run on 75 percent of downs. The models’ best performance was related to a 2014 game between the Jacksonville Jaguars and Dallas Cowboys. Their predictions proved correct on 109 out of 119 offensive plays—a 91.6-percent accuracy rate.

Burton and Dickey say that anyone, including NFL coaches and fans rooting for their teams at home, can use the tool to make educated guesses about what will happen each time the ball is snapped.

The Conversation (0)

## From WinZips to Cat GIFs, Jacob Ziv’s Algorithms Have Powered Decades of Compression

### The lossless-compression pioneer received the 2021 IEEE Medal of Honor

Vertical
Photo: Rami Shlush
Yellow

Lossless data compression seems a bit like a magic trick. Its cousin, lossy compression, is easier to comprehend. Lossy algorithms are used to get music into the popular MP3 format and turn a digital image into a standard JPEG file. They do this by selectively removing bits, taking what scientists know about the way we see and hear to determine which bits we'd least miss. But no one can make the case that the resulting file is a perfect replica of the original.

Not so with lossless data compression. Bits do disappear, making the data file dramatically smaller and thus easier to store and transmit. The important difference is that the bits reappear on command. It's as if the bits are rabbits in a magician's act, disappearing and then reappearing from inside a hat at the wave of a wand.

The world of magic had Houdini, who pioneered tricks that are still performed today. And data compression has Jacob Ziv.

In 1977, Ziv, working with Abraham Lempel, published the equivalent of Houdini on Magic: a paper in the IEEE Transactions on Information Theory titled “A Universal Algorithm for Sequential Data Compression." The algorithm described in the paper came to be called LZ77—from the authors' names, in alphabetical order, and the year. LZ77 wasn't the first lossless compression algorithm, but it was the first that could work its magic in a single step.

### Jacob Ziv

RAMI SHLUSH

Current job: Technion Distinguished Professor Emeritus, Faculty of Electrical Engineering

Date of birth: 27 November 1931

Birthplace: Tiberias, British-ruled Palestine (now Israel)

Height: 172 centimeters

Family: Married to Shoshana, four children, nine grandchildren

Education: BSc, Dip-Eng, MSc, all in electrical engineering from Technion, in 1954, 1955, 1957; Sc.D, MIT, 1962

Favorite books: Detective stories, particularly those featuring Perry Mason

Favorite kind of music: classical, particularly Bach; jazz

Favorite food: Falafel, ice cream

How he starts the day: A cup of espresso and a piece of dark chocolate

Favorite movie: Casablanca (1942)

Organizational memberships: Israel Academy of Science and Humanities, U.S. National Academy of Engineering, U.S. National Academy of Sciences, American Philosophical Society, IEEE Fellow

Major awards: IEEE Medal of Honor “for fundamental contributions to information theory and data compression technology, and for distinguished research leadership"; BBVA Foundation Frontiers of Knowledge Award; Claude E. Shannon Award of the IEEE Information Theory Society

The following year, the two researchers issued a refinement, LZ78. That algorithm became the basis for the Unix compress program used in the early '80s; WinZip and Gzip, born in the early '90s; and the GIF and TIFF image formats. Without these algorithms, we'd likely be mailing large data files on discs instead of sending them across the Internet with a click, buying our music on CDs instead of streaming it, and looking at Facebook feeds that don't have bouncing animated images.

Ziv went on to partner with other researchers on other innovations in compression. It is his full body of work, spanning more than half a century, that earned him the 2021 IEEE Medal of Honor “for fundamental contributions to information theory and data compression technology, and for distinguished research leadership."

Ziv was born in 1931 to Russian immigrants in Tiberias, a city then in British-ruled Palestine and now part of Israel. Electricity and gadgets—and little else—fascinated him as a child. While practicing violin, for example, he came up with a scheme to turn his music stand into a lamp. He also tried to build a Marconi transmitter from metal player-piano parts. When he plugged the contraption in, the entire house went dark. He never did get that transmitter to work.

When the Arab-Israeli War began in 1948, Ziv was in high school. Drafted into the Israel Defense Forces, he served briefly on the front lines until a group of mothers held organized protests, demanding that the youngest soldiers be sent elsewhere. Ziv's reassignment took him to the Israeli Air Force, where he trained as a radar technician. When the war ended, he entered Technion—Israel Institute of Technology to study electrical engineering.

After completing his master's degree in 1955, Ziv returned to the defense world, this time joining Israel's National Defense Research Laboratory (now Rafael Advanced Defense Systems) to develop electronic components for use in missiles and other military systems. The trouble was, Ziv recalls, that none of the engineers in the group, including himself, had more than a basic understanding of electronics. Their electrical engineering education had focused more on power systems.

“We had about six people, and we had to teach ourselves," he says. “We would pick a book and then study together, like religious Jews studying the Hebrew Bible. It wasn't enough."

The group's goal was to build a telemetry system using transistors instead of vacuum tubes. They needed not only knowledge, but parts. Ziv contacted Bell Telephone Laboratories and requested a free sample of its transistor; the company sent 100.

“That covered our needs for a few months," he says. “I give myself credit for being the first one in Israel to do something serious with the transistor."

In 1959, Ziv was selected as one of a handful of researchers from Israel's defense lab to study abroad. That program, he says, transformed the evolution of science in Israel. Its organizers didn't steer the selected young engineers and scientists into particular fields. Instead, they let them pursue any type of graduate studies in any Western nation.

“In order to run a computer program at the time, you had to use punch cards and I hated them. That is why I didn't go into real computer science."

Ziv planned to continue working in communications, but he was no longer interested in just the hardware. He had recently read Information Theory (Prentice-Hall, 1953), one of the earliest books on the subject, by Stanford Goldman, and he decided to make information theory his focus. And where else would one study information theory but MIT, where Claude Shannon, the field's pioneer, had started out?

Ziv arrived in Cambridge, Mass., in 1960. His Ph.D. research involved a method of determining how to encode and decode messages sent through a noisy channel, minimizing the probability and error while at the same time keeping the decoding simple.

“Information theory is beautiful," he says. “It tells you what is the best that you can ever achieve, and [it] tells you how to approximate the outcome. So if you invest the computational effort, you can know you are approaching the best outcome possible."

Ziv contrasts that certainty with the uncertainty of a deep-learning algorithm. It may be clear that the algorithm is working, but nobody really knows whether it is the best result possible.

While at MIT, Ziv held a part-time job at U.S. defense contractor Melpar, where he worked on error-correcting software. He found this work less beautiful. “In order to run a computer program at the time, you had to use punch cards," he recalls. “And I hated them. That is why I didn't go into real computer science."

Back at the Defense Research Laboratory after two years in the United States, Ziv took charge of the Communications Department. Then in 1970, with several other coworkers, he joined the faculty of Technion.

Jacob Ziv (with glasses), who became chair of Technion's electrical engineering department in the 1970s, worked earlier on information theory with Moshe Zakai. The two collaborated on a paper describing what became known as the Ziv-Zakai bound.Photo: Jacob Ziv/Technion

There he met Abraham Lempel. The two discussed trying to improve lossless data compression.

The state of the art in lossless data compression at the time was Huffman coding. This approach starts by finding sequences of bits in a data file and then sorting them by the frequency with which they appear. Then the encoder builds a dictionary in which the most common sequences are represented by the smallest number of bits. This is the same idea behind Morse code: The most frequent letter in the English language, e, is represented by a single dot, while rarer letters have more complex combinations of dots and dashes.

Huffman coding, while still used today in the MPEG-2 compression format and a lossless form of JPEG, has its drawbacks. It requires two passes through a data file: one to calculate the statistical features of the file, and the second to encode the data. And storing the dictionary along with the encoded data adds to the size of the compressed file.

Ziv and Lempel wondered if they could develop a lossless data-compression algorithm that would work on any kind of data, did not require preprocessing, and would achieve the best compression for that data, a target defined by something known as the Shannon entropy. It was unclear if their goal was even possible. They decided to find out.

Ziv says he and Lempel were the “perfect match" to tackle this question. “I knew all about information theory and statistics, and Abraham was well equipped in Boolean algebra and computer science."

The two came up with the idea of having the algorithm look for unique sequences of bits at the same time that it's compressing the data, using pointers to refer to previously seen sequences. This approach requires only one pass through the file, so it's faster than Huffman coding.

Ziv explains it this way: “You look at incoming bits to find the longest stretch of bits for which there is a match in the past. Let's say that first incoming bit is a 1. Now, since you have only one bit, you have never seen it in the past, so you have no choice but to transmit it as is."

“But then you get another bit," he continues. “Say that's a 1 as well. So you enter into your dictionary 1-1. Say the next bit is a 0. So in your dictionary you now have 1-1 and also 1-0."

Here's where the pointer comes in. The next time that the stream of bits includes a 1-1 or a 1-0, the software doesn't transmit those bits. Instead it sends a pointer to the location where that sequence first appeared, along with the length of the matched sequence. The number of bits that you need for that pointer is very small.

“Information theory is beautiful. It tells you what is the best that you can ever achieve, and (it) tells you how to approximate the outcome."

“It's basically what they used to do in publishing TV Guide," Ziv says. “They would run a synopsis of each program once. If the program appeared more than once, they didn't republish the synopsis. They just said, go back to page x."

Decoding in this way is even simpler, because the decoder doesn't have to identify unique sequences. Instead it finds the locations of the sequences by following the pointers and then replaces each pointer with a copy of the relevant sequence.

The algorithm did everything Ziv and Lempel had set out to do—it proved that universally optimum lossless compression without preprocessing was possible.

“At the time they published their work, the fact that the algorithm was crisp and elegant and was easily implementable with low computational complexity was almost beside the point," says Tsachy Weissman, an electrical engineering professor at Stanford University who specializes in information theory. “It was more about the theoretical result."

Eventually, though, researchers recognized the algorithm's practical implications, Weissman says. “The algorithm itself became really useful when our technologies started dealing with larger file sizes beyond 100,000 or even a million characters."

“Their story is a story about the power of fundamental theoretical research," Weissman adds. “You can establish theoretical results about what should be achievable—and decades later humanity benefits from the implementation of algorithms based on those results."

Ziv and Lempel kept working on the technology, trying to get closer to entropy for small data files. That work led to LZ78. Ziv says LZ78 seems similar to LZ77 but is actually very different, because it anticipates the next bit. “Let's say the first bit is a 1, so you enter in the dictionary two codes, 1-1 and 1-0," he explains. You can imagine these two sequences as the first branches of a tree."

“When the second bit comes," Ziv says, “if it's a 1, you send the pointer to the first code, the 1-1, and if it's 0, you point to the other code, 1-0. And then you extend the dictionary by adding two more possibilities to the selected branch of the tree. As you do that repeatedly, sequences that appear more frequently will grow longer branches."

“It turns out," he says, “that not only was that the optimal [approach], but so simple that it became useful right away."

Jacob Ziv (left) and Abraham Lempel published algorithms for lossless data compression in 1977 and 1978, both in the IEEE Transactions on Information Theory. The methods became known as LZ77 and LZ78 and are still in use today.Photo: Jacob Ziv/Technion

While Ziv and Lempel were working on LZ78, they were both on sabbatical from Technion and working at U.S. companies. They knew their development would be commercially useful, and they wanted to patent it.

“I was at Bell Labs," Ziv recalls, “and so I thought the patent should belong to them. But they said that it's not possible to get a patent unless it's a piece of hardware, and they were not interested in trying." (The U.S. Supreme Court didn't open the door to direct patent protection for software until the 1980s.)

However, Lempel's employer, Sperry Rand Corp., was willing to try. It got around the restriction on software patents by building hardware that implemented the algorithm and patenting that device. Sperry Rand followed that first patent with a version adapted by researcher Terry Welch, called the LZW algorithm. It was the LZW variant that spread most widely.

Ziv regrets not being able to patent LZ78 directly, but, he says, “We enjoyed the fact that [LZW] was very popular. It made us famous, and we also enjoyed the research it led us to."

One concept that followed came to be called Lempel-Ziv complexity, a measure of the number of unique substrings contained in a sequence of bits. The fewer unique substrings, the more a sequence can be compressed.

This measure later came to be used to check the security of encryption codes; if a code is truly random, it cannot be compressed. Lempel-Ziv complexity has also been used to analyze electroencephalograms—recordings of electrical activity in the brain—to determine the depth of anesthesia, to diagnose depression, and for other purposes. Researchers have even applied it to analyze pop lyrics, to determine trends in repetitiveness.

Over his career, Ziv published some 100 peer-reviewed papers. While the 1977 and 1978 papers are the most famous, information theorists that came after Ziv have their own favorites.

For Shlomo Shamai, a distinguished professor at Technion, it's the 1976 paper that introduced the Wyner-Ziv algorithm, a way of characterizing the limits of using supplementary information available to the decoder but not the encoder. That problem emerges, for example, in video applications that take advantage of the fact that the decoder has already deciphered the previous frame and thus it can be used as side information for encoding the next one.

For Vincent Poor, a professor of electrical engineering at Princeton University, it's the 1969 paper describing the Ziv-Zakai bound, a way of knowing whether or not a signal processor is getting the most accurate information possible from a given signal.

Ziv also inspired a number of leading data-compression experts through the classes he taught at Technion until 1985. Weissman, a former student, says Ziv “is deeply passionate about the mathematical beauty of compression as a way to quantify information. Taking a course from him in 1999 had a big part in setting me on the path of my own research."

He wasn't the only one so inspired. “I took a class on information theory from Ziv in 1979, at the beginning of my master's studies," says Shamai. “More than 40 years have passed, and I still remember the course. It made me eager to look at these problems, to do research, and to pursue a Ph.D."

In recent years, glaucoma has taken away most of Ziv's vision. He says that a paper published in IEEE Transactions on Information Theory this January is his last. He is 89.

“I started the paper two and a half years ago, when I still had enough vision to use a computer," he says. “At the end, Yuval Cassuto, a younger faculty member at Technion, finished the project." The paper discusses situations in which large information files need to be transmitted quickly to remote databases.

As Ziv explains it, such a need may arise when a doctor wants to compare a patient's DNA sample to past samples from the same patient, to determine if there has been a mutation, or to a library of DNA, to determine if the patient has a genetic disease. Or a researcher studying a new virus may want to compare its DNA sequence to a DNA database of known viruses.

“The problem is that the amount of information in a DNA sample is huge," Ziv says, “too much to be sent by a network today in a matter of hours or even, sometimes, in days. If you are, say, trying to identify viruses that are changing very quickly in time, that may be too long."

The approach he and Cassuto describe involves using known sequences that appear commonly in the database to help compress the new data, without first checking for a specific match between the new data and the known sequences.

“I really hope that this research might be used in the future," Ziv says. If his track record is any indication, Cassuto-Ziv—or perhaps CZ21—will add to his legacy.

This article appears in the May 2021 print issue as “Conjurer of Compression."

Related Articles Around the Web