Training Engineers With Baseball
A new college textbook teaches introductory statistics through America’s pastime
Steven Cherry: Hi, this is Steven Cherry for IEEE Spectrum’s “Techwise Conversations.”
The numbers 714, 755, and 762 are instantly recognizable to many Americans as the lifetime home run totals hit by Babe Ruth, Hank Aaron, and Barry Bonds. “Point 406,” or just 406 evokes the name “Ted Williams,” the last player to average more than four hits in every 10 at-bats over a full season. Even a rather ordinary number like 56 has a baseball tinge to it—for Joe DiMaggio’s 1941 hitting streak, a 71-year record that no one has ever come close to breaking.
Numbers are the lifeblood of Major League Baseball, in a way that just doesn’t have a correlate in other sports. In fact, they so infuse the game that a few years ago, the data mining of them began to alter the way general managers assemble rosters and managers make decisions on the field—a sea change memorialized in the best-selling book and hit movie Moneyball.
Back when I was growing up, there was a popular book for grade-school kids called How and Why: The Science of Sport that used examples from various sports to teach basic concepts of science: potential energy, angular momentum, speed versus acceleration, weight versus mass. It seems inevitable, in hindsight, that someone would develop an entire college statistics course around the numerology of baseball. And college courses need college textbooks.
Stanley Rothman is a professor of mathematics at Quinnipiac University, in Hamden, Connecticut. His textbook Sandlot Stats: Learning Statistics With Baseball was published this month by the Johns Hopkins University Press. He’s my guest today by phone.
Stanley, welcome to the podcast.
Stanley Rothman: Well I—thank you very much, and I’m glad to be talking to you.
Steven Cherry: The book is new, but you’ve taught the course eight times. What’s different about the class from a regular introductory-level stat course? Is it more than just different examples and homework problems?
Stanley Rothman: You’ve pretty much hit the nail on the head. I’ve taught at Quinnipiac—I’ve been here 43 years, and I’ve taught biostatistics, I’ve taught econometrics. But in the end, when you deal with quantitative data—that’s numerical data—you’ve got numbers. Now, the techniques we use in statistics do not know the meaning of the numbers, so I can give you numbers—24, 32, 16, 18—they could be the number of home-runs a batter hits, they could be the number of patients admitted to emergency rooms, they can be the number of unemployed people in a small city. And they’re just numbers. Now, when you get to techniques, things like descriptive statistics, which involves COPS—which means collecting, organizing, presenting, and summarizing data—is basically the same no matter what area you’re working with. And then when you get to inferential statistics and you do the two major techniques, which would be confidence intervals and hypothesis testing, again it’s just the variables are different. One of the reasons I wrote the book is when I decided to teach this, I could not find a book that taught statistics and used statistics. There are many reference manuals that will use the techniques of statistics but do not teach it. They’re not textbooks. They’re meant to use it, and so I was forced to write this because there was no textbook available. This is a college-level statistics course, and the students have to do a lot of work in Excel, PowerPoint; they must give presentations; plus I have them read Moneyball; plus they write for my blog. It’s a true liberal arts course for nonmajors, basically.
Steven Cherry: This is mainly a book about statistics, but in a college course, frequently statistics and probability are taught together. How do you work in any probability?
Stanley Rothman: Oh yes! I have a whole chapter—I have two chapters, one chapter on probability, and then I do a unique chapter on sports betting, where I actually go, how does Las Vegas and book, how they handle sports betting? And then I go into some tips on how you may be able to advance your odds of winning. And then I bring in the rules of probability, such things as independence, and I give them some tips on—I don’t want to make gamblers out of them—but some knowledge that would help them, even though sports betting is really, can be really frustrating, but there are ways that you can advance your chances of winning. So there are actually two chapters, one on probability and one on sports betting.
Steven Cherry: I have a niece who works for Major League Baseball and would have loved this course in college, but the sport has something of a gender gap. Do you worry that women will feel excluded?
Stanley Rothman: Well, you know something? Today’s different. When I was a student, I wouldn’t expect any females to be in my class. But what I run now is I basically keep the class at between 24 and 26, and I usually get about a 3-to-1 ratio, so roughly 18 men versus 6 women. Because today I have women who are on the softball team, and then there are women who are very much into baseball; they do very well in the course. If this was given 30 or 40 years ago, it would be 26 to 0.
Steven Cherry: So, who’s taking this course? Is it social science students or natural science students or math students?
Stanley Rothman: Okay, that’s a very good question, and at Quinnipiac I would break it down to this, that many of my students, maybe 60 to 70 percent, are in our school of communications. And we offer a sports minor, and I have students in journalism and students who want to go on to work for MLB or ESPN, and they take this course to get the statistics background.
Steven Cherry: You take up some classic questions, the kind a baseball fan can spend an entire 2-hour rain delay debating—for example, which player was the best hitter of all time. How does a question like that get treated in the course and in the book?
Stanley Rothman: Okay. I divide the book in 18 chapters. The first 15 teach what would be taught in any other statistics course, which is descriptive and inferential statistics. The last 3 chapters I call my research chapters, and a lot of it has to do with my research, my own individual research. And one chapter is trying to devise a probability form, for example, for a player duplicating Joe DiMaggio’s 56-game hitting streak. The next chapter asks, Will we ever have another .400 hitter? And you mentioned in your introduction, of course, Ted Williams [in] 1941 hit .406. And my final chapter deals with trying to come up with the 10 best hitters of all time. And all 3 of these chapters use the results of the first 15 chapters, and so the student sees that by the first 15 chapters they can now do research in the area called sabermetrics, which is what we call the area that studies baseball through statistics and mathematics.
Steven Cherry: Yeah, that’s that data mining that I spoke of earlier. So doesn’t this—just to talk about the research for a second—doesn’t this compare apples and oranges? I mean, leaving aside steroids, there’s the dead-ball era, night baseball, the lowering of the pitching mound, expansion teams, interleague play, and World War II carving out what might have been the best three years of Ted Williams’s career.
Stanley Rothman: Well, in fact, we talk about adjustments. Students learn that if you try to compare 1911, when Ty Cobb led the league in home runs with approximately 9, and Barry Bonds, with 73—how do you compare home-run hitters in different times? Well, they learn how to adjust to the year by taking the results of the league for that year, so I teach them to adjust for both the league and the park that…the home-field park. I give very simple adjustments. So this is a good rule of thumb, that when you go to compare, if you have apples to oranges, you must make adjustments. And so that’s one of the things they learn in the course, is that you just can’t just use statistics; you do have to adjust to certain eras and certain time periods.
Steven Cherry: So we might as well not leave our listeners in total suspense. Who was the best hitter of all time?
Stanley Rothman: Well, I have my own theorem in baseball. Of course, there are many sabermetric people who have attacked the problem. I attack it with basically 11 statistics, which are developed throughout. And I break up the…into six eras. I really start at 1910. The reason for that is that data wasn’t really kept well. You had different amounts of base-on-balls; strikeouts weren’t kept. So starting from 1910, I look at about six eras of baseball, and I pick out for each era what I consider the qualifiers as the top hitters: people like Ty Cobb, Babe Ruth, Lou Gehrig, and go on and on, Napoleon Lajoie, and the older players, and of course the newer players as we go decade by decade. And what we do then is we get a list of semifinalists. Then we put them through these 12 statistics, and the answer is the following: My feeling has been, if you develop a system—and there are many systems to do this, mine is just one system—if you don’t come out with Williams and Ruth in some way one and two, there’s a flaw in your system. Now other people may argue that. Fortunately, without fixing it, sure enough, Babe Ruth turned out to be number one, and Ted Williams turned out to be number two. And that’s after all adjustments to the era that he played in and so forth. And then people like Lou Gehrig, Ty Cobb, and the great Rogers Hornsby, which I may mention, if I may. A lot of people do not realize what [was] the great achievement of Rogers Hornsby in the 1920s. For a period of four years, 1921 to 1925, if you look at his cumulative statistics—cumulative—he batted over .400.
Steven Cherry: I wanted to ask you about writing the book. It’s over 600 pages long—did you bite off more than you initially planned to chew?
Stanley Rothman: Well, actually the book was longer. My editor—my idea was to be more than a statistics book, include a lot of the history of baseball. So, in the original draft of the book, I had at the end of each chapter things like the Black Sox scandal of 1919, the Women’s Professional League, the negro leagues, how did the Hall of Fame get to Cooperstown? But if I had included all that, it would be 850 pages! So we reached a compromise. I created a website, which is free, which anybody can go to: www.sandlotstats.com, which is the name of the book. And I pulled out all the stuff that was in the book, so that at the end of each, they can now go to my website, coordinate with it, and get the actual history, the trivia, history of the game of baseball. So along with each original chapter, there’s a PDF file which was originally in the book, but now is on the Web page, and it’s free to anybody to access. I also have a running blog, which my students contribute to, which is running, and I think they’ll find the Web page very interesting.
Steven Cherry: I’ve been to the website, and it is pretty terrific. And we’ll be linking to it on our page about the podcast.
Stanley Rothman: Great.
Steven Cherry: You know, I asked about whether women might feel excluded. I’m wondering: Baseball is a mainly U.S. thing, and I’m wondering if you think there will be many book orders outside of the U.S.?
Stanley Rothman: The answer is yes! Because right now, it’s really international. I mean, the players now are coming from China; they’re coming from Japan, Mexico, of course the Latin American players; and so it’s becoming more and more of an international sport. And as we get more players from foreign countries, the natural attraction of the country through the World Wide Web and satellite—people like Ichiro, I mean, he draws a great audience. I mean, the Yankees picked up Ichiro; they’re picking up a tremendous population that follow him. And the media follows him. If you go to any game, they’re there watching Ichiro and the other Japanese players. So it will sell. And also, even though we don’t translate it yet, [in] so many countries, English is a second language. We have that advantage.
Steven Cherry: I guess you’re a lifelong baseball fan. Do you have any other connection to the sport?
Stanley Rothman: Well, actually, when I was young, I played Little League Babe Ruth; I played freshman ball in high school; and I actually, basically, love all sports, not just baseball. But of course my two sons, I coached—I coached [a] 13- and 14-year-old traveling team, and I was always coaching. My son played, of course, when I was coaching. And so I’ve coached, and I deal with the young people, and I have a love of baseball. And you haven’t asked me: My idol is the great Mickey—the late Mickey Mantle. And to me he was just fabulous, and he was my idol when I was young.
Steven Cherry: Very good. Well I—I wish you luck with it, it’s a terrific book and a very good website.
Stanley Rothman: Well, thank you for letting me talk about it.
Steven Cherry: We’ve been speaking with Stanley Rothman about the importance of numbers in baseball and life, and his new book, Sandlot Stats: Learning Statistics With Baseball, just out this month from Johns Hopkins University Press.
For IEEE Spectrum’s “Techwise Conversations,” I’m Steven Cherry.
Announcer: “Techwise Conversations” is sponsored by National Instruments.
NOTE: Transcripts are created for the convenience of our readers and listeners and may not perfectly match their associated interviews and narratives. The authoritative record of IEEE Spectrum’s audio programming is the audio version.