Data Science Is Now a Job Market Based Entirely on Merit
A start-up ranks data scientists and creates competitions between them for specific consulting projects
Hi, this is Steven Cherry for IEEE Spectrum’s Techwise Conversations.
When you need to hire a professional these days—a programmer, a doctor, a lawyer—it can be hard to choose. They sort of rank themselves, by their fees; generally the better ones are more expensive, but that’s a pretty inexact rule of thumb. What if you could rank them independent of price?
Then too, they’re not exactly interchangeable—you don’t need just any doctor, you need an oncologist, you don’t need any lawyer, you need a bankruptcy attorney. In fact, you need the best person for your situation, which, darn it, isn’t exactly like anyone else’s. What if you could get them bidding to solve your particular problem—and telling you exactly how they would solve it?
There’s one market where this is actually happening—the market for data scientists, the sort of mathematicians we used to call statisticians, until data became big and sexy, like the way we renamed the Patagonian toothfish Chilean sea bass.
My guest today is Anthony Goldbloom, founder and CEO of Kaggle, which describes itself as “the world's largest community of data scientists. They compete with each other to solve complex data science problems, and the top competitors win interesting projects from some of the world’s biggest companies.”
Goldbloom himself is a data scientist, with a degree in economics and econometrics from the University of Melbourne. Before starting Kaggle, he did macroeconomic modeling for the Reserve Bank of Australia and the Australian Treasury. He joins us by phone.
Steven Cherry: Anthony, welcome to the podcast.
Anthony Goldbloom: Thank you for having me, Steven.
Steven Cherry: Let’s start with the data. The Kaggle website says “many organizations don’t have access to the machine learning and statistical techniques that would allow them to extract maximum value from their data.” And it lists some customers. One of them was an unnamed $10 billion beverage company. What was the problem, and what was the data?
Anthony Goldbloom: So this beverage company sells beverages through different outlets, and the worst thing, one of the biggest ways that they’d lose revenue is if they don’t adequately supply each retailer with enough of the stock that customers are demanding. And so what they wanted was a model that would very, very accurately predict demand on different days based on things like, what’s the weather going to be like on that day, is there going to be a football game, or a sports game, a sports event on, you know, all sorts of characteristics. Because their rate of, the number of out of stock situations they had was a little higher than they wanted. So what we built for them was a very accurate demand forecasting algorithm
Steven Cherry: So that wasn’t simply a problem, or a question to be answered, it was an actual algorithm that they would use going forward.
Anthony Goldbloom: Correct, yes. In having that algorithm developed, you do get some answers as to what are the things that contribute most to fluctuations in demand, for instance. So the deliverable on a project like that is an algorithm to use on a going forward basis, as well as some insight into the factors that matter most for fluctuations in demand.
Steven Cherry: There was also a project to improve the accuracy of airline departure and arrival times. Now this was for GE, which isn’t an airline.
Anthony Goldbloom: No. GE has a big aviation division. It was with GE and Alaskan Airlines, as a partnership. Now GE, a little known fact is that a lot of GE’s business is now in services, not just selling hardware. Now in that case, if you can very accurately predict flight arrival times based on where a plane is at a given point in time, what the weather is, how much traffic there is between that plane and the destination, you can more accurately predict when that flight’s going to arrive, and the utility of that is, a lot happens when a plane hits the runway. You’ve got the folk who tow the plane to the gate, you’ve got the folk who clean the plane, you’ve got the people who stock it with food and fuel for the next flight, and so if you can very accurately predict when a flight is going to arrive, then you can better optimize your resource, an airline can better optimize the ground crew to make sure that they’re not waiting around idle if a flight is going to be quite a lot longer than was expected. They can be rerouted to another flight. Or another plane.
Steven Cherry: And so here again the deliverable was an algorithm.
Anthony Goldbloom: Correct.
Steven Cherry: I’d like to do one more, but I have trouble choosing, there are so many interesting ones. There’s Microsoft, and apparently the data involved gesture recognition on the xbox, I guess the connect, the Ford Motor company I guess was studying driver drowsiness. Why don’t you pick another one and just tell us about it.
Anthony Goldbloom: One I really like was an algorithm that we built for the Hewlett Foundation. What they wanted to do, the Hewlett Foundation, one of their areas that they invest a lot in is education. They wanted an algorithm that can mathematically score essays written by high school students. So what they did was they collected 22,000 essays from around America, each of them had been graded by two teachers, and participants had to build an algorithm that matched the grade of the two teachers. Now this result was absolutely breathtaking. We were skeptical that the state of the art in natural language processing and machine learning could do a good job on this problem. It turned out that teachers are either incredibly inconsistent in the grades that they give, so two teachers will give very different grades for the same essay, such that the best algorithms were able to, if you look at the discrepancy between either teacher and the best algorithm, it was about the same as the discrepancy between the two teachers. So the algorithm was just about as reliable as the teachers, with the one caveat being that you can game an algorithm, you can’t…it’s more difficult to game a teacher. You know, you stuff an essay with big words, and an algorithm might give you a higher grade. Although the caveat to that caveat is that any student who can game an essay scoring algorithm probably does deserve good grades.
Steven Cherry: So tell us how the competitions work, first.
Anthony Goldbloom: So with the competition what we’ll do is we’ll take a data set from a company or researcher, and we’ll put it up on our website. We have now 93 000 data scientists, or statisticians, as you called them earlier, who compete on the site, and what they’ll do is they’ll download the data set, they’ll build an algorithm locally on their home computer. When they’re happy with the algorithm that they’ve built, they’ll upload the output of that algorithm, which we will score in real time against historical data. So, for instance, the GE aviation example, which you mentioned before, predicting flight arrival times, we were using historical data to validate their model, so how well would a particular model have done of predicting flight arrival times in November of 2012? We then grade all the algorithms based on how they would have done in November of 2012, we test them on a second holdout data set, just to make sure that it wasn’t a fluke that they performed well on the first holdout data set, and then prize money is awarded to the person who comes up with the most accurate algorithm. And in exchange for the prize money, they hand over the IP to the company.
Steven Cherry: There’s a sort of collaborative aspect to this too, right? People see each other’s algorithms?
Anthony Goldbloom: No, actually they don’t. What they do see is they see each other’s scores. So we give people feedback on a live leader board as to how well they’re performing. So I might have got you know, 95 percent of cases correct, you got 98 percent of cases correct. Therefore you’re doing better than me. It’s actually really interesting the effect that it has, giving people feedback on a live leader board. You know, the competition can be going along, with people getting around 50, 51, 52 percent accuracy, and then somebody makes a big breakthrough, and they get up to 70 percent. And almost immediately others match that performance. We see this effect happening all the time. We call the effect the Roger Bannister Effect, after the British runner who broke the 4 minute mile. In 1954, the world record for the mile race was 4:01, and it had been that for 10 years. Nobody had been able to break it. In 1954, Roger Bannister broke it, and then 46 days later John Landy broke it, and before long everybody was breaking it. It just is something about the psychology of knowing what’s possible. Of course there’s some people to match the performance of the front runner. And that turns out to be one really powerful reason why competitions are so effective. Because you know, you’re not just working in isolation wondering if you’ve done everything you can, you have a benchmark to go after at all times.
Steven Cherry: That’s pretty interesting, and it leads us straight to the other half of the equation here, which is the data scientists themselves. Now your website says that they ‘crave real world data to develop and refine their techniques.’ Now who are these data scientists, and how do you find them, or how do they find Kaggle?
Anthony Goldbloom: So our data scientists come from three main categories. Now the first is academics, so if you’re an applied statistician, or an applied data sci machine learner, you want access to real-world data sets that you can, you know, get a sense for the kind of questions companies are interested in solving, and you can benchmark your techniques against others. Second category are people that work in companies, they already have jobs, and generally we find that a common profile is you might be the best statistician or data scientist at your company, and so you’re not learning quite as much from your colleagues as you would like, but you’re ambitious and curious, and you like puzzles, so Kaggle is a really good way to come on and compete with some of the world’s best, and still improve. And by the way, just on that, we’ve had people who contributed to IBM Watson competing on Kaggle, we’ve got the people who developed Trueskill, the Xbox rating system, competing on Kaggle, so when I say the world’s best competing on the site, they really are. And the third category is the most interesting. So this is a category of people who are starting to leave their full time jobs because they’re making a full-time living through the site. And this is becoming more and more prominent as we launch the one-on-one matching, or the Kaggle Connect service. And it’s kind of exciting to see this happening. So these are people who are, you know, we have a ranking system, all our data scientists are ranked from 1 to 93 000, and based on that they’re getting work through Kaggle Connect. It’s very exciting to be able to provide these people with, you know, our data scientists with a full-time income.
Steven Cherry: So let’s talk about the rankings first. I guess they’re largely based on the competitions that they participate in?
Anthony Goldbloom: Correct. So it’s just like golf. You continually win golf tournaments, then you become the world’s #1 ranked golfer. Same with tennis, and we have a system that basically takes people’s competition performances and ranks them on that basis.
Steven Cherry: Now yeah, I guess chess and bridge work the same way too, kind of similar. And so, apparently one’s Kaggle number is becoming quite a thing for data scientists.
Anthony Goldbloom: Yeah, it’s been really neat. We’ve started to see job companies asking data scientists when they apply for a job for their Kaggle ranking. The latest, it got a write-up recently that the New York Times were hiring a data scientist, and they put in their job requirements, that they wanted to see people’s Kaggle rankings. So this is a really exciting development. It’s becoming, yeah, it’s becoming a recognized credential for data scientists.
Steven Cherry: And the people with the higher ranks get to charge a lot more money.
Anthony Goldbloom: Yeah, on Kaggle Connect, this one-on-one matching service, we started out with just a flat rate, but we’ve only opened it up to the top .5 percent, so of the 92,000 data scientists on Kaggle, only the top .5 percent at the moment are eligible for Kaggle Connect. We will eventually roll it out to the rest of the community, but not quite yet. And differential pricing will probably be a feature of it when we do roll it out more broadly.
Steven Cherry: An article in the Atlantic recently quotes you as speculating about doing the same thing that you’re doing for data scientists for doctors and lawyers, were their examples in the beginning. What about engineers as well?
Anthony Goldbloom: So my basic view on this is that data science is unusually well suited to objective ranking, because you can measure a data scientist’s predictions against real world outcomes. In that article, I think I was, to that journalist, speaking about trial lawyers possibly being in a similar situation, because you can objectively measure somebody’s success in winning a trial, or losing a trial, and how many they win and lose. I’m not sure. I guess, you know, different fields of engineering are so different that it’s very hard to generalize. I suspect there are probably clever ways to institute a system like this for different branches of engineering. But I don’t know. As for the profession as a whole, I find it hard to imagine that there would be one such system that would cover all engineers.
Steven Cherry: The first kind of market for solving tasks I ever saw was Amazon [Mechanical] Turk, which was something Amazon set up for simple tasks, such as translating a document, or deciphering a store name from a scanned receipt. Is Kaggle an Amazon Turk for PhD.s?
Anthony Goldbloom: Yeah. It’s an interesting way to think about it. Amazon Mechanical Turk is at one end of the value chain. It’s, as you say, a very high volume, low value tasks. We’re at the total other end of the spectrum, I mean, there are probably only a handful of people in the world who can build an algorithm that can score, grade high school essays as well as teachers can. And so they’re certainly at two ends of the value chain, or the value spectrum.
Steven Cherry: Taking a kind of broad look at things, computing is so important to everything we do these days, and the classic von Neumann architecture divides things into programs and data, programs and data, programs and data. Programmign has become something that hundreds of millions of people can do, to varying degrees. It’s something that maybe only hundreds of people did 60 years ago. Do you think data science is going to become something that everybody learns to some degree?
Anthony Goldbloom: Yeah. There’s this wonderful quote from a former head of the Royal Statistical Society, a fellow called Adrian Smith. He said this, I think it was in the inaugural address as the president of that society. He said that it’s a real triumph to statistics that being judged competent in statistics is so important as far as being judged competent in your particular field of endeavor. I think he was really talking about the academic world, 30 years ago. Or 15 years ago if you’re a biologist, you had to be able to dissect things, or look through a microscopes. Today you’ve got to be able to correlate gene sequences with phenotypes, or other bio-markers with phenotypes. You know, 30 years ago if you’re an astronomer you’d look through telescopes and try to find things. Today telescopes snap images of the universe and pass those through computer algorithms, and data science algorithms, and try and find anomalies, or things of interest. So I actually already, to be honest with you, think we’re already starting to see it, particularly in academia. I think it’s also starting to happen in industry as well. I think there are some very high-profile case studies where a dispassionate data scientist, or dispassionate analysis by a data scientist has outdone expert judgment. I know you’ve spoken about Nate Silver in past episodes, and what he was able to do, predicting election outcomes, using actually relatively simple statistics and data science. He was much more accurate than the experts. You know, the pop culture, the movie that came out last year, and the book before it, called Moneyball, showing what a data cruncher could do for recruiting top slice baseball, and you know, more and more we’re seeing cases where expert judgment has been far less effective than dispassionate data analysis. And I think it’s a trend we’re starting to see, and I think it’s one we’re going to, it’s going to continue.
Steven Cherry: Well Anthony, we call it a job market, but it rarely is, and I think you’ve created a completely new one, so thanks for that, and I’m sure all the data scientists thank you. Actually, do all the data scientists thank you, or are any of them bothered by the way Kaggle has rationalized the market?
Anthony Goldbloom: I think that overall data scientists are thrilled to have a place. Certainly we give great opportunities to data scientists, who are otherwise wouldn’t have a chance to demonstrate their abilities. As you say, job market is not really a job market. You know, people are judged by the degree they got, and from what university, and the brands that they have on their CV. Whereas we give data scientists an outlet to actually demonstrate their capabilities, and it certainly has meant that some data scientists that would have a lot of trouble just getting discovered, because they don’t have the right brands on their CV, have definitely….you know, we give them an opportunity to present themselves to potential employers, or potential consulting customers. And so I think in that regard we’re seen as a very positive force among data scientists.
Steven Cherry: Very good. Well thanks for creating Kaggle, and thanks for joining us today.
Anthony Goldbloom: Thanks, Steven.
We’ve been speaking with Kaggle CEO Anthony Goldbloom about the explosion of data science, and data scientists.
For IEEE Spectrum’s Techwise Conversations, I’m Steven Cherry.
This interview was recorded Monday, 13 May 2013.
Segment producer: Barbara Finkelstein; audio engineer: Francesco Ferorelli
Read more Techwise Conversations; find us in iTunes; or follow us on Twitter.
NOTE: Transcripts are created for the convenience of our readers and listeners and may not perfectly match their associated interviews and narratives. The authoritative record of IEEE Spectrum's audio programming is the audio version.
To Probe Further