What is a data scientist? Job search firm Indeed recently sketched out a picture of a data scientist as a technologist with a degree “in computer science, statistics, or a quantitative social science, along with some training in statistical modeling, machine learning, and programming.” Wikipedia describes data science as “an interdisciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from data in various forms, both structured and unstructured.”
But while a lot of companies want to hire data scientists, there is little agreement about what a data scientist actually is. In a discussion on Hacker News about my recent blog post indicating that demand and salaries for data scientists are both growing rapidly, one commenter wrote: “‘Data scientist’ is a title recently thrown around a lot for positions that used to be called ‘data analyst,’ with no strong [machine learning] or [software engineering] ability required.”
Another stated: “I would have thought [data science involves] a serious, sustained study of statistics—starting with a strong base knowledge of the mathematics of probability and building from there. But based on the resumes I've seen that doesn't seem to be the common opinion.”
This fuzzy definition doesn’t make filling data science jobs—or finding the right job—any easier.
In hopes of growing the supply of data scientists—and clarifying the definition—the Open Group, an IT industry consortium that offers certification programs in IT architecture and risk analysis, this week announced that it has established a certification program for data scientists.
Certification requires knowing a certain set of subject matter, including statistics, machine learning, AI, and business communications, and demonstrating that knowledge through projects, explained Martin Fleming, chief analytics officer for IBM. It will offer three levels of certification: certified data scientist, master certified data scientist, and distinguished certified data scientist. Applicants for certification can be reviewed by their peers through the Open Group or within their own organizations, providing their companies receive Open Group accreditation to do so.
IBM, Fleming said, is the first company to be accredited, but he expects other companies to soon follow.
Offering certification, Fleming said, will “help make our organization an attractive place to come and work, because it will give our workers a credential that is valuable to them. It will also help them improve their skills, which are valuable to the organization, and provide clarity around kinds of skills that are required for the profession.” IBM currently employs about 15,000 tech workers that it defines as data scientists, and expects that number to grow faster than its overall workforce, Fleming indicated.
IBM also announced that it will establish a 24-month data science apprenticeship program, allowing people interested in data science without experience in the field, including those who do not have four-year college degrees, to build skills and receive Open Group certification. The apprenticeship program is certified by the U.S. Department of Labor. That took a little doing, Fleming pointed out, because it required the Department of Labor, for the first time, to include “data scientist” as a recognized occupation.
“We tend to think of data scientists as superstars,” Fleming said, “but it is a large community—you need folks at junior level and at superstar level.”
A version of this post appears in the March 2019 print magazine as “Defining Data Scientists.”
Tekla S. Perry is a senior editor at IEEE Spectrum. Based in Palo Alto, Calif., she's been covering the people, companies, and technology that make Silicon Valley a special place for more than 40 years. An IEEE member, she holds a bachelor's degree in journalism from Michigan State University.