DotData’s AI Builds Machine Learning Models All by Itself

Startup aims to “eliminate the skill barrier” by using AI to automate much of data science

3 min read
Illustration of a laptop with a network on top of it, and the logo for the Python programming language coming out of it.
Illustration: Shutterstock

Demand for data scientists and engineers has, for the past couple of years, been off the charts. The number of openings for machine learning and data engineers posted on recruiting web sites continues to grow by double digits annually, and those working in the field have been commanding ever-higher salaries.

Joining the ranks of these desperately sought after techies takes serious coding chops, definitely expertise in Python, along with familiarity with other languages. That combination—of job openings for data engineers along with the dominance of Python, means Python regularly makes the charts of most in-demand coding languages.

So anyone contemplating a future in data science or machine learning needs to build up software engineering skills, right?

Wrong, says Ryohei Fujimaki, founder and CEO of dotData. Fujimaki has, for nearly a decade, been working to use AI to automate much of the job of the data scientist.

We can, he says, “eliminate the skill barrier. Traditionally, the job of building a machine learning model can only be done by people who know SQL and Python and statistics. Our system automates the entire process, enabling less experienced people to implement machine learning projects.”

DotData—which is currently offering its tools as a cloud-based service—came out of NEC. Fujimaki, then a research fellow at the company, started thinking about automating machine learning in 2011 as a way to make the 100 or so data scientists on his research team more productive. He got sidetracked for a few years, focused on commercializing an algorithm designed to make machine learning transparent, but in 2015 returned to the machine learning project.

“A typical use case for machine learning in the business world is prediction,” he said, “predicting demand of a product to optimize inventory, or predicting the failure of a sensor in a factory to allow preventive maintenance, or scoring a list of possible customers.”

“The first step in developing a machine learning model for prediction is feature engineering—looking at historical patterns and coming up with hypotheses,” he says. Feature engineering generally requires a team of people with a multitude of skill sets—data scientists, SQL experts, analysts, and domain experts. Typically, only after this team comes up with a set of hypotheses does machine learning step in, combining all those hypotheses to figure out how to best weigh them to come up with accurate predictions.

In dotData’s system, AI takes over that first step, coming up and testing its own hypotheses from a set of historical data.

So, he says, “you don’t need domain experts or data scientists, and as a subproduct AI can explore many more hypotheses than human experts—millions instead of hundreds in a limited time window.”

Fujimaki’s group at NEC in 2016 let Japan’s Sumitomo Mitsui Banking Corp. (SMBC) test a prototype against a team using traditional data science tools. “Their team took three months, our process took a day, and our results were better,” he says. NEC spun off the group in early 2018, remaining as a shareholder. Right now DotData has about 70 employees, about 70 percent of those are engineers and data scientists, along with a few dozen customers, Fujimaki says.

“In the near future,” Fujimaki says, “80 percent of machine learning projects can be fully automated. That will free up the most skilled, computer-science-PhD-type of data scientists, to focus on the other 20 percent.”

Demand for data scientists overall won’t drop from what it is today, Fujimaki predicts, though the double-digit growth may slow. The job, however, will become more focused. “Data scientists today are expected to be superman, good at too many things—statistics, and machine learning, and software engineering.”

And a new role is likely to emerge, he predicts. “Call it the business data scientist, or the citizen data scientist. They aren’t machine learning people, they are more business oriented. They know what predictions they need, and how to use those predictions in their business. It will be useful for them to have basic knowledge of statistics, and to understand data structures, but they won’t need deep mathematical understanding or knowledge of programming languages.

“We can’t eliminate the skill barrier, but we can significantly lower it. And here will be many more potential people who will be able to do this.”

The Conversation (0)

Will AI Steal Submarines’ Stealth?

Better detection will make the oceans transparent—and perhaps doom mutually assured destruction

11 min read
A photo of a submarine in the water under a partly cloudy sky.

The Virginia-class fast attack submarine USS Virginia cruises through the Mediterranean in 2010. Back then, it could effectively disappear just by diving.

U.S. Navy

Submarines are valued primarily for their ability to hide. The assurance that submarines would likely survive the first missile strike in a nuclear war and thus be able to respond by launching missiles in a second strike is key to the strategy of deterrence known as mutually assured destruction. Any new technology that might render the oceans effectively transparent, making it trivial to spot lurking submarines, could thus undermine the peace of the world. For nearly a century, naval engineers have striven to develop ever-faster, ever-quieter submarines. But they have worked just as hard at advancing a wide array of radar, sonar, and other technologies designed to detect, target, and eliminate enemy submarines.

The balance seemed to turn with the emergence of nuclear-powered submarines in the early 1960s. In a 2015 study for the Center for Strategic and Budgetary Assessment, Bryan Clark, a naval specialist now at the Hudson Institute, noted that the ability of these boats to remain submerged for long periods of time made them “nearly impossible to find with radar and active sonar.” But even these stealthy submarines produce subtle, very-low-frequency noises that can be picked up from far away by networks of acoustic hydrophone arrays mounted to the seafloor.

Keep Reading ↓Show less
{"imageShortcodeIds":["30133857"]}