Open Source Is Throwing AI Policymakers For A Loop

Machine learning isn't just for big companies any more

4 min read

Six suited people in front of a stack of papers twice their size with AI written on the top

Depending on whom you ask, artificial intelligence may someday rank with fire and the printing press as technology that shaped human history. The jobs AI does today—carrying out our spoken commands, curing disease, approving loans, recommending who gets a long prison sentence, and so on—are nothing compared to what it might do in the future.

But who is drawing the roadmap? Who's making sure AI technologies are used ethically and for the greater good? Big tech companies? Governments? Academic researchers? Young upstart developers? Governing AI has gotten more and more complicated, in part, because hidden in the AI revolution is a second one. It's the rise of open-source AI software—code that any computer programmer with fairly basic knowledge can freely access, use, share and change without restriction. With more programmers in the mix, the open-source revolution has sped AI development substantially. According to one study, in fact, 50 to 70 percent of academic papers on machine learning rely on open source.

And according to that study, from The Brookings Institution, policymakers have barely noticed.

"Open-source software quietly affects nearly every issue in AI policy, but it is largely absent from discussions around AI policy," writes Alex Engler, a fellow in governance studies at Brookings and the author of the report.

A few major examples: A newly proposed Artificial Intelligence Act by the European Parliament makes no mention of open source. In the United States, the Obama and Trump administrations gave it only passing attention in their AI strategies. (The Biden administration is just getting started.)

"In many of the meetings I've been in, the role of open-source code functionally never comes up," says Engler in an interview. "It deserves more routine consideration as part of the broader issues that we all care about."

At its heart, open-source software should be a good thing. If more developers are involved, the common belief is that they will improve on each other's work. AI development is dominated by a small number of technology giants—Google, Facebook, Amazon, Apple, Baidu, Microsoft, and so forth—but machine-learning libraries such as Google's TensorFlow and Facebook's PyTorch are there for anyone's use.

"There are web development libraries that are competing against each other," says Engler, "and so that often means that the code is much, much, much, much better than any individual person could write."

The problem, Engler says, is that while developers know are familiar with these libraries, most non-engineers—including many policymakers whose job it is to protect the public interest—are not. And people are being affected ways that they may not even recognize.

The ideal of open source is that many contributors will catch each others' mistakes and biases—but they may also introduce new biases to a piece of software.

Engler cites the problem of hiring discrimination by machines. AI bias has been widely documented (recall Google Photos famously labeling black people as "gorillas" in 2015, or OpenAI persistently linking Muslims with violence), but for all the transparency promised by open source, most people may have no idea when they're victims.

"You might send in a resume, and it might go through an AI system that's discriminatory, it might reject you—and you'll never know that happened," says Engler. "If you don't know you were discriminated against, if you don't know you were evaluated by an algorithm, you can't even tell the EEOC [the U.S. Equal Employment Opportunity Commission]. And in fact, the EEOC keeps saying we're not getting complaints."

Remember also that since open source is based on a faith in the wisdom of crowds, any one member of the crowd—any developer—can change a piece of code without appreciating the possible consequences. The ideal of open source is that many contributors will catch each others' mistakes and biases—but they may also introduce new biases to a piece of software.

That's a worry expressed by Melanie Moses, a professor of computer science at the University of New Mexico and the Santa Fe Institute who has done considerable work on the growing role of AI in the criminal justice system. Algorithms have been used to decide whether a suspect in a crime can be trusted not to jump bail, or whether a convicted criminal is a risk for repeat offenses if only sentenced to probation.

"If software is solidifying, let's say, racial bias in sentencing," she says, "and every time it operates it puts more young black men in jail, and then having been in jail before makes them more likely to be put in jail again—that's a dangerous positive feedback."

Which brings us back to the policymakers who, in Engler's view, need to pay more attention to the ways in which open source is shaping the future of AI.

"One of the scary parts of open-source AI is how intensely easy it is to use," he says. "The barrier is so low… that almost anyone who has a programming background can figure out how to do it, even if they don't understand, really, what they're doing."

Perhaps, says Moses in New Mexico, AI doesn't simply need heavy-handed lawmakers or regulators; standards organizations could recommend better practices. But there needs to be something in place as the pace of AI development increases. If an open-source algorithm is flawed, it is harder to undo the damage than if the software came from one proprietary—and accountable—company.

"The software is out there, it's been copied, it's in multiple places, and there's no mechanism to stop using something that's known to be biased," she says. "You can't put the genie back in the bottle."

The Conversation (1)
Thomas Haferlach
Thomas Haferlach28 Aug, 2021

I made an account just to comment on how terrible I find this article. I will paste some of the top comments from a discussion Hacker News about it.

"This piece was baffling. It seems to be conflating the issue of AI bias with the concept of open source software development. "If an open-source algorithm is flawed, it is harder to undo the damage than if the software came from one proprietary—and accountable—company." "Agree with the criticism of this piece so far, but what really makes this piece laughable is the fact that the vast majority of proprietary AI is just open source AI that's been hidden behind an organization's firewall. So, this piece is arguing that the problem with open source AI is that people can see it. When it's hidden behind a firewall, it will magically become more transparent and accountable. Makes perfect sense, right?" "What a strange article. There is almost no substance and it doesn't deliver any convincing arguments why policymakers should focus on open source. Intuitively I would say policy should focus more on corporationsas open source is publicly auditable."

"Terrible piece. It makes me wonder if this is a 'submarine' from a proprietary AI provider that wants to push the narrative that open source AI tools need stronger regulation vs proprietary AI tools. "The scary part is how easy open source is to use", and "a mistake in open source is much more difficult to correct then an algorithm from a - accountable - company." Shows how little the author understands of this field. It won't be tensorflow that's biased against minorities in resume selection, more likely the data was biased."

"My summary of claims: Opensource AI is too dangerous and can't be left as it is. It's scary that everyone can just create AI and we should make laws regarding this glaring issue. Proprietary AI is more accountable, transparent and easier to fix than opensource AI. Opensource AI is more bias-prone by it's nature. No arguments presented. It's just that. Everyone can change opensource AI and add biases. Unlike companies, who are accountable and will easily fix it if something goes wrong. Tech is too powerful for simpletons to wield, we should restrict it."