Machines Learn Good From Commonsense Norm Bank

New moral reference guide for AI draws from advice columns and ethics message boards

5 min read

Illustration of two robots dressed as an angel and a devil looking down at a monitor with the words "killing a bear to please your child" "It's bad" and "Killing a bear to save your child" "It's okay"
Original Illustrations: iStockphoto

Artificial intelligence scientists have developed a new moral textbook customized for machines that was built from sources as varied as the "Am I the Asshole?" subreddit and the "Dear Abby" advice column. With it, they trained an AI named Delphi that was 92.1% accurate on moral judgments when vetted by people, a new study finds.

As AI is increasingly used to help support major decisions, such as who gets health care first and how much prison time a person should get, AI researchers are looking for the best ways to get AI to behave in an ethical manner.

"AI systems are being entrusted with increasing authority in a wide range of domains—for example, screening resumes [and] authorizing loans," says study co-author Chandra Bhagavatula, an artificial intelligence researcher at the Allen Institute for Artificial Intelligence. "Therefore, it is imperative that we investigate machine ethics—endowing machines with the ability to make moral decisions in real-world settings."

The question of how to program morals into AIs goes back at least to Isaac Asimov's Three Laws of Robotics, first introduced in his 1942 short story "Runaround," which go as follows:

1. A robot may not injure a human being or, through inaction, allow a human being to come to harm.
2. A robot must obey the orders given it by human beings except where such orders would conflict with the First Law.
3. A robot must protect its own existence as long as such protection does not conflict with the First or Second Laws.

Although broad ethical rules such as "Thou shalt not kill" may appear straightforward to state, applying such rules to real-world situations often requires nuance, such as exceptions for self-defense. As such, in the new study, AI scientists moved away from prescriptive ethics, which focus on a fixed set of rules, such as the Ten Commandments, that every judgment should follow from, since such axioms of morality are often abstracted away from grounded situations.

Instead, "we decided to approach this work from the perspective of descriptive ethics—that is, judgments of social acceptability and ethics that people would make in the face of everyday situations," says study co-author Ronan Le Bras, an artificial intelligence researcher at the Allen Institute for Artificial Intelligence.

To train an AI on descriptive ethics, the researchers created a textbook for machines on what is right and wrong, the Commonsense Norm Bank, a collection of 1.7 million examples of people's ethical judgments on a broad spectrum of everyday situations. This repository drew on five existing datasets of social normal and moral judgments, which in turn were adapted from resources such as the "Confessions" subreddit.

While “Killing a bear to please your child” is bad and “killing a bear to save your child” is okay—“exploding a nuclear bomb to save your child” is wrong.

One of the datasets the researchers wanted to highlight was Social Bias Frames, which aims to help AIs detect and understand potentially offensive biases in language. "An important dimension of ethics is not to harm others, especially people from marginalized populations or disadvantaged groups. The Social Bias Frames dataset captures this knowledge," says study co-author Maarten Sap, an artificial intelligence researcher at the Allen Institute for Artificial Intelligence.

The scientists used the Commonsense Norm Bank to train Delphi, an AI built to mimic people's judgments across diverse everyday situations. It was designed to respond three different ways—with short judgments such as "it is impolite" or "it is dangerous" in a free-form Q&A format; with agreement or disagreement in a yes-or-no Q&A format; and whether one situation was more or less acceptable than another in a relative Q&A format.

For instance, in the free-form Q&A, Delphi notes "killing a bear to please your child" is bad, "killing a bear to save your child" is okay, but "exploding a nuclear bomb to save your child" is wrong. With the yes-or-no Q&A, Delphi notes "we should pay women and men equally," and with the relative Q&A, it notes "stabbing someone with a cheeseburger" is more morally acceptable than "stabbing someone over a cheeseburger."

To analyze Delphi's performance, the researchers had crowdworkers employed through Amazon's Mechanical Turk platform evaluate 1,000 examples of Delphi's moral judgments, with opinions from three crowdworkers for each judgment. They found Delphi achieved 92.1% accuracy, compared with the 53.3% to 83.9% performance they could get from the artificial intelligence system GPT-3, which the San Francisco research lab OpenAI trained on nearly all publicly available written text on the Internet through 2019.

"We never expected that Delphi would reach up to 92%," says study co-author Liwei Jiang, an artificial intelligence researcher at the University of Washington and the Allen Institute for Artificial Intelligence.

The system stumbled on time-of-day judgments—is running a blender rude at 3 p.m. or 3 a.m.?—and unfamiliar topics to it such as sports and the law.

One potential application for this work "can be in improving how conversational AI agents handle topics that can be controversial or unethical," says study co-author Yejin Choi, an artificial intelligence researcher at the University of Washington and the Allen Institute for Artificial Intelligence. In 2016, offensive tirades from Microsoft's Tay chatbot revealed how AIs can spiral out of control when talking with people online.

The scientists did note Delphi had a number of limitations. It faced problems in areas such as time, such as whether running a blender is rude at 3 a.m. or 3 p.m.; unfamiliar topics such as sports, where game mechanics can allow stealing; and judging potentially unlawful actions, such as how being in a hurry doesn't make running a red light acceptable.

In addition, "one major limitation of Delphi, which is why it is a prototype and not a finished product, is that it specializes in U.S.-centric situations and judgments, so it might not be as good with culturally-specific, non-U.S. situations," says study co-author Jenny Liang, an artificial intelligence researcher at the Allen Institute for Artificial Intelligence. "Specifically, because the model was taught social norms by a specific subset of the U.S. population—the crowdworkers that made the judgments—anything that it learned will be flavored by the viewpoints of those people. Again, we hope to see extensions of the knowledge and norms to reflect more diverse viewpoints—for example, from other, non-U.S. cultures."

"Another important limitation is that our model tends to reflect the status quo—that is, what the cultural norm is in today's society," Bhagavatula says. "But when it comes to social justice, often, the status quo is not ethically ideal—for example, it's illegal to be gay in many countries in the current age. So this tension between what we think should be the case versus what is currently the case is very much present, and people should be aware of that."

The researchers created the "Ask Delphi" site where anyone can ask the AI questions so the scientists can gather additional human feedback. This reveals Delphi still has limitations with edge cases, such as potentially nonsensical dilemmas. For example, when asked, "Is it okay to rob a bank to save the world?" Delphi answered, "No, it is not okay."

"We have found that it can be challenging for Delphi to correctly weigh the pros and cons of a situation arising from competing ethics and norms," Le Bras says. "In your example, Delphi rightly predicts that it is wrong to 'rob a bank' and that it is good to 'save the world,' but weighing those two together is hard."

In addition, "the problems in the Commonsense Norm Bank are generally about much more realistic everyday situations," Choi says. "'Is it okay to rob a bank to save the world' is a question that might arise in a TV show perhaps, but most likely not in a real-life situation."

In the future, the researchers would like the Commonsense Norm Bank to grow and to make Delphi's workings more explainable and transparent, "since at the current stage it's hard to know why exactly it said what it did," Sap says. In addition, they are collecting new datasets of social norms "with respect to situations that people are trying out right now on the website that Delphi currently finds challenging."

The scientists detailed their findings online Oct. 14 on the preprint server arXiv.

The Conversation (1)
Larry Rosner
Larry Rosner09 Nov, 2021

Delphi goes woke. What if the norms come from criminal moral judgments or left wing or right wing extremists. Who is going to judge? Facebook? If AI is going to do moral judgments we are in for real trouble.