Timnit Gebru was a well-known scholar in the AI-ethics community long before she got fired by Google in December 2020—but that messy and dramatic incident brought a new level of attention to her work. Google apparently exiled Gebru from its AI ethics team (and subsequently fired the other leader of the team) in response to a paper about the dangers of the large language models that have become so important to the world’s biggest technology companies. The episode created a firestorm in the AI field.
But Gebru has made the most of the jarring opportunity. In December 2021, she announced the founding of a new organization, the Distributed AI Research Institute (DAIR), which is billed as “a space for independent, community-rooted AI research free from Big Tech’s pervasive influence.” Since then, Gebru has been staffing up. In February, AI and sociology researcher Alex Hanna joined as research director, departing from her Google job with a blistering resignation letter. Gebru and Hanna spoke with IEEE Spectrum about their plans for DAIR.
Timnit Gebru and Alex Hanna on...
Timnit, did you decide to found a new organization because you think that the current model of AI research is broken?
Timnit Gebru: Yes. For instance, I was looking at what our incentives were at Google and what happened to us—we don’t need to rehash that—and what the incentives are in academia. We want to do interdisciplinary research. We don’t want to drive people to the publishing rat race. We want to take very seriously communicating our research results to people, beyond just writing papers. And we want [researchers] to live a livable life! We don’t want them to work 24/7. And that means we plan to put out less work, so each work will take more money. I was thinking about what kind of work I wanted to do and what kind of environment I wanted to create, and it seemed like it was better to start something from scratch and figure out how to sustain that.
Your press release about the founding of DAIR mentioned that AI is often presented as inevitable, and that you want to combat that idea. Are you trying to apply the precautionary principle to AI?
Alex HannaBrittany Hosea-Small
Alex Hanna: I’m not necessarily thinking about it from the perspective of the precautionary principle. I’m thinking of it more from the perspective of developing technology that works for people. A lot of the AI research that happens right now is AI for the value of AI itself. A lot of people are thinking about this body of tools known as AI and saying, “Well, everything looks like a nail, and we have this big hammer.”
We already know that deep learning has problems. These modes of research require organizations that can gather a lot of data, data that is often collected via ethically or legally questionable technologies, like surveilling people in nonconsensual ways. If we want to build technology that has meaningful community input, then we need to really think about what’s best. Maybe AI is not the answer for what some particular community needs.
If AI is not inevitable, if it’s a choice to use it, are there any application areas right now where you think we should definitely choose not to use AI?
Gebru: I wonder about AI for social good. I’m not saying that it shouldn’t happen, but why start with the AI? Why not think about the good you want to do, and then see if AI can be helpful? Sometimes people talk about AI for climate change, but if you really do the analysis of climate change, isn’t a lot of AI being used to make the oil and gas industries more efficient? I’m not saying AI for social good shouldn’t exist. But I think that’s an example of what Alex was saying, where AI is the hammer. And of course, the technology should not be used for risk assessment, criminalizing people, predictive policing, and remotely killing people and making it easier to enter warfare.
Hanna: Even if you exclude areas like war-making, policing, and incarceration, we should think about areas in which AI is used for things that are necessary, like social welfare and education. [In schools,] there have been all these surveillance systems to “make teachers jobs’ easier” by monitoring students online. We know that student surveillance systems are applied unequally across school systems. If you have a predominately white private school, they are not going to be surveilling the same way that prominently black and brown public schools are. There was an article that was recently published in Slate that showed these school surveillance systems will flag LGBT keywords, which could unintentionally be outing students. If you’re in a place like Texas or Florida, that can be potential for reporting the student to child welfare services. And those get filed as child abuse, according to the antitrans executive order that the governor of Texas signed. The promise of those tools is that you’re going to be able to do more with less. AI is bringing brought in to reduce inefficiencies, but maybe these schools really need many more teachers.
What do you see your mission at DAIR? Will you be calling attention to the current problems in AI, or doing algorithmic audits, or building new types of AI?
Gebru: We have projects that are basically audits, but I’m wary of being a third-party auditor that people can point to as a green light: “Well, they said it’s okay.” But we’re basically doing all of those things. For instance, we have a project on using satellite imagery and computer vision to analyze the effects of spatial apartheid [in South Africa]—so that’s using AI for something that we think will help.
Right now, we’re also focused on the process by which we do this research. What are some of the principles we should be following? How do we not exploit people? How do we make sure that when we extract knowledge from people, we appropriately compensate them? There are lots of people in communities who are not writing papers, but they have other forms of knowledge that are very important for our projects. How do we collaborate with those people in a way that’s respectful and values what they bring to the table?
Hanna: Also, what would it mean to use AI to hold power to account? We’re having lots of discussions with NGOs that are focused on accountability and human rights.
Maybe you can tell me more about the satellite imagery project to make this more concrete. What are the goals of that project and what have you figured out so far?
Gebru: This project is about analyzing spatial satellite imagery. One of our research fellows, Raesetje Sefala, is based in South Africa and grew up in a township. She does work in computer vision, but her knowledge of that history is just as important as her work in computer vision. Spatial apartheid is legally over. But when you look at these images, you see that the townships are in one place and the white mansions are in a different place. That was mandated by the Group Areas Act of 1950. The question we’re asking is: What has happened since then?
Like always, the data-set work was the most important and time-consuming work, and that was where we made the biggest innovations. And it’s so hard to publish that kind of work—as we knew already, we’ve been through this multiple times. You go to the computer vision community and they’re like, “Oh, it’s a data-set paper, where’s the algorithm?” But NeurIPS had this new data sets and benchmarks track and that’s where we published it.
This project is an example of a lot of the things we’re hoping to do here. We don’t want to just write a paper and move on. We’re working on visualizations; we’re working on how to effectively communicate our findings to relevant groups. [Sefala] is going to write an article for Africa is a Country about some of the findings. We’ve realized that one of the most important things we did was to label townships in the data sets, because the South African government doesn’t label townships in the census. Just that is very important, because how can you analyze the impacts of spatial apartheid if you don’t label the townships? I don’t know if you know Mimi Onuoha—she’s an artist who made a similar point about how Google Maps completely ignores favelas in Brazil.
It’s interesting to hear you talk about challenges with the data sets. Timnit, in your work on large language models you’ve called attention to problems with existing data sets, including embedded bias. The response I often hear is, essentially, “It’s just too hard to make data sets better.”
Gebru: If it’s just too hard to build a safe car, would we have cars out there? It goes back to what Alex was saying about a hammer. People think there’s just one way to do it. We’ve been trying to say, “Maybe there’s a different way we can think about this.” If you think [data-set curation] is a necessity, that means it’ll take more time and resources before you put something out there.
Hanna: This is a point we’ve made over and over. We’ve published on data-set practices and how many of these things go out with not enough attention paid to what’s in them. This data-hungry version of building models started with ImageNet, and it wasn’t until ImageNet was out for about 10 years that people started to dig in and say, “Wait, this [data set] is really problematic.”
I’ve been working on a paper with a legal scholar named Mehtab Khan on the legal dimensions of these huge data sets. The big firms like OpenAI are really pushing to say, “Oh, we can use these data, it’s fair use.” But we actually don’t know that—there’s not a lot of prior case law. Plus, fair use only matters for copyright holders, it doesn’t matter for data subjects and it doesn’t matter for the people who are affected by the decisions when these models are deployed.
It seems like a lot of the big changes need to happen within the big industry players. But can you affect change from the outside? And have you seen signs that this philosophy of AI development is spreading?
Gebru: I’ve seen changes. For instance, we’ve been talking for a long time about how data labor is completely undervalued. If you have PhD students and you want them to spend their time very carefully thinking through how they’re going to gather data sets, but then they have nowhere to publish… Now NeurIPS has the data sets and benchmarks track. When you think about what people need to do, you also have to think about the incentive structures. I see some of that changing, with help from labor organizing. And I think from the outside we can help. When we were on the inside, we partnered with people on the outside all the time. And the government has a huge role to play.
But what I worry about—and I said this to the EU parliament—is that we are stuck in this cycle where we are still talking about the potential harms of technology from a long time ago. And now people are talking about the metaverse! We have to figure out how to slow down, and at the same time, invest in people and communities who see an alternative future. Otherwise we’re going to be stuck in this cycle where the next thing has already been proliferated by the time we try to limit its harms.
Hanna: There’s a big desire for some kind of ethical framework. There is already legislation, and there’s going to be more regulations and ongoing litigation. But it’s not going to happen without a concerted effort from people who are willing to push and advocate for it.
- Statement Regarding the Ethical Implementation of Artificial ... ›
- IEEE Global Initiative Aims to Advance Ethical Design of AI and ... ›
- IBM and Microsoft Have Integrated AI Ethical Standards into Their ... ›
- Will Alphabet's Unionization Effort Spread to Other Big Tech ... ›
- Engineering Bias Out of AI - IEEE Spectrum ›
- OpenAI's GPT-3 Speaks! (Kindly Disregard Toxic Language) - IEEE ... ›
- Meta’s Challenge to OpenAI—Give Away a Massive Language Model - IEEE Spectrum ›
- AI Everywhere, All at Once - IEEE Spectrum ›