“Should we automate away all the jobs, including the fulfilling ones?”
This is one of several questions posed by the Future of Life Institute’s recent call for a pause on “giant AI experiments,” which now has over 10,000 signatories including Elon Musk, Steve Wozniak, and Andrew Yang. It sounds dire—although maybe laced through with a little bit of hype—and yet how, exactly, would AI be used to automate all jobs? Setting aside whether that’s even desirable—is it even possible?
“I think the real barrier is that the emergence of generalized AI capabilities as we’ve seen from OpenAI and Google Bard is that similar to the early days when the Internet became generally available, or cloud infrastructure as a service became available,” says Douglas Kim, a fellow at the MIT Connection Science Institute. “It is not yet ready for general use by hundreds of millions of workers as being suggested.”
Even researchers can’t keep up with AI innovation
Kim points out that while revolutionary technologies can spread quickly, they typically fail to reach widespread adoption until they prove themselves through useful, easily accessible applications. He notes that generative AI will need “specific business applications” to move beyond a core audience of early adopters.
Matthew Kirk, the head of AI at Augment.co, has a similar view. “What I think is happening is similar to what happened in the early days of the Internet. It was an absolute mess of ideas, and no standards. It takes time and cooperation for human beings to settle on standards that people follow. Even something as mundane as measuring time is incredibly complicated.”
Standardization is a sore spot for AI development. The methods used to train the models and fine-tune the results are kept secret, making basic questions about how they function hard to answer. OpenAI has touted GPT-4’s ability to pass numerous standardized tests—but did the model genuinely understand the tests, or simply train to reproduce the correct answers? And what does that mean for its ability to tackle novel tasks? Researchers can’t seem to agree on the answer, or on the methods that might be used to reach a conclusion.
OpenAI’s GPT-4 can ace many standardized tests. Does it truly understand them, or was it trained on the correct answers?OpenAI
Even if standards can be agreed on, designing and producing the physical hardware required for the widespread use of AI-powered tools based on large-language models (LLMs) like GPT-4—or other generative AI systems—could prove a challenge. Lucas A. Wilson, the head of global research infrastructure at Optiver, believes the AI industry is in an “arms race” to produce the most complicated LLM possible. This, in turn, has quickly increased the compute resources required to train a model.
“The pace of innovation in the AI space means that immediately applicable computational research is now in advance of the tech industry’s ability to develop new and novel hardware capabilities, and so the hardware vendors must play catch-up to the needs of AI developers,” says Wilson. “I think vendors will have a hard time keeping up for the foreseeable future.”
Like you, AI won’t work for free
In the meantime, developers must find ways to deal with limitations. Training a powerful LLM from scratch can present unique opportunities, but it’s only viable for large, well-funded organizations. Implementing a service that taps into an existing model is much more affordable (Open AI’s ChatGPT-3.5 Turbo, for example, prices API access at roughly US $0.002 per 750 English words). But costs still add up when an AI-powered service becomes popular. In either case, rolling out AI for unlimited use isn’t practical, forcing developers to make tough choices.
“Generally, startups building with AI should be very careful with dependencies on any specific vendor APIs. It’s also possible to build architectures such that you don’t light GPUs on fire, but that takes a fair bit of experience,” says Hilary Mason, the CEO and cofounder of Hidden Door, a startup building an AI platform for storytelling and narrative games.
This is a screen capture of an AI-powered tool used to generate narrative games. It includes multiple characters and prompts that a user can select. Hidden Door
Most services built on generative AI include a firm cap on the volume of content they’ll generate per month. These fees can add up for businesses and slow down people looking to automate tasks. Even OpenAI, despite its resources, caps paying users of ChatGPT, depending on the current load: As of this writing, the cap is currently 25 GPT-4 queries every three hours. That’s a big problem for anyone looking to rely on ChatGPT for work.
Developers of AI-powered tools also face a challenge as old as computers themselves—designing a good user interface. A powerful LLM capable of many tasks should be an unparalleled tool, but a tool’s ability to accomplish a task is irrelevant if the person using it doesn’t know where to start. Kirk points out that while ChatGPT is approachable, the openness of interacting with an AI through chat may prove overwhelming when users need to focus on a specific task.
“I have learned from experience that leaving tools completely open-ended tends to confuse users more than assist,” says Kirk. “Think of it like a hall of doors that is infinite. Most humans would stand there perplexed with what to do. We have a lot of work to do to determine the optimal doors to present to users.” Mason has a similar observation, adding that “in the same way that ChatGPT was mainly a UX improvement over GPT-3, I think that we’re just at the beginning of inventing the UI metaphors we’ll need to effectively use AI models in products.”
Training to use AI will be a job in itself
One particular problem has already generated controversy and threatens efforts to build AI tools for sensitive and important work: hallucination. LLMs have an incredible ability to generate unique text, cracking jokes and weaving narratives about imaginary characters. However, this perk is an obstacle when precision and accuracy are mission critical, because LLMs will often present nonexistent sources or incorrect statements as fact.
“Specific functions in companies in certain heavily regulated industries (banking, insurance, health care) will find it difficult to reconcile the very stringent data privacy and other regulatory requirements that prevent discrimination,” says Kim. “In these regulated sectors, you can’t have AI make the sort of mistakes that are passable when writing a school paper.”
“How does hiring people for the completely new job of feeding LLMs free anyone else who was already working to focus on more complex or abstract tasks?”
—Lucas A. Wilson, Optiver
Companies may respond to this challenge by courting employees that have expertise using AI tools. Anthropic, an AI safety and research company, recently made headlines with a job ad seeking a prompt engineer and librarian responsible for building “a library of high quality prompts or prompt chains to accomplish a variety of tasks,” among other things. The salary? $175,000 to $335,000.
However, Wilson sees a friction between the expertise required to use AI tools effectively and the efficiency AI promises to deliver. “How does hiring people for the completely new job of feeding LLMs free anyone else who was already working to focus on more complex or abstract tasks?” asks Wilson. “I don’t readily see a clear answer.”
Augmenting work with AI could be worthwhile despite these problems. This was certainly true of the computing revolution: Many people need training to use Word and Excel, but few would propose typewriters or graph paper as a better alternative. Still, it’s clear that a future in which “we automate away all the jobs, including the fulfilling ones,” is more than six months away, as the Future of Life Institute’s letter frets. The AI revolution is unfolding right now—and will still be unfolding a decade from today.