GPT-5 was supposed to be the model that proved artificial general intelligence (AGI) is within reach. OpenAI CEO Sam Altman hinted as much with a January post on his personal blog. Altman wrote he was “now confident we know how to build AGI as we have traditionally understood it,” adding that 2025 would be the year AI agents “materially change the output of companies.”
Reality hasn’t lived up to Altman’s expectations. Cognitive scientist and AGI skeptic Gary Marcus called GPT-5 “overhyped and underwhelming“ in a Substack post, and the deluge of negative feedback eventually prompted Altman to admit OpenAI “totally screwed up” the launch.
It’s not just GPT-5 in the crosshairs. A recent MIT report on AI in business found that 95 percent of all generative-AI deployments in business settings generated “zero return.” The report shook confidence in AI badly enough to drive a minor sell-off in tech stocks, though stock prices have since leveled off. Recent releases from Grok and Anthropic also received a tepid response.
“We are amid a classic hype cycle,” says Juan Graña, CEO of the AI company Neurologyca. “AI burst onto the scene with intense buzz, but is now sliding into what Gartner calls the ‘trough of disillusionment,’ where expectations meet reality.”
Is AI Headed Toward a Trough of Disillusionment?
There’s a good chance you know the trough of disillusionment even if you’re not familiar with the term.
The phrase was coined in 1995 as part of a graph that Gartner analyst Jackie Fenn used to illustrate how inflated expectations can lead to a period of disillusionment. It quickly caught on and led to countless (and sometimes humorous) variations of the original graph.
Jason Gabbard, comanaging partner at the AI consultant Bowtie, says the hype leading up to GPT-5—as well as other AI releases in 2025—was intense. “There have been so many talking heads, the commentary has been all hype for so long, that expectations were high,” says Gabbard. He added that GPT-5’s failure to meet expectations was most sorely felt by smaller organizations and individuals, which hoped “that the next thing out of OpenAI was going to solve all of their problems.”
His comments are echoed by the user-led rebellion that followed in GPT-5’s wake.
As part of the new model’s release, OpenAI removed the earlier GPT-4o model from ChatGPT on the apparent assumption that users would find GPT-5 an upgrade in every situation. Instead, many ChatGPT users complained that the new model seemed worse than its predecessor. The criticism caused OpenAI to change course and restore access to GPT-4o just 24 hours after its removal.
It was an embarrassing turn of events for OpenAI. In 2024, Altman predicted that GPT-5 would make GPT-4 feel “mildly embarrassing” by comparison. Instead, user feedback to GPT-5 was so negative that OpenAI decided to restore its predecessor.
Challenges Facing AI Agents in 2025
Ironically, Fenn’s original 1995 graph placed intelligent agents at the very peak of expectations—precisely where AI agents found themselves at the start of 2025. Fast forward to August and it seems that, just as Fenn’s graph predicts, agents are leading a plummet into the trough.
The launch of GPT-5’s Agent Mode (formerly called Operator), much like the model itself, received mixed reviews. And doubts about agentic AI have spilled over into the entire AI industry. Replit, an AI vibe coding tool, faced criticism in June after its agent deleted a company’s entire codebase. Security is an issue, too. Antivirus provider Malwarebytes recently issued a warning that AI agents trusted with important credentials could leave users “penniless” by falling for scams designed to fool AI.
These worrying headlines are extreme cases, but they’re flanked by benchmarks that paint a modest picture of agentic performance.
Once such benchmark, TheAgentCompany, tasked AI agents powered by models from Amazon, Anthropic, Google, and OpenAI with jobs across a wide range of career paths, including coding, data science, and human resources. It found that even the best model tested, Google’s Gemini 2.5 Pro, could complete only 30.3 percent of tasks. Results for GPT-5 are not yet available.
TheAgentCompany benchmark also discovered that the limitations of AI agents differ from expectations.
A recent study found that AI posed the greatest threat to jobs that involve soft skills. These include customer-service representatives, clerks, analysts, public-relations specialists, and administrators. Anthropic CEO Dario Amodei says AI will eliminate up to half of all white-collar jobs.
However, TheAgentCompany benchmark found that agents perform poorly when asked to complete tasks that fall within these roles. They struggle due to a lack of social skills and a tendency toward self-deception. Agents were most successful when asked to handle software-development and project-management tasks.
“Coding looks hard to humans, but it’s actually easier for AI models than simpler-seeming tasks like clerical work,” says Frank Xu, a coauthor on the TheAgentCompany paper.
Data Limitations Impacting AI Performance
One possible reason for this capability gap? A lack of training data.
“There’s a huge amount of open-source code online to train on, but you don’t see companies open-sourcing their spreadsheets or human-resource workflows,” says Xu. “That lack of data is a big reason why agents struggle with the jobs people expect them to replace.”
All of the experts IEEE Spectrum spoke with agreed that a lack of data related to specific tasks appears to be a stumbling point for AI models.
Neurologyca’s Graña believes that “AI lacks the data and, more importantly, the context needed to behave in emotionally intelligent ways.” Bowtie’s Gabbard, who helps financial institutions like hedge funds implement AI automations, says generalized AI agents struggle with unique business processes, requiring customized solutions to succeed. And Mark Parfenov, an analyst with experience using AI, finds agents “very quickly lose track of complex tasks” and omit important data when used for market analysis.
These difficulties cast doubt on the AI industry’s hope that AGI can be achieved by scaling up general-purpose large language models. That’s not to say AI models lack a path to improvement, however. Synthetic data and improved data labeling offer options to address shortcomings, though they could also turn AI’s climb out of the trough of disillusionment into a difficult, costly slog.
“I think we are running out of low-hanging fruits to improve on,” says Xu. He added that the early agentic gains came from simple changes, “things like formatting errors, or not understanding tools [...] I think we’re going to slow down until we find the next big thing.”
- OpenAI’s GPT-OSS Challenges Meta’s Leadership in Open-Weight AI ›
- OpenAI Launches GPT-5, the Next Step in Its Quest for AGI ›
Matthew S. Smith is a freelance consumer technology journalist with 17 years of experience and the former Lead Reviews Editor at Digital Trends. An IEEE Spectrum Contributing Editor, he covers consumer tech with a focus on display innovations, artificial intelligence, and augmented reality. A vintage computing enthusiast, Matthew covers retro computers and computer games on his YouTube channel, Computer Gaming Yesterday.



