Why OpenAI’s Codex Won’t Replace Coders

Human programmers can actually become more powerful and efficient with Codex

6 min read

Line after line of green computer computer code
iStock Photo

UPDATE 28 MARCH 2024: Coders have, in fact, outlasted OpenAI’s Codex. In March 2023, OpenAI initially announced it would shut down access to Codex, telling users to switch to the company’s more general-purpose GPT-3.5 and GPT-4 models instead. It slightly reversed course a few days later, maintaining access to Codex for researchers. Coders 1, Codex 0.

That’s not to say AI hasn’t already made an impact in software engineering. A 2023 survey from Github found that 92 percent of programmers are using AI tools in their work. For now, AI is still relegated to assisting programmers, not replacing them, as there are plenty of aspects to the job that AI cannot take over. Elsewhere, AI’s other big impact on coding, prompt engineering, has so far failed to take off as predicted. —IEEE Spectrum

Original story from 28 September 2021 follows:

This summer, the artificial intelligence company OpenAI released Codex, a new system that automatically writes software code using only simple prompts written in plain language. Codex is based on GPT-3, a revolutionary deep learning platform that OpenAI trained on nearly all publicly available written text on the Internet through 2019.

As an early Beta tester, I’ve had extensive opportunities to put both GPT-3 and Codex through their paces. The most frequent question I’m asked about Codex is “Will this replace human programmers?” With world powers like the United States investing billions into training new software developers, it’s natural to worry that all the effort and money could be for naught.

If you’re a software developer yourself—or your company has spent tons of money hiring them—you can breathe easy. Codex won’t replace human developers any time soon, though it may make them far more powerful, efficient, and focused.

Why isn’t Codex an existential threat to human developers? Years ago, I worked with a high-level (and highly compensated) data scientist and software developer from a major American consulting firm on a government database project. Our task was to understand how a state agency was using its database to assign grants to organizations, and then to advise the agency on how to improve the database.

When I first started working with my developer colleagues, I had a lot of preconceived ideas about how he’d spend his time. I imagined he’d be hunched over his laptop all day, tapping out code in R or cooking up brilliant formulas in Mathematica to help us better understand our client’s database. I pictured Beautiful Mind-style frantic scribbling on windows, regression analyses, and lots of time spent in front of a screen, writing thousands of lines of Python code.

Instead, my colleague started the engagement by sitting down with the client and spending several days understanding their grant-making process. This led to meetings with individual staff members, stakeholders, the agency’s constituents, and more. Only after several months of this kind of work did he finally sit down to analyze the agency’s data, using R and various graphing libraries. The actual coding and analysis took all of two days. The results of his analysis were spot on, and his program worked perfectly. The client was thrilled.

He later explained to me that actually writing code and running analyses occupies about 1 percent of his time. The remainder is spent working with clients to understand their problems, determining the right software and mathematical models to use, gathering and cleaning the actual data, and presenting results. In most cases, the coding and math itself is a tiny, almost rote, part of the software development process.

This is typical of developers. According to Tech Republic, writing actual code often occupies less than half of a typical software developer’s time, and in many cases, as little as 20 percent of it. That means that even if systems like Codex worked perfectly, they would replace at most half of the job of a typical human software developer, and often less than a quarter of it. Unless someone trains Codex to sit down with clients, win their trust, understand their problems, and break those problems down into solvable, component parts—in short, to do what my colleague did during our project-the system won’t threaten skilled human developers any time soon.

The day when a non-coder can sit down with Codex, write up a spec sheet, and crank out a working piece of software is still far away.

In their paper announcing Codex, OpenAI’s scientists acknowledge this. In their words, “engineers don’t spend their full day writing code.” Instead, they spend much of their time on tasks like “conferring with colleagues, writing design specifications, and upgrading existing software stacks.” Codex’s creators suspect the system may “somewhat reduce the overall cost of producing software” by letting developers “write good code faster. But they doubt it will steal jobs. If anything, they suggest that automating the grunt work associated with software development will open up the field to a broader range of people. It might create a new specialty, too: “prompt engineering,” the often-complex process of crafting the textual prompts which allow AI systems like Codex to work their magic.

Others aren’t so sure. As journalist Steven Levy points out in Wired, Codex may not steal work from individual software developers. But if it makes each developer far more efficient, companies may decide they can get by with fewer of them. Where a project may have required ten developers before, it may only require eight if those developers are assisted by Codex or a similar AI system, resulting in a net loss of two jobs.

That might be true one day, but that day won’t arrive any time soon. Given that demand for developers grew 25 percent worldwide in 2020 despite the pandemic, the real threat to jobs from systems like Codex seems minimal, at least for now. If anything, allowing top companies to get by with fewer developers might make those developers available to mid-tier companies or startups, leading to better software at all levels of the tech ecosystem. Currently, startups often struggle to attract talented developers. If the Googles and Facebooks of the world poached poached fewer top developers, more top-notch talent might be available for emerging, innovative companies.

It’s also important to remember that all of this is predicated on the idea that Codex or systems like it can write code as well as a human software developer. At the moment, it absolutely cannot. OpenAI acknowledges that at launch, Codex’s code contains errors or simply doesn’t work 63 percent of the time. Even writing perfect code 37 percent of the time is a big deal for a machine. But the day when a non-coder can sit down with Codex, write up a spec sheet, and crank out a working piece of software is still far away.

Systems like Codex could create “centaurs,” hybrids of humans and AIs working together to do something faster and better than either could do alone.

That’s why many in the tech community see Codex less as a generator of new code, and more as a powerful tool to assist humans. When I asked futurist Daniel Jeffries whether Codex would replace human software developers, he responded “No chance.” In his words, “It will likely take years before we have a code engine that can generate consistently good routine code and inventive new code.”

Instead, Jeffries imagines systems like Codex creating “centaurs,” hybrids of “humans and AIs working together to do something faster and better than either could do alone.” Centaurs have already proven their value in games like chess, where human/machine centaurs consistently best both human grandmasters and unassisted computers. A human/AI centaur could likely work faster than a human software developer, but would be far more accurate and better attuned to real-world problems than a system like Codex laboring alone.

Popular code repository Github made a splash when it launched Copilot, a code assistance platform powered by Codex. Copilot works like autocorrect on steroids, offering code to complete whole functions or auto-filling repetitive code as a developer types. If centaurs really are the future of artificial intelligence, though, then the system’s name is misleading. In aviation, a copilot is a fully qualified pilot who can take over control of an airplane from the captain if needed. An autopilot, on the other hand, can fly the plane automatically in certain contexts (like when cruising straight and level) but must hand over control to a human pilot when things get dicey (like when landing in bad weather).

GitHub’s Copilot is really more like an autopilot than a true copilot. It can code on its own when tasks are easy and repetitive, but as soon as they get more complex, it requires human intervention. “As the developer”, Github says on its page about Copilot, “you are always in charge.” Ultimately, that’s not a criticism of Copilot or Codex. In aviation, autopilots are incredibly useful systems.

On a given commercial flight, a plane might be on autopilot up to 90 percent of the time. But crucially, human pilots are always supervising the system. And without their 10 percent contribution, planes would frequently crash. Pilots, in other words, are already skilled, safe, and effective centaurs. They could provide a helpful blueprint for similar centaurs in the software development world. That’s probably why GitHub and OpenAI decided to use an aviation metaphor to describe their system in the first place.

Unless Codex improves dramatically in the next few years, human software developers’ jobs are safe. But given the potential efficiency gains, companies and individual developers should begin to explore centaur technologies today. If you’re a developer, brush up on skills like prompt engineering, and apply for access to systems like Copilot and Codex so you can get early experience working with them. If you lead a technology company, start thinking about how embracing centaurs could make your own software development workflows more efficient. And if you teach computer science or coding, start educating your students about AI systems and hybrid centaur approaches today, so that they’re prepared to work with platforms like Codex or Copilot when they enter the job market.

Systems like Codex may fail when they’re pitted against a skilled human developer. But as Codex and its ilk improve, humans who transform themselves into centaurs by combining their skills with advanced AI are likely to become a powerful—and perhaps unstoppable—technological force.

The Conversation (1)
ravi sawhney
ravi sawhney30 Sep, 2021

Fully agree with your conclusion. This technology can be used an 'assist' rather than 'replace' for software engineers today. For example, the efficiency gains could be realized through faster on-boarding onto engineering teams when/if OpenAI expands to offering fine-tuning for Codex as it does for GPT-3 today. Wrote my thoughts here[1] with some examples of the technology in-action.