The End of AI Scaling A New Reality
In this Open Questions column, guest writer Cal Newport explores a critical question facing the tech world today. Much of the recent excitement and fear surrounding artificial intelligence can be traced to a January 2020 OpenAI report on scaling laws. Led by researcher Jared Kaplan, the team suggested that unlike previous assumptions, language models would not just memorize data but would continuously improve as they grew larger. This improvement was predicted to follow a power law, an aggressive hockey-stick-shaped curve.
This theory implied that building bigger models with more data would lead to shockingly good results. A few months later, the release of GPT-3, which was ten times larger and significantly better than GPT-2, seemed to prove this scaling law. The concept of Artificial General Intelligence (AGI) suddenly felt within reach.
The Dawn of the Scaling Law
The belief in scaling to AGI quickly became industry doctrine. OpenAI's CEO, Sam Altman, proclaimed an unstoppable technological revolution that would generate unimaginable wealth. However, when A.I. entrepreneur Gary Marcus questioned these scaling laws in 2022, suggesting they were mere observations that might not hold, the backlash was intense. Marcus recalled being ridiculed by industry leaders from Sam Altman to Yann LeCun and effectively "excommunicated" from the machine learning community. The release of ChatGPT and the subsequent GPT-4, which inspired a paper titled "Sparks of Artificial General Intelligence," only solidified the hype, leading to an eighty percent jump in venture-capital spending on A.I.
Cracks in the Curve
After the triumph of GPT-4, however, progress appeared to slow down. OpenAI didn't release a major new model for over two years, instead focusing on specialized updates. Whispers grew within the industry that the scaling law was faltering. Ilya Sutskever, an OpenAI co-founder, even remarked in November that the age of scaling was over and everyone was looking for the next big thing. Yet, these observations were largely drowned out by the bold claims of other leaders. Anthropic's CEO Dario Amodei predicted that half of entry-level white-collar jobs could be "wiped out" in the next five years, while both Altman and Mark Zuckerberg claimed their companies were close to developing superintelligence.
The GPT-5 Reality Check
The much-anticipated release of GPT-5 last week served as a major reality check. While it showed some improvements, such as better code generation for a custom Pokémon chess game, it underperformed its predecessor GPT-4o on other tasks like creating YouTube thumbnails. Users quickly expressed disappointment on platforms like Reddit, with one calling it the "biggest piece of garbage even as a paid user." In an Ask Me Anything session, OpenAI's team found themselves on the defensive. Marcus summarized the launch as "overdue, overhyped and underwhelming." In the wake of GPT-5's launch, the bombastic predictions about A.I. have become harder to believe, and the more moderate views of critics seem increasingly plausible.
A Pivot from Scaling to Tweaking
OpenAI's struggles were not new. By the spring of 2024, their next major model, code-named Orion, was yielding disappointing results. The performance increase was far smaller than the leap between GPT-3 and GPT-4. This cemented the fear that the scaling law was indeed failing. With diminishing returns from simply building bigger models, tech companies needed a new strategy: post-training improvements.
A useful metaphor is a car. Pre-training builds the car; post-training soups it up. If GPT-3 was a sedan and GPT-4 was a sports car, the industry has now shifted its focus from building faster cars to becoming expert mechanics. Techniques like reinforcement learning are used to refine a pre-trained model for specific tasks. This pivot was reflected in releases like OpenAI's o-series models and Anthropic's Claude family. Even Elon Musk’s xAI, after its massively pre-trained Grok 3 failed to dominate, embraced post-training for Grok 4. GPT-5 is not a new car, but rather an attempt to integrate these souped-up models into one package.
Are We Measuring Real Progress?
Has this new approach put us back on track to AGI? While OpenAI's announcement for GPT-5 was filled with charts showing outperformance on various benchmarks, these improvements feel narrow, more like a software update than a true expansion of capability. Furthermore, some benchmarks may not measure what they claim. An Apple research paper titled “The Illusion of Thinking” found that state-of-the-art models collapsed when puzzle complexity increased slightly. Researchers at Arizona State University were even more blunt, calling A.I. reasoning "a brittle mirage." As Marcus noted, better benchmark scores haven't necessarily translated to greater real-world utility for businesses.
The Economic Fallout of AI Hype
If the moderate view is correct, A.I. will see steady but gradual advances. It will become a useful tool for specific tasks but may not massively disrupt the job market. Skeptics like technology analyst Ed Zitron and linguistics professor Emily Bender predict a fifty-to-hundred-billion-dollar market, not a trillion-dollar one. This contrasts sharply with current market valuations. Zitron pointed out in a recent article that the "Magnificent Seven" tech companies have spent hundreds of billions on A.I. capital expenditures for comparatively little revenue, creating a potential economic bubble.
A More Moderate Path Forward
Even A.I. moderates believe we should not let our guard down. The renewed investment could lead to other breakthroughs, and Marcus believes A.G.I. could still be attainable in the 2030s through new techniques. This potential lull offers a critical opportunity to prepare for future disruptions by developing effective regulations and the field of digital ethics. The original 2020 scaling-law paper included a crucial caveat: "we do not have a solid theoretical understanding for any of our proposed scaling laws." They worked until they didn't. The journey to teaching computers to think remains a mystery, and it's one we should approach with less hubris and more care.