Simanaitis Says

On cars, old, new and future; science & technology; vintage airplanes, computer flight simulation of them; Sherlockiana; our English language; travel; and other stuff

FAKE IT TILL IT MAKES IT

HALLUCINATIONS MADE BY ARTIFICIAL INTELLIGENCE are rooted in the adage above. Celina Zhao recounts, “A.I. Hallucinates Because It’s Trained to Fake Answers It Doesn’t Know,” AAAS Science, October 28, 2025. “Teaching chatbots to say ‘I don’t know’ could curb hallucinations,” she posits, “It could also break A.I.’s business model.” 

A Nagging Issue. Even as OpenAI—founded as a nonprofit, now valued at $500 billion—completes its long-awaited restructuring, Zhao observes, “a nagging issue with its core offering remains unresolved: hallucinations. Large language models (LLMs) such as those that underpin OpenAI’s popular ChatGPT platform are prone to confidently spouting factually incorrect statements.”

Image from aibusiness.com.

How Come? “These blips,” Zhao says, “are often attributed to bad input data, but in a preprint posted last month, a team from OpenAI and the Georgia Institute of Technology proves that even with flawless training data, LLMs can never be all-knowing—in part because some questions are just inherently unanswerable.”

Zhao explains, “The root problem, the researchers say, may lie in how LLMs are trained. They learn to bluff because their performance is ranked using standardized benchmarks that reward confident guesses and penalize honest uncertainty. In response, the team calls for a rehaul of benchmarking so accuracy and self-awareness count as much as confidence.” 

The Tradeoff. “The awkward reality,” Zhao says, “may be that if ChatGPT admitted ‘I don’t know’ too often, then users would simply seek answers elsewhere. That could be a serious problem for a company that is still trying to grow its user base and achieve profitability.”

Accuracy hurting profitability?? Alas, this is a distressing assessment of the A.I. business model.

A Problem in Pretraining? Zhao writes, “Hallucinations begin during what’s known as pretraining, when the model first ingests massive amounts of text and begins to learn how to statistically predict the next word in a sequence. As Arizona State University computer scientist Subbarao Kambhampati puts it, an LLM fresh from pretraining amounts to an autocomplete tool ‘on steroids.’ This early-stage model can handle straightforward patterns such as grammar or spelling with ease, but it can still go astray when asked to answer tricky factual questions.”

More Than That. Zhao continues, “To explain why pretraining alone can’t keep an LLM on the straight and narrow, Georgia Institute of Technology theoretical computer scientist Santosh Vempala and his colleagues reimagined the problem: When prompted with a sentence, how accurate is the LLM when it’s asked to generate an assessment of whether the sentence is fact or fiction? If a model can’t reliably distinguish valid sentences from invalid ones, it will inevitably generate invalid sequences itself.”

I recall the old computer saw: GIGO; garbage in, garbage out. Only now, the layers of possible garbage are deep indeed.

Needed: Some A.I. Humility. During post-training, an A.I.’s veracity is assessed by benchmarks, standardized tests scoring how well models answer thousands of questions. Zhoa observes, “Of the hundreds of benchmarks available, only a few systematically test how often a model hallucinates facts. The researchers call for reworking all benchmarks to penalize a model for guessing incorrectly. That could embed a ‘school of hard knocks’ intuition in the model, teaching it humility.”

OpenAI/Georgia Tech Observations. The technical paper  “Why Language Models Hallucinate,” arXiv.org, September 4, 2025, is by Adam Tauman Kalai, Ofir Nachum, and Edwin Zhang, all of OpenAI, and Santosh S. Vempala, Georgia Tech. They write in the Abstract, “Like students facing hard exam questions, large language models sometimes guess when uncertain, producing plausible yet incorrect statements instead of admitting uncertainty. Such ‘hallucinations’ persist even in state-of-the-art systems and undermine trust. We argue that language models hallucinate because the training and evaluation procedures reward guessing over acknowledging uncertainty, and we analyze the statistical causes of hallucinations in the modern training pipeline.”

Their Conclusions: “This paper,” they claim, “demystifies hallucinations in modern language models, from their origin during pretraining to their persistence through post-training. In pretraining, we show that generative errors parallel misclassifications in supervised learning, which are not mysterious, and naturally arise due to the minimization of cross-entropy loss.”

It is perhaps an oversimplification to recall GIGO.

“Many language model shortcomings,” they continue, “can be captured by a single evaluation…. Simple modifications of mainstream evaluations can realign incentives, rewarding appropriate expressions of uncertainty rather than penalizing them. This can remove barriers to the suppression of hallucinations, and open the door to future work on nuanced language models, e.g., with richer pragmatic competence (Ma et al., 2025).”

Zhao’s Conclusions. AAAS Science’s Celina Zhao quotes Hao Peng, a specialist at the University of Illinois Urbana-Champaign: “I’m a bit pessimistic if there’s any set of data or metric that would naturally fix hallucination. These models are just so good at gaming whatever we’re optimizing them for.”

Zhao continues, “In the meantime, no A.I. company wants to be the first to break long-standing industry norms and risk its users migrating to seemingly more confident—and more productive—competitors.” She quotes Arizona State University computer scientist Subbarao Kambhamapati: “If LLMs keep pleading the Fifth , they can’t be wrong. But they’ll also be useless.” 

In a sense, it’s time for A.I. to grow up and stop crossing its fingers. ds 

© Dennis Simanaitis, SimanaitisSays.com, 2025 

3 comments on “FAKE IT TILL IT MAKES IT

  1. Andrew G.
    November 19, 2025
    Andrew G.'s avatar

    The mention of University of Illinois reminded me that the fictional HAL 9000 was also built at a lab in Urbana. In “2001: A Space Odyssey”, [Spoiler alert!] HAL was given secret conflicting orders that resulted in his decision to murder the spaceship’s human crew to ensure a successful mission. Maybe in addition to being truthful about having insufficient data, AI needs to incorporate Asimov’s Three Laws of Robotics before someone gets hurt.

    • simanaitissays
      November 19, 2025
      simanaitissays's avatar

      Another UIUC tidbit: Bruce Artwick was there when he devised what evolved into Microsoft Flight Simulator. (Many thanks, Bruce!)

  2. vwnate1
    November 19, 2025
    vwnate1's avatar

    SKYNET is watching you…..

    -Nate

Leave a reply to simanaitissays Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.