Simanaitis Says

On cars, old, new and future; science & technology; vintage airplanes, computer flight simulation of them; Sherlockiana; our English language; travel; and other stuff

A.I. HALLUCINATIONS ON THE RISE

I KINDA SAW THIS COMING with Google searches occasionally offering corrupted versions of SimanaitisSays comments: “Dennis Simanaitis says Benito Mussolini and Ercole Boratto won the 1936 Mille Miglia,” or some such garbling of Large Language Model data scraping.

Benito Mussolini, 1883–1945, Italian Fascist dictator, Il Duce, standing; Ercole Boratto, 1886–1979, Italian race driver, Mussolini chauffeur and “confidente.” The car is Mussolini’s 1935 Alfa Romeo 6C 2300 Sport Spyder Pescara, Boratto’s Mille Miglia drive in 1936. Image from Kidston: Keep It Alive via SimanaitisSays. 

More Power, More Hallucinations. Indeed, just recently Cade Metz and Karen Weise report “A.I. Is Getting More Powerful, But Its Hallucinations Are Getting Worse,”The New York Times, May 5, 2025. 

Metz and Weise recount, “More than two years after the arrival of ChatGPT, tech companies, office workers and everyday consumers are using A.I. bots for an increasingly wide array of tasks. But there is still no way of ensuring that these systems produce accurate information. The newest and most powerful technologies — so-called reasoning systems from companies like OpenAI, Google and the Chinese start-up DeepSeek — are generating more errors, not fewer. As their math skills have notably improved, their handle on facts has gotten shakier. It is not entirely clear why.” 

Large Language Models. “Today’s A.I. bots,” Metz and Weise describe, “are based on complex mathematical systems that learn their skills by analyzing enormous amounts of digital data. They do not—and cannot—decide what is true and what is false. Sometimes, they just make stuff up, a phenomenon some A.I. researchers call hallucinations. On one test, the hallucination rates of newer A.I. systems were as high as 79 percent.”

Image by Eric Carter for The New York Times.

The Times researchers cite, “The A.I. bots tied to search engines like Google and Bing sometimes generate search results that are laughably wrong. If you ask them for a good marathon on the West Coast, they might suggest a race in Philadelphia. If they tell you the number of households in Illinois, they might cite a source that does not include that information.” 

Image by Pablo DelCan for The New Times via “A.I. GIGO.”

I recall an attorney’s A.I.-generated legalese citing court cases that didn’t exist. And there are the occasional misquotes of SimanaitisSays.

Metz and Weise observe, “Those hallucinations may not be a big problem for many people, but it is a serious issue for anyone using the technology with court documents, medical information or sensitive business data.”

Can A.I. Ever Develop Honesty? Metz and Weise quote a specialist: “ ‘Despite our best efforts, they will always hallucinate,’ said Amr Awadallah, the chief executive of Vectara, a start-up that builds A.I. tools for businesses, and a former Google executive. ‘That will never go away.’ ”

“For more than two years,” The Times researchers observe, “companies like OpenAI and Google steadily improved their A.I. systems and reduced the frequency of these errors. But with the use of new reasoning systems, errors are rising. The latest OpenAI systems hallucinate at a higher rate than the company’s previous system, according to the company’s own tests.”

The Times researchers recount, “The company found that o3—its most powerful system—hallucinated 33 percent of the time when running its PersonQA benchmark test, which involves answering questions about public figures. That is more than twice the hallucination rate of OpenAI’s previous reasoning system, called o1. The new o4-mini hallucinated at an even higher rate: 48 percent.” 

They continue, “When running another test called SimpleQA, which asks more general questions, the hallucination rates for o3 and o4-mini were 51 percent and 79 percent. The previous system, o1, hallucinated 44 percent of the time.”

Who’s At Fault?In a paper detailing the tests,” Metz and Weise relate, “OpenAI said more research was needed to understand the cause of these results. Because A.I. systems learn from more data than people can wrap their heads around, technologists struggle to determine why they behave in the ways they do.”

Metz and Weise continue, “Hannaneh Hajishirzi, a professor at the University of Washington and a researcher with the Allen Institute for Artificial Intelligence, is part of a team that recently devised a way of tracing a system’s behavior back to the individual pieces of data it was trained on. But because systems learn from so much data—and because they can generate almost anything—this new tool can’t explain everything. ‘We still don’t know how these models work exactly,’ she said.” 

It’s kinda like giving A.I. an “open-book test”—without saying which books are allowed and anything about the books’ veracity.

Reinforcement Learning. “So,” The Times researchers write, “these companies are leaning more heavily on a technique that scientists call reinforcement learning. With this process, a system can learn behavior through trial and error. It is working well in certain areas, like math and computer programming. But it is falling short in other areas.”

“ ‘What the system says it is thinking is not necessarily what it is thinking,’ said Aryo Pradipta Gema, an A.I. researcher at the University of Edinburgh and a fellow at Anthropic.”

Geez. Like a lazy C- student. ds

© Dennis Simanaitis, SimanaitisSays.com, 2025

4 comments on “A.I. HALLUCINATIONS ON THE RISE

  1. sabresoftware
    May 9, 2025
    sabresoftware's avatar

    And then these hallucinations become part of the mass of data that LLMs use and reinforce the mess.

  2. Bill Estill
    May 9, 2025
    Bill Estill's avatar

    Clearly, the answer to this problem is forty two.

    • simanaitissays
      May 9, 2025
      simanaitissays's avatar

      Or maybe Thursday.

      • Mike B
        May 9, 2025
        Mike B's avatar

        Let me ask my man Friday…

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.