Simanaitis Says

On cars, old, new and future; science & technology; vintage airplanes, computer flight simulation of them; Sherlockiana; our English language; travel; and other stuff

A.I.’S MIRAGE REASONING—ITS LATEST HALLUCINATIONS 

I’VE COME TO RESPECT GARY MARCUS’ knowledge and opinions concerning A.I. His latest analysis  “The Mirage of Visual Understanding in Current Frontier Models,” Substack, March 29, 2026, introduces me to yet another form of Large Language Model hallucinations. Here are tidbits gleaned from this article, together with my usual Internet sleuthing (which, one hopes, is rather more discerning than the LLM process).

A.I.’s Occasional B.S. It has long been recognized that in LLM’s scraping of data collections and algorithmically predicting things, they have been known to er… make stuff up. In a sense, it’s like a kid lying his way out of quandary: It’s done only when seemed necessary.

Medical A.I. Gary Marcus observes, “When a model achieves a ‘top rank on a standard chest X-ray question-answering benchmark without access to any images’ you know something is deeply wrong.” This seems like a new high for “Fake It Till It Makes It,” as described in SimanaitisSays, November 19, 2025.

Marcus cites “Mirage: The Illusion of Visual Understanding,” by Mohammad Asaki, et al., ariv.org, March 26, 2026. From their Abstract: “Multimodal AI systems have achieved remarkable performance across a broad range of real-world tasks, yet the mechanisms underlying visual–language reasoning remain surprisingly poorly understood.”

The researchers recount, “We report three findings that challenge prevailing assumptions about how these systems process and integrate visual information. Frontier models readily generate detailed image descriptions and elaborate reasoning traces, including pathology-biased clinical findings, for images never provided, we term this phenomenon mirage reasoning. Second, without any image input, models also attain strikingly high scores across general and medical multimodal benchmarks, bringing into question their utility and design. In the most extreme case, our model achieved the top rank on a standard chest X-ray question-answering benchmark without access to any images. Third, when models were explicitly instructed to guess answers without image access, rather than being implicitly prompted to assume images were present, performance declined markedly.”

As Marcus notes, “AGI [Artificial General Intelligence] this stuff ain’t.”

A.I. and Blindness. Marcus describes, “This study reinforces what Anh Totti Nguyen has been saying for a long time, in a series of underappreciated papers like “Vision Language Models are Blind” that I keep trying to draw attention to.” 

Yet Many Jobs Require Visual Understanding. Marcus observes, “Also, re the very active discussion on A.I. and jobs: although some white collar jobs (e.g., entry-level coder or market research assistant) may be in near-term jeopardy, many of those that require visual understanding (architect, cartographer, civil engineer, film editor, medical illustrator, urban planner, etc) probably aren’t vulnerable until entirely new techniques are developed.”

The Home Robot? “And humanoid home robots?,” Marcus posits. “Don’t make me laugh. If your humanoid robot can’t understand the visual world, it’s just a demo, and not something you can trust.” 

The thought of a robovac skinning the cat is particularly disturbing.

I sense that Gary Marcus is among the more conservative of A.I. specialists. And, to me, this makes his opinions all the more important in the hellbent profit-chasing that characterizes A.I. these days. 

A Coming Squabble. Not only in Silicon Valley, note, but in the White House’s “President Donald J. Trump Unveils National AI Legislative Framework”as well. For reading the latter press release with its required grain of salt, see also  “California to Impose New A.I. Regulations in Defiance of Trump Call,” The Guardian, March 30, 2026. 

Sources.

Hmm… Which to believe? A White House press release? Or The Guardian?  

Let’s see what Gary Marcus’ Substack has to say as well. ds

© Dennis Simanaitis, SimanaitisSays.com, 2026

4 comments on “A.I.’S MIRAGE REASONING—ITS LATEST HALLUCINATIONS 

  1. sabresoftware
    April 3, 2026
    sabresoftware's avatar

    “The thought of a robovac skinning the cat is particularly disturbing.”, that probably explains one of our cats, Jasper’s, hate for our robot vacuum. Mind you in that particular battle it’s more likely that the vacuum will get skinned by Jasper.

    My current experience with AI proves that the data set for the AI is critical, as shown in two items below:

    1. User support chatbots – mostly useless as all they do is base their answers on the most basic type of FAQ queries, mostly from clueless users. Any more complex questions usually elicit either “please rephrase your query” responses, or totally incorrect (hallucinated) responses giving instructions to use commands or features that are not actually incorporated into the current version of the software. I always end up requesting a human agent, although often that is also pointless because they are reading from a prepared script that excludes anything more complex than the basic issues – grade FAIL;

    2. Recently myHeritage has added an AI feature for reading old documents (in my case hand written German church records). This actually works fairly well, although not flawless, but I have been able to interpret these often difficult records without having to resort to paying professional experts. I’ve noticed that sometimes they get the names wrong (but as there is no standard “dictionary” of names unlike general words I’m not surprised or bothered), although I can usually visually make out the names. Where it helps is with the other more standard words that I have difficulty making sense of the absolutely horrid handwriting of some of the scribes. I have used the tool for interpreting about a couple dozen records and have found that the results are pretty good, even matching the human translated records that I had previously paid to have interpreted/translated. – grade PASS.

    My big fear for general purpose AI is that all the hallucinations and deliberate misinformation out there will be incorporated into the LLM database, basically ensuring more hallucinations and misinformation down the road (GIGO).

    AI tools designed to work with curated data models (such as old German handwriting), or medically certified diagnostic data, stand a better chance of being useful tools.

    It reminds me of the advent of CAD (computer assisted design). In the early days the CAD systems were not very sophisticated and were very cumbersome to use. Yet in more than a few instances we heard management types say “Good, now we can get rid of our expensive draughtsmen and replace them with CAD operators”. My department manager and I looked at each other and said “we’re screwed”.

    As we switched from draughtsmen to CAD operators, we definitely observed a serious drop in the skill set, but what we found over time was that the best users of CAD were former manual draughtsmen that saw the potential in CAD, learned the more advanced features and were better able to improve productivity and quality compared to people drained as CAD operators and not trained in draughting/design skill sets.

  2. simanaitissays
    April 3, 2026
    simanaitissays's avatar

    Thanks for this, Sabre. Your CAD comments are particularly interesting. It appears there’s still hope for creative people.

    And I agree completely about GIGO, Garbage In/Garbage Out. —ds

  3. sabresoftware
    April 3, 2026
    sabresoftware's avatar

    One puzzle for me is that lately I am not receiving your daily posts in my mail inbox, but I do get the comments. Currently I just have to make the effort to read your posts in JetPack, which does have the benefit of letting me send comments without all the problems when commenting via the link in the email or on the website.

    • simanaitissays
      April 3, 2026
      simanaitissays's avatar

      Sorry to say, I have no idea why this should be a problem. I have another regular reader whose problem is posting. I might try a tech chat with WP.

Leave a reply to sabresoftware Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.