Simanaitis Says

On cars, old, new and future; science & technology; vintage airplanes, computer flight simulation of them; Sherlockiana; our English language; travel; and other stuff


CAPTCHAs are Completely Automated Public Turing tests to tell Computers and Humans Apart. We all encounter these visual puzzles used as website defenses against robo intrusion. Our solving the puzzle correctly confirms to the website that we are indeed real people. Alan Turing and CAPTCHAs made their initial appearance here at SimanaitisSays more than three years ago. Today, I describe how researchers in artificial intelligence have conquered CAPTCHAs with enhanced AI.

Treatment of a variety of CAPTCHAs by enhanced AI. Image from Science, December 8, 2017

“Brain-Inspired Vision Model Cracks Website Protection System, With Much Less Training Compared To Deep Learning,” by Dileep George et al, is online at Scienmag as well as described in the December 8, 2017, issue of Science, the weekly magazine of the American Association for the Advancement of Science. The full paper is available to AAAS members.

Researchers achieved this CAPTCHA deciphering by incorporating other aspects of human intelligence into their AI. They note that recognizing composition, generalizing from this, and learning from a few examples are three hallmarks of human intelligence. In fact, humans are very good at it.

The letter “A.” Image by Vicarious AI on Scienmag online.

Originally, AI computers responded to their human programmers’ instructions. Then machine learning took this a step further, with AI improving upon its programmed algorithms. Last, with deep learning, the AI learns by evaluating scads of examples, rather than simply reacting to pre-programmed direction.

Dileep George and his colleagues observe that “A recent deep-learning approach for parsing one specific CAPTCHA style required millions of labeled examples, whereas humans solve new styles without explicit training.”

To perform AI more efficiently, the researchers looked to neuroscience and devised something they call a recursive cortical network. An RCN is a “probabilistic generative model for vision in which message-passing-based inference handles recognition, segmentation, and reasoning in a unified manner.”

A four-level RCN representation of the letter “A.” Compare this to the illustration above. This and the following images from “A Generative Vision Model That Trains with High Data Efficiency and Breaks Text-based CAPTCHAs,” D. George et al, Science, October 26, 2017.

The researchers report that “RCN was effective in breaking a wide variety of CAPTCHAs with very little training data and without using CAPTCHA-specific heuristics. By comparison, a convolutional neural network [a form of deep learning] required a 50,000-fold larger training set and was less robust to perturbations to the input.”

Variations of an “A,” with different deformability settings, each determined by pooling and lateral perturbation factors.

The RCN approach handles clutter more efficiently than conventional deep learning. It also generates realistic responses when confronted with one-shot training of handwritten characters. Researchers note, “RCN outperformed state-of-the-art deep-learning methods while requiring 300-fold less training data.”

The researchers also see other applications of RCN parsing. For example, the approach permits recognition of multiple real-world objects in cluttered scenes with random background.

These detections and segmentations show RCN extended beyond text parsing of CAPTCHAs.

“In addition,” the researchers conclude, “our model’s effectiveness in breaking test-based CAPTCHAs with very little training data suggests that websites should seek more robust mechanisms for detecting automated interactions.”

Before long, we may have to do more than simply “Identify the boxes containing store fronts.” ds

© Dennis Simanaitis,, 2017

3 comments on “GOTCHA, CAPTCHA!

  1. Mark W
    December 22, 2017

    Do these things really help prevent robots from responding to messages on websites (and why would C3PO be doing that anyway?) Some of them are so hard to read because of distortion, and figuring out if the set of pictures includes every conceivable bridge or sign or whatever is sometimes just a guess. They might be aging out some of us with more “mature” eyes….

    • simanaitissays
      December 22, 2017

      Apparently they work. I agree, some are particularly stupid and remind me of playing cards with a dotty old aunt: “Select boxes with street signs.” Does the sign post count? Is that sign a “street sign” or merely a sign?

      • Mark W
        December 22, 2017

        Thanks, I try to stay current, but think I’m reaching my point of obsolescence sometimes. May consider having a chip inserted in my brain (which would make me a cyborg)! Could I do those capture things then?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: