Google reCAPTCHA cracked

recaptcha_1483594c (Custom)Despite denials from Google, a security researcher continues to assert that the Search King’s reCAPTCHA system for protecting Web sites from spammers can be successfully exploited by Internet junk mail panderers.

Researcher Jonathan Wilkins published a paper recently that included an analysis of reCAPTCHA’s security. In automated attacks he conducted against the system, he reported he had an alarming success rate of 17.5 percent.

CAPTCHA–which stands for Completely Automated Public Turing test to tell Computers and Humans Apart–is a method for foiling automated attacks by spammers on Web sites. Before a Net surfer can perform at a site a task, such as setting up an email account or adding comments to a blog posting, he or she is presented with the image of a word or phrase that has been distressed in some way. The warped image is intended to thwart scanners and optical recognition software programs used to automate the compromising of web sites by spammers. The idea is that humans can read the characters in the image and type them into a form while machines can’t.

Some simple math reveals just how alarming Wilkins’ findings are. The operator of even a modest botnet of 10,000 machines would be perfectly happy with a success rate of 0.01 percent. That would mean 10 new gmail accounts could be created every second or 864,000 new accounts a day from which spam could be launched.

Google counters that Wilkins test targeted an old form of reCAPTCHA from 2008 that’s been changed. “[T]his study does not reflect the effectiveness of reCAPTCHA’s current technology against machine solvers,” a Google spokesperson told The Register. “We’ve found reCAPTCHA to be far more resilient while also striking a good balance with human usability, and we’ve received very positive feedback from customers.”

Wilkins acknowledged that his initial tests were on an older version of reCAPTCHA, but since that time, he has conducted tests on the new images produced by the system and found them to be even weaker than the older ones. In one of his original tests on the system, his success rate was five in 200. When that test was run on the new reCAPTCHA, the rate was 23 in 100.

The major difference between the old and new versions of reCAPTCHA, according to Wilkins, is the use of horizontal lines to obscure the characters in the image. While the use of the lines makes it harder for machines to recognize a reCAPTCHA phrase–although Wilkins asserts the lines can be subverted easily by spammers–it also makes the phrase harder to read by humans, too. New reCAPTCHA images drop the lines but add distortion to the image. They’re easier to read for humans, but, alas, they’re also easier for machines to crack.

Unlike most CAPTCHA systems, Google’s uses images with two words. That’s because Google uses reCAPTCHA for two purposes. Like other CAPTCHA systems, it’s designed to frustrate spammers, but it’s also incorporated into Google’s efforts to digitize books. When a word in a book scan can’t be recognized by Google’s OCR software, it’s sent to the reCAPTCHA pool. So when a person enters a reCAPTCHA phrase into a form, Google can discover what its OCR program couldn’t, without having to hire human editors to review scanning results.

One weakness of CAPTCHA schemes, though is that they use words that can be found in a dictionary. This makes it easier for machines to crack the phrases because they have something to compare them to for errors.

In addition, reCAPTCHA uses a “one-off” system. That means a letter in a word can be incorrect, and it will still be accepted by the system.
So if the reCAPTCHA phrase contains the word “meat” and a Webster enters “peat,” his or her response will still be interpreted as a valid one.

Some alternatives to CAPTCHA avoid words entirely. Microsoft, for instance, has developed a scheme called Asirra that is totally based on images of cats and dogs. To perform a task protected by Asirra, a netizen is presented with an array of 12 pictures and asked to identify each as either a canine or feline. This method is called Human Interactive Proof, or HIP.

To be effective, HIP systems need to be supported by large databases that tax the computational power of an attacking spammer. Microsoft does that by using the picture database at Petfinder.com, which contains some three million photos.

Written by John P Mello Jr

John Mello is a freelance writer who has written about business and technical subjects for more than 25 years. He is frequent contributor to the ECT News Network and his work has appeared in a number of periodicals, including Byte magazine, PC World, Computerworld, CIO magazine and the Boston Globe

0 Comments

  1. Moiz · January 8, 2011

    its true that reCaptcha haven been successfully and I have received hundreds of spam posts on my forum

  2. Brandon Sheley · January 8, 2011

    Thanks for the update, we’re wondering why so many spammers were getting through to the boards.

  3. Virtualization · January 9, 2011

    Why is google denying? I run 8 or 9 forums which use reCaptcha and for past week or so there have been bots registering and posting non-stop. Since then I’ve switched to other methods to verify and automatic registerations have stopped! Another friend of mine with 4 boards has had the same issue.

    Google, reCAPTCHA is NOT working… get on with the program and fix the damn thing.

  4. blackspot · January 10, 2011

    @Virtualization

    What other type of bot prevention would you recommend?
    In the past two weeks my IP board was flooded with bots registering and posting, regardless of reCaptcha!!!

    Any suggestion is welcome :-)

  5. Doctor · January 11, 2011

    I had to switch to Q/A authentication system. reCAPTCHA is definitely broken. =(

  6. Robby · January 11, 2011

    Ya, I was wondering why I was getting so much more then average amounts of spam accounts on my VB site. This explains it. I’m changing my sites verification to a question. I know I will have to change the answers every once in a while but worth it to try and slow down spam.
    Thanks for the info.

  7. Jonathan · January 11, 2011

    I was wondering why I was getting so many spam bots. My board is literally full of them and I have captcha, as well as email notification on as well!

  8. bugme143 · January 12, 2011

    Q/A is question/answer?

    and its quite funny how many people actually click on “p3n1s 3nl4rg3r” ads and messages.

  9. Don · January 15, 2011

    > 10,000 machines would be perfectly happy with
    > a success rate of 0.01 percent. That would
    > mean 10 new gmail accounts could be created
    > every second or 864,000 new accounts a day
    > from which spam could be launched.

    That’s assuming Google has no other techniques for mitigating the problem. How many times must an IP address fail a captcha in some period of time before it’s added to some suspect list or denied entirely?

    There’s also the factor that your browser cookies allow Google to identify how many browsers are visiting from the same IP address. Spammers then have to trade off between many visits to Google’s servers from each cookie, and the number of browsers apparently connecting from the same IP address.

    I’d say it’s a serious problem, but 0.01 wouldn’t have me worried.

    Frankly I think there is better technology on the horizon than trying to detect spam, vis a vis, detecting human/non-human. I for one welcome any non-organic Internet user that has something of value to contribute to my inbox.

  10. Dean · January 16, 2011

    Yes i have used recaptcha for many years and for the past few weeks it seems i am getting tons of spam as before recaptcha used to work a treat.

    I have had to develop some anti-spam features of my own as recaptcha is no longer upto scratch :(

  11. הפצת קורות חיים · January 18, 2011

    q/a is easy to break, i guess a combination of a few methods together would lower the crack statistics dramatically

  12. KeyCaptchist · January 20, 2011

    Even if captcha is not cracked, it should also prevent retransmission itself by bots to sweatshops for human solving like it is assured by KeyCAPTCHA. BTW, keycaptcha was never cracked.

  13. Dr. Shoeb · January 22, 2011

    I am also facing the severe problem of 100s of spam registration per day which are able to pass the recaptcha codes.
    Was wondering about sudden increase in spam, but got confirmation from multiple forum admins, that its global “RISE OF SPAM”. ;)

  14. Shaun Childerley · February 6, 2011

    It seems google has been exploited, while that has happened, I now have to find a reliable spam protection program, May develope my own.

  15. Dr. Klahn · February 9, 2011

    There’s no need for spammers to crack ReCaptcha. There’s a guy named Kermit Welda on Amazon Mechanical Turk who pays 5 cents for people to register fake AOL, Hotmail and Gmail accounts. He registers about 2000 of each a day. $100 for 2000 fake accounts – hey, that’s a minor cost of doing business expense.

  16. Grindlay · February 10, 2011

    Still not clear if reCAPTCHA has actually been cracked or whether a bunch of seven year-olds are being paid 5 cents per 1,000 captchas solved. The astonishing rise in the proportion of Gmail spammers suggests that something has changed. Google will have the stats but are probably too scared at the moment to go public with them.

  17. Frank Parkinson · February 10, 2011

    I used reCaptcha until a few weeks ago when I got flooded with spam on a phpBB forum, so I know it is cracked.

    I now require email authentication and I use standard phpBB Q&A plus an automatic check against StopForumSpam database. In several weeks of running these checks I have had only two (human) spammers!!

  18. I Agree · February 13, 2011

    I agree, I just enabled reCaptcha in the latest version of phpBB and in less than a few hours I had about 30-40 spam accounts. All accounts were created sequentially in a matter of 3-4 minutes between each of them.

    Frank, what is the name of the mod that checks against StopForumSpam?

  19. Frank Parkinson · February 15, 2011

    You can see the method and code by following this link to the phpBB support forum:-
    http://www.phpbb.com/community/viewtopic.php?p=12950926&sid=96eb84fc2f14bed2230aa99b798956bb#p12950926

    It is not a “MOD” in the sense that phpBB mean.

  20. Richard Lang · September 6, 2011

    I have tried two or three times to enter contest with this stupid recapsha code entry words. One word is strait up and the other word is an anagram?
    I try and try with no success. I understand it is to prevent people with automated contest programs with flooding the contests with entries. But I type the squiggle code exactly as I see it. Capitols, small letters, commas, periods spaces and it never works. What’s the deal with them ? Don’t they want people to enter their contests at all?

  21. Lun Chongthu · March 6, 2013

    Ask to type captcha code. But code not given. What to do so as to open google search

Leave A Reply