search
Carter Cole LinkedInCarters Twitter PageCarter Cole on Facebook Carter Coles RSS
Showing posts with label captcha. Show all posts
Showing posts with label captcha. Show all posts

Sunday, March 28, 2010

An attack on a Flickr based photo Captcha

The other day i was minding my own business running around the internet and then i came across a blog with an awesome picture CAPTCHA... i wanted to know how it was done and took a peek at where the image was served from... Flickr!



So how would I have created this service? Tags on the images from the API... so if we can find the photo used then we can reverse the process and break the CAPTCHA...

Well lucky us theres an API that does exactly what we need... the urls for the images look like this...
http://farm4.static.flickr.com/3503/3836827765_2d7f39811d_s.jpg

first number is the picture id and the second is the secret... we pass these to the flickr.photos.getInfo (documentation) and get back exactly what we need... the picture's info and tags associated with it...

ive wired this up so it will pull a random test from the CAPTCHA's server and boom we can not only break the system but we can bypass it... this will return the answer every time :)

so just to be clear this page is actually breaking a captcha each time it loads... its pulling the remote captcha, parsing the results and sending off requests to Flickr to pull the tags for each image detected (all in javascript)... the green borders represent the images it has detected as answers to captcha

The code that gets displayed must be viewed on the original post


I actually really liked using this captcha it went must faster than other ones but the problem is I was able to reverse the process of image selection and break the CAPTCHA... i knew this wasn't the first of the Flickr based system that I had heard of so i went out and found another one but it was protected... it proxies the image through a PHP script on the blog to hide the original Flickr url and prevent my attack from working...

This WordPress plugin has a few thousand users and i was able to bypass the test in just a couple minutes, this just further proves the idea that security is hard because you have to fix every hole in the system and the hacker only has to find one.

Sunday, November 8, 2009

Break a simple image CAPTCHA... its not that hard

CAPTCHA always seemed like it was a kind of one word challenge to me. it stands for Completely Automated Public Turing test to tell Computers and Humans Apart and its what is the industry standard to try and keep my bot from scraping your web service or posting dirty comment spam but some are very crazy... even too hard for a human to read... and how hard are they really to break? I have done allot of reading about the theory of the CAPTCHA and tried to break one before (see as far as i got below i stopped at anti-alias rotating but segmentation and cleaning the image was done so i didn't have that much more)

partially broken captcha with rotated image and random lines
points to whoever can tell me where this captcha came from...



so for my second i wanted to choose an easy one. the CAPTCHA i choose had these features

  • image has changing static (easy to filter)
  • no letter rotation
  • fixed width font
  • pattern to solution (letter-number-letter-ect)
fixed width font is the worst offender on this list. it lets you eliminate the hardest part of breaking a test, its called segmentation and its hard, once you get it down to one letter tho OCR is normally accurate to like 97% but these bots are sending literally millions or requests so a captcha is considered broken if it can be solved even half the time. that brings me to the second problem with this test... i can validate my answers. because every phrase follows a letter number letter pattern i can check to see if my bot got the right answer without sending it. This is never good because it lets the attacker check their work but lets get into how i broke this one

The captcha i broke with 94% accuracy
(sample CAPTCHA with mouse-over effects)
the first thing that is meant to mess up a bot is the noise and theirs wasn't so good. first it didn't disrupt the letter that much AND if i requested the image again i got back the same riddle with different static. This let me make a filter to extract only the pixels that occur in both images giving me the clean letters. i have read about neural nets and used examples but never implemented my own so i decided to go with the easy way of guessing the letters. i created templates that represented the perfect symbol and them compared them byte by byte to determine which letter it was most like...

heres one of my templates for the number 5

111111101
110000001
110000101
110111001
111001101
000000111
000000111
110000111
011001101
001111001

heres what the program spits out while solving the test data...
I Guess its a Y with a 97.50 %
I Guess its a 8 with a 98.75 %
I Guess its a L with a 100.00 %
I Guess its a 2 with a 98.75 %
I Guess its a Q with a 100.00 %
I Guess its a 5 with a 100.00 %
I Guess its a G with a 98.75 %
I Guess its a 7 with a 97.50 %
I Guess its a Q with a 97.50 %
I Guess its a 6 with a 98.75 %
I Guess its a O with a 100.00 %
I Guess its a 8 with a 100.00 %
i was right! it was Y8L2Q5G7Q6O8
I Guess its a Y with a 97.50 %
I Guess its a 8 with a 98.75 %
I Guess its a L with a 100.00 %
I Guess its a 2 with a 98.75 %
I Guess its a Q with a 100.00 %
I Guess its a 5 with a 100.00 %
I Guess its a G with a 98.75 %
I Guess its a 7 with a 97.50 %
I Guess its a Q with a 97.50 %
I Guess its a 6 with a 98.75 %
I Guess its a O with a 100.00 %
I Guess its a 8 with a 100.00 %
i was right! it was Y8L2Q5G7Q6O8
(i cut out a bunch of them Google was saying that my site had relevance for the work "guess" oops :)
and at the end

i checked a total of 265 files with a success rate of 94.34 %


im very happy with its accuracy (especially because im not using neural net) below are some more screen shots from the app i made to look at the data and test my solver...


you can see above the program knows where the number 8 is because of fixed width font. this made it easy to extract just the letter to be analyzed

this is the training data input screen. the text box validates the input as valid solutions changing color when the pattern isn't followed so bad training data isn't entered by mistake


this was a very simple test to break and it had very weak security features that let even a simple attack defeat it with great accuracy. if you are interested in other poor captchas you may be interested in my post about text based math captchas and why they are so easy to bypass aswell.

i plan to be doing some more work with captcha breaking (ill probably step it up and take on one that needs a neural net) soon i wrote this code a few month ago and its been just sitting so i thought i would share what i learned and how simple it can be to break one of these. i don't like captchas because they are like locks on doors, they only keep the honest people out. bandwidth is getting cheaper and cheaper you should encourage people and companies to consume your services and learn to monetize the traffic not implement stupid little pictures that a good bot can read anyways that just waste the humans time as they are trying to figure out if its a 0 or a capitol O. id love to hear your thoughts on the subject and will respond if i can help so please take the time to write a comment if you have any questions

Saturday, August 15, 2009

Math captcha dont work... why textual captcha are FAIL

there are tons of captcha solvers and sites that break images like here but i have been seeing a rise in math captchas so i wanted to real quick discuss something that i thought was kinda funny. its the idea that simple math problems are difficult for bots to solve. i found math captcha or a text based captcha that i thought would be really easy to solve so i decided i would break the captcha real quick.

basically an image captcha had a "textual riddle" version of the code in its alt tag.

one of the most difficult lines it returned was
(((((??? - 1) - 7) - 8) * 8) - 4) = -76
but i knew it had to be a number from 0 to 9 so i wrote a function to spit this out

computers are really good at math this takes no time to create and execute this simple vb
code:

so now this captcha whose images are actually kind of hard to segment and classify is broken because of 4 lines of vb code

im working on cracking my second image captcha this time the letters aren't fixed with and have a rotation. i plan to use a feed forward back propagation learning neural net so ill let you know how that goes and hopefully get to post again about another captcha ive cracked

you may also find my breaking of image captcha article interesting too...