Wednesday, July 16, 2008

Breaking Captcha Images

I'll start by saying that this overview isn't for everyone. It's intended for those who have a good programming background and hopefully have worked with imaging a bit. Even if you haven't worked with images and pixel manipulation, this may be the answer to some prayers out there when people are asking themselves, "How in the world do I even start to break this thing?!?!". It is important to realize though that many times when advanced warping techniques are used it becomes almost impossible to break, all that means though is that it's -almost- impossible, not impossible

So, what's the purpose of breaking a captcha image? The reasons may vary, but most of the time it's to be able to use a bot to automate some process (what captcha images are meant to prevent). For example, say in HTS, Real 1 there was a way to register at "Uncle Arnold's Local Band Review" that used a captcha image. Well we know by the challenge that we have to get the band "Raging Inferno" up to the top. In a real world situation that didn't have the same type of security flaws as the Real 1 challenge, we could register hundreds of bots that simply vote the band up to the top, and to do this we'd have to break the captcha image at registration.

Remember though, Captcha images are never universal, every different site has their own specialized captcha, so there's no simple "global" fix for all of them. With that said, however, it's easy to take code once you've written it and transfer it into another captcha breaking project.

This overview is meant to establish the groundwork so that you can break captcha images easier in the future. You can use virtually any language, however, I recommend C/C++ or C# just for speed reasons. One of these examples I've done in PHP and it works quite well, though it goes slower than most.

Now lets begin our overview of captcha breaking!


[Step 1: Analyze and Prepare]

This is more of a step that you would take after you have read this entire overview, however, I'll fill you in on it now. When starting to break a captcha, look it over, refresh it several times, and find all aspects of the captcha. Does it use different fonts? Does the background change? Is there a background image? Does the text change from bold to italics? Does the text move around on the image? Is the text a completely different color than the image? What characters/charset does it use? Is it case sensitive? These questions and more are all things you must ask yourself and analyze while looking at the different variations of the captcha image.

Now that we've got a good idea of what's what, we need to be able to start the breaking process. This just depends on what language you want to use, but make sure you have a way to open the image into your language and read all the bits into an array. Whether you do this by looping through all the pixels and putting them into an RGB array, or by using some function like LockBits or GetDIBits. This part is essential to being able to work with the image. Never try to manipulate the image using single pixel functions, like functions that get or set the color of an individual pixel. These functions usually take an extreme amount of time to perform simple tasks. The only time you'd ever use those functions is when you're reading the pixels into an array. Okay, now that you've got the general idea, on to Step 2!


[Step 2: Get rid of the crap!]

A lot of people who write captcha images like to think that they are very crafty and cunning with the garbage they put in to throw you off. Here's a big morale booster... 99% of the time it's just that, crap. You can easily write image filters to go through and wipe out the junk.

Looking for ways to get rid of garbage often times includes looking for patterns in the image. You have to really think hard about what you can and cannot use against them. For example, you come across a captcha image that has black text, but unfortunately it has an image in the background. How do we filter out the text from the image? Simple, write a filter to include only back and colors close to it (when saving in JPG, not all colors will be perfect so you have to account for some variation in color). By filtering out all pixels that aren't close to black, we're left with just the text. One way of thinking is to ask yourself, "How is it possible that I can read this? How come I can distinguish the text from the garbage and noise?". A lot of times these questions will bring you to the answer. Lets look at some examples.



Now, start by asking yourself what you notice in this image. Is it the dark text that jumps out at you? How about the light background? Both of those we can use to our advantage. Now what about those lines? For now, we'll deal with those after we get rid of the background. So we think we have an idea of how to break it... but what happens if they throw something like this at us?



The text is barely visible! Not to mention the amount of noise is cluttering up the screen. Lets think about this, how is it possible that we can read this? Simple, the text is still slightly darker than the background. So, for our filter we'll write it to turn all pixels that are darker than a certain amount to black, and all pixels that are lighter than that certain amount to white. I find that when working with captcha images, it's really nice to be able to convert them to monochrome for working with, since monochrome is just black and white. You can then use a simple 2 dimensional array for the width and height, and just use 0 and 1 for black and white. Here's our result:



Wow, now the text sure stands out! But what about that annoying background noise? Notice how it looks like there are very distinct lines going horizontally. If you look at both the original images very closely, you'll notice they aren't lines, but rows of dots! Getting rid of this is simple, all we have to do is scan the image for a pixel that's white, then a pixel that's black, then another pixel that's white again. By scanning the image for that pattern, we will be able to find and isolate the dots. Since if we look at it, it's actually both columns and rows of dots, we'll do a 2 way filter. One that looks for dots going up and down, and the other left and right. Pseudo code for left-right would look like this:

if (Pixel[x,y] == 0 && Pixel[x + 1] == 1 && Pixel[x + 2, y] == 0)
Then we have a dot in the middle! We could also do another if that flips the black with the white to scan for white dots, but we don't need to now. The same can be done for scanning up and down, just by adding 1 and 2 to the y instead of the x. The last part of our code here is to set the middle dot to white. Here's what we've got now:



Much better, we've eliminated the majority of the background and some parts of those random black lines. A big hint here now on what to do is that you can actually use the same and or close to the same filter that we just wrote above to remove these black lines. If we write something that looks for individual pixels that are not touching more than 3 other black pixels (there are 9 pixels around any single pixel that is not on the border of the image), then we can eliminate almost all of the noise.



Now that's looking really good. Unfortunately here this is the point where the above filter probably ends, since if we go any further and, lets say, try to eliminate pixels that aren't touching more than 5 or 6 black pixels, we'll start eating away too much of the text. Keeping the text close to it's original look is key for cracking captcha images. What we're going to do now is a method that I've come up with which uses Flood-Filling to eliminate random garbage. If you're going to top performance, you can always write your own FloodFill function, or you can find GD libraries that include FloodFill functions. PHP for example has the function "imagefilltoborder" which is exactly what I want. I also decided to write a performance version of this same application in C#, which I wrote my own FloodFill function. So you might ask, how are we going to use FloodFill to eliminate garbage? If we look at the image we have now, we notice that all the garbage is in really small parts, while the text is very thick and large. This gives us an advantage to breaking it, because we can simply go through every black pixel, run a FloodFill on it, count the amount of pixels that got filled, then if it's less than a certain amount... throw it out. The smaller pieces of garbage will only have a pixel count of usually 20 pixels or less, so we write our function to get rid of anything that fits our needs. You may or may not even need this step, however, if you do use it the pixel count will have to be adjusted based off of your image and how much garbage you have. After we run this new filter, our image looks like this:



Alright! Now just to let you know, depending on the captcha, not all the junk needs to be filtered out. This will also depend on the method you choose in Step 3.


[Step 3: Define our letters]

The third step is usually easier than the second. Whereas before we were just cleaning the image up, now we're going to actually define where our letters are on the image. Lucky for us, the letters are still there and pretty thick, so how should we do this? Here are our options:

Method 1: Break the letters into individual cells

1 comment:

Capri said...

Visual image captchas are bad. They block out and discriminate against visually impaired users, punishing them as spammers.

Visual verification that requires you to enter characters in an image you see, or answer a question about what's in an image you see, blocks out anyone with a visual impairment.

Clicking to get a larger image displayed does nothing at all for people with severe vision impairments who cannot even read large print.

Audio captchas are becoming available on a growing number of sites, but even they aren't good enough. The deaf-blind use braille displays and cannot see a picture or hear a corresponding sound.

Captchas force the blind to surrendor what independence they once had on site registration and forms, reducing them to begging a sighted person or site admin for help in account creation, form submittal, group creation, anywhere there is a manditory visual verification code.

As if that wasn't bad enough, Many of these captcha-using sites add further insult to the visually impaired when they demand you to prove you are human by entering in a visual code. If you are blind and you cannot see an image, does that disqualify you as a member of the human race? According to captcha, yes!

This is not a tiny little inconvenience that occurs every once in a blue moon, but an ongoing, day to day problem. Trying to register, make comments, create groups, or fill out any form to completion is a crapshoot if you are visually impaired. If you are on your own, trying to make a submission on a site and you are pressed for time, you are completely out of hope when you run up against a captcha and there is no one you can get to help you.. Site administrators may or may not have time or the desire to help you.

When you find yourself running up against this cyber face-slapping half or more than half the time you try to make submissions to various sites, it is demoralizing. You are told again and again that you are not welcome, you are not human, forced to pester a site administrator or someone else for help with something you could do on your own before, and as far as the site administration goes, you do not exist and are not worth consideration.

It's infuriating and a threat to the dignity of people who are at the mercy of visual verification captchas.

In addition to blind users having the door shut in their faces at sites that use visual captchas, It is evident that spam problems still occur as much as ever on sites that use captchas, proving captcha to be a cure that's worse than the disease.

If a site administrator feels so strongly that they must employ a captcha, there is a newer, truly accessible variety that should be more effective. It prompts you with a question in text format and requires you to fill in the answer. the questions should not require a person to be able to see an image to answer.

Bad examples: Which number in the picture is red?" "Which animal in the picture above has four legs?" How is someone who can't read print and has to rely on a screenreader supposed to know that?

Good examples: "How many legs does a cat have?" "What's 2+2?" Math questions can be asked in a number of different ways to hault a bot and still be accessible to a user. "What's 6 divided by 2?" What's 5 added to 3?" Even "What color is an orange?" is still a good example, because everyone except the bots, sighted or not, knows the answer.