Thursday, March 3, 2011

Passwords, Hashes and Dictionary Attacks

Cryptology has been a favorite topic of research for me over the past 10 years. Ever since geometry class, I have been obsessed with mathematics, patterns, and number theory. I enjoy turning one value into a totally different value and back again. That may sound like nonsense, but that is exactly how cryptology (and thus password systems) work.

I am going to go into detail about one of the most popular password storage and verification processes in use on the internet, and in offline systems. This method is called a hash, specifically an MD5 hash. I will also outline 2 of the methods that hackers can use to compromise password.

A big part of password security is never actually storing a password. A password is not stored by itself, meaning in plain text. If your password is ‘evadman’, ‘evadman’ is not stored in a database. Some other representation of the password is stored, and verified against when you log into an application or site.

MD5 Hashes
For example, the defacto standard for passwords is called an MD5 hash. This is a 32 character representation of a string of numbers, letters or other characters. The length is always 32 hexadecimal (0-9 and A-F) characters no matter how long the input is, and the output hash is always the same for the same input. For example, the MD5 hash of an empty string (“”) is d41d8cd98f00b204e9800998ecf8427e. The MD5 hash of “The quick brown fox jumps over the lazy dog” is 9e107d9d372bb6826bd81d3542a419d6. If any character is changed, such as an uppercase letter, extra space, or even a non-printable character such as a tab, the MD5 hash will change. The only thing that won’t change is the length of the hash, which will always be 32 hexadecimal characters.

The input into an MD5 hashing program can be a password (a small string of characters) or something as big as an entire hard drive. That is why sometimes when you go to a download site, they will tell you the hash for the file so you can verify that the file was transferred correctly. If you create a hash on your computer of the file and get a different answer, then the file was corrupted when it was downloaded.

Since an MD5 hash is always the same length, it can be stored conveniently in a database. The database administrator or application owner doesn’t have to set a maximum password length, or store a field of random length, as the output is always exactly 32 characters. For all the program cares, the password can be a paragraph. The output will always be 32 characters long. That is a big ‘selling point’ of an MD5 hash.

These password hashes used to be considered moderately safe for public view because they could not be reversed. For example, walking backwards from 9e107d9d372bb6826bd81d3542a419d6 to “The quick brown fox jumps over the lazy dog” was considered pretty much impossible, because it would take every computer on the planet working for decades to go backwards and turn the hash into the original password. That was a false assumption; methods were found to do that conversion quickly if the password is short enough.

Rainbow Tables
This method is called a “pre-computed dictionary attack” and the data is stored in a device called a “rainbow table”. To GREATLY simplify, if you have a hash, you can look up the original password in the rainbow table. The limitations of this process are of the utmost importance to understand, as the limitations of a rainbow table will tell you what the minimum password length, and characters, should be to best mitigate the risk of your password being broken.

First, a rainbow table is absolutely ginormous. A rainbow table that would work on passwords that are up to 7 numbers or lowercase characters is about 130 GB. Up to 9 numbers and lowercase letters is about 500 GB. The size grows quickly as additional characters are added. Adding a non-numeric character like "#" will expand that to more than 10 TB. That is one of the reasons that it is recommended that you use a non-numeric character in your passwords. The method used for generating the chains that make up a rainbow table takes a boatload of horsepower. This can be decades of computer time, but once they are made, they never need to be made again. Distributed computing can be used to reduce this to a few hours, if not less, for smaller tables.

I will explain a real life example of this method. I was tasked with migrating users from one software platform to another many years ago. The original software used an MD5 hash for a password, and the new software used a different method. This meant that every user would have to reset their password, as the password hash that was stored could not be migrated directly to the new system. The application had more than a quarter million users, so this would stink for them. To mitigate this, I used a rainbow table to reverse the stored password and feed it into the new software. Thus, generate the hash in the new software so the user would not notice a difference. The amazing thing was that I used a very small rainbow table, and was able to reverse more than 98% of the quarter million hashes back into the original password in less than 10 minutes. That points to a gigantic hole in how most users choose passwords, and in the MD5 process as a whole.

On top of that, the passwords that could not be undone also had some interesting patterns. The hash of 32 characters itself means nothing, but when a bunch of the hashes are exactly the same, it means that folks are using the same password. This is surprising considering users are separated by thousands of miles, and have likely never met. I also noticed that the hash from my personal password, which couldn't be undone by the rainbow table, was in use by a few hundred different users. This confused me, because I thought my password would have been secure and random enough. I'm a DBA, so I always pick strong passwords that are very long compared to what I would term a ‘normal’ user.

It turns out that the cause was simpler than I expected. I was migrating an application that had a higher percentage of technical users than a general group. These users were also using strong passwords in greater numbers than a general population. However, a bunch of users were using a password that is fast to type on a keyboard, while also being long enough to deter a rainbow table attack. It appeared a bunch of us had the same cycling of passwords (use one, then change to another, then another, and so on) as I found that some of the other hashes were previous passwords I had used.

That leads to the 2nd type of vulnerability that exists in any password system, not just a MD5 or other hash process. This is called a dictionary attack (sometimes referred to as Brute Force). A dictionary attack means using words in a dictionary and trying them as passwords. Basically, each word is typed (thousands a second if the system allows it) until one works. This wouldn't be a problem for most strong passwords, as they are not actual words in the dictionary. But, sequences of numbers can still be tried, and exist in ‘hacker’ password dictionaries. For example, "q2w3e4r" is the upper left of the keyboard. "753159" is a sequence on the numeric keypad. Neither of these are actual words, but a quick look showed both exist in dictionary attack password lists that hackers can use. I did this same rough password frequency analysis on 3 other systems that have many thousands of users, and came up with roughly the same surprising answers. Different users were using the same passwords across systems that had absolutely nothing to do with each other. It's freaky, but it points to the biggest weak link in a password system: the user and their habits.

Here is something that should make some folks change their passwords. There is almost a 30% chance that your password (yes, the person reading this) is in this list of 30: 123, 1234, 12345, 123456, 666666, 7777777, 12345678, 123456789, password, blogger, qwerty, letmein, test, trustno1, dragon, abc123, 111111, hello, monkey, master, killer, 123123, ncc1701, thx1138, qazwsx, ou812, 8675309. Do you have that as a password to your banking site? What about your email? Welcome to an emerging field called social engineering. Social engineering tactics were used very recently by hackers to destroy the credibility of a firm called HBGary. In fact, HBGary’s CEO stepped down on Tuesday as a result of that attack.

Best Practices
If you use the same password across systems, if one is broken, every system that has that password can also be broken into as well. So here are the rules I suggest you follow for passwords:

• Use different passwords for each system. If you have trouble remembering passwords, at minimum use a different one for your email than everything else. If your email account is broken, that means a hacker can reset your password on almost any site that exists. The new password will be sent to your email, so the hacker will have it.
• Don't bother using an uppercase letter, as almost all rainbow tables include upper case in the table. Use a non-alphanumeric character instead. This means something like "&", "(" or "$". Bonus points for ALT-0160 or something like it.
• Use passwords that are 7 characters long or longer if at all possible. 10 is better, even if the last 3 are spaces or the same character repeated. If remembering long passwords are a problem, repeat the password 2 or 3 times to extend the length. That will defeat a rainbow table, and most dictionary attacks.

• Use a phrase made out of words that is long.  XKCD has a great example here.



• Don't use a password that can be found online.So don't use 'correct horse battery staple'.
• When installing any software application or hardware, always change the default password to something else. Change the password on your router or Access Point; same with database or other programs. It is amazing how many are still the default password, which can be pulled off a support website.

This was just the tip of the iceberg, but hopefully it will assist folks with choosing secure passwords.