This seems to be my week for flagging password-related blogs. Well, there are plenty of stolen password issues around. :(

So here's a blog in stark contrast to Urban Schrott's blog about good password practice in Ireland (which I expanded on here and here). Troy Hunt ran an analysis of the subset of stolen Sony Pictures passwords put out as a torrent by those nice boys at LulzSec, some 37,608 of them. He was looking for four characteristics:

  • Password length
  • Diversity of character type (uppercase/lowercase letters, numbers, punctuation)
  • randomness
  • uniqueness

I won't list all his findings: I'd rather you read the blog. However, here's a brief summary of the more easily-determined statistics. 

  • Around 50% were eight or less characters long
  • 4% had three or more character types
  • 50% had only one character type, and most of those consisted of lower-case letters
  • 1% contained non-alphanumeric characters

Randomness is a harder characteristic to determine. Hunt tried a couple of approaches to this, but they're not really enough to characterise the whole sample set. Still, they're interesting as far as they go:

  • 2.5% were among the top 25 from the Gawker breach I mentioned in a previous blog, but referring to his own analysis at http://www.troyhunt.com/2011/03/only-secure-password-is-one-you-cant.html. Suffice it to say that there's a high correlation between his top 25 and similar analyses of Gawker sites et al. He seems to think 2.5% is quite low: I can't say I agree.
  • The other approach he uses is to measure susceptibility to a dictionary attack by comparing them to a 1.7 million collection of character strings (we're not talking Merriam-Webster or OED here) considered likely to turn up in password lists. 36% of them were matches to that particular database. However, that's not exactly a measurement of non-randomness, but rather of predictability. Not quite the same thing: it's obviously not impossible for a string like “1qazZAQ!” (one he actually quotes) to be generated (pseudo-)randomly in unrelated contexts, even by unrelated (pseudo-)random generator software.

Uniqueness is a measure of whether strings are re-used across multiple accounts. His contention is that he can do this because the LulzSec dump contains data from several data sources, including over 2000 accounts where the same email address has been registered on both the "Beauty" and "Delboca" databases. A scary 92% of passwords were found across both systems...

Clearly, if you're one of the victims of the Sony breach(es), it doesn't matter how good your password is, since it's been exposed through no fault of your own. However, you might want to think about how susceptible your passwords are to bruteforcing. Not all attacks can rely on the service provider's incompetence, and sometimes a good password really can save your data from exposure.

David Harley CITP FBCS CISSP
ESET Senior Research Fellow