Bob's Blog
10 Billion Passwords Exposed by RockYou2024, What Does It Mean?
In July 2024, a list of almost
ten billion
unique passwords, exposed in past breaches,
became available for download.
Ten billion!
That sounds extremely bad!
But wait, it's just the passwords,
not the user or account names,
and nothing about where they were used.
This is no big deal, right?
The reality is somewhere in the middle.
Careful thinking about this episode
reinforces the importance of using a
password manager application that you can mirror
across your phone and other computers.
It also points out how valuable multi-factor authentication is,
although your second factor probably is
a mobile phone that could be damaged,
lost,
or stolen,
leaving you unable to authenticate.
Let's work through the various risk factors.
Reusing Passwords, and Reusing Identities
People tend to re-use passwords across multiple accounts. We have too much to remember. Then companies make non-security aspects of their online user interface awkward. That pointlessly increases the difficulty of the overall experience, and drives people to less secure behavior.
Let's say that I have an account on some web site,
where the username is my email address
and the password is something that I chose.
This is a completely made-up example for this page,
but let's say that it's:
Username: bobcromwell@someisp.net
Password: bigsecret
Now let's say that the site was breached, and it had a horrible design that stored the password in plaintext. Now that username/password pair is known. I eventually learn about the breach, and change my password on that site. Am I safe? No, not necessarily.
Attackers will, of course, try to authenticate into
other sites using that username and password.
If I used that username/password pair in one place,
there's a good chance that I used it in several others:
Username: bobcromwell@someisp.net
Password: bigsecret
The attackers will also try using a username that is just
the name, omitting the email server.
Maybe I re-used the password with a different username:
Username: bobcromwell
Password: bigsecret
So far, we're talking about credential-stuffing attacks, in which a compromised account/password pair is automatically submitted to a large number of other sites.
This is why you should not re-use credentials across multiple accounts. "Oh, the local TV station wants me to create an account to look at stories on their web site. But it's free!." Yes, and from the TV station's point of view, there's no need for them to worry about security. But then someone steals their account/password database and applies all those credentials to banks, credit card companies, and other places where security matters.
However, things are even worse.
The breach has shown that bigsecret
is a password made up by some person out there in the world.
If one person has selected it,
or constructed it,
there's a good chance
that other people will, too.
The RockYou Password List
How Password Cracking Attacks WorkThe RockYou password list is an excellent example. In 2009, over 32 million user passwords were stolen from RockYou, a social app and advertising network. The "RockYou list" became a standard tool for password cracking.
On 4 July 2024, the file
rockyou2024.txt
became available for download.
It is a plaintext list of almost 10 billion passwords.
9,948,575,739, so yes, almost 10 billion.
How Many Unique Password Strings Can There Be?
There are 95 printable ASCII characters, 0x20 through 0x7e. You can generate all of them on a standard U.S. keyboard.
$ ascii Usage: ascii [-adxohv] [-t] [char-alias...] -t = one-line output -a = vertical format -d = Decimal table -o = octal table -x = hex table -b binary table -h = This help screen -v = version information Prints all aliases of an ASCII character. Args may be chars, C \-escapes, English names, ^-escapes, ASCII mnemonics, or numerics in decimal/octal/hex. Dec Hex Dec Hex Dec Hex Dec Hex Dec Hex Dec Hex Dec Hex Dec Hex 0 00 NUL 16 10 DLE 32 20 48 30 0 64 40 @ 80 50 P 96 60 ` 112 70 p 1 01 SOH 17 11 DC1 33 21 ! 49 31 1 65 41 A 81 51 Q 97 61 a 113 71 q 2 02 STX 18 12 DC2 34 22 " 50 32 2 66 42 B 82 52 R 98 62 b 114 72 r 3 03 ETX 19 13 DC3 35 23 # 51 33 3 67 43 C 83 53 S 99 63 c 115 73 s 4 04 EOT 20 14 DC4 36 24 $ 52 34 4 68 44 D 84 54 T 100 64 d 116 74 t 5 05 ENQ 21 15 NAK 37 25 % 53 35 5 69 45 E 85 55 U 101 65 e 117 75 u 6 06 ACK 22 16 SYN 38 26 & 54 36 6 70 46 F 86 56 V 102 66 f 118 76 v 7 07 BEL 23 17 ETB 39 27 ' 55 37 7 71 47 G 87 57 W 103 67 g 119 77 w 8 08 BS 24 18 CAN 40 28 ( 56 38 8 72 48 H 88 58 X 104 68 h 120 78 x 9 09 HT 25 19 EM 41 29 ) 57 39 9 73 49 I 89 59 Y 105 69 i 121 79 y 10 0A LF 26 1A SUB 42 2A * 58 3A : 74 4A J 90 5A Z 106 6A j 122 7A z 11 0B VT 27 1B ESC 43 2B + 59 3B ; 75 4B K 91 5B [ 107 6B k 123 7B { 12 0C FF 28 1C FS 44 2C , 60 3C < 76 4C L 92 5C \ 108 6C l 124 7C | 13 0D CR 29 1D GS 45 2D - 61 3D = 77 4D M 93 5D ] 109 6D m 125 7D } 14 0E SO 30 1E RS 46 2E . 62 3E > 78 4E N 94 5E ^ 110 6E n 126 7E ~ 15 0F SI 31 1F US 47 2F / 63 3F ? 79 4F O 95 5F _ 111 6F o 127 7F DEL
That's the 26 lower-case letters:
a b c
... z
Plus, the 26 upper-case letters:
A B C
... Z
Plus, the 10 digits:
0 1 2 3 4 5 6 7 8 9
Plus, the space character and these 32 punctuation marks:
! " # $ % & ' ( ) * + , - . / : ;
< = > ? @ [ \ ] & _ ` { | } ~
Now, a list of 10 billion unique passwords sounds impressive at first. But let's think about how many passwords are possible.
Let's first think about password strings containing nothing but lower-case letters. There are 26 letters, so you could generate 26N unique N-character strings. And so, there are just over 8 billion possible 7-letter strings, and almost 209 billion possible 8-letter strings. 8,031,810,176 and 208,827,064,576, to be exact. That list of 10 billion unique passwords is a little larger than the list of all possible 7-letter strings, and less than half the size of the list of all possible 8-letter strings.
$ bc -l 26^7 8031810176 26^8 208827064576
If we use both lower case and upper case, there are 52 choices for each character in the string. There are almost twenty billion possible 6-letter mixed-case strings, 19,770,609,664 to be exact.
$ bc -l 52^6 19770609664
If we add in the ten digits, there are 62 choices for each character. There are almost a billion 5-character mixed-case alphanumberic strings, and almost 57 billion 6-character strings. 916,132,832 and 56,800,235,584, respectively, to be exact.
$ bc -l 62^5 916132832 62^6 56800235584
Let's add the space and the punctuation marks, including the ones we don't usually use within English language text. That's all four character classes: upper, lower, digit, and special. There are 95 possible choices for each character, so there are 95N possible N-character all-ASCII-keyboard strings. That means that there are over 7.7 billion possible 5-character printable ASCII strings, 7,737,809,375 to be exact.
$ bc -l 95^5 7737809375
5-character ASCII strings can take over 7.7 billion forms!
Be Careful
It's tempting to look at large numbers and jump to inappropriate conclusions. Think about the monoalphabetic substitution cipher, a weak encryption algorithm with an impressively long name and impressively large numbers.
The number of possible keys, or substitution schemes, for a monoalphabetic substitution cipher using the Latin alphabet with its 26 letters is: $$ \begin{aligned} N &= 26! \\ &= 1 × 2 × 3 × ... × 26 \\ &= 403{,}291{,}461{,}126{,}605{,}635{,}584{,}000{,}000 \end{aligned} $$ That's just over 4×1026, which is 40,000,000,000,000,000 times more than the mere 10 billion in RockYou2024's password list. Wow, what an astronomically large search space!
Or so you might recklessly think... Despite the enormous search space, the monoalphabetic substitution cipher is very weak. Edgar Allen Poe's short story The Gold-Bug, published in 1843, contains a very clear explanation of how to use frequency analysis to easily crack a monoalphabetic substitution cipher. Arthur Conan Doyle's Sherlock Holmes story The Adventure of the Dancing Men, published in 1903, contains a less clear explanation.
So, Ten Billion Unique Passwords, Is That a Lot or Isn't It?
Let's say that your password complexity requirement
is all four character classes and at least 12 characters long.
That means all 95 printable ASCII characters, so:
540,360,087,662,636,962,890,625
possible passwords,
which is about 54 trillion
times as many as the RockYou2024 list.
Actually, 54,036,008,766,263 times as many.
$ bc 95^12 540360087662636962890625 (95^12)/(10^10) 54036008766263
It's very reasonable to conclude that a randomly generated 4-class 12-long password is at a vanishingly small chance of being guessed.
But did you catch what I did there? I slipped in the phrase randomly generated.
We humans are bad at some aspects of statistics. In particular, randomness. Well-intentioned people tasked with generating random strings of letters or digits or bits do a horrible job.
Most of us aren't very good at memorizing random strings of letters, let alone when they're upper and lower case along with all the punctuation marks. So, with notebooks and smart phones forbidden in the secure area, almost everyone would be locked out of their computers if they couldn't select passwords that were a little less hard to remember and then type.
The result is that people will use only a small fraction out of the space of possible passwords. Resources like the evolving RockYou lists could help us figure out what types of strings people do and do not select.
The Interesting Question
Think about scanning through the RockYou2024 list of 10 billion unique passwords. Let's say that you copy has already been sorted into order.
As you scan through it, you could see that the list resembles the sequence of guesses that you would use to mount a brute-force attack. But it only resembles it. The list built from breaches won't include all the possible strings. Far from it, in fact, and it would omit a growing percentage as the entries grow longer.
You could think about the search space of a 4-class 12-long password as a 12-dimensional space with 96 units of distance along each dimensional axis.
Wait, excuse me, I should have said that you could try to think about that, but that's going to be an awfully difficult mental visualization for most of us.
But my point is that there is a large search space, an enormous collection of possible strings, and human psychology leads to us only using a fraction of it.
It would be interesting to know which types of patterns are preferred and which are avoided. It would be more interesting to know why, what is it about possible patterns that makes us prefer some and avoid others.
Real-worldPIN study
There's what I think is a fascinating study of real-world PIN selection. They analyzed a collection of almost 3.4 million user-selected four-digit PINs. There are strong biases for and against different patterns. 20.552% of all PINs in the collection were one of the 5 most popular selections — just 5 out of 10,000 possible digit strings, 0.05%, were chosen for over 20% of the accounts.
With a four-digit search space, they can do some nice visualization and come to interesting conclusions about what makes certain specific patterns, or types of patterns, more or less popular.
But with a dozen characters in the string, each of them any printing ASCII character, analysis would be way more complex and visualization would be challenging.
My conclusion — use a password manager app and have it generate highly random password strings.
I have thought that for a long time. Thinking about fresh breaches, or tools that might help to cause fresh breaches, just reinforces the value of using a password manager.
I have some further information and links about real-world PIN and password studies.
Latest:
Routing Through Starlink
By the mid 2020s, Internet connections in remote areas frequently used Starlink, the satellite system owned by the pro-fascist eugenicist Elon Musk. Let's see how Starlink works.
Previous:
Easy Automation of Thousands of Changes
Use fundamental Linux commands and some shell syntax to make thousands of changes in thousands of files in seconds.
What is "A.I.", or "Artificial Intelligence"?
So-called "A.I." is hype and misunderstanding, here's hoping the next "A.I. Winter" arrives soon.
Books I've Read: "The Origin of Consciousness in the Breakdown of the Bicameral Mind"
According to the author, humans only became truly conscious in the second millennium BCE, and schizophrenia may be a holdover or return to the pre-conscious state.
What's Up With My Social Media Postings?
I have an automated Mastodon identity that posts numerous factoids of widely varying relevance. What's going on?.