Rotors of M-209 cipher machine.

Protect Confidentiality:
Select a Cipher and Mode

Protecting Confidentiality

You must protect the confidentiality of sensitive data. There may be an industry requirement as with PCI-DSS, or even a legal requirement as with HIPAA. The simple answer is that the sensitive data must be encrypted when stored or transmitted across a network. But exactly how should this be done? There are many ciphers (that is, encryption algorithms) to choose from. The choice of which cipher and how to apply it depends on your data characteristics and your security goals.

For the background see:
"Just Enough Cryptography"

Let's be realistic. You certainly aren't going to be designing new ciphers! You might be designing a software system that will use existing ciphers. But you most likely need to be an informed consumer. Choose the best tool for each job. Make the best choice from the available existing software systems based on their cryptographic designs.

Splitting the Choices

Selecting a cipher can be like splitting a loaf of bread. You can bring the knife down vertically to split the loaf left-right. Or cut it vertically from the side to split it front-back. Or cut it horizontally to split it top-bottom. There are many ways to split the loaf into two halves. Then you can slice one half into finer divisions in many ways. There's no one correct way to slice your bread, it depends on what you want to achieve.

Cipher selection is like that, with binary divisions of symmetric versus asymmetric, and block versus stream, and then slices or further, finer choices between several available block cipher modes.

Baguettes of French bread.

All of that is fine as an abstract model, but first realize that block-versus-stream is only an issue in symmetric ciphers. And then realize that we don't really have any trusted stream ciphers now. Applications where stream ciphers were used, like mobile phone voice streams, are now handled with symmetric block ciphers operating in a stream-like mode. Keep reading to see the details on this.

Symmetric versus Asymmetric

The first distinction you must make, and the one you hear the most about, is symmetric versus asymmetric. Note that this is a distinction and not a choice because asymmetric ciphers are useful for a vital but very narrow range of tasks.

Symmetric ciphers are used to protect the data. Use AES. Other reasonable choices in years past have been Blowfish. and Twofish. Of course a symmetric cipher must be not leak information and must resistant to attack. But for a given security level, they are also designed for efficiency. Data files grow and grow. New camera models have more and more megapixels every year. Storage media continues to grow in size.

The kernel module is dm-crypt, it's part of the device mapper infrastructure mapping virtualized storage volumes onto storage devices. LUKS (or Linux Unified Key Setup) is the on-disk format, and the user interface for entering the passphrase and passing it to the kernel module.

Whole-disk encryption is useful to protect devices that might be lost or stolen. For example, Linux dm-crypt or Microsoft's Bitlocker. Whole-disk or filesystem-level encryption makes sense for a laptop, smart phone, or USB stick, but it doesn't seem helpful for a server. I hope you're not worried about someone walking off with your server!

Today's personal computers have multi-terabyte disks and network speeds continue to climb. We don't want to have to choose between security and acceptable performance, we want both. Symmetric is the choice for files and streams.

But Which Symmetric Cipher?

Use AES. Honestly, cipher choice doesn't matter very much at all for most people, as long as you use a recent one. Your dominant security problems will come from key management, not from subtle differences between AES, Twofish, Blowfish, and GOST.

Key Management

Page from a German Enigma book of keys from https://en.wikipedia.org/wiki/File:Enigma_keylist_3_rotor.jpg

A page of daily key settings for a German Luftwaffe Enigma machine. The Allies would attack the easier problem of routine weather messages in order to get the day's key that was also used by high-ranking commanders.

Symmetric cryptography has traditionally had an enormous problem of key management, especially when used for communication. The sender and receiver must share a secret key, and there was no good solution for that problem. One approach was to carefully select one key and very carefully protect it while using it on many messages or files.

The obvious problem is that if that one key is discovered, a lot of sensitive data is exposed.

The not-so-obvious problem is that using one key on many messages provides the attacker with more and more data for a ciphertext-only attack to discover the key.

The alternative, using a unique session key for each message or file, has its own problems. This might quickly become impractical if you had to have matching large sets of keys at both ends of a communication link. Instead of one highly sensitive key, you would need a large book filled with them! There is also the problem of keeping track of which key to use with each message or file.

How RSA Works How ECC Works

Now we can solve this problem with asymmetric cryptography. The negotiation involves small exchanges at the beginning, so we don't care about the computational expense associated with asymmetric cryptography. Elliptic-Curve Cryptography or ECC has been the best choice. RSA has issues (for example, it doesn't practially support ephemeral session keys and thus perfect forward secrecy).

Asymmetric is the choice for authentication and key negotiation.

Post-Quantum Cryptography with Nginx and OpenSSL

The problem with asymmetric algorithms as of the early 2020s is the growing threat of quantum computers that could rapidly solve the trapdoor problems used to provide their security. Work is underway on a new class of asymmetric ciphers, called post-quantum or quantum-safe or quantum-resistant cryptography. When those become standardized and then broadly supported, they will be the best choice. I have a page describing my experiments with PQC.

Hybrid Systems Combine Asymmetric and Symmetric

Yes, we need to use symmetric for large data sets and asymmetric to negotiate keys, but the choice isn't simply either–or. Hybrid systems are the practical reality.

An encrypted message from me to you could start with a header encrypted with RSA, an asymmetric cipher, using your public key. Its original cleartext would effectively say this:

Let's use AES with this randomly-generated 256-bit session key for this message only:
0x902328857ba7a75532c5ffb5fded61b164663e251a89fa35172d4788e5fbb9ce.

The rest of the message would be the actual content encrypted efficiently with that symmetric cipher using that one-time-only session key.

You are the only person with access to your private key, so only your software can decrypt the header. It then uses the instructions in the header to decrypt the body of the message.

Or, let's say we're setting up a TLS connection. The hosts authenticate with RSA or ECC. Then they negotiate a mutually supported symmetric cipher and agree on a shared session key with the Diffie-Hellman Ephemeral method. (That is, until we have broad support for PQC) They then encrypt the data stream with the negotiated asymmetric cipher and unique session key.

The resulting hybrid cryptosystems are described as:

Asymmetric encryption protects the exchanges.
Symmetric encryption protects the data.

The exchanges in that description include the endpoint authentication and the key negotiation or agreement.

Which Asymmetric Cipher to Protect the Exchanges?

The short answer: Use RSA with at least a 2048-bit key, preferably 4096 bits, or ECC with a trusted curve and at least a 256-bit key. Skip to the next section if you don't want the RSA and ECC details.

The tradition for ages has been to use RSA. Elliptic curve cryptography or ECC is a more recent development. Both are based on "trapdoor" problems. The security comes from a math problem that is enormously difficult to solve, but for which it is relatively easy to verify a possible solution. RSA's security is based on the difficulty of factoring the product of large prime numbers, ECC's on the difficulty of solving the discrete logarithm to find points on an elliptic curve.

You need different key sizes for roughly equal resistance to brute-force attack. According to the NIST document Recommended for Key Management, you need the following.

Key Length in Bits for Approximately Equal Resistance to Brute-Force Attacks, per NIST/NSA
Symmetric Encryption 80 112 128 192 256
Elliptic Curve asymmetric encryption 160 224 256 384 512
RSA asymmetric encryption 1024 2048 3072 7680 15380

Also see the very similar advice from ENISA. and IETF.

ECC is a category, you must choose which curve. Available choices are defined in:

Using free RSA and ECC certificates from Let's Encrypt

Elliptic curve P-384 or secp384r1 is the current choice for for negotiating the session key in TLS. That is, you generate a P-384 ECC key pair and get the public key of the pair wrapped in a digital certificate from a trusted CA or Certificate Authority.

For a while, it was believed that if it was possible to build a general-purpose quantum computer, the factoring problem and thus RSA could be broken using Shor's algorithm while ECC would be more resistant. More recently it has come to be expected that ECC would be slightly more susceptible to breaking this way. In August 2015 the NSA announced that its Information Assurance Directorate "will initiate a transition to quantum resistant algorithms in the not too distant future" and encouraged academia and industry to work on post-quantum or quantum-resistant techniques. Meanwhile, don't bother replacing RSA with ECC just for quantum resistance. They published the document with no warning or explanation, similarly changed it a few times, and then took it off their site. The above link is to the archive.org copy.

Which Symmetric Cipher Category? Block versus Stream

The symmetric ciphers used on the data come in two varieties, block and stream.

Stream ciphers partially emulate a one-time pad, which is the only perfect secrecy system. A one-time pad is perfectly secure if you do it right, but it is far too impractical for all but the most critical or the most trivial situations. You need a totally random key stream that is as long as the message and used only once. That bulky and sensitive key must be stored at each end. In most situations it makes much more sense to simply exchange the message itself in whatever out-of-band channel would have been used to distribute the key.

Practical stream ciphers provide far from perfect security, but for many years they seemed to provide an acceptable tradeoff between security and practicality. The shared secret key for a stream cipher generates a pseudorandom key stream. As long as we're careful about how we generate and use that pseudorandom stream, it will probably be secure enough for many purposes. Notice the use of "probably," "enough," and "many" in that sentence!

Stream cipher encryption and decryption goes fast. Both operations are a simple XOR (or exclusive-OR) operation. At the sender:
cleartext XOR key → ciphertext
At the receiver:
ciphertext XOR key → cleartext

XOR can be done directly in hardware for optimal speed. We don't have to know in advance how long the stream will be, and we don't have to pad the data to any standardized length.

Block ciphers, on the other hand, deal with data a block at a time. Your choices are 32, 64, 128, or 256 bits, typically. Unless you are doing something unusual, you want to use the largest block size for both efficiency and security. If the data isn't an even multiple of the block size, it's padded.

For many years the conventional wisdom was:
"Use block ciphers on data known in advance like files, devices, and email messages, and use stream ciphers on stream-like data."

But in the 2000s and 2010s we discovered problems with our available choices of stream ciphers.

A block symmetric cipher can be operated in various modes, and the selection of mode depends on data characteristics and what security goals you are trying to achieve. Mode selection has become more important with recent developments. We have realized that some modes are less secure than expected when used in certain situations. The good news is that we have found some modes that give block ciphers stream-like performance characteristics that make them good replacements for the old stream ciphers.

Old Stream Ciphers Have Been Replaced

RC4 was designed in 1987, and it had a good run as the de facto standard cipher first for SSL and then for its replacement TLS. But a number of reports in February through May 2015 specified that it was time to retire RC4.

RFC 7457
Summarizing Known Attacks on Transport Layer Security (TLS) and Datagram TLS (DTLS)
RFC 7465
Prohibiting RC4 Cipher Suites
RFC 7525
Recommendations for Secure Use of TLS and DTLS

There aren't many stream cipher choices. A5/1 and A5/2 have been used in GSM telephony, but A5/1 has severe weaknesses and A5/2 is even worse.

Salsa20 and ChaCha20 are our best current stream ciphers.

Another solution is to use a block cipher in a mode that gives it stream-like characteristics. This is what has been done for GSM telephony with the KASUMI cipher, also called A5/3. It's much better than the other GSM alternatives, although a 2010 paper reported an attack on the A5/3 cipher. (The not-so-bad news is that the attack may not work against the way A5/3 is used in GSM.)

KASUMI or A5/3 is used in telephony. What about the ciphers we used in operating systems and networking?

Metal chain in the sunlight.

Block Cipher Modes

Block ciphers operate in a number of different modes. Just saying "Let's use AES" is only a starting point. How will you operate it? Block ciphers support several modes of operation.

Block cipher
modes

The Wikipedia page provides a quick overview of some of the modes. If you prefer government-authorized overview to a crowd-sourced one, see the U.S. NIST document SP 800-38D, "Recommendation for Block Cipher Modes of Operation".

Evaluation of Some
Blockcipher Modes
of Operation

If you really want to learn about this, see "Evaluation of Some Blockcipher Modes of Operation" by Phillip Rogaway at the University of California, Davis. It has 159 pages of detailed explanation and analysis, and as it title says, that's just for some of the more interesting modes. Don't be overly intimidated, the writing is refreshingly informal and it's very readable compared to most academic writing.

Cipher Block Chaining for Files

For most of the data that most of us own, Cipher Block Chaining or CBC is the appropriate way to encrypt files in the broad sense of that word — actual files, email messages, or entire devices as in whole-disk encryption. You will see nomenclature like AES-CBC-256 (or AES-256-CBC) used to specify the cipher, this mode of operation, and a 256-bit key.

XTS for Devices Much Larger than Files

More recently, XTS has come to be preferred for disk or whole-device encryption. Its name is an acronym built from an acronym. XEX refers to XOR-encrypt-XOR mode, and so XTS is XEX-based tweaked-codebook mode with ciphertext stealing. That's why we just called it XTS. LUKS uses AES-XTS by default.

Metal chain in the sunlight.

CBC is still considered secure for stored data, but we have seen many practical attacks demonstrated against CBC for network streams. Instead, use a block cipher in a mode that gives it stream-like characteristics.

Is It Fair to Turn Block Ciphers into Stream Ciphers?

Some people want to have an argument over semantics at this point. If an encryption system has a block cipher like AES at its core, isn't it really a block cipher no matter how "stream-like" we use it?

Before we argue about whether this is somehow cheating, let's first consider just how purely stream-oriented the existing (but weak) stream ciphers really are. Yes, the perfectly secure (but almost perfectly impractical) One-Time Pad system operates, in theory, on one bit at a time. But practical stream ciphers operate on one byte at a time. Aren't those really 8-bit blocks?

Furthermore, practical hardware accelerators don't send the data and key streams through a single XOR gate one bit at a time. They manipulate bytes or even larger words. Within the operating system or an application, your CPU does XOR on 64-bit words.

Several of the block cipher modes effectively convert the block cipher into a stream cipher. The key primes the generation of a key stream that is XORed with the data stream.

Yes, the data is encrypted or decrypted at up to 256 bits at a time. But consider that 256 bits means 32 bytes, less than half the minimum allowed Ethernet packet size and far smaller than a practical disk I/O buffer size. The blockiness is far below the scale of both network and storage I/O.

Let's solve our security problems instead of worrying about semantics!

AES-GCM for TLS

Galois/Counter Mode or GCM has been proven to be secure when used with a strong block cipher, as long as you are careful to choose a unique initialization vector for every encryption done with the same key. NIST describes GCM in Special Publication 800-38D, "Recommendations for Block Cipher Modes of Operation: Galois/Counter Mode (GCM) and GMAC".

Check your browser's settings, AES-GCM should be a preferred cipher for TLS.

Salsa20 / ChaCha for TLS

Around 2008 we finally got a good replacement stream cipher. The Salsa20 / ChaCha20 stream cipher suite is trusted for use in TLS.

AES-CCMP for Wireless

Use WPA2 for wireless security. That includes the preferred AES-CCMP. NIST describes CCM mode in Special Publication 800-38C, "Recommendations for Block Cipher Modes of Operation: The CCM Mode for Authentication and Confidentiality". It's easy to say "Use AES-CCMP," now as for what it means...

That's the AES-CCM Protocol, where "CCM" means "Counter Mode with CBC-MAC", where "CBC" means "Cipher Block Chaining" and "MAC" means Message Authentication Code." So AES-CCMP is [deep breath] AES in Counter Mode with Cipher Block Chaining Message Authentication Code Protocol. The real meaning of AES-CCMP is:

Check your wireless settings, make sure you are using WPA/2, a.k.a. 802.11i. That implies AES-CCMP. Here is an example of doing that on Linux.

# iwconfig
wlp10s0u1  IEEE 802.11bg  ESSID:"FBI_van4"  Nickname:"rtl_wifi"
           Mode:Managed  Frequency:2.437 GHz  Access Point: 00:1D:7E:2E:97:86   
           Bit Rate:54 Mb/s   Sensitivity:0/0  
	   Retry short limit:7   RTS thr=2347 B   Fragment thr:off
           Encryption key:****-****-****-****-****-****-****-****   Security mode:open
	   Power Management:on
	   Link Quality=70/70  Signal level=-38 dBm
	   Rx invalid nwid:0  Rx invalid crypt:0  Rx invalid frag:0
	   Tx excessive retries:0  Invalid misc:0   Missed beacon:0

# wpa_cli status
Selected interface 'wlp10s0u1'
bssid=00:1d:7e:2e:97:86
ssid=FBI_van4
id=0
mode=station
pairwise_cipher=CCMP
group_cipher=CCMP
key_mgmt=WPA2-PSK
wpa_state=COMPLETED
ip_address=192.168.1.101
address=08:60:6e:63:7b:80 

It says WPA2-PSK, meaning Pre-Shared Key, because I haven't yet gotten around to setting up a RADIUS server and creating key pairs and digital certificates to do full 802.1x or Network Access Control. Here is an example from OpenBSD using the same wireless LAN:

# ifconfig run0
run0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        lladdr 00:a1:b0:c0:74:50
        priority: 4
        groups: wlan
        media: IEEE802.11 autoselect (DS1 mode 11g)
        status: active
        ieee80211: nwid FBI_van4 chan 6 bssid 00:1d:7e:2e:97:86 43dBm wpakey <not displayed> wpaprotos wpa1,wpa2 wpaakms psk wpaciphers tkip,ccmp wpagroupcipher tkip
        inet 192.168.1.103 netmask 0xffffff00 broadcast 192.168.1.255