You must protect the confidentiality of sensitive data. There may be an industry requirement as with PCI-DSS, or even a legal requirement as with HIPAA. The simple answer is that the sensitive data must be encrypted when stored or transmitted across a network. But exactly how should this be done? There are many ciphers (that is, encryption algorithms) to choose from. The choice of which cipher and how to apply it depends on your data characteristics and your security goals.For the background see:
"Just Enough Cryptography"
Let's be realistic. You certainly aren't going to be designing new ciphers! You might be designing a software system that will use existing ciphers. But you most likely need to be an informed consumer. Choose the best tool for each job. Make the best choice from the available existing software systems based on their cryptographic designs.
Splitting the Choices
Selecting a cipher can be like splitting a loaf of bread. You can bring the knife down vertically to split the loaf left-right. Or cut it vertically from the side to split it front-back. Or cut it horizontally to split it top-bottom. There are many ways to split the loaf into two halves. Then you can slice one half into finer divisions in many ways. There's no one correct way to slice your bread, it depends on what you want to achieve.
Cipher selection is like that, with binary divisions of symmetric versus asymmetric, and block versus stream, and then slices or further, finer choices between several available block cipher modes.
All of that is fine as an abstract model, but first realize that block-versus-stream is only an issue in symmetric ciphers. And then realize that we don't really have any trusted stream ciphers now. Applications where stream ciphers were used, like mobile phone voice streams, are now handled with symmetric block ciphers operating in a stream-like mode. Keep reading to see the details on this.
Symmetric versus Asymmetric
The first choice you must make, and the one you hear the most about, is symmetric versus asymmetric.
Symmetric ciphers have good performance, so use them on large data sets. Appropriate choices include AES, Twofish, and Blowfish. Data files grow and grow. New camera models have more and more megapixels every year. Storage media continues to grow in size.
Whole-disk encryption is useful to protect devices that might be lost or stolen. For example, Linux dm-crypt or Microsoft's Bitlocker. Whole-disk or filesystem-level encryption makes sense for a laptop, smart phone, or USB stick, but it makes no sense for a server. I hope you're not worried about someone walking off with your server!
Today's personal computers have multi-terabyte disks and network speeds continue to climb. We don't want to have to choose between security and acceptable performance, we want both. Symmetric is the choice for files and streams.
But Which Symmetric Cipher?
Use AES. Honestly, cipher choice doesn't matter very much at all for most people, as long as you use a recent one. Your dominant security problems will come from key management, not from subtle differences between AES, Twofish, Blowfish, and GOST.
Symmetric cryptography has traditionally had an enormous problem of key management, especially when used for communication. The sender and receiver must share a secret key, and there was no good solution for that problem. One approach was to carefully select one key and very carefully protect it while using it on many messages or files.
The obvious problem is that if that one key is discovered, a lot of sensitive data is exposed.
The not-so-obvious problem is that using one key on many messages provides the attacker with more and more data for a ciphertext-only attack to discover the key.
The alternative, using a unique session key for each message or file, has its own problems. This might quickly become impractical if you had to have matching large sets of keys at both ends of a communication link. Instead of one highly sensitive key, you would need a large book filled with them! There is also the problem of keeping track of which key to use with each message or file.
Now we can solve this problem with asymmetric cryptography. The negotiation involves small exchanges at the beginning, so we don't care about the computational expense associated with asymmetric cryptography. Appropriate choices include RSA and the various Elliptic Curve Ciphers.
Asymmetric is the choice for authentication and key negotiation.
Hybrid Systems Combine Asymmetric and Symmetric
Yes, we need to use symmetric for large data sets and asymmetric to negotiate keys, but the choice isn't simply either–or. Hybrid systems are the practical reality.
An encrypted message from me to you would start with a header encrypted with RSA, an asymmetric cipher, using your public key. Its original cleartext would effectively say this:
Let's use AES with this randomly-generated 256-bit
session key for this message only:
The rest of the message would be the actual content encrypted efficiently with that symmetric cipher using that one-time-only session key.
You are the only person with access to your private key, so only your software can decrypt the header. It then uses the instructions in the header to decrypt the body of the message.
Or, let's say we're setting up a TLS connection. The hosts authenticate with RSA or ECC. Then they negotiate a mutually supported symmetric cipher and agree on a shared session key with the Diffie-Hellman Ephemeral method or something similar. They then encrypt the data stream with the negotiated cipher and unique session key.
The resulting hybrid cryptosystems are described as:
Asymmetric encryption protects the exchanges.
Symmetric encryption protects the data.
The exchanges in that description include the endpoint authentication and the key negotiation or agreement.
Which Asymmetric Cipher to Protect the Exchanges?
The short answer: Use RSA with at least a 2048-bit key, preferably 4096 bits, or ECC with a trusted curve and at least a 256-bit key. Skip to the next section if you don't want the RSA and ECC details.
The tradition for ages has been to use RSA. Elliptic curve cryptography or ECC is a more recent development. Both are based on "trapdoor" problems. The security comes from a math problem that is enormously difficult to solve, but for which it is relatively easy to verify a possible solution. RSA's security is based on the difficulty of factoring the product of large prime numbers, ECC's on the difficulty of solving the discrete logarithm to find points on an elliptic curve.
You need different key sizes for roughly equal resistance to brute-force attack. According to the NIST document Recommended Elliptic Curves for Federal Government Use, you need the following. And that isn't a typo, that's really 521 and not 512.
|Elliptic Curve encryption||160||224||256||384||521|
|RSA (asymmetric encryption)||1024||2048||3072||7680||15380|
ECC is a category, you must choose which curve. Available choices are defined in:
- Recommended Elliptic Curves for Federal Government Use
- SEC 2: Recommended Elliptic Curve Domain Parameters
- RFC 5639 ECC Brainpool Standard Curves and Curve Generation
For a while, it was believed that if it was possible to build a general-purpose quantum computer, the factoring problem and thus RSA could be broken using Shor's algorithm while ECC would be more resistant. More recently it has come to be expected that ECC would be more susceptible to breaking this way. In August 2015 the NSA announced that its Information Assurance Directorate "will initiate a transition to quantum resistant algorithms in the not too distant future" and encouraged academia and industry to work on post-quantum or quantum-resistant techniques. Meanwhile, don't bother replacing RSA with ECC. They published the document with no warning or explanation, similarly changed it a few times, and then took it off their site. The above link is to the archive.org copy.
Which Symmetric Cipher Category? Block versus Stream
The symmetric ciphers used on the data come in two varieties, block and stream.
Stream ciphers partially emulate a one-time pad, which is the only perfect secrecy system. A one-time pad is perfectly secure if you do it right, but it is far too impractical for all but the most critical or the most trivial situations. You need a totally random key stream that is as long as the message and used only once. That bulky and sensitive key must be stored at each end. In most situations it makes much more sense to simply exchange the message itself in whatever out-of-band channel would have been used to distribute the key.
Practical stream ciphers provide far from perfect security, but for many years they seemed to provide an acceptable tradeoff between security and practicality. The shared secret key for a stream cipher generates a pseudorandom key stream. As long as we're careful about how we generate and use that pseudorandom stream, it will probably be secure enough for many purposes. Notice the use of "probably," "enough," and "many" in that sentence!
Stream cipher encryption and decryption goes fast.
Both operations are a simple XOR (or exclusive-OR)
At the sender:
cleartext XOR key → ciphertext
At the receiver:
ciphertext XOR key → cleartext
XOR can be done directly in hardware for optimal speed. We don't have to know in advance how long the stream will be, and we don't have to pad the data to any standardized length.
Block ciphers, on the other hand, deal with data a block at a time. Your choices are 32, 64, 128, or 256 bits, typically. Unless you are doing something unusual, you want to use the largest block size for both efficiency and security. If the data isn't an even multiple of the block size, it's padded.
For many years the conventional wisdom was:
"Use block ciphers on data known in advance like files, devices, and email messages, and use stream ciphers on stream-like data."
But in the 2000s and 2010s we discovered problems with our available choices of stream ciphers.
A block symmetric cipher can be operated in various modes, and the selection of mode depends on data characteristics and what security goals you are trying to achieve. Mode selection has become more important with recent developments. We have realized that some modes are less secure than expected when used in certain situations. The good news is that we have found some modes that give block ciphers stream-like performance characteristics that make them good replacements for the old stream ciphers.
Our Old Stream Ciphers Need Replacement
RC4 was designed in 1987, and it had a good run as the de facto standard cipher first for SSL and then for TLS. But a number of reports in February through May 2015 specified that it was time to retire RC4.
Summarizing Known Attacks on Transport Layer Security (TLS) and Datagram TLS (DTLS) RFC 7465
Prohibiting RC4 Cipher Suites RFC 7525
Recommendations for Secure Use of TLS and DTLS
There aren't many stream cipher choices. A5/1 and A5/2 have been used in GSM telephony, but A5/1 has severe weaknesses and A5/2 is even worse.
Salsa20 and ChaCha20 are our best current stream ciphers.
Another solution is to use a block cipher in a mode that gives it stream-like characteristics. This is what has been done for GSM telephony with the KASUMI cipher, also called A5/3. It's much better than the other GSM alternatives, although a 2010 paper reported an attack on the A5/3 cipher. (The not-so-bad news is that the attack may not work against the way A5/3 is used in GSM.)
KASUMI or A5/3 is used in telephony. What about the ciphers we used in operating systems and networking?
Block Cipher Modes
Block ciphers operate in a number of different modes. Just saying "Let's use AES" is only a starting point. How will you operate it? Block ciphers support several modes of operation.Block cipher
The Wikipedia page provides a quick overview of some of the modes. If you prefer government-authorized overview to a crowd-sourced one, see the U.S. NIST document SP800-38A, "Recommendation for Block Cipher Modes of Operation". However, there isn't much more in the NIST version other than the 33 pages of test vectors and the official imprimatur.Evaluation of Some
If you really want to learn about this, see "Evaluation of Some Blockcipher Modes of Operation" by Phillip Rogaway at the University of California, Davis. It has 159 pages of detailed explanation and analysis, and as it title says, that's just for some of the more interesting modes. Don't be overly intimidated, the writing is refreshingly informal and it's very readable compared to most academic writing.
Cipher Block Chaining
For most of the data that most of us own, Cipher Block Chaining or CBC is the appropriate way to encrypt files in the broad sense of that word — actual files, email messages, or entire devices as in whole-disk encryption. You will see nomenclature like AES-CBC-256 (or AES-256-CBC) used to specify the cipher, this mode of operation, and a 256-bit key.
CBC is still considered secure for stored data, but we have seen many practical attacks demonstrated against CBC for network streams. The solution seems to be using a block cipher in a mode that gives it stream-like characteristics.
Is It Fair to Turn Block Ciphers into Stream Ciphers?
Some people want to have an argument over semantics at this point. If an encryption system has a block cipher like AES at its core, isn't it really a block cipher no matter how "stream-like" we use it?
Before we argue about whether this is somehow cheating, let's first consider just how purely stream-oriented the existing (but weak) stream ciphers really are. Yes, the perfectly secure (but almost perfectly impractical) One-Time Pad system operates, in theory, on one bit at a time. But practical stream ciphers operate on one byte at a time. Aren't those really 8-bit blocks?
Furthermore, practical hardware accelerators don't send the data and key streams through a single XOR gate one bit at a time. They manipulate bytes or even larger words. Within the operating system or an application, your CPU does XOR on 64-bit words.
Several of the block cipher modes effectively convert the block cipher into a stream cipher. The key primes the generation of a key stream that is XORed with the data stream.
Yes, the data is encrypted or decrypted at up to 256 bits at a time. But consider that 256 bits means 32 bytes, less than half the minimum allowed Ethernet packet size and far smaller than a practical disk I/O buffer size. The blockiness is far below the scale of both network and storage I/O.
Let's solve our security problems instead of worrying about semantics!
AES-GCM for TLS
Galois/Counter Mode or GCM has been proven to be secure when used with a strong block cipher, as long as you are careful to choose a unique initialization vector for every encryption done with the same key. NIST describes GCM in Special Publication 800-38D, "Recommendations for Block Cipher Modes of Operation: Galois/Counter Mode (GCM) and GMAC".
Check your browser's settings, AES-GCM should be a preferred cipher for TLS.
AES-CCMP for Wireless
Use WPA2 for wireless security. That includes the preferred AES-CCMP. NIST describes CCM mode in Special Publication 800-38C, "Recommendations for Block Cipher Modes of Operation: The CCM Mode for Authentication and Confidentiality". It's easy to say "Use AES-CCMP," now as for what it means...
That's the AES-CCM Protocol, where "CCM" means "Counter Mode with CBC-MAC", where "CBC" means "Cipher Block Chaining" and "MAC" means Message Authentication Code." So AES-CCMP is [deep breath] AES in Counter Mode with Cipher Block Chaining Message Authentication Code Protocol. The real meaning of AES-CCMP is:
- AES is a strong block cipher.
- CBC is usually a reasonable choice, and it's pretty good for streams, although there are some weaknesses for directly applying CBC in this scenario so...
- Counter mode turns this into a stream cipher that is both more secure and more efficient!
- MAC makes this an authenticated cipher so you aren't whispering secrets to strangers.
- It's a protocol so it also defines how to set up a secure connection.
Check your wireless settings, make sure you are using WPA/2, a.k.a. 802.11i. That implies AES-CCMP. Here is an example of doing that on Linux.
# iwconfig wlp10s0u1 IEEE 802.11bg ESSID:"FBI_van4" Nickname:"rtl_wifi" Mode:Managed Frequency:2.437 GHz Access Point: 00:1D:7E:2E:97:86 Bit Rate:54 Mb/s Sensitivity:0/0 Retry:off RTS thr:off Fragment thr:off Encryption key:****-****-****-****-****-****-****-**** Security mode:open Power Management:off Link Quality=100/100 Signal level=100/100 Noise level=0/100 Rx invalid nwid:0 Rx invalid crypt:0 Rx invalid frag:0 Tx excessive retries:0 Invalid misc:0 Missed beacon:0 # wpa_cli status Selected interface 'wlp10s0u1' bssid=00:1d:7e:2e:97:86 ssid=FBI_van4 id=0 mode=station pairwise_cipher=CCMP group_cipher=CCMP key_mgmt=WPA2-PSK wpa_state=COMPLETED ip_address=192.168.1.101 address=08:60:6e:63:7b:80
It says WPA2-PSK, meaning Pre-Shared Key, because I haven't yet gotten around to setting up a RADIUS server and creating key pairs and digital certificates to do full 802.1x or Network Access Control. Here is an example from OpenBSD using the same wireless LAN:
# ifconfig run0 run0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500 lladdr 00:a1:b0:c0:74:50 priority: 4 groups: wlan media: IEEE802.11 autoselect (DS1 mode 11g) status: active ieee80211: nwid FBI_van4 chan 6 bssid 00:1d:7e:2e:97:86 43dBm wpakey <not displayed> wpaprotos wpa1,wpa2 wpaakms psk wpaciphers tkip,ccmp wpagroupcipher tkip inet 192.168.1.103 netmask 0xffffff00 broadcast 192.168.1.255