Ciphers — Encryption and Decryption
Encrypting and Decrypting with Ciphers
We need to protect the confidentiality of sensitive data, so we must encrypt the data before storing or transmitting it. Ciphers are the algorithms (that is, the detailed methods) used to do the work. They once were pencil-and-paper methods, then they became electromechanical, and now they are computer programs. A cipher always works in the same general way, but a key modifies the details so that every file or message can be protected in its own unique way. Here we're using Ke to encrypt, and then Kd to decrypt.
The process starts with the plaintext that is to be protected. The plaintext was limited to very simple text for most of the history of cryptography. With the creation of electrical machine cryptography, the plaintext was often limited to being expressed in a very constrained alphabet — no digits, just letters, sometimes with relatively unneeded characters left out (Q, or J in some languages), and with no distinction of upper versus lower case. The plaintext alphabet was usually more constrained than the Morse code or other telegraphy used to transmit the ciphertext.
With the cultural shift in how people use computer network communication, the plaintext these days is often non-text data — multimedia such as image, sound, and video files; complex and bulky non-ASCII file formats such as those used by Libre Office and Microsoft Office; or even an archive containing an entire directory structure of multiple files of various data types.
However, we do not care about the internal format, if any, and the intended use. It's all simply plaintext to be protected.
The encryption operation uses some cipher algorithm E that always functions with the same general principles. However, its operation in this case is modified by the encryption key, Ke.
The resulting ciphertext is then transmitted across a network or stored on some media. There's always a possibility that an attacker will get a copy of the ciphertext.
At the far end of the communication channel, or in the future when we want to recover the plaintext from the stored ciphertext, a corresponding decryption operation D applies the appropriate reverse algorithm, as directed by the decryption key, Kd.
Auguste Kerckhoffs mathematically proved that the security of a cryptosystem must not depend on keeping its algorithm secret [See his article "La cryptographie militaire", in Journal des sciences militaries, vol IX, pp 5-38, Jan 1883]. That means that an attacker will know the details of all the useful ciphers, and will be able to guess which one we are using. If we chose a good cipher algorithm E, we can feel comfortable about the security as long as:
- We choose our keys wisely — an attacker has no advantage beyond simply guessing as to what the needed decryption key Kd might be. The keys would appear to be random strings of bits.
- We maintain the secrecy of keys — the difficulty is getting the decryption key to the remote receiver, or safely storing it until needed to decrypt stored data.
- We realize that we only have security for a limited amount of time — there is a finite number of possible keys, and an attacker could try all possible keys while monitoring the output for data that appears to be plaintext.
So:
Choose a cipher with a large enough key space (that is, total number of possible keys) to keep an attacker fruitlessly searching until the plaintext is no longer sensitive.
Choose a cipher that, as far as we know now, has no flaws leaking partial information about the key or allowing an attacker to eliminate part of the brute-force search through the key space.
Realize that captured ciphertext never gets stronger over time, but cryptanalytic knowledge gets better and computers get much faster.
Initialization Vectors
Let's say that you encrypt some plaintext using today's key.
Maybe this is a routine report sent on a fixed schedule:
ALL QUIET ON THE WESTERN FRONT.
Later in the day, therefore using the same key, you need
to encrypt the same plaintext message.
The problem is that the two ciphertexts will be identical.
An attacker could use traffic analysis to spot the identical ciphertexts and correlate that with observations. This was used by the U.S. Navy cryptanalysis effort during World War II in the Pacific Theater to derive information about planned Japanese ship movements when the cipher or code itself could not yet be broken. Joseph Rochefort and his team could tell what a Japanese ciphertext meant before they could tell what the corresponding plaintext literally said.
The fix to this is to add an initialization vector or IV to the plaintext to randomize the input to the encryption operation. Ciphers can be used in various chaining modes, where the operation applied to some block of input depends not only on the key but also on the preceding block of input (plaintext) or output (ciphertext). The IV therefore primes the cipher, getting it into some random state before encrypting that identical plaintext with today's fixed key.
The IV needs to be different for every message sent, but it does not need to be kept secret in a well-designed cryptosystem. So, if messages need to be sent no more often than once per second, a useful IV might be the number of seconds elapsed since some epoch, perhaps since 00:00:00 UTC 1 January 1970 (as time is reckoned in Unix), or since 00:00:00 UTC 1 January 2000. Or, if the messages to be protected are rapidly streamed network packets, the number of microseconds elapsed since midnight today.
Substitution and Transposition
Substitution ciphers replace letters, or bit patterns, or words. A trivial example is the Caesar Cipher, which replaces each letter with the letter N positions later or earlier in the alphabet. For a shift of +3, A becomes D, B becomes E, C becomes F, and so on.
Cleartext:
THE CAESAR CIPHER IS TRIVIAL
Ciphertext for N = +3:
WKH FDHVDU FLSKHU LV WULYLDO
The monoalphabetic substitution cipher has an
enormous name although it is only slightly less trivial
and almost as weak.
You arbitrarily rearrange the alphabet to create the mapping
between cleartext and ciphertext.
You have the illusion of security if you only think about
the number of possible mappings:
26! = 1×2×3×...×25×26 =
403,291,461,126,605,635,584,000,000
But, if you know a little about the statistics of the cleartext language, you can start to break these with only about 30 characters of ciphertext. See Edgar Allan Poe's story The Gold-Bug for a good explanation of how to do this, and Arthur Conan Doyle's The Adventure of the Dancing Men for a rather muddled attempt.
Polyalphabetic substitutions like those of Alberti (1467), Trithemius (ca. 1500), and Vigenère (1585) can be broken with pencil and paper attacks known since the mid 1800s.
Transposition ciphers rearrange
letters, or words, or bits.
For example, let's say that your cleartext is:
CONFIDENTIALITY IS IMPORTANT BUT IT CAN BE DIFFICULT TO ACHIEVE
and your key is:
SECRETKEY
Write down the key, deleting any repeated characters like
the second and third "E" in this example:
SECRTKY
Then write the message in by rows, padding with random
characters as needed to fill the last row:
SECRTKY
CONFIDE
NTIALIT
YISIMPO
RTANTBU
TITCANB
EDIFFIC
ULTTOAC
HIEVEJW
Now take the ciphertext out by columns ordered by the
letters of the key: "C" first, then "E", "K", "R",
"S", "T", "Y":
NISATITE OTITIDLI DIPBNIAJ FAINCFTV CNYRTEUH
ILMTAFOE ETOUBCCW
Break it into groups of five characters each so your
Morse code
operators can accurately communicate the ciphertext to the
other end.
Pad again as needed to fill the last group:
NISAT ITEOT ITIDL IDIPB NIAJF AINCF TVCNY
RTEUH ILMTA FOEET OUBCC WADSH
That may look as if it would be much more secure, but transposition like this is also quite easy to break. Helen Gaines wrote Cryptanalysis in the 1930s. It explains how to use just pencil and paper to solve pure transposition like this example, as well as combinations of transposition and polyalphabetic substitution.
Today's ciphers use several rounds of combined substitution and transposition on bits.
Symmetric vs Asymmetric Cryptography
Symmetric means that something is the same on both sides, while asymmetric means the opposite, the two sides are different.
For symmetric cryptographic algorithms, Ke = Kd. You are free to pick any appropriately long string of bits to function as the single key for an encryption and the following decryption. Once you have picked an encryption key, you must use exactly that key to decrypt. Unlike what you see in movies and TV shows, you can't tell if you are getting close to guessing the decryption key as long as you are using strong ciphers. Off by just one bit, and the output will be as meaningless as if half the bits had been wrong.
Real-world studies of how weak human-chosen keys and PINs are
However, beware: some keys might be obvious guesses,
and some algorithms have "weak keys", certain types of
key patterns that allow enough information to leak into
the ciphertext to make it easier to guess the key.
Weak keys are often those with regular patterns — lots
of consecutive 1s or 0s, or a repeated pattern like
0101010101
or 00110011
.
The keys need to be hard to guess,
and so they should look very random.
Humans are very bad at generating randomness.
There have been some interesting
studies of PIN and password patterns.
For asymmetric cryptographic algorithms,
when you encrypt with some K1
you must decrypt with a completely different K2
The keys are generated as pairs.
You must use the correct math trick to generate a
key pair: K1,K2.
You can use either key to encrypt, and then you must decrypt
using exactly the other key of the pair.
For useful asymmetric cipher algorithms, it is extremely difficult to derive one key from the other.
Using Asymmetric-Key Cryptography
Each person has generated an asymmetric key pair. They call one of their pair of keys the public key and the other the private key. They must treat those keys as their names suggest. The private key must be kept secret, while the public key must be available to everyone else.
Now two people want to communicate with each other. Each of them knows three keys — their pair, and the public key of the other. What do they do? They will encrypt the message with one or more keys, but exactly which key or keys depends on what they want to accomplish:
Method 1 — Sender's private key only — Since the sender's public key is the only decryption key producing a coherent message, the receiver can trust that the sender sent it. Since slight changes in input make big changes in output, this also must have been the same message originally sent. But anyone can read it.
Method 2 — Receiver's public key only — Since the receiver's private key is the only decryption key producing a coherent message, the receiver can trust that no one else could have read it. But anyone could have sent it.
Method 3 — Sender's private key and receiver's public key — The receiver can trust that no one else could read it, that the sender must have sent it, and that it wasn't altered in transit.
How to Use Asymmetric-Key Cryptography to Accomplish Various Security Goals
Let's look at this the other way around: start with the goal and see what is needed.
I want to: |
Send a message in such a way that I could
prove: "I was the sender and this is precisely the message that I sent." |
I must: | Digitally sign the message, see a later page for the details. |
I need: | My private key. |
I want to: | Send a message which only the intended receiver can read. |
I must: | Encrypt the message. |
I need: | The receiver's public key. |
I want to: | Read a message which was sent so that only I can read it. |
I must: | Decrypt the message. |
I need: | My private key. |
I want to: | Verify the identity of the sender and the integrity of the received message. |
I must: | Verify the digital signature. |
I need: | The sender's public key. |
Quantum Key Distribution
When we use asymmetric encryption, our security relies on our assumption that certain classes of mathematical problems are very difficult to solve. Discovering the key needed to decrypt the message would require the attack to factor a number with hundreds of digits (RSA) or solve a discrete logarithm problem (ECC). We don't mind making one key of a pair public if it would be hard enough to discover the other.
With symmetric encryption, we assume that an N-bit key cannot be guessed from the 2N possibilities before we no longer care about keeping the secret. But, we must agree on a shared secret.
Quantum Key Distribution or QKD is a physics-based method to securely communicate a randomly selected symmetric key to the other party. This involves either polarized single photons in the BB84 protocol, or entangled photon pairs in the E91 protocol. For a discussion of its security, see "The security of practical quantum key distribution", Reviews of Modern Physics, 81, pg 1301, 29 Sep 2009 (also here).
For a rebuttal of its supposed absolute security, see Dan Bernstein's paper: "Is the security of quantum cryptography guaranteed by the laws of physics?" Quantum physics may protect the key signals mdash; the polarized single photons or the entangled photon pairs — but the endpoints can still be observed. Electromagnetic emanations radiate from both generating and receiving ends of a quantum link. Electromagnetic shielding, masking noise, and physical distance can increase the difficulty for the eavesdropper. But no Faraday cage is perfect.
More recent practical implementations include:
Progress in satellite quantum key distribution Satellite-to-ground quantum key distribution Metropolitan quantum key distribution with silicon photonics The quantum internet is already being built China's massive quantum-secure network is officially online China team sends quantum keys over commercial networks Quantum fiber network to launch in August China’s 2,000-km Quantum Link Is Almost Complete
QKD uses single-photon signaling between the legitimate users at either end of a fibre link. An intruder could tap the fibre and monitor the key exchange, but that would corrupt the signal and warn both ends that the key was compromised and must not be used.
Many people think it sounds like science fiction or at least impractical, but it's been in use since 2004. In that year it was used to protect a transaction between the Vienna City Hall and the headquarters of Bank Austria Creditanstalt.
That seems to me to have been a proof of concept organized by the city to support development in a local university. But since 2007 the government of Geneva, Switzerland, has used QKD to protect election results.
In 2007, China began deploying a QKD network in Beijing. A larger network with over 45 nodes began operating in Hefei in 2010. Another large network began operating in Jinan in 2014, joining 28 organizations, including government agencies such as the China Banking Regulatory Commission plus commercial customers including the China Industrial and Commercial Bank, and the Xinhua News Agency. In 2016 China was planning to complete the world's longest QKD network with a 2000-kilometer backbone joining Beijing, Jinan, Hefei, and Shanghai.
In 2010, Japan began operating a QKD network developed by NEC, Toshiba, NTT and others and joining several nodes in Tokyo.
In 2013, Battelle Memorial Institute deployed a QKD network joining its headquarters in Columbus, Ohio, to a manufacturing facility 20 miles away.
In 2015 ETSI, the European Telecommunications Standards Institute, published a number of quantum key distribution specifications.
In 2016, a South Korean QKD network began operating between Seoul, Bundang, and Suwon.
In 2016, China was working on QKD between satellites and ground stations. See "China's quantum space pioneer: We need to explore the unknown", Nature, 13 Jan 2016.
In 2017, China demonstrated QKD between orbit and the ground in their Liángzĭ kēxué shíyàn wèixīng or Quantum Experiments at Space Scale. The Chinese Academy of Sciences operates the Micius satellite and the Chinese ground stations, the University of Vienna and the Austrian Academy of Sciences run the European ground stations.
Wikipedia Science Nature BBC Reuters CNBC
By 2020, the range was extended to over 1,100 kilometers.
ID Quantique in Geneva, Switzerland, sells QKD systems plus random-number generators based on quantum physics.
Quantum Key Distribution versus Quantum Cryptanalysis
Quantum Key Distribution is used by defenders, to protect secrets.
Quantum Cryptanalysis is used by attackers, to expose secrets.
If you generate a 256-bit symmetric key using quantum physics, and then distribute it with QKD so you know that there was no eavesdropping, you know that the attacker has no advantage in searching the space of 2256 possible keys beyond what is exposed by any weaknesses in the cipher algorithm.
Asymmetric ciphers, on the other hand, base their security on trap-door math problems. These are problems that are believed to be impractical to solve, but for which it is relatively easy to verify whether a proposed solution is correct or not. Examples including factoring a number with hundreds of digits (RSA) or solving a discrete logarithm problem (ECC).
Algorithms have been devised to solve some problems when run on a general-purpose quantum computer — Shor's algorithm for factoring, or Grover's algorithm for solving a black-box problem like finding an unknown symmetric key. Shor's algorithm could solve the factoring problem in polynomial time, an enormous speed-up. This isn't new, Peter Shor first published his algorithm in 1994, he published an update in 1995.
So far, no one has explained just how
to build a general-purpose quantum computer
with enough qubits to attack real-world problems.
Not yet. Meanwhile:
"Quantum Computing in the NISQ Era and Beyond"
John Preskill
In 2015–2016 NSA and NIST worried that such a device might be built in the next ten to twenty years and called for the community to work harder on post-quantum cryptography.
Post-Quantum Cryptography
The current worry is that a general-purpose quantum computer could quickly solve the trap-door problems. In August, 2015, the NSA announced that ECC wasn't a backup for RSA when facing the threat of quantum computing cryptanalysis, to the point that government agencies and contractors considering a migration from RSA to ECC shouldn't bother. They later modified the page, and later yet they took it down, see the archived update here.
We need post-quantum or quantum-safe or quantum-resistant asymmetric ciphers. Several families of post-quantum cipher algorithms are being explored: lattice-based cryptography, code-base cryptography, multivariate polynomial cryptography, and others. See the Post-Quantum Crypto conference series for details.
Note that the first two URLs below are secured with a digital certificate not installed in public distributions of browsers. You can get there by clicking through the warnings. You might want to install the DOD certificates in your browsers.
Commercial National Security Algorithm (CNSA) Suite
and Quantum Computing FAQ
Changes to CNSA Suite and Quantum Computing Policy
Introduction to Post-Quantum Cryptography
NIST Report on Post-Quantum Cryptography
"Polynomial-Time Algorithms for Prime Factorization and Discrete Logarithms on a Quantum Computer", Peter W. Shor
"Cybersecurity in an era with quantum computers: will we be ready?", Michele Mosca
"Strengths and Weaknesses of Quantum Computing",
Charles H. Bennett, Ethan Bernstein, Gilles Brassard, and Umesh Vazirani
"Quantum Safe Cryptography and Security:
An Introduction, Benefits, Enablers and Challenges",
European Telecommunications Standards Institute
"Quantum resistant public key cryptography: a survey",
Ray Perlner and David Cooper
"Post-quantum RSA",
Daniel J. Bernstein, Nadia Heninger, Paul Lou,
and Luke Valenta
"Quantum Computing in the NISQ era and beyond",
John Preskill
"We've Entered a New Era of Quantum Computing",
Gizmodo overview of Preskill's paper
"Post-quantum RSA" discusses 1-terabyte RSA keys. Key generation required that they first generate a set of 231 unique 4096-bit primes congruent to 2 mod 3. Prime generation took 1,975,000 core-hours, which took four months using spare compute capacity of a 1,400-core cluster.
Once they had the 231 4096-bit primes, they started generating the 1 TB key. Their implementation used an Ubuntu system with 24 cores running at 3.40 GHz (4 Intel Xeon E7-8893 v2 processors), 3 TB of DRAM, and 4.9 TB of swap memory on enterprise solid-state drives. It took about four days to generate the 1 TB key.
The actual encryption took only a few hours.
Post-quantum asymmetric cryptography would be used only to encrypt a randomly generated session key that will be used with a symmetric cipher. Symmetric ciphers are thought to be relatively safe against quantum computing attacks. Grover's algorithm on a quantum computer would speed up the search for a symmetric key, but only to the point that keys would need to be doubled in length to preserve the same resistance to attack.
Moving On
On to the next page: One-Time PadsAt this point, you may be ready to move on to the next page, which is about one-time pads.
Specific Ciphers and Key LengthBut to dig deeper, scroll down to read about identity-based encryption, where to learn more about the available ciphers, and how AES and RSA work.
See a later page for specific ciphers and their key lengths.
Identity-Based Encryption
Identity-Based Encryption (IBE) is asymmetric cryptography where the public key is based on some unique public information about the user. Perhaps an electronic mail address, or a telephone number including country code.
There must be a completely trusted third party called the Private Key Generator (PKG). The PKG has published a master public key and retained the corresponding master private key, also called simply the master key.
Any party can generate any other party's public key using the master public key and the public information about the other party.
Only the PKG can generate the corresponding private keys using the master private key and the public information about parties.
Advantages:
- Parties may encrypt messages or verify signatures without distributing public keys.
- There is inherent key escrow, because any private key can be generated by the PKG.
Disadvantages:
- All security relies on the security of the PKG, because access to the master private key is effectively access to all parties' private keys.
- A secure channel is needed to communicate the private key to each party.
- The single central PKG may be overworked in very large systems.
-
There is inherent key escrow,
because any private key can be generated by the PKG.
(Yes, this appears as both an advantage and disadvantage — key escrow can be both)
For more information:
- Adi Shamir's paper, "Identity-Based Cryptosystems and Signature Schemes", Advances in Cryptology: Proceedings of CRYPTO 84, Lecture Notes in Computer Science, 7:47-53, 1984.
- Clifford Cocks, "An Identity Based Encryption Scheme Based on Quadratic Residues", Proceedings of the 8th IMA International Conference on Cryptography and Coding, 2001.
- The Wikipedia page on IBE.
Amazon
ASIN: 1119096723
Amazon
ASIN: 0849385237
Getting the Ciphers
For an enormous list, see Applied Cryptography by Bruce Schneier. The book contains source code for many serious cryptographic algorithms.
For something more mathematically rigorous, see the CRC Handbook of Applied Cryptography by A. Menezes, P. van Oorschot and S. Vanstone. A free PDF version can be downloaded for personal use.
How AES Works
Jeff Moser has written a great explanation of the history and operation of the Advanced Encryption Standard, A Stick Figure Guide to the Advanced Encryption Standard (AES).