Rotors of M-209 cipher machine.

How RSA Works

Using RSA, an Asymmetric Cipher

Jump ahead to see how the RSA cipher works

Asymmetric ciphers like RSA or Rivest Shamir Adelman, and ECC or elliptic-curve cryptography are intended for session key agreement and for digital signature creation and verification.

They are not suited for encrypting the data itself — email messages or files or database contents or data streams or archives — whatever the sensitive data content may be.

Instead, asymmetric ciphers work on keys and hash outputs. Their plaintext and ciphertext are typically 256 bits long, and no more than 512 bits long — 32 to 64 bytes, maximum.

A digital signature provides Proof of Content and Proof of Origin.

Proof of Content comes from the hash function. Even a single-bit change in the content should lead to approximately 50% of the hash output bits changing. The receiver knows the message content did not change.

Proof of Origin comes from the asymmetric encryption and decryption. When the receiver decrypts the signature using the purported sender's public key, and finds a match to the message hash, this verifies that the signature must have been created with the sender's private key. The receiver knows who sent the message.

Either RSA or ECC could be used to encrypt and create the digital signature, and then to decrypt and verify it.

Let's see how RSA is used in digital signatures, and then we'll see how the cipher works.

How asymmetric encryption provides both confidentiality and sender authentication.

Follow the steps in the below procedure.

How Digital Signatures Work

Alice has a message for Bob. Alice wants to make sure that Bob can verify that the message came from Alice and has not been modified.

  1. Alice creates the message. If confidentiality is also needed, the message should be encrypted before continuing.
  2. Alice calculates the hash of the message, typically using SHA-2-256.
  3. Alice then encrypts the hash, using RSA or another asymmetric cipher. The encryption key is Alice's private key. No one else knows that key, so only Alice can do this operation in that specific way. The ciphertext output is the digital signature.
  4. The message and digital signature are transmitted to Bob. They could be two components of an email message, two files combined into one archive, two separate files to download from a server, as long as Bob ends up with both pieces. Metadata added to the signature explains which hash function, which asymmetric cipher, and the identity of the creator.
  5. Bob verifies the digital signature. The first step is calculating the hash of the received message, which is suspected of being modified, or being sent by someone masquerading as Alice, or both.
  6. Bob then decrypts the digital signature. The decryption key is Alice's public key. Bob needs to be certain that what he is using is really Alice's public key.
  7. If Bob finds that the received message hash is identical to the signature decryption output, he concludes that the message was not changed (because of the hash being so sensitive to even a single-bit change), and it really came from Alice (because decrypting with what he knows to be Alice's public key implies encryption with Alice's private key, which no one else knows).

Public-Key Infrastructure and Digital Certificates

The practical difficulty is that you must be absolutely certain that what you have really is the other party's public key.

Bob and Alice could meet face-to-face and exchange public keys. However, this doesn't scale beyond a few local acquaintances with plenty of free time.

Instead, everyone's public key should be in the form of a digital certificate. That's a data structure containing information about the owner or "subject" (name, address, email address, URL, etc), their public key, and the name of the certificate issuer. We call the issuer the Certificate Authority or CA. All of certificate data structure is wrapped in a digital signature created by the CA, so the content and the issuer identity can be verified.

Public CA
Failures

Everyone must completely trust the CA as an issuer of credentials — a trusted introducer, if you will. The CA says things like "Alice exists" and "The following is Alice's public key", and everyone believes it. This is a completely non-technical human trust issue, confidence that the CA will never makes errors or tells lies. Sometimes this goes wrong.

Everyone then needs a copy of the CA's public key, which will be in the form of a digital certificate that they issued to themselves. This is a completely technical issue, confidence in the mathematics and logic meaning that the cryptography is strong enough.

CA/Browser
Forum

All this is built into web browsers and email tools. The makers of Chrome, Firefox, and other browsers have decided that everyone should trust a group of root-level CAs such as Comodo, DigiCert, VeriSign, and others, and so their certificates are included in the browsers. Similarly, the Mozilla organization's Thunderbird and other email tools contain certificates.

Any organization can create its own PKI or Public-Key Infrastructure. They would establish an in-house CA, and install its certificates in all of the organization's browsers and email tools.

Supporting Encryption with RSA

Another possible use for asymmetric ciphers would be to agree on a shared secret session key. That key would be used with AES or another symmetric cipher to encrypt some data.

Consider the above situation and diagram, where Alice is sending a message to Bob. If the message is sensitive and should be encrypted:

  1. Alice obtains a copy of Bob's public key. If it's in the form of a certificate from a CA that Alice trusts, then Alice can verify that it really is Bob's.
  2. Alice generates a random 256-bit key to be used only for this one message or transferred file.
  3. Alice creates a short message containing that key, saying something like "Let's encrypt the data using the AES cipher in CBC mode using key 0x34a09cc1eff4832...", and then encrypts that message using Bob's public key. And really, only the session key itself needs to be encrypted. If we think we have to hide our choice of cipher and mode, then apparently we think we're using a weak choice, and Kerchoff's principle tells us that we should instead use a cipher without a known or suspected flaw. Claude Shannon reached a similar conclusion in the 1940s.
  4. Alice encrypts the data itself according to the message.
  5. Alice sends the asymmetric-encrypted key message combined with the symmetric-encrypted data to Bob. This is hybrid cryptography, combining symmetric and asymmetric. In the scenario of the above diagram, that hybrid-encrypted message would be accompanied by its digital signature.
  6. Only Bob has a copy of Bob's private key. And so only Bob can decrypt the short key message. That explains how to decrypt the data, and so Bob decrypts and reads the file or message.

Ephemeral Keys and Perfect Forward Secrecy

The above explains how you could use RSA to agree on a shared secret session key. However...

Utah Data Center

Consider that your adversary might be recording every ciphertext message transmitted, in the hopes of eventually decrypting something.

If your long-term private key is ever exposed, then every session key created using the above procedure is exposed, and all the messages can be decrypted.

You instead should insist on only using ephemeral session keys. An ephemeral key is used only once, and if your private key is exposed some day, that can't be used to figure out what the ephemeral key was.

I would have called it "Reverse Secrecy", because it protects secrets going back into the past. But they didn't ask me.

Using nothing but ephemeral keys provides Forward Secrecy, also called Perfect Forward Secrecy or just PFS. The concept is: "A breach today does not expose secrets protected in the past."

Diffie-Hellman key agreement ECDHE, or Elliptic-Curve Diffie-Hellman Ephemeral key agreement

The Diffie-Hellman algorithm lets two parties securely agree on an ephemeral shared secret key, this is called DHE. Classic Diffie-Hellman calculates exponential powers of large integers followed by modulo operation. ECDHE or Elliptic-Curve Diffie-Hellman Ephemeral key agreement can run much faster.

TLS 1.3 requires PFS and ephemeral keys agreed upon by either DHE or ECDHE.

However, there is still use for RSA! You don't want to exchange secrets with strangers. The key agreement stage needs authentication to prevent Man-in-the-Middle or MitM attacks. So, you combine DHE or ECDHE for the key negotiation with RSA or an elliptic-curve method for authentication.

So, let's see how RSA works.

How RSA Works

Setting up RSA keys

  1. Randomly select two distinct large prime numbers \( p \) and \( q \). These should be large numbers, similar in magnitude but differing in length by a few digits to make factoring even more difficult. For example, in base 10, a 615-digit prime \(p\) and a 617-digit prime \(q\) for a 1232-digit \(n\), approximately \( 2 ^ {4096} \) and thus a 4096-bit modulus or key size. Yes, the numbers become very large.
  2. Compute: \( n = p \cdot q \)
    \( n \) will be used as the modulus for both the private and public keys.
    Security comes from:
    • The difficulty of factoring the product of two large prime numbers.
    • The ambiguity introduced by the modulo function.
  3. Compute \( \lambda (n) \), Carmichael's totient function for \(n\). That's equal to the least common multiple of \( (p-1) \) and \( (q-1) \):
    \( \lambda (n) = l.c.m.(p - 1, q - 1) \)
    You can compute \( l.c.m.(a, b) \) by first computing the greatest common divisor:
    \( l.c.m.(a, b) = \frac{ | a \cdot b | }{ g.c.d.(a, b) } \)
    and so:
    \( l.c.m.(p-1, q-1) = \frac{ | (p-1) \cdot (q-1) | }{ g.c.d.((p-1), (q-1)) } \)
  4. Choose an integer \( e \) such that:
    • \( 1 < e < \lambda (n) \)
    • \( e \) and \( \lambda (n) \) are coprime, they share no factors other than 1. A popular choice is \( e = 2^{16} + 1 = 65537 \)
      Some applications choose smaller \( e \) such as \( 3 \), \( 5 \), or \( 17 \), to make encryption and signature verification faster on small devices like smart cards. However, small values of \(e\) are less secure in some settings.
  5. Compute \( d \) to satisfy the congruence relation:
    \( ( d \cdot e )\mod{\lambda (n)} = 1 \)
    or, put another way:
    \( d \cdot e = 1 + k \cdot \lambda (n) \)
    for some integer \( k \).
    Also note that in computing \( d \) in steps 3-5 you could use the Euler totient function:
    \( \varphi (n) = ( p - 1 ) \cdot ( q - 1) \)
    in place of:
    \( \lambda (n) = l.c.m. ( p - 1, q - 1) \)
    as it would be a common multiple instead of the least common multiple.
Number Theory, by George E. Andrews
Amazon 0486682528

The exponent \( e \) and modulus \( n \) make up the public key.

\( d \) is the private key.

The first few chapters of Number Theory by George Andrews can give you plenty of background on the mathematics.

The first 29 pages, meaning the first two chapters, takes you through the Fundamental Theorem of Arithmetic, proving that every number has one unique factorization. Continuing on through page 114, through chapter 8, takes you through much more — Fermat's Little Theorem; Wilson's theorem; congruences; residue systems; the Chinese Remainder Theorem; Euler's totient function \(\varphi(n)\); the number \(d(n)\) of divisors of \(n\); the sum of those divisors \(\sigma(n)\); primitive roots; prime numbers and \(\pi(x)\), the number of primes that do not exceed \(x\); proof that there are infinitely many prime numbers; and some unsolved problems about primes. Not that you really need to go through any of that to follow how RSA works.

For a trivial example with very small prime factors, let's use: $$ \begin{aligned} p &= 5 \\ q &= 11 \\ n &= p \cdot q = 55 \\ \varphi ( n ) &= (p - 1) \cdot (q - 1) \\ & = 4 \cdot 10 \\ & = 40 \\ \lambda ( n ) &= l.c.m. (p - 1, q - 1) \\ &= l.c.m. (4, 10) \\ &= 20 \end{aligned} $$

We need: $$ 1 < e < \lambda (n) $$ where \(e\) and \( \lambda (n) \) are coprime, and so: $$ e \in \{3, 7, 9, 11, 13, 17, 19\} $$

\( d \) is defined by: $$ ( d \cdot e )\mod{20} = 1 $$ So, the possible key pairs are:

public
\( n, e \)
private
\( d \)
55, 37
55, 73
55, 99
55, 1111
55, 1317
55, 1713
55, 1919

Using the keys for RSA encryption and decryption

  1. Sender obtains receiver's public key \( (n,e) \).
  2. Sender converts message M into a number \( m < n \) with an agreed-upon reversible padding scheme. Since the point is usually to agree on a 256-bit symmetric session key, or to encrypt a SHA-2-256 hash value to create a digital signature, the typical message is a 256-bit unsigned integer, a number in the range \( \{ 0, 1, 2, ..., 2^{256} - 1 \} \)
  3. Sender computes ciphertext \( c \) using the receiver's public key \(e\) and the modulus \(n\) by:
    \( c = ( m^e )\mod{n} \)
  4. Sender transmits ciphertext to receiver.
  5. Receiver calculates cleartext by using their private key \(d\) and the modulus \(n\):
    \( m = ( c^d )\mod{n} \)
  6. Receiver reverses padding function to recover message M from cleartext \( m \).

As an almost as trivial example:

The number \(855^{2753}\) is enormous, approximately \( 5 \times 10^{8071} \), but you can do these calculations with the bc command. Leave off the common -l option so it will be set up for large integer arithmetic rather than floating-point:

$ bc
(123^17) % 3233
855
(855^2753) % 3233
123 

Post-Quantum Cryptography

Asymmetric ciphers rely on "trap-door problems", math problems that are enormously more difficult to solve in one direction than the other. RSA uses the factoring problem, elliptic-curve cryptography relies on the difficulty of the discrete logarithm problem.

If an attacker could factor an RSA modulus \(n\) into its prime factors \( p \cdot q \), they could use the public key \(e\) to derive the private key \(d\). We are gambling on two assumptions to keep us safe.

First, that factoring is and will remain too difficult — factoring \( n = p \cdot q \) would always take too long. Either the attacker would get frustrated and give up, or if they did keep at it until they succeeded, it would have taken so long that we no longer care about those secrets. Factoring and prime numbers have fascinated mathematicians since Ancient Greece, so we aren't too worried about a sudden mathematical discovery.

Second, that no side-channel attack will be discovered — no one will figure out an alternative method to find the private key \(d\) without having to factor \( n = p \cdot q \). RSA is intentionally simple enough that everyone feels reasonably comfortable about this assumption. But you can't prove the non-existence of this type of attack.

Peter Shor's initial publication of his algorithm, 1994 An update to Shor's original paper, 1995 Increasing the speed of Shor's algorithm, 1998

There is, however, a growing third worry. Algorithms have been devised to solve some enormously difficult problems when run on a general-purpose quantum computer. One example is Shor's algorithm. It could solve the factoring problem in polynomial time, an enormous speed-up. More recent work has shown that it could also be used to solve the discrete logarithm problem used by ECC. We don't yet have a general-purpose quantum computer, one that has enough stable cubits to solve non-trivial problems and run Shor's algorithm. But advances continue.

In 2015 the NSA announced that ECC wasn't a backup for RSA when facing the threat of quantum computing cryptanalysis, to the point that government agencies and contractors considering a migration from RSA to ECC shouldn't bother. They later modified the page, and later yet they took it down, see the archived update here.

We need post-quantum or quantum-safe or quantum-resistant asymmetric ciphers. Several families of post-quantum cipher algorithms are being explored: lattice-based cryptography, code-base cryptography, multivariate polynomial cryptography, and others. See the Post-Quantum Crypto conference series for details.

U.S. NIST started a project to standardize Post-Quantum Cryptography algorithms in 2016. They hope to have draft standards out in 2022 to 2024.

The Open Quantum Safe project is an open-source project supporting the development and prototyping of quantum-safe or quantum-resistant cryptographic algorithms. They have the open-source liboqs C library ready for download and use, along with prototype integrations into protocols and applications including OpenSSL. See the 2016/2017 paper for a detailed description of the project.

Post-quantum asymmetric cryptography would be used only to encrypt a randomly generated session key that will be used with a symmetric cipher. Symmetric ciphers are thought to be relatively safe against quantum computing attacks. Grover's algorithm on a quantum computer would speed up the search for a symmetric key, but only to the point that keys would need to be doubled in length to preserve the same resistance to attack. See Daniel J. Bernstein's paper Grover vs. McEliece for applications of Grover's algorithm to other cryptosystems.

Quantum Key Distribution

The only alternative to a quantum-safe or quantum-resistant asymmetric cipher requires physics.

Quantum Key Distribution or QKD is a physics-based method to securely communicate a randomly selected symmetric key to the other party.

QKD uses single-photon signaling between the legitimate users at either end of a fibre link. This involves either polarized single photons in the BB84 protocol, or entangled photon pairs in the E91 protocol. An intruder could tap the fibre and monitor the key exchange, but that would corrupt the signal and warn both ends that the key was compromised and must not be used.

For a discussion of its security, see "The security of practical quantum key distribution", Reviews of Modern Physics, 81, pg 1301, 29 Sep 2009 (also here). For a rebuttal of its supposed absolute security, see Dan Bernstein's paper: "Is the security of quantum cryptography guaranteed by the laws of physics?"

Many people think it sounds like science fiction or at least impractical, but it's been in use since 2004. In that year it was used to protect a transaction between the Vienna City Hall and the headquarters of Bank Austria Creditanstalt.

That seems to me to have been a proof of concept organized by the city to support development in a local university. But since 2007 the government of Geneva, Switzerland, has used QKD to protect election results.

In 2007, China began deploying a QKD network in Beijing. A larger network with over 45 nodes began operating in Hefei in 2010. Another large network began operating in Jinan in 2014, joining 28 organizations, including government agencies such as the China Banking Regulatory Commission plus commercial customers including the China Industrial and Commercial Bank, and the Xinhua News Agency. In 2016 China was planning to complete the world's longest QKD network with a 2000-kilometer backbone joining Beijing, Jinan, Hefei, and Shanghai.

In 2010, Japan began operating a QKD network developed by NEC, Toshiba, NTT and others and joining several nodes in Tokyo.

In 2013, Battelle Memorial Institute deployed a QKD network joining its headquarters in Columbus, Ohio, to a manufacturing facility 20 miles away.

In 2015 ETSI, the European Telecommunications Standards Institute, published a number of quantum key distribution specifications.

In 2016, a South Korean QKD network began operating between Seoul, Bundang, and Suwon.

In 2016, China was working on QKD between satellites and ground stations. See "China's quantum space pioneer: We need to explore the unknown", Nature, 13 Jan 2016.

In 2017, China demonstrated QKD between orbit and the ground in their Liángzĭ kēxué shíyàn wèixīng or Quantum Experiments at Space Scale. The Chinese Academy of Sciences operates the Micius satellite and the Chinese ground stations, the University of Vienna and the Austrian Academy of Sciences run the European ground stations.

Wikipedia Science Nature BBC Reuters CNBC

By 2020, the range was extended to over 1,100 kilometers.

New Scientist Nature

ID Quantique in Geneva, Switzerland, sells QKD systems plus random-number generators based on quantum physics.