Securing Web Browsing with TLS and HTTPS
How do web browsers secure my use of the Internet?
The obvious part, what you probably have heard about,
is that they can encrypt communication between
your computer or smart phone and the web server.
However, they must be very careful to first
verify the server's identity.
After all, it makes no sense to whisper secrets to strangers!
Let's look at how your browser makes secure connections, step by step. The initial steps were done well in advance. The people running the server had to set things up correctly. And before that, the people who built your web browser had to do some steps carefully.
When all this is done correctly, you can have high confidence that you're really communicating with the proper server, and that no one else could understand the data passing back and forth.
I say "the browser" throughout the following, as that is probably what you're thinking about. But it could be any client program that supports SSL, or really TLS.
We're all in the habit of saying "SSL" although no servers should support SSL today. TLS or Transport-Layer Security is the trusted replacement for SSL.
1: The web site owner creates a key pair
Let's use Citibank as an example. That's an enormous bank, recognized by many people. As a bank, they have to be careful. As a modern business, they have to do business across the Internet.
They create one or more public/private key pairs. Key pairs are used by asymmetric algorithms such as RSA and ECC. That is, Rivest-Shamir-Adelman and Elliptic Curve Cryptography. These days you might consider it best practice to generate a key pair of each type.
They must use up-to-date software, typically the OpenSSL package. They also must select an adequately long key, one that will be strong enough. For example, at least a 2048-bit key for RSA, a 4096-bit key would be better. Elliptic curve ciphers provide roughly equal security with much shorter keys. A 384-bit key is recommended for ECC.
They must be very careful to treat the keys as their names suggest. The public key will be made public, but the private key must be protected. The server needs to have access to the private key, but ownership and file permissions must limit access to a highly privileged identity. Also, they must keep the software on the server up to date. This is especially so for the web server itself (e.g., Apache or NGINX), the cryptographic libraries (e.g., OpenSSL or GnuTLS), and the operating system itself.
The public key and private key are related to each other. You can encrypt a piece of data with either key. Whichever one you use, you must encrypt with the other key of the pair.
There is no sense of "I'm getting closer to the needed key." If just one bit is wrong, the result is as random looking as when half the bits are wrong. That means that you can't search for the solution.
It is possible in theory to start with the public key, solve a math problem, and get the private key. But in practice you can't do this.
Asymmetric algorithms are based on trapdoor functions. They are easy to solve in one direction, but enormously difficult to solve in the other direction. They are so difficult to solve that we can safely consider them as one-way functions. Practically speaking, it can't be done.
RSA security is based on the difficult of factoring. Start with two large randomly selected prime numbers. Let's say they're each about 150 digits long. It's actually quite easy and fast for a computer to multiply them to produce a 300-digit number.
However, it would take an enormous amount of computing time to start with that 300-digit product and discover the two large prime factors.
So, if the public key is effectively the 300-digit product, and the private key is the pair of 150-digit factors, we feel comfortable. Yes, in theory an attacker could discover our private key, but in practice we aren't going to worry about that happening. If you do worry, then you can use bigger keys, meaning even bigger numbers. It will slow things down a little, but that's the cost of even higher security.
ECC uses different math to accomplish the same thing. Its security is based on the difficult of solving the discrete logarithm problem.
2: The web site owner visits the CA
In my example, someone from Citibank approaches a Certificate Authority or CA. There are several trusted CAs. Citibank was using DigiCert when I wrote this page. They identify themselves, describe how they generated the key pairs, and how they will set up and operate their server to protect the private keys. They give the public keys to the CA.
A bank will want what's called an Extended Validation or EV certificate.
The CA will verify that the organization's legal identity, and both physical and operational existence. They will also verifying that the people making the request are authorized by the organization. A browser will indicate an EV certificate with a green padlock and the organization's name next to the URL.
My site has just a Domain-Validated or DV certificate. I only have to prove that I control the DNS records and/or the page content for cromwell-intl.com. I get just a green padlock and the word "Secure". However, the DV certificate is free, while an EV certificate definitely isn't. A browser may be fussy about allowing you to enter data into a web page form named "credit-card-info" if the server doesn't have an EV certificate.
The CA will have a Certificate Practices Statement or CPS. That defines what a potential customer must do to get a certificate.
3: The CA is satisfied
In this story DigiCert says, "Citibank exists, they are a reputable bank, these people are authorized to make this request, and they have been careful with their key generation. Please give us US$ 300 (or whatever) and we'll give you a certificate good for 24 months." The bank pays.
4: The CA creates the certificate
The CA uses software to create a digital document or file. Again, OpenSSL has everything you need. This file is called the digital certificate. It is in a format called X.509v3.
The obvious point is that the certificate contains a public key. So, if the web site owner wants to support both RSA and ECC, they need to get two certificates. For example, if you test my server at the excellent Qualys SSL Labs on-line test service, you should see that my server sends two certificates.
The certificate must also contain quite a bit of metadata. This includes:
- The identity of the CA.
- A serial number for the certificate.
- The start and end dates of the certificate's validity.
- The name, address, city, state or province, and country for the web site owner, called the "Subject".
The DNS names to which the certificate applies.
For Citibank this might include:
In some situations, a "wildcard" certificate can make things a little cheaper and simpler. For example,
- What type of public key it is (RSA or ECC), and the public key itself.
- A URL for the Certificate Practices Statement or CPS.
One or more URLs for the
Certificate Revocation List
This is a list of certificate serial numbers and
corresponding public keys that have been revoked.
Certificates should be revoked when the owner:
- Realizes that their server may have been hacked, or
- Realizes that a software vulnerability may have exposed the private key (for example, the Heartbleed bug), or
- Behaves in a way that violates the CA's CPS.
- One or more URLs for the Online Certificate Status Protocol or OCSP. That's a protocol that makes for a more efficient way of checking whether a certificate has been revoked.
- Plus some other odds and ends, and then most importantly...
- The CA's digital signature of all the other data within the certificate. The digital signature is the result of computing the cryptographic hash (e.g., SHA-2-256) and then encrypting that with the CA's private key. The hash step verifies the integrity of the content, and encryption with the CA's private key verifies the source of the signature. In other words, you know that the certificate is precisely what the CA assembled.
The CA sends the certificate to the owner. They install the certificate in the appropriate location on the file server. Now we're ready to make a connection.
5: The user makes several assumptions
When you sit down to use a browser on a computer or smart phone, you are really making a number of assumptions:
- My computer or smart phone is running unhacked Linux or macOS or Windows or Android or iOS or whatever.
- My browser is really fully-patched Chrome or Firefox or Safari or whatever.
- The people who made my browser (e.g., Google in the case of Chrome) made appropriate decisions about which CAs are trustworthy. These days, this is done by the CA/Browser Forum.
- By some highly trusted method, the real public keys of those CAs have been passed to the browser maker and embedded within my browser.
To be honest, we seldom think about these assumptions at all. But they definitely are there!
See my page describing some PKI failures to see how CA errors can invalidate some of these assumptions.
6: You enter some URL, the browser reconnects, and you may be redirected
Let's say you enter just
in the URL box in the browser, because that's what you
saw in some advertisement.
The browser adds the HTTP protocol and connects to that:
That connection works, but the browser is immediately
redirected to a different hostname and protocol.
When I was writing this page, that was:
They redirect you to HTTPS for security reasons. They support
citibank.com as an easily remembered
or guessed entry point for marketing reasons,
and then redirect you as needed for their
preferred naming and organization scheme.
7: The server sends its certificate
The server sends its certificate. The browser understands the X.509v3 format, and finds that the certificate was issued by DigiCert.
It also finds the CRL and OCSP URLs, and it should immediately verify that the certificate has not been revoked, that it is still valid. Yes, you can disable this, and HTTPS pages will load a little faster. But you would have no reason to trust what's happening then. Don't disable this!
8: The browser finds the relevant CA certificate
The browser needs the CA's public key. This is in the form of a self-signed certificate. "We are DigiCert. This is DigiCert's public key. Trust us, because we're DigiCert, and our certificate was stored in your browser."
9: The browser verifies the site certificate
The browser uses its copy of the CA's public key to verify the digital signature wrapped around the digital certificate content. The browser concludes, "This is a valid (as per the cryptographic math) certificate from a trusted (as per human trust) CA, so now I really know the Citibank public key." However...
Anyone could set up a fake site and install a copy of Citibank's digital certificate. So while your browser now knows the Citibank public key, it doesn't know whether it's talking to Citibank.
You may wander into a site whose digital certificate can't
For example, the U.S. Department of Defense runs its own CAs,
and they don't ask the browser makers to include their
You can find examples of this by asking Google:
U.S. DoD maintains and controls their client computers. So they have extra steps in their processes to add the DoD CA certificates to those systems. Following the Google results in that above query from a non-DoD desktop will lead to intentionally intimidating warnings from your browser.
You won't get those warnings on a browser with the DoD certificates installed.
Note that the warnings are from the browser and not from the DoD server itself.
10: The browser challenges the server to prove its identity
The browser generates a unique challenge. It's a 28-byte or 224-bit random sequence. It sends that number to the server, effectively saying "If you're who you say you are, you have the private key for online.citibank.com. Please encrypt this large number with that private key and send it to me."
11: The server answers the challenge
The server does that, encrypting the challenge with its private key and sending back the result.
Yes, an attacker could intercept network traffic and watch all this happen. The attacker would see the challenge and the corresponding result. That's no problem as long as:
- The client doesn't repeat challenges. They should be derived from a good random number generator, which should be provided by the operating system.
- We are using encryption algorithms that are strong against known plaintext attacks.
12: The browser verifies the result
The browser decrypts the server's response with the public key, and verifies that it gets the same challenge it sent. Only now can it conclude that it's really talking to a Citibank server.
13: The browser and server negotiate a cipher suite
The client sends a message to the server effectively saying "Here's all the crypto I know how to do." Maybe something like:
- Encrypt using specific ciphers (e.g., 3DES, AES) operating in specific modes (e.g., CBC, GCM) with specific key sizes (e.g., 128, 192, 256 bits).
- Exchange or negotiate keys using specific algorithms (e.g., RSA, Diffie-Hellman, Diffie-Hellman Ephemeral, Elliptic Curve Diffie-Hellman Ephemeral).
- Verify integrity using specific hash functions (e.g., SHA-1, SHA-2-256).
- Other odds and ends, like supported versions of SSL and TLS, compression algorithms, and other details.
The server then responds with a similar message. It can say that it supports a large number of algorithms, listed in a preferred order.
OK, to be honest... These "Client Hello" and "Server Hello" messages actually happen at the very beginning of the connection. But the story makes more sense if I put off delaying their description to now. They also include "nonces", a strange word meaning "number used only once". The nonces are those highly random 28-byte strings.
14: The browser starts the encryption
The browser then starts the key exchange, more accurately called the key negotiation. It uses the best method the two ends have in common. Hopefully there are several methods that both ends support. In that case, it uses the most preferred one they have in common. In other words, the server specifies which methods are better than others.
It is possible that the two end points don't have enough in common. In that case, the connection can't continue.
It is very appropriate for a bank server to effectively say "I'm sorry, but you are unable to do what I require for an appropriate level of security." Hopefully that would direct the browser to an HTTP page telling the user to please upgrade.
For something like a state government server, as long as the pages contain no personal or financial data, it might be appropriate to support rather outdated protocols and algorithms.
15: Everything is encrypted
Once the key exchange finishes, all the traffic is encrypted.
You can explore this by first looking at the certificate for a web server. You could do that for this page, as my server runs HTTPS. Click or right-click on the padlock or "Secure" next to the URL and look at the certificate. Look at the overview, and then at the details.
You can also go into your browser settings and browse through its stored CA certificates.
To see even more, run Wireshark to capture your connection to an HTTPS server. You will be able to see the details up through the key exchange. Beyond that, it will just be TLS containing encrypted payload.