Hex dump of Gibe-F worm.

How Browsers Use HTTPS and TLS

Securing Web Browsing with TLS and HTTPS

How do web browsers secure my use of the Internet? The obvious part, what you probably have heard about, is that they can encrypt communication between your computer or smart phone and the web server. However, they must be very careful to first verify the server's identity. After all, it makes no sense to whisper secrets to strangers!

Let's look at how your browser makes secure connections, step by step. The initial steps were done well in advance. The people running the server had to set things up correctly. And before that, the people who built your web browser had to do some steps carefully.

When all this is done correctly, you can have high confidence that you're really communicating with the proper server, and that no one else could understand the data passing back and forth.

I say "the browser" throughout the following, as that is probably what you're thinking about. But it could be any client program that supports SSL, or really TLS.

We're all in the habit of saying "SSL" although no servers should support SSL today. TLS or Transport-Layer Security is the trusted replacement for SSL.

1: The web site owner creates a key pair

Let's use Citibank as an example. That's an enormous bank, recognized by so many people. As a bank, they have to be careful. As a modern business, they have to do business across the Internet.

They create one or more public/private key pairs. Key pairs are used by asymmetric algorithms, RSA and ECC. That is, Rivest-Shamir-Adelman and Elliptic Curve Cryptography. These days you might consider it best practice to generate a key pair of each type.

They must use up-to-date software, typically the OpenSSL package. They also must select an adequately long key, one that will be strong enough. For example, a 2048-bit RSA key and a 384-bit ECC key.

They must be very careful to treat the keys as their names suggest. The public key will be made public, but the private key must be protected. The server needs to have access to the private key, but ownership and file permissions must limit access to a highly privileged identity. Also, they must keep the software on the server up to date. This is especially so for the web server itself (e.g., Apache or NGINX), the cryptographic libraries (e.g., OpenSSL or GnuTLS), and the operating system itself.

The public key and private key are related to each other. You can encrypt a piece of data with either key. Whichever one you use, you must encrypt with the other key of the pair.

There is no sense of "I'm getting closer to the needed key." If just one bit is wrong, the result is as random looking as when half the bits are wrong. That means that you can't search for the solution.

It is possible in theory to start with the public key, solve a math problem, and get the private key. But in practice you can't do this.

Asymmetric algorithms are based on trapdoor functions. They are easy to solve in one direction, but enormously difficult to solve in the other direction. They are so difficult to solve that we can safely consider them as one-way functions. Practically speaking, it can't be done.

RSA security is based on the difficult of factoring. Start with two large randomly selected prime numbers. Let's say they're each about 150 digits long. It's actually quite easy and fast for a computer to multiply them to produce a 300-digit number.

However, it would take an enormous amount of computing time to start with that 300-digit product, and discover the two large prime factors.

So, if the public key is effectively the 300-digit product, and the private key is the pair of 150-digit factors, we feel comfortable. Yes, in theory an attacker could discover our private key, but in practice we aren't going to worry about that happening. If you do worry, then you can use bigger keys, meaning even bigger numbers. It will slow things down a little, but that's the cost of even higher security.

ECC uses different math to accomplish the same thing. Its security is based on the difficult of solving the discrete logarithm problem.

Yes, there's a potential logic problem here. We're only saying, "If you can factor 300-digit numbers, then you could break the RSA cipher security." But there's no proof that's the only way to derive the private key from the public key. What if some mathematicians have discovered a side door to get around the difficulty of factoring, or of discrete logarithms in the case of ECC? No one has said anything in public about how such a thing might be done, but it might be possible.

2: The web site owner visits the CA

In my example, someone from Citibank approaches a Certificate Authority or CA. There are several trusted CAs. Citibank was using DigiCert when I wrote this page. They identify themselves, describe how they generated the key pairs, and how they will set up and operate their server to protect the private key. They give the public keys to the CA.

A bank will want what's called an Extended Validation or EV certificate.

The CA will verify that the organization's legal identity, and both physical and operational existence. They will also verifying that the people making the request are authorized by the organization. A browser will indicate an EV certificate with a green padlock and the organization's name next to the URL.

My site has just a Domain-Validated or DV certificate. I only have to prove that I control the DNS records and/or the page content for cromwell-intl.com. I get just a green padlock and the word "Secure". However, the DV certificate is free, while an EV certificate definitely isn't. A browser may be fussy about allowing you to enter data into a web page form named "credit-card-info" if the server doesn't have an EV certificate.

The CA will have a Certificate Practices Statement or CPS. That defines what a potential customer must do to get a certificate.

3: The CA is satisfied

In this story DigiCert says, "Citibank exists, they are a reputable bank, these people are authorized to make this request, and they have been careful with their key generation. Please give us US$ 300 (or whatever) and we'll give you a certificate good for 24 months." The bank pays.

4: The CA creates the certificate

The CA uses software to create a digital document or file. Again, OpenSSL has everything you need. This file is called the digital certificate. It is in a format called X.509v3.

The obvious point is that the certificate contains a public key. So, if the web site owner wants to support both RSA and ECC, they need to get two certificates. For example, if you test my server at the excellent Qualys SSL Labs on-line test service, you should see that my server sends two certificates.

The certificate must also contain quite a bit of metadata. This includes:

The CA sends the certificate to the owner. They install the certificate in the appropriate location on the file server. Now we're ready to make a connection.

5: The user makes several assumptions

When you sit down to use a browser on a computer or smart phone, you are really making a number of assumptions:

PKI
Failures

To be honest, we seldom think about these assumptions at all. But they definitely are there!

See my page describing some PKI failures to see how CA errors can invalidate some of these assumptions.

6: You enter some URL, the browser reconnects, and you may be redirected

Let's say you enter just www.citibank.com in the URL box in the browser, because that's what you saw in some advertisement.

The browser adds the HTTP protocol and connects to that: http://www.citibank.com/

That connection works, but the browser is immediately redirected to a different hostname and protocol. When I was writing this page, that was:
https://online.citi.com/US/login.do
They redirect you to HTTPS for security reasons. They support citibank.com as an easily remembered or guessed entry point for marketing reasons, and then redirect you as needed for their preferred naming and organization scheme.

7: The server sends its certificate

The server sends its certificate. The browser understands the X.509v3 format, and finds that the certificate was issued by DigiCert.

It also finds the CRL and OCSP URLs, and it should immediately verify that the certificate has not been revoked, that it is still valid. Yes, you can disable this, and HTTPS pages will load a little faster. But you would have no reason to trust what's happening then. Don't disable this!

8: The browser finds the relevant CA certificate

The browser needs the CA's public key. This is in the form of a self-signed certificate. "We are DigiCert. This is DigiCert's public key. Trust us, because we're DigiCert, and our certificate was stored in your browser."

9: The browser verifies the site certificate

The browser uses its copy of the CA's public key to verify the digital signature wrapped around the digital certificate content. The browser concludes, "This is a valid (as per the cryptographic math) certificate from a trusted (as per human trust) CA, so now I really know the Citibank public key." However...

Anyone could set up a fake site and install a copy of Citibank's digital certificate. So while your browser now knows the Citibank public key, it doesn't know whether it's talking to Citibank.

You may wander into a site whose digital certificate can't be verified. For example, the U.S. Department of Defense runs its own CAs, and they don't ask the browser makers to include their public keys. You can find examples of this by asking Google:
inurl:https site:.mil
U.S. DoD maintains and controls their client computers. So they have extra steps in their processes to add the DoD CA certificates to those systems. Following the Google results in that above query will lead to intentionally intimidating warnings, unless you do that from a system with the DoD certificates installed.

10: The browser challenges the server to prove its identity

The browser generates a unique challenge. It's a 28-byte or 224-bit random sequence. It sends that number to the server, effectively saying "If you're who you say you are, you have the private key for online.citibank.com. Please encrypt this large number with that private key and send it to me."

11: The server answers the challenge

The server does that, encrypting the challenge with its private key and sending back the result.

Yes, an attacker could intercept network traffic and watch all this happen. The attacker would see the challenge and the corresponding result. That's no problem as long as:

12: The browser verifies the result

The browser decrypts the server's response with the public key, and verifies that it gets the same challenge it sent. Only now can it conclude that it's really talking to a Citibank server.

13: The browser and server negotiate a cipher suite

The client sends a message to the server effectively saying "Here's all the crypto I know how to do." Maybe something like:

The server then responds with a similar message. It can say that it supports a large number of algorithms, listed in a preferred order.

OK, to be honest... These "Client Hello" and "Server Hello" messages actually happen at the very beginning of the connection. But the story makes more sense if I put off delaying their description to now. They also include "nonces", a strange word meaning "number used only once". The nonces are those highly random 28-byte strings.

14: The browser starts the encryption

The browser then starts the key exchange, more accurately called the key negotiation. It uses the best method the two ends have in common. Hopefully there are several methods that both ends support. In that case, it uses the most preferred one they have in common. In other words, the server specifies which methods are better than others.

It is possible that the two end points don't have enough in common. In that case, the connection can't continue.

It is very appropriate for a bank server to effectively say "I'm sorry, but you are unable to do what I require for an appropriate level of security." Hopefully that would direct the browser to an HTTP page telling the user to please upgrade.

For something like a state government server, as long as the pages contain no personal or financial data, it might be appropriate to support rather outdated protocols and algorithms.

15: Everything is encrypted

Once the key exchange finishes, all the traffic is encrypted.

Explore this!

You can explore this by first looking at the certificate for a web server. You could do that for this page, as my server runs HTTPS. Click or right-click on the padlock or "Secure" next to the URL and look at the certificate. Look at the overview, and then at the details.

You can also go into your browser settings and browse through its stored CA certificates.

To see even more, run Wireshark to capture your connection to an HTTPS server. You will be able to see the details up through the key exchange. Beyond that, it will just be TLS containing encrypted payload.

To the Cybersecurity Page