The OSI Model
The OSI Model is Academic, but Useful
The OSI model of a networking protocol stack is a long-lived academic tool. It isn't practical, it doesn't describe exactly how the real protocols work, but it's useful. It definitely helps people quickly understand at least a little about the various protocols. Here it is in its simplest form:
7 | Application |
6 | Presentation |
5 | Session |
4 | Transport |
3 | Network |
2 | Data Link |
1 | Physical |
When I teach short courses, I work to teach people what they need to know to get things done. I don't try to teach people formal models or fancy jargon. But I find the OSI model, or at least the bottom 4 layers, to be very helpful for giving people a practical understanding.
In linguistic terms, the OSI model should be descriptive and not prescriptive. That is, if it provides descriptions that help you understand, that's great. But don't waste effort trying to conform to it as a formal model or requirement.
Let's look at what really matters. We'll start at the bottom and work our way up.
Layer 1: Physical
When I need to introduce networking concepts, I try to use example scenarios that people will understand. So, let's imagine that the Internet hasn't been invented yet. We're talking about the mid to late 1960s at the latest. Someone has a dream of a world-wide network that would allow you to watch funny cat videos from the other side of the world, whenever you wanted. You could watch these on your computer, and you could download and store them as computer files to watch them again.
Where do we start?
Well, we said watch them on computers and store them as files. So, we're talking about digital data, not analog video signals. Binary communications. Bits. Ones and zeros.
The very first task is to define just what makes up a one and a zero. How will we look at a signal and decide "This is a one" versus "This is a zero"?
Layer 1 or the Physical layer defines ones and zeros. We need some electronic or electro-optical or other mechanism for transmitting and receiving bits, and we need some test for the signal to decide whether it's a one or a zero.
This is an area for electrical engineers and physicists. Our bits might be defined by voltages on wires, or currents flowing along wires, or phase shifts in high frequency signals. Or, we might define bits by light pulses instead of electrical signals.
There's no single right answer. Some technology might be very appropriate in one setting, but very inappropriate in another. For example, light pulses through glass fiber make sense for point-to-point links under oceans. But it would be expensive and inconvenient to use fiber to connect several laptop computers in a coffee shop. In that setting we would prefer microwave radio signals with a useful range of no more than several tens of meters. (that is, WiFi)
All signals degrade as they travel. Electrical signals and light pulses lose their sharp definition. Radio signals weaken as they spread out. Beyond some range you can no longer reliably tell the difference between what started out as a zero and what started out as a one.
So, every so often out across your network, you need something to clean up the bit and relay it onward as an unambiguous signal. We can call such a device a repeater if it's in long point-to-point media like a cable or fiber. We might call it a hub if out network is built in a star shape, with every node connected to a central device that cleans up and forwards signals.
Good designs can reliably move a lot of data per second. That's good, because the video files will be large, and we don't want to wait too long.
Layer 2: Data Link Layer
Now we can transmit and receive a stream of bits. But we want to have more than two computers on our newly invented Internet, and we don't want to force all of them to do the same thing at the same time.
So, we need a way to direct bits to one specific device. And, in order to share the network, we need to limit how many bits will be sent at a time. That is, limit the length of any one transmission. Layer 2 groups bits into frames, and directs frames to destinations on the local network.
We're building on top of Layer 1, where we defined how much time per bit. Now we will group bits into frames, each with a maximum number of bits and thus taking no more than some maximum amount of time to transmit. That means that if the network link is currently busy with another device's bits, our device will soon get a chance to use the network.
Each frame will start with a header including two addresses, specifying which piece of hardware it's coming from, and which one it's meant for. We might call those Layer 2 addresses hardware addresses, or physical addresses, or MAC addresses standing for "Media Access Control".
Just two layers in, and we already see how the OSI model is academic but not practical. Ethernet defines Layer 1 signaling, but it also defines Layer 2 frames and addressing. They're both done by the same chips on the Ethernet card.
A repeater or hub is a dumb Layer 1 device, all it knows is how to clean up signals. Then it sends clean copies everywhere except where it came from.
A switch is a device that cleans up signals like a hub, but it also uses Layer 2 hardware addresses to decide where to forward frames. It can learn by observing traffic, so at first it has to send copies everywhere because it hasn't yet learned where anything is. As its knowledge of the network improves, more and more it is sending frames only to where they need to go. That makes the network more efficient, it makes for less interruptions to hosts attached to the network, and it makes things a little harder for would-be intruders to capture network traffic they shouldn't see.
So far, this looks promising! We can send data at high speed to specific devices. However, we have two big problems.
First, many of the network technologies we want to use are severely limited in how far they can reach. We have to be polite, we have to allow other nodes to insert packets into the network traffic. But signals can't move faster than some fraction of the speed of light (for Ethernet, about 65-75% the speed of light). So, Layer 1 and 2 design imposes limits such as "Ethernet cables can be no more than 100 meters long." Ethernet (and WiFi, and Token Ring, and many other technologies) are limited to use as a local area network or LAN.
But remember that we want to watch cat videos from the other side of the world. We need to be inventing a true "internetwork", a large-scale network made up of smaller networks, some of them small LANs with several attached hosts, others long point-to-point links. We must interconnect these networks with relaying devices.
Second, even if we had a network technology to which we could connect every computer in the world, like one enormously large Ethernet switched network, that wouldn't work. A device couldn't accomplish anything, as there would almost never be a clear time for it to send a packet. We can only have a limited number of devices on any of these networks.
There's another severe limitation to what we have so far. Hardware addresses are unique, but they're meaningless. Yes, for Ethernet and now WiFi, the first half is a manufacturer code and the second half is basically a serial number. The first half specifies the manufacturer, but that isn't useful for deciding how to forward the packet.
We need a logical addressing system, one with some meaning at various scales. That leads us to...
Layer 3: Network Layer
This layer adds a new header holding Network Layer source and destination addresses.
The first part of a Network Layer address is called the netid, it answers the question "Where?" or "On which network?" Devices called routers use the netid to decide how to forward the packet hop by hop by hop as many times as needed to get it to a router that is directly connected to the destination via a switch. That last-hop router then uses the hostid, the remainder of the address, to answer the question "Which host on this network?" and make the final delivery.
IP addresses and subnet designThere are several families of network protocols, but the group we call TCP/IP won out for the world-wide Internet. In fact, IP stands for "Internet Protocol". IP is the Layer 3 or Network Layer protocol on the Internet. Another page of mine explains how IP addresses implement logical addressing with netid and hostid. It also explains how subnet design provides a multi-level architecture.
Wow! Now we can get a packet to any device anywhere in the world! However, while that's very impressive for sending a short message, our design goal of a video file won't fit into a single packet. A packet is probably limited to no more than about 1,500 bytes, while a video file is easily one thousand to one million times that long.
Also, these are supposed to be multi-purpose devices. We want to do more than just watch cat videos on our computers. We want our computers to do multiple things at the same time. Send and receive email, submit print jobs, share files, and more. We need a way of delivering data more specifically to one activity or one program on the destination host. So we need the Transport Layer on top of this.
But first, notice that we have passed two more signs that the OSI model is just that, an impractical academic model that has its limited uses. A host uses the ARP or Address Resolution Protocol to find the hardware or MAC address for a destination host plugged into the same switch. ARP straddles Layer 2 and Layer 3.
The ICMP or Internet Control Message Protocol
is used for error reporting, plus some limited network
information retrieval, plus echoing back and forth
for the ping
command.
It's mostly about IP and Layer 3, but it's purely
meta-level and unneeded as long as things are working.
Even if needed, it doesn't make Layer 3 work.
It just helps people figure out what they need to fix
to get Layer 3 working again.
Don't get hung up on the OSI model being some ideal. It's just a tool to help you understand.
Layer 4: Transport Layer
Some people describe Layer 4's job as "software addressing". The idea is that a Network Layer protocol like IP gets the packet to the correct host, and then the layer above that directs the information to the appropriate software or process running on that host.
But, there's even more to it than that. Different applications need their own type of communication. Some need short messages, some need two-way streaming connections. I always describe this by comparing network communication to postcards and telephone calls.
Postcards can be cheap to buy, and they're the cheapest thing for individuals to mail (junk mail from businesses and charities is the only mail cheaper than postcards).
A postcard is small, you only have a very limited space on which to write a message. But that's fine, you don't have too much to say. If you really wanted to, you could send a long letter via postcards by sending several with numbers explaining the order in which to read them.
A postcard isn't guaranteed to get there, but the postal service does a pretty good job. Almost all postcards make it to their destination.
If you send one to the same person on each day of your trip, the cards might arrive out of order. But that's still probably good enough for what you want to accomplish. Again, you could write the date and time on your postcards, or number them, if you really cared about the receiver reading them in order.
On the other hand, telephone calls cost more in money and effort. People usually don't have to pay more for long-distance calls these days, and quite likely all your calls are "free", or at least your phone provider wants you to think they are. But you have to own the phone itself, and you have to pay a monthly fee to make those "free" calls. How many postcards would you have to send in a month to spend more on postage than you pay for your phone contract?
Postcards are easy, you get home and say "Oh, I got a card!" That's it. Telephone calls require both people to be available at the same time.
Then there's all the back-and-forth to start a call.
— "Hi, this is Joe, is this Jane?"
— "Yes, this is Jane."
— "Do you have a few minutes to talk about our project?"
— "Yes, I suppose so."
And only after that do they get around to really communicating.
Then there's more back-and-forth at the end of the call,
both of them verifying that the other person is really done,
nothing more to add or ask,
and we'll talk later, and you take care, OK?
However, the great thing about a phone call is that it's a two-way stream of communication. Everything one person says gets to the other end in the correct order. The other person can interact, or even interrupt.
Some of the time the higher cost and hassle of a telephone call is well worth it.
Transport Layer protocols are just like this.
Sometimes you have a quick question
that will have a short answer.
You know that YouTube has funny cat videos, you type
youtube.com
into your browser, and then
your computer has to ask
"What's the IP address for youtube.com
?
The answer is also very short,
"The IP address for youtube.com
is 172.217.7.206."
Other times you want something that streams both ways indefinitely like phone calls do, like Skype, or an interactive command-line session, or a remote desktop session. Or you want to transfer a large data file, like that video you want to download and watch later. Then you need to set up a connection, stream the data, and eventually shut the connection down.
UDP or the User Datagram Protocol is the message-oriented protocol at the Transport Layer. UDP is like postcards. Short message-oriented protocols like DNS (to look up IP addresses) and NTP (to synchronize clocks) are carried as UDP messages.
TCP or the Transmission Control Protocol is the message-oriented protocol at the Transport Layer, is the stream-oriented protocol. TCP is like telephone calls. Connection-oriented protocols like HTTP and HTTPS (web pages) and SSH (secure remote command-line access and file transfer) are carried in TCP connections.
But there's more to specify: send a short message to which process, or make a connection to which process?
Both UDP and TCP use port numbers. Each of those protocols provides a 16-bit field for both the source port (which program it's coming from) and destination port (which program it's going to). 216 = 65,536, so there's support for plenty of independent network clients and services on any one host.
Let's take my server,
cromwell-intl.com
, as an example.
Its IP address is 35.203.182.32.
You try to make some sort of connection, and the
IP routing process delivers your packet to my server.
Once it arrives, the operating system looks inside
the IP header to find a TCP header (TCP because you're
making a connection, not sending a message with UDP).
Three out of the possible 65,536 destination ports could work. My web server listens on both TCP port 80 and TCP port 443. HTTP runs on TCP/80, HTTPS on TCP/443. Yes, you can connect to HTTP on TCP/80, but my server will immediately redirect your browser to disconnect and connect to HTTPS on TCP/443 instead. The third "open port", or TCP port with a service program accepting connections, is TCP port 22. SSH or Secure Shell runs on TCP/22. That's how I upload web page updates, and connect to interactively work on the server. Anyone can make a TCP connection to port 22, but to authenticate and get a session you need to have the appropriate ECDSA or Ed25519 or RSA key.
We didn't originally have devices that made forwarding decisions above Layer 3. Layer 3 was to get it to the destination host, and that operating system would decide what to do. But the Internet became less academic and more dangerous, and now we need to apply further restrictions. Some packets that we could forward based on pure IP logic should instead be discarded for safety's sake. Don't let people from the far side of the Internet connect to your internal file and print sharing, for example.
So we have firewalls, devices that decide what not to forward for security reasons. They base those over-riding policy decisions on some combination of Layer 3 and Layer 4 information.
Here's one more debunking of the supposed accuracy of the OSI model: TCP makes connections, it clearly deals with sessions. But the OSI model insists that sessions are the entire point of the next layer up the stack. Which leads to...
Layer 5-7: The Application Layers
And now to make the purists cringe: Everything above Layer 4 is the application. Yes, I know the formal model says Layer 5 is Session, Layer 6 is Presentation, and Layer 7 is Application, but that academic hair-splitting just doesn't matter.
OSI Layers 1 through 4 are done by the operating system. Layers above that are done by the application. If an application developer really wants to implement three layers formally following the OSI model, go ahead. Very few do.
Old-school NFSv2 over UDP seems to have been the only popular application protocol that really did all seven layers, more or less.
Most application-layer protocols are directly inside the transport-layer protocol, either TCP or UDP. And look at this, we finally have a solution for our design project!
Our dream of watching funny cat videos from the other side
of the world can be done with HTTP,
HyperText Transport Protocol.
Our web browser can connect to TCP port 80 at
youtube.com
, send a GET
request (part of the HTTP application-layer protocol),
and receive a 200 code meaning "OK" followed by
a large stream of video data.
It could also be done with HTTPS, on TCP/443. That verifies the server's identity, then sets up encryption on the bidirectional data stream. Then the application-layer protocol works the same way.
As for making forwarding decisions, this is the latest addition to our networking model. We want systems that can quickly and accurately decide about application-specific issues:
- "Does this data file contain malicious software?"
- "Is this email message spam?"
- "Is this email message a spearphishing attack?"
- "Does this outbound email message contain sensitive information?"
-
"Do this request from a web client
constitute a web-specific attack such as
cross-site scripting,
cross-site request forgery,
SQL injection, PHP injection,
buffer overflow, directory traversal,
or another syntactical or semantic attack?"
...and so on...
These can be enormously difficult questions. Two intelligent and well-motivated people might never agree on whether a specific message should be considered spam, or whether specific message contents were truly sensitive. Automated systems will let inappropriate content pass, or block what really was appropriate, or, most likely, commit a mix of both error types.
These tasks require computing resources, to the point that we usually have a dedicated system for each application-layer protocol.
We might say Application-Layer Gateway or Application Proxy as a general term for a device that attempts to regulate connections and messages and data transfers in application-aware ways.
Or, we might use terms more specific to their tasks, such as Malware Detection or AV for Anti-Virus, Spam Filter and Anti-Phishing for email, Data Loss Prevention or DLP to keep sensitive data from leaking, and Web Application Firewall or WAF for the wide range of web-specific threats.
What Really Matters
Here is the result of leaving out unneeded layers and adding in short explanations of what happens and what we call the connecting devices.
7 6 5 |
Application Jobs software programs do |
ALG, AV, Spam filter, DLP, WAF, etc |
4 | Transport UDP: Messages to numbered ports TCP: Connections to numbered ports |
Firewall |
3 | Network Relay packets hop by hop to anywhere by IP address: [netid|hostid] |
Router |
2 | Data Link Send frames to HW/MAC addresses |
Switch |
1 | Physical Send and receive 0 vs 1 bits |
Repeater (link) or hub (star) |
And of course, they keep moving the goalposts!
Software-DefinedNetworking
Software-Defined Networking or SDN allows a device to request a "flow" from the network infrastructure. It could be a TCP connection, or it could be a flow of UDP packets, or it could be just about anything else. The flow request can include Quality of Service or QOS performance metrics, such as bandwidth, latency, latency jitter, and so on. It could include security requirements, to compartmentalize traffic flows and keep sensitive traffic off less-trusted network links. With OpenFlow, the request could be anything you can dream up.
The flow definitions can be based on everything from physical topology up through the Application Layer, as voice and video streaming quality is an application-layer issue.
People working in SDN need to call their highly controllable forwarding boxes something. So, within SDN, switch means an SDN switch, making forwarding decisions at least up through the Transport Layer.