Performance Tuning on Linux

Tuning File System and Network Performance

The Linux operating system is well designed and it has good performance "out of the box," but some choices you can make when building and installing your system can help improve performance. Some additional configuration changes including kernel tuning may provide further performance improvements. Design the system well using appropriate storage technology for both persistent disk storage and RAM. Storage I/O will probably be the bottleneck. In a data center, storage will include NFS and therefore need Ethernet and TCP optimization below any NFS tuning. We will see how to measure your system's performance and tune some kernel parameters.

What hardware should be used? Which file systems? What kernel tuning for best disk and network I/O? We can make some sweeping generalizations — RAM and therefore solid-state drives are much faster than rotating magnetic disks, and we should offload processing into network hardware and use at least NFS version 4.1 and later if possible. But we will see that the full answer is often "It depends" and there is no one optimal solution even within one organization.

We must select hardware and lay out the file systems first for capacity planning, and later make adjustments for performance tuning. The must be done from the bottom up: NFS performance is only possible if TCP has already been tuned, and Ethernet tuned before that.

We want to address the bottlenecks, it is pointless to tune something else first. We should identify the changes that will make the greatest improvement with the least change and therefore expenditure of time and money.

Maximizing Return for Effort

Start with these simple improvements:

Have plenty of RAM. The kernel will automatically tune file system caching and buffering and network buffered based on available RAM, give it enough to work with.
Mount file systems with access times disabled. Use options noatime,nodiratime.
Don't let file systems get too full. All file systems put to realistic use must fragment some, but it gets rapidly worse when capacity gets over some limit. The threshold depends on how you use your file system — huge number of small files, small number of huge files, how much you delete and replace files, how much you modify existing files. I can't give you a number, but very much over 90% full is probably going to cause noticeable slowdown for most anyone.

If those simple suggestions make you happy, that's great! To go deeper, we will have to get into details of block I/O, multiple buffers in protocol stacks, file system data structures, and more.

Throughput Versus Latency

Let's look at physical-world analogies to the throughput–versus–latency tradeoff behind many of the decisions we must make. We would like both high throughput and low latency but improving one usually makes the other worse.

The simplest analogy is to ask which of these you need: a faster car or a bigger truck? A fast car is lower latency but less material moved, a big truck moves more material per week but each load takes longer.

Oil pipelines are another physical analogy. Any large pipeline exhibits tradeoffs in throughput versus latency analogous to those we encounter in data storage and transmission systems. Being physical systems, they're a little easier to understand.

Throughput and Latency in Pipes

Don't panic, this isn't on the exam! If you really want to know about fluid dynamics, see here re flow and and here re Reynolds number.

If you try to force liquid through a pipe too fast, the flow becomes turbulent and reduces the amount of liquid moved per unit of time. To move a liquid through a pipe in a smooth streamline or laminar flow a measure called the Reynolds number must be less than about 2,000. If it gets above 3,000 the flow becomes turbulent. You calculate the Reynolds number or R_e as:

R_e	=	ρvD_H
		μ

where:

ρ	=	density of the fluid (kg/m³)
v	=	velocity of the fluid (m/s)
D_H	=	hydraulic diameter of pipe (m)
μ	=	dynamic viscosity of the fluid (N·s/m²)

The point is that when you are trying to move a fluid through a pipe, there is a limit to the speed at which it can move. To keep the flow smooth, we must keep the Reynolds number from climbing too high. To multiply the velocity by some factor we must decrease the diameter of the pipe by that same factor. If the flow is to move three times as fast, the pipe must be reduced to one third the diameter.

Since cross-section area increases as the square of diameter, larger pipes can move more fluid per unit of time. Cut the velocity in half, double the pipe diameter (and quadruple the cross-section area), and we have doubled the volume of fluid delivered per time period.

Alaska pipeline just north of Fairbanks.

The Trans-Alaska Pipeline System is an 800-mile-long pipeline running from Prudhoe Bay on the northern coast of Alaska to Valdez and the tanker shipping port on the southern coast. This pipeline has a maximum discharge rate of 2.14 million barrels or 340,000 cubic meters per day. It usually operates at about a third of that rate, 700,000 barrels or 110,000 cubic meters per day in recent years. The oil comes out of the ground at about 49 °C. The pipeline is elevated to reduce how much the oil is cooled and the tundra heated, but it still cools off especially in the winter. Eleven pumping stations keep the oil warm and moving.

Alaska pipeline running south from Livengood.

Let's do the math: the pipeline is 48 inches or 122 cm in diameter, and 800.3 miles or 1,288 kilometers long. A daily output of 110,000 cubic meters requires a flow of 3.9 kilometers per hour. This means that it takes 13.7 days, almost two weeks, for oil to travel the length of the pipeline from Prudhoe Bay to Valdez. To use the terms we use with data storage and communication, the output volume per day is the throughput and the time to travel the length of the pipe is the latency. Move the oil faster to increase the throughput and reduce the latency. For the oil companies, throughput is an economic concern and latency is an operational concern. [1]

[1] The consortium of oil companies is interested in regulating the throughput. To maximize their income, they want to keep the throughput down, limiting supply in order to increase demand and price. But they must be careful to limit the latency, too long in the pipeline in winter and the oil will congeal and shut down the flow. They have considered replacing the north half with a smaller pipe in order to safely reduce throughput and thereby increase their income.

What if the pipeline carried not oil but something that spoiled, like milk? Then the latency would have to be greatly reduced. Rather than a four-foot pipe, any low latency Alaska Milk Pipeline would have to be more of a reinforced hose or tube. Improved latency is only possible with degraded throughput, and vice-versa.

Throughput and Latency in Data Communication

Fluid dynamics obviously have no effect on data transfer through networks or storage devices, but that same tradeoff rules.

You want to move information either across a network or in and out of disks. You are thinking of the payload, the volume of information with meaning to the users. But that data must be bundled into packets with headers. Packets on a network have Ethernet and IP headers with hardware and logical routing information, and TCP and UDP headers specifying the client/server software endpoints. Packets in and out of storage hardware have headers specifying physical address and other parameters for SATA, SCSI, or whatever storage technology is used.

The result is protocol overhead, the percentage of transmitted bits obligated to be meta-data rather than information meaningful to users.

You could increase the throughput of content per second by making the packets larger and reducing the protocol overhead. For example, what we call "jumbo frames" on Ethernet. But with a fixed bits-per-second communication speed on the media, larger packets mean that a process must wait longer on average before injecting a packet. We improved throughput but we worsened latency.

We could improve latency by doing the opposite — reduce the packet size so there is never a long wait for the end of the current packet and an opportunity to insert a new one. This was the idea behind the small 48-byte packets or cells used in the time-division multiplexing of Asynchronous Transfer Mode. But meanwhile the throughput is limited because every 48 bytes another complete header must appear.

What Matters? It Depends...

If you ask most people they will say that some measure of throughput is most important to them. Messages sent per unit of time, or volume of data stored or retrieved or transferred. If the system is used in an automated or unattended fashion, that will be correct. But when people use the system interactively, latency is the most important criteria.

Humans have a very non-linear response to latency. Up to some point we don't notice it at all. Then there is a very thin band where it is noticeable but still acceptable. But then it transitions almost immediately from barely noticeable to objectionable. When people complain that a system is slow or awkward, this is usually a complaint about latency. We start noticing many forms of latency at about 100 milliseconds. Keyboard latency is likely to be noticed at just 50 milliseconds, and keyboard interaction becomes awkward and more error-prone at twice that.

Perceived quality of two-way voice communication degrades rapidly when latency increases above 200 milliseconds, see here for details and references. The public switched telephone network is designed to maintain latency below 100 milliseconds. Latency can grow, especially when making calls between mobile phones on two different networks — it must traverse the first mobile Radio Access Network, then that provider's core network, then a Gateway Mobile Switching Center, and then the other provider's core network and mobile network. Further latency is added when one or both of the mobile phones is connected through a wireless LAN and out through a customer's Internet service provider.

Two-way voice communication through a geosynchronous satellite is very awkward. There is a delay of about 0.24 seconds for a radio signal to propagate from the Earth's surface to the satellite and back. If one person starts speaking, the other person doesn't realize that for a quarter of a second, and if they had started saying something and immediately stopped, the first person doesn't hear them stop for another quarter of a second. You must speak deliberately, almost like with a radio saying "over" when you will stop talking and only listen to the other person.

Getting Started

Let's make our computers faster! Below are the major concepts to follow when designing, building, and tuning a Linux system.

The network part is aimed at tuning within a data center or at least within a LAN. Internet services are at the mercy of the client's ISP connection, congestion in WAN links, and more things outside your control.

Be careful: The network tuning suggestions are for a data center with at least 1 Gbps bandwidth and very low round-trip latency. If you apply them to your home computer, they are likely to make your Internet connection worse instead of better. See the BufferBloat discussion on the Ethernet page for why this is the case.

To the Linux / Unix Page