UNIX / Linux keyboard.

Exploring the Linux Kernel

How the Kernel Boots

I have more details on another page, but the short version of Linux booting is as follows:

  1. The hardware runs a power-on self-test and finds the boot loader. On a PC, the BIOS finds the Master Boot Record (MBR), the first 512-byte block of the first media. On other hardware the boot loader is a mini-OS.
  2. The MBR points to a boot loader stored on the disk. Current Linux distributions generally use GRUB. Boot loader configuration is up to you, but it typically presents a menu of choices and boots the first one by default in a few seconds if you don't specify otherwise.
  3. The boot loader uncompresses the compressed kernel image is and loads into RAM. More on the kernel file format below.
  4. The boot loader also uncompresses an initial RAM disk image and loads it into memory, the kernel mounts that up as its root file system for a while.
  5. The kernel discovers the available hardware, or at least what it needs and knows about initially.
  6. The real root file system is mounted in read-only mode.
  7. The kernel starts /sbin/init and now we're under the control of the boot files on the disk rather than what's coded into the kernel itself.
  8. The root file system is checked and then remounted in read/write mode.
  9. Some kernel modules (also called device drivers) may be loaded, and they may detect more hardware.
  10. A collection of scripts is run to get the system into the desired run state.

That's the simple version, see my other page for more details.

Where Are The Pieces Installed?

While the kernel components can go anywhere you want to hide them, you will generally find the following file installed within the /boot directory. Replace the string release with whatever your kernel release is, the result of running the command uname -r. Depending on your distribution, if you have multiple kernels installed you will see several of each file type with the release as part of their file names, and then one symbolic link pointing to the latest one. See the monolithic kernel for an example.

File Purpose
/boot/vmlinuz-release
So you might have a symbolic link simply named:
/boot/vmlinuz
pointing to the real file:
/boot/vmlinuz-3.5.4
and with earlier kernels also available:
/boot/vmlinuz-3.5.1
/boot/vmlinuz-3.4.3
/boot/vmlinuz-3.3.1
This same pattern would continue for all the below files with the release as part of their name.
The monolithic kernel.
This is the main operating system. This file is uncompressed and loaded into memory by the boot loader.

This is one big file with three components on a PC:
 — 512-byte boot block
 — secondary boot loader
 — gzip'ed kernel
On other hardware platforms this is simply a gzip'ed kernel. The kernel portion itself is a statically-linked ELF executable.

Why the funny name and location?
  • Unix traditionally boots from a file named /vmunix or similar. So the Linux developers used /vmlinux.
  • Then the kernel grew too large for the boot loader to handle it, so it was gzip'ed and the "x" changed to "z".
  • Then disks grew large enough that the BIOS could not find things beyond the first 1024 cylinders, so a small file system named /boot was created and made the first thing on the disk.
/lib/modules/release/
A directory containing the loadable modules.
These are the dynamically loaded kernel modules, or device drivers. This area contains:
  • /lib/modules/release/kernel/
    A hierarchy of modules organized by type and sub-type.
  • /lib/modules/release/modules.dep
    A description of dependencies between modules. For example, the module to support the printer device /dev/lp0 will require the help of a module supporting the generic parallel port, which in turn may require the help of a module supporting some chipset, and so on.
/boot/initrd-release-img
The initial RAM disk image.
This has been compressed with gzip. If you uncompress it, you find a cpio archive. If you go to some safe temporary directory and extract it, you find:
  • Subdirectory bin with a few programs for handling loadable modules and a simple shell.
  • Subdirectory dev with a few critical devices: console, null, ptmx, ram0, ram1, systty, tty, tty0, tty1, tty2, tty3, tty4, tty5, tty6, tty7, tty8, tty9, tty10, tty11, tty12, ttyS0, ttyS1, ttyS2, ttyS3, and zero. handling loadable modules and a simple shell.
  • Subdirectory etc with files ld.so.conf and ld.so.cache plus a few others.
  • Subdirectories lib and usr/lib with some crucial shared libraries.
  • Subdirectory lib/modules with a few loadable kernel modules for handling disk controllers. These are used to allow the kernel to talk to the disks so it can eventually find the real collection of drivers.
I was going to provide a list here of what you find, but decided not to for two reasons. First, the list is rather large! Second, it will change from release to release and even more so from one distribution to another. So, here is how to see for yourself what is in your initial RAM disk image. Do this as root:
# mkdir /tmp/initrd
# cp /boot/initrd-release /tmp/initrd
# cd /tmp/initrd
# cpio -i < initrd-release
# ls -RF
/boot/grub/
A directory containing the GRUB boot loader components.
The only thing you want to mess with here will be the file named either menu.lst or grub.conf as that defines the actual boot menu. The other files are crucial so don't mess with them. You must have stage1, some appropriate *stage1_5 for your file system, and stage2 for the boot loader to get the kernel loaded, plus the file device.map. Other than files with names *.old, mess with these at your peril.
/boot/System.map-release
The kernel's symbol table.
If you know what a symbol table is and you want to debug your kernel, this is that thing. It's the result of running nm against the compiled kernel itself before creating the file vmlinuz-release. It's a text file, no harm in looking at it, but if you don't know what this is all about you aren't going to care.
/boot/config-release
The kernel's build configuration.
This isn't critical, and in many cases it isn't completely true (Red Hat is a prominent example of "configurations" that only suggest reality), but it might be helpful in figuring out how your kernel was built and what it could do. This is discussed in more detail below.
/boot/kernel.h-release
The kernel headers.
This isn't critical, at least not for booting and running a kernel, but it might be needed for compiling certain C/C++ programs that need some kernel information.

How Was My Kernel Built? What Device Drivers Does It Have? What Can It Do?

You need this information. For some reason Red Hat doesn't think you really need to figure it out on your own, I guess you're supposed to call their support line.

There should be a configuration file describing the set of device drivers built into the monolithic kernel and the set built as loadable modules. There will also be some configuration choices made for some of them — for example, for a non-native file system type like NTFS, should it be supported read-only or read-write?

When you build a Linux kernel, the configuration you create to define the build itself ends up as the file /usr/src/linux/.config, see my page on building Linux kernels for more details. If you built your own kernel, you should have kept that file or a copy.

Many distributions give you the file /boot/config-release with the implication that this is the configuration used to build the kernel you got. They might have gotten better about this, but I was misled by Red Hat enough times when working with Linux on the Alpha architecture that I no longer trust their config file to be any more than a fairly close approximation. If it's all you have, understand that it may be close but not completely correct.

To be confident that you are getting the real kernel configuration, ask the kernel to describe itself. If the kernel was built with the right settings, its build configuration is available as a kernel data structure that you can access as /proc/config.gz. The configuration variables are:
  CONFIG_IKCONFIG=y
  CONFIG_IKCONFIG_PROC=y
They are set when configuring the build by:
General setup -> Kernel .config support -> Enable access to .config through /proc/config.gz

To see the list of available device drivers, you could use either of these commands:

$ modprobe -l
$ ls -R /lib/modules/$( uname -r )/kernel

That's only somewhat useful, as that just gives you a list of file names. If you installed the source code (and why not?), see the text files in: /usr/src/linux/Documentation/

You can also find information on a specific module this way:

$ modinfo module-name-goes-here 

The information you get is up to the developers of that module. So you might get something very useful, with an explanation, a list of load-time optional parameters and what they mean, and so on. Or you might get a cryptic table of hexadecimal addresses and a list of PCI bus addresses and a reminder that you can always read the C source code and figure it out from there.

Loading and Unloading Modules

Let's say you just added an Ethernet card but you don't know if whether needs the 8139cp or 8139too driver. Based on what you saw on the card and its chips, or maybe in the output of the lspci -v command, you think it's one of the two. But you don't know which one.

Try loading one of them and examining the end of the kernel ring buffer with this command sequence:

# modprobe 8139cp
# dmesg | tail 

Let's say that you only saw this output, generated by the module announcing itself as it loaded:

8139cp: 10/100 PCI Ethernet driver v1.3 (Mar 22, 2004) 

That doesn't look too promising. So let's unload it, and then load the other:

# rmmod 8139cp
# modprobe 8139too
# dmesg | tail 

Now we see this output at the end of the kernel ring buffer:

8139too Fast Ethernet driver 0.9.28
ACPI: PCI Interrupt Link [APC2] enabled at IRQ 17
8139too 0000:01:09.0: PCI INT A -> Link[APC2] -> GSI 17 (level, high) -> IRQ 17
eth0: RealTek RTL8139 at 0xf8394000, 00:11:95:1e:8e:b6, IRQ 17
eth0:  Identified 8139 chip type 'RTL-8100B/8139D'

Hey, that's it! Now we know which driver to specify in /etc/modprobe.conf or wherever.

What Is My Kernel Doing Right Now?

See the kernel ring buffer with this command:

$ dmesg 

It's a ring buffer, so it only keeps the most recent information. It's a good idea to save a copy as soon as possible after boot time, /var/log/dmesg is an obvious place to store this. If your distribution didn't think of this simple improvement, add it yourself by adding this to the end of your /etc/rc.local file:

echo "Saving the kernel ring buffer in /var/log/dmesg
dmesg > /var/log/dmesg 

See what kernel modules are currently loaded with this command:

$ lsmod 

Let's see you see two lines reading like the following among all the output:

ext3                  125412  8
nf_conntrack_ftp       12704  1 nf_nat_ftp 

This means that the module nf_conntrack_ftp has been loaded (it's needed to handle FTP connections through a Linux firewall), and that module is needed by another module, nf_nat_ftp, a module used to handle FTP connections through Network Address Translation or NAT.

The module ext3 has been loaded, as it must to handle the Linux-native Ext3FS file systems. No other module needs ext3, but it could not be unloaded as the kernel needs it to handle all the file systems currently in use!

The first number after the module name indicates the size of the module in bytes. The second number indicates that the number of things currently requiring the module. That FTP connection tracking module is needed by one thing, that other module that requires it. The Ext3FS module is needed by 8 things — the number of currently mounted Ext3FS file systems.

Kernel Hardware Detection

The /proc file system is really a large collection of kernel data structures presented in a reasonably friendly format. It appears to be a hierarchy of directories and files, which you can explore with cd and ls, and investigate in many cases with cat.

What has my kernel detected about the CPU, memory, and partition table?

$ cat /proc/cpuinfo 
.... details appear here ....
$ cat /proc/meminfo
.... details appear here ....
$ cat /proc/partitions
.... details appear here ....

What devices have been connected to the kernel? Note that the loading of kernel modules may lead to the detection of more hardware and the automatic appearance of more device-special files in /dev.

$ ls /dev 

What devices are on the PCI bus? Let's see that in one line per device, then in moderate detail, then in great detail.

$ lspci 
.... output appears ....
$ lspci -v
.... much more output appears ....
$ lspci -vv
.... more output appears than you probably want to see ....  

What about moderate details on the device at PCI bus address 01:08.0?

$ lspci -v -s 01:08.0 

What USB devices are connected? Let's see that in one line per device, then in moderate detail, then in great detail.

$ lsusb 
.... output appears ....
$ lsusb -v
.... much more output appears ....
$ lsusb -vv
.... more output appears than you probably want to see ....  

What about another way to report on the USB bus?

$ systool -v -b usb 

What about SCSI devices, including USB storage devices that appear as generic SCSI devices?

$ systool -v -b scsi 

Kernel Data Structures and Kernel Tuning

What is the complete current set of kernel data structures, by their name and value?

$ sysctl -a 

The kernel data structure net.ipv4.tcp_fin_timeout is accessible as the file /proc/sys/net/ipv4/tcp_fin_timeout. Instead of that sysctl command, you could have changed to the directory /proc/sys/net/ipv4 and displayed its contents with cat.

You can read the current values of these kernel timers, counters, and other fields. You can also change them! This has the effect of flipping a switch or twisting a knob on the running kernel. Now let's say that your enthusiastic modification of kernel values accidentally puts your running kernel into a bizarre state — this is not at all unlikely if you aren't careful. All you have modified is the running kernel in RAM, the kernel file stored on the disk is unchanged. Reboot with a fresh kernel and you're back to the default state.

Read just one specific kernel data structure:

$ sysctl net.ipv4.tcp_fin_timeout

   -- or --

$ cat /proc/sys/net/ipv4/tcp_fin_timeout 

Change that kernel value to 10 (seconds in this case), you will need to be root to do this:

# sysctl -w net.ipv4.tcp_fin_timeout=10

   -- or --

# echo "10" > /proc/sys/net/ipv4/tcp_fin_timeout 

Why would you want to mess with kernel values? To tune the running kernel for performance or security. See my page on security tuning suggestions for far more detail.

If you come up with a collection of adjustments that you find useful, you could either put the relevant echo or sysctl -w command sequence into /etc/rc.local or else you could put the relevant lines into /etc/sysctl.conf.

Other Pages

Other Various Linux / UNIX topics

How To Build Linux Kernels

Linux / UNIX Command Fundamentals

Click here to inquire about advertising on this or any page on this site.
Home Linux/Unix Networking Cybersecurity Travel Technical Radio Site Map Contact
Use /bin/vi! Manipulate images with ImageMagick! Hosted on OpenBSD
Hosted on Apache This site is viewable with any browser Valid HTML 5!  Validate it here. Valid CSS!  Validate it here.
© Bob Cromwell Jun 2013. Created with /bin/vi and ImageMagick, hosted on OpenBSD with Apache.    Root password available here, privacy policy here.