Exploring the Linux Kernel
I have more details on another page, but the short version of Linux booting is as follows:
That's the simple version, see my other page for more details.
While the kernel components can go anywhere you want to hide them, you will generally find the following file installed within the /boot directory. Replace the string release with whatever your kernel release is, the result of running the command uname -r. Depending on your distribution, if you have multiple kernels installed you will see several of each file type with the release as part of their file names, and then one symbolic link pointing to the latest one. See the monolithic kernel for an example.
/boot/vmlinuz-releaseSo you might have a symbolic link simply named:
/boot/vmlinuzpointing to the real file:
/boot/vmlinuz-3.5.4and with earlier kernels also available:
/boot/vmlinuz-3.5.1 /boot/vmlinuz-3.4.3 /boot/vmlinuz-3.3.1This same pattern would continue for all the below files with the release as part of their name.
The monolithic kernel.
This is the main operating system. This file is uncompressed and loaded into memory by the boot loader.
This is one big file with three components on a PC:
— 512-byte boot block
— secondary boot loader
— gzip'ed kernel
On other hardware platforms this is simply a gzip'ed kernel. The kernel portion itself is a statically-linked ELF executable.
Why the funny name and location?
A directory containing the loadable modules.
These are the dynamically loaded kernel modules, or device drivers. This area contains:
The initial RAM disk image.
This has been compressed with gzip. If you uncompress it, you find a cpio archive. If you go to some safe temporary directory and extract it, you find:
# mkdir /tmp/initrd # cp /boot/initrd-release /tmp/initrd # cd /tmp/initrd # cpio -i < initrd-release # ls -RF
A directory containing the GRUB boot
The only thing you want to mess with here will be the file named either menu.lst or grub.conf as that defines the actual boot menu. The other files are crucial so don't mess with them. You must have stage1, some appropriate *stage1_5 for your file system, and stage2 for the boot loader to get the kernel loaded, plus the file device.map. Other than files with names *.old, mess with these at your peril.
The kernel's symbol table.
If you know what a symbol table is and you want to debug your kernel, this is that thing. It's the result of running nm against the compiled kernel itself before creating the file vmlinuz-release. It's a text file, no harm in looking at it, but if you don't know what this is all about you aren't going to care.
The kernel's build configuration.
This isn't critical, and in many cases it isn't completely true (Red Hat is a prominent example of "configurations" that only suggest reality), but it might be helpful in figuring out how your kernel was built and what it could do. This is discussed in more detail below.
The kernel headers.
This isn't critical, at least not for booting and running a kernel, but it might be needed for compiling certain C/C++ programs that need some kernel information.
You need this information. For some reason Red Hat doesn't think you really need to figure it out on your own, I guess you're supposed to call their support line.
There should be a configuration file describing the set of device drivers built into the monolithic kernel and the set built as loadable modules. There will also be some configuration choices made for some of them — for example, for a non-native file system type like NTFS, should it be supported read-only or read-write?
When you build a Linux kernel, the configuration you create to define the build itself ends up as the file /usr/src/linux/.config, see my page on building Linux kernels for more details. If you built your own kernel, you should have kept that file or a copy.
Many distributions give you the file /boot/config-release with the implication that this is the configuration used to build the kernel you got. They might have gotten better about this, but I was misled by Red Hat enough times when working with Linux on the Alpha architecture that I no longer trust their config file to be any more than a fairly close approximation. If it's all you have, understand that it may be close but not completely correct.
To be confident that you are getting the real
ask the kernel to describe itself.
If the kernel was built with the right settings, its build
configuration is available as a kernel data structure that
you can access as /proc/config.gz.
The configuration variables are:
They are set when configuring the build by:
General setup -> Kernel .config support -> Enable access to .config through /proc/config.gz
To see the list of available device drivers, you could use either of these commands:
$ modprobe -l $ ls -R /lib/modules/$( uname -r )/kernel
That's only somewhat useful, as that just gives you a list of file names. If you installed the source code (and why not?), see the text files in: /usr/src/linux/Documentation/
You can also find information on a specific module this way:
$ modinfo module-name-goes-here
The information you get is up to the developers of that module. So you might get something very useful, with an explanation, a list of load-time optional parameters and what they mean, and so on. Or you might get a cryptic table of hexadecimal addresses and a list of PCI bus addresses and a reminder that you can always read the C source code and figure it out from there.
Let's say you just added an Ethernet card but you don't know if whether needs the 8139cp or 8139too driver. Based on what you saw on the card and its chips, or maybe in the output of the lspci -v command, you think it's one of the two. But you don't know which one.
Try loading one of them and examining the end of the kernel ring buffer with this command sequence:
# modprobe 8139cp # dmesg | tail
Let's say that you only saw this output, generated by the module announcing itself as it loaded:
8139cp: 10/100 PCI Ethernet driver v1.3 (Mar 22, 2004)
That doesn't look too promising. So let's unload it, and then load the other:
# rmmod 8139cp # modprobe 8139too # dmesg | tail
Now we see this output at the end of the kernel ring buffer:
8139too Fast Ethernet driver 0.9.28 ACPI: PCI Interrupt Link [APC2] enabled at IRQ 17 8139too 0000:01:09.0: PCI INT A -> Link[APC2] -> GSI 17 (level, high) -> IRQ 17 eth0: RealTek RTL8139 at 0xf8394000, 00:11:95:1e:8e:b6, IRQ 17 eth0: Identified 8139 chip type 'RTL-8100B/8139D'
Hey, that's it! Now we know which driver to specify in /etc/modprobe.conf or wherever.
See the kernel ring buffer with this command:
It's a ring buffer, so it only keeps the most recent information. It's a good idea to save a copy as soon as possible after boot time, /var/log/dmesg is an obvious place to store this. If your distribution didn't think of this simple improvement, add it yourself by adding this to the end of your /etc/rc.local file:
echo "Saving the kernel ring buffer in /var/log/dmesg dmesg > /var/log/dmesg
See what kernel modules are currently loaded with this command:
Let's see you see two lines reading like the following among all the output:
ext3 125412 8 nf_conntrack_ftp 12704 1 nf_nat_ftp
This means that the module nf_conntrack_ftp has been loaded (it's needed to handle FTP connections through a Linux firewall), and that module is needed by another module, nf_nat_ftp, a module used to handle FTP connections through Network Address Translation or NAT.
The module ext3 has been loaded, as it must to handle the Linux-native Ext3FS file systems. No other module needs ext3, but it could not be unloaded as the kernel needs it to handle all the file systems currently in use!
The first number after the module name indicates the size of the module in bytes. The second number indicates that the number of things currently requiring the module. That FTP connection tracking module is needed by one thing, that other module that requires it. The Ext3FS module is needed by 8 things — the number of currently mounted Ext3FS file systems.
The /proc file system is really a large collection of kernel data structures presented in a reasonably friendly format. It appears to be a hierarchy of directories and files, which you can explore with cd and ls, and investigate in many cases with cat.
What has my kernel detected about the CPU, memory, and partition table?
$ cat /proc/cpuinfo .... details appear here .... $ cat /proc/meminfo .... details appear here .... $ cat /proc/partitions .... details appear here ....
What devices have been connected to the kernel? Note that the loading of kernel modules may lead to the detection of more hardware and the automatic appearance of more device-special files in /dev.
$ ls /dev
What devices are on the PCI bus? Let's see that in one line per device, then in moderate detail, then in great detail.
$ lspci .... output appears .... $ lspci -v .... much more output appears .... $ lspci -vv .... more output appears than you probably want to see ....
What about moderate details on the device at PCI bus address 01:08.0?
$ lspci -v -s 01:08.0
What USB devices are connected? Let's see that in one line per device, then in moderate detail, then in great detail.
$ lsusb .... output appears .... $ lsusb -v .... much more output appears .... $ lsusb -vv .... more output appears than you probably want to see ....
What about another way to report on the USB bus?
$ systool -v -b usb
What about SCSI devices, including USB storage devices that appear as generic SCSI devices?
$ systool -v -b scsi
What is the complete current set of kernel data structures, by their name and value?
$ sysctl -a
The kernel data structure net.ipv4.tcp_fin_timeout is accessible as the file /proc/sys/net/ipv4/tcp_fin_timeout. Instead of that sysctl command, you could have changed to the directory /proc/sys/net/ipv4 and displayed its contents with cat.
You can read the current values of these kernel timers, counters, and other fields. You can also change them! This has the effect of flipping a switch or twisting a knob on the running kernel. Now let's say that your enthusiastic modification of kernel values accidentally puts your running kernel into a bizarre state — this is not at all unlikely if you aren't careful. All you have modified is the running kernel in RAM, the kernel file stored on the disk is unchanged. Reboot with a fresh kernel and you're back to the default state.
Read just one specific kernel data structure:
$ sysctl net.ipv4.tcp_fin_timeout -- or -- $ cat /proc/sys/net/ipv4/tcp_fin_timeout
Change that kernel value to 10 (seconds in this case), you will need to be root to do this:
# sysctl -w net.ipv4.tcp_fin_timeout=10 -- or -- # echo "10" > /proc/sys/net/ipv4/tcp_fin_timeout
Why would you want to mess with kernel values? To tune the running kernel for performance or security. See my page on security tuning suggestions for far more detail.
If you come up with a collection of adjustments that you find useful, you could either put the relevant echo or sysctl -w command sequence into /etc/rc.local or else you could put the relevant lines into /etc/sysctl.conf.