
Exploring the Linux Kernel
How the Kernel Boots
I have more details on another page, but the short version of Linux booting is as follows:
-
The firmware on the motherboard runs a power-on
self-test and finds the boot loader.
- On older PC platforms, the BIOS found the Master Boot Record (MBR), the first 512-byte block of the first media. That MBR points to a boot loader stored on the disk.
- Modern PC motherboards use UEFI firmware. It must find a specially labeled partition called the EFI System Partition or ESP, which must contain a FAT file system. The firmware runs a specified program within that, which in turn finds andr runs the boot loader.
- On non-PC hardware (e.g., Alpha, UltraSPARC), the boot loader is a mini-OS and operates somewhat like the UEFI firmware.
- The boot loader uncompresses the compressed kernel image and loads it into RAM. More details on the kernel file format appear below.
- The boot loader also uncompresses an initial RAM disk image and loads it into memory, the kernel mounts that up as its root file system for a while.
- The kernel discovers the available hardware, or at least what it needs and knows about initially.
- The real root file system is mounted in read-only mode.
-
The kernel starts
/sbin/init
and now we're under the control of the boot files on the disk rather than what's coded into the kernel itself. - The root file system is checked and then remounted in read/write mode.
- Some kernel modules (also called device drivers) may be loaded, and they may detect more hardware.
- A collection of scripts is run to get the system into the desired run state.
That's the simple version.
See my details page for much more.
Where Are The Pieces Installed?
While the kernel components can go anywhere you want to
hide them, you will generally find the following files
installed within the /boot
directory.
Replace the string release
throughout
the following with whatever your kernel release is,
the result of running the command
uname -r
.
Depending on your distribution, if you have multiple kernels
installed you may see several of each file type with the
release as part of their file names, and then one symbolic
link pointing to the latest one installed through the
distribution's updates.
The monolithic kernel
This is the monolithic core of the kernel,
the main operating system file.
This file is uncompressed and loaded
into memory by the boot loader.
Typically this is
/boot/vmlinuz-release
,
possibly with several of different releases plus one
symbolic link simply named vmlinuz
pointing
to the most recent file.
This is one big file with three components on a PC:
- 512-byte boot block
- Secondary boot loader
- Gzip-compressed kernel
On other hardware platforms this is simply a gzip'ed kernel. The kernel portion itself is a statically-linked ELF executable.
Why the funny name and location?
Unix traditionally boots from a file named /vmunix
or similar.
So the Linux developers used /vmlinux
.
Then the kernel grew too large for the boot loader to handle it, so it was gzip'ed and the "x" changed to "z".
Then disks grew large enough that the BIOS could not find
things beyond the first 1024 cylinders, so a small file
system named /boot
was created within the
first the first partition on the disk.
Directory containing the loadable modules
These are the dynamically loaded kernel modules,
or device drivers.
The modules for a release are stored in a hierarchy beneath
/lib/modules/release/kernel/
.
The files
/lib/modules/release/modules.*
contain information about the modules.
An important one is modules.dep
,
which describes dependencies between modules.
For example, the module to support the printer device
/dev/lp0
will require the help of
a module supporting the generic parallel port,
which in turn may require the help of a module supporting
some chipset, and so on.
The initial RAM disk image
While this could be named anything within /boot
,
it is typically /boot/initrd-release.img
or /boot/initramfs-release.img
.
It is the result of creating a file system with enough
components to provide the kernel with needed device drivers
to find and read the root system on the disks,
and some programs, configuration files, shared libraries,
and device-special files to handle the initial steps
required to detect and initialize hardware and
mount the root file system.
It is the result of creating a cpio
archive
and compressing it with gzip
.
While you could copy the file to some temporary working area, uncompress and then extract its contents, you can easily read it with this command:
# lsinitrd /boot/initrd-release.img
GRUB boot loader components
You will probably be using version 2 of GRUB, so its
components will be in
/boot/grub2
.
Older systems with legacy GRUB will use
/boot/grub
.
System map (kernel symbol table)
This is stored in /boot/System.map-release
.
If you care about this, then you will want to see some of
the other modules.*
files in
/lib/modules/release
, especially
modules.symbols*
.
Kernel build configuration
This is supposed to be in
/boot/config-release
.
I have found that the files provided by Red Hat are close
but not necessarily exactly what was used to build the
kernel they supply.
If you built your kernel
with the proper choice, this will be available as the
kernel data structure /proc/config.gz
.
In that case you are asking the kernel itself to provide
its internal record of how it was built, meaning you will
get the complete truth.
If that apparent file (actually a kernel data structure) isn't there, that feature may have been built as a loadable module. Try loading that module and trying again:
# modprobe ikconfig # zcat /proc/config.gz | less
See more on this below.
Kernel headers
These may be available as
/boot/kernel.h-release
.
You may need these this header file to compile
some C/C++ programs.
The kernel source code itself under
/usr/src/linux
is another possible source
for these headers.
How Was My Kernel Built? What Device Drivers Does It Have? What Can It Do?
You need this information. For some reason Red Hat doesn't think you really need to figure it out on your own, I guess you're supposed to call their support line.
There should be a configuration file describing the set of device drivers built into the monolithic kernel and the set built as loadable modules. There will also be some configuration choices made for some of them — for example, for a non-native file system type like NTFS, should it be supported read-only or read-write?
When you
build a Linux kernel,
the configuration you create to define the build itself
ends up as the file
/usr/src/linux/.config
,
see my page on building Linux kernels
for more details.
If you built your own kernel, you should have kept that
file or a copy.
Many distributions give you the file
/boot/config-release
with the implication that this is the configuration used to
build the kernel you got.
They might have gotten better about this,
but I was misled by Red Hat enough times when working
with Linux on the Alpha architecture that I
no longer trust their config file to be any
more than a fairly close approximation.
If it's all you have, understand that it may be close
but not completely correct.
To be confident that you are getting the real
kernel configuration,
ask the kernel to describe itself.
If the kernel was built with the right settings, its build
configuration is available as a kernel data structure that
you can access as /proc/config.gz
.
The configuration variables are:
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
They are set when configuring the build by:
General setup →
Kernel .config support →
Enable access to .config through /proc/config.gz
To see the list of available device drivers, you could use either of these commands:
$ modprobe -l $ ls -R /lib/modules/$( uname -r )/kernel
That's only somewhat useful, as that just gives you a list
of file names.
If you installed the source code (and why not?),
see the text files in:
/usr/src/linux/Documentation/
You can also find information on a specific module this way:
$ modinfo module-name-goes-here
The information you get is up to the developers of that module. So you might get something very useful, with an explanation, a list of load-time optional parameters and what they mean, and so on. Or you might get a cryptic table of hexadecimal addresses and a list of PCI bus addresses and a reminder that you can always read the C source code and figure it out from there.
Loading and Unloading Modules
Let's say you just added an Ethernet card but you don't
know if whether needs the 8139cp
or 8139too
driver.
Based on what you saw on the card and its chips,
or maybe in the output of the lspci -v
command,
you think it's one of the two.
But you don't know which one.
Try loading one of them and examining the end of the kernel ring buffer with this command sequence:
# modprobe 8139cp # dmesg | tail
Let's say that you only saw this output, generated by the module announcing itself as it loaded:
8139cp: 10/100 PCI Ethernet driver v1.3 (Mar 22, 2004)
That doesn't look too promising. So let's unload it, and then load the other:
# rmmod 8139cp # modprobe 8139too # dmesg | tail
Now we see this output at the end of the kernel ring buffer:
8139too Fast Ethernet driver 0.9.28 ACPI: PCI Interrupt Link [APC2] enabled at IRQ 17 8139too 0000:01:09.0: PCI INT A -> Link[APC2] -> GSI 17 (level, high) -> IRQ 17 eth0: RealTek RTL8139 at 0xf8394000, 00:11:95:1e:8e:b6, IRQ 17 eth0: Identified 8139 chip type 'RTL-8100B/8139D'
Hey, that's it!
Now we know which driver to specify in
/etc/modprobe.conf
or wherever.
What Is My Kernel Doing Right Now?
See the kernel ring buffer with this command:
$ dmesg
It's a ring buffer, so it only keeps the most recent
information.
It's a good idea to save a copy as soon as possible after
boot time, /var/log/dmesg
is an obvious place to
store this.
If your distribution didn't think of this simple
improvement, add it yourself by adding this to the
end of your /etc/rc.local
file:
echo "Saving the kernel ring buffer in /var/log/dmesg dmesg > /var/log/dmesg
See what kernel modules are currently loaded with this command:
$ lsmod
Let's see you see two lines reading like the following among all the output:
ext3 125412 8 nf_conntrack_ftp 12704 1 nf_nat_ftp
This means that the module nf_conntrack_ftp
has been loaded (it's needed to handle FTP connections
through a Linux firewall), and that module is needed
by another module, nf_nat_ftp
, a module used
to handle FTP connections through
Network Address Translation or NAT.
The module ext3
has been loaded, as it must to
handle the Linux-native Ext3FS file systems.
No other module needs ext3
, but it could not be
unloaded as the kernel needs it to handle all the file
systems currently in use!
The first number after the module name indicates the size of the module in bytes. The second number indicates that the number of things currently requiring the module. That FTP connection tracking module is needed by one thing, that other module that requires it. The Ext3FS module is needed by 8 things — the number of currently mounted Ext3FS file systems.
Kernel Hardware Detection
The /proc
file system is really a large collection
of kernel data structures presented in a reasonably
friendly format.
It appears to be a hierarchy of directories and files,
which you can explore with cd
and ls,
and investigate in many cases with cat.
What has my kernel detected about the CPU, memory, and partition table?
$ cat /proc/cpuinfo .... details appear here .... $ cat /proc/meminfo .... details appear here .... $ cat /proc/partitions .... details appear here ....
What devices have been connected to the kernel?
Note that the loading of kernel modules may lead to
the detection of more hardware and the automatic
appearance of more device-special files in /dev
.
$ ls /dev
What devices are on the PCI bus? Let's see that in one line per device, then in moderate detail, then in great detail.
$ lspci .... output appears .... $ lspci -v .... much more output appears .... $ lspci -vv .... more output appears than you probably want to see ....
What about moderate details on the device at PCI bus address 01:08.0?
$ lspci -v -s 01:08.0
What USB devices are connected? Let's see that in one line per device, then in moderate detail, then in great detail.
$ lsusb .... output appears .... $ lsusb -v .... much more output appears .... $ lsusb -vv .... more output appears than you probably want to see ....
What about another way to report on the USB bus?
$ systool -v -b usb
What about SCSI devices, including USB storage devices that appear as generic SCSI devices?
$ systool -v -b scsi
Kernel Data Structures and Kernel Tuning
What is the complete current set of kernel data structures, by their name and value?
$ sysctl -a
The kernel data structure net.ipv4.tcp_fin_timeout
is accessible as the file
/proc/sys/net/ipv4/tcp_fin_timeout
.
Instead of that sysctl
command, you could have
changed to the directory /proc/sys/net/ipv4
and
displayed its contents with cat
.
You can read the current values of these kernel timers, counters, and other fields. You can also change them! This has the effect of flipping a switch or twisting a knob on the running kernel. Now let's say that your enthusiastic modification of kernel values accidentally puts your running kernel into a bizarre state — this is not at all unlikely if you aren't careful. All you have modified is the running kernel in RAM, the kernel file stored on the disk is unchanged. Reboot with a fresh kernel and you're back to the default state.
Read just one specific kernel data structure:
$ sysctl net.ipv4.tcp_fin_timeout -- or -- $ cat /proc/sys/net/ipv4/tcp_fin_timeout
Change that kernel value to 10 (seconds in this case),
you will need to be root
to do this:
# sysctl -w net.ipv4.tcp_fin_timeout=10 -- or -- # echo "10" > /proc/sys/net/ipv4/tcp_fin_timeout
Why would you want to mess with kernel values? To tune the running kernel for performance or security. See my page on security tuning suggestions for far more detail.
If you come up with a collection of adjustments that you
find useful, you could either put the relevant echo
or sysctl -w
command sequence into
/etc/rc.local
or else you could put the relevant
lines into /etc/sysctl.conf
.