
How Linux Boots, Run Levels, and Service Control
How Linux Boots
"How does Linux boot?"
That's a very important question!
Of course you need to understand that for
troubleshooting, in order to handle situations where
the system doesn't boot, or doesn't do it correctly,
or otherwise doesn't get all the needed services started.
But you also need to understand this for routine administration.
You need to control which services are started,
and handle dependencies where one service must
be running before a second can be started.
The answer to the question is complex, because
there are so many choices along the way.
The firmware on the platform,
which boot loader you are using,
which init
system you are using,
the details depend on these choices and more.
Let's start with a simple explanation.
Turn it on, wait a few moments, start doing powerful things.
Maybe that's all you care to know. But maybe you want some details. At its very simplest, Linux is booted by this sequence:
- Firmware on the main system board selects the bootable media, and...
- The boot loader loads the monolithic kernel from the boot media into RAM and starts it, and...
- The kernel uses the directive passed by the boot loader to find the main file system, at which point it can find and run...
-
The
init
program runs boot programs to find the other file systems and start network and local service processes.
Going just slightly deeper, we have choices for the firmware and boot loader. On Linux's traditional platform derived from the IBM PC architecture, the firmware has been the very limited BIOS. That started transitioning to UEFI a number of years ago. UEFI has been taking over since Microsoft's release of Windows 8 in October 2012, they won't allow retail sales of Windows 8 computers without UEFI and its support for Secure Boot. The boot loader was once LILO, then GRUB, and now GRUB2.
The boot loader also tells the kernel how to find and load an initial RAM-based disk image providing device drivers for physical disk controllers as well as some initial scripts.
The init
program continues to evolve in
capability and complexity, from early BSD-like systems
through a SVR4-style init
,
then Upstart, and now systemd
.
As you will see below, systemd
brings
enormous changes to the way a Linux system boots.
This has been just the briefest of overviews. Continue to learn the details!
The following goes through the booting and service start and control steps in the sequence in which they happen, attempting to cover all the possibilities at each step.
Much of what follows will be needed to understand how to migrate from one Linux distribution to another, and how to upgrade across major releases of one distribution.
Kernel Space versus User Space
The Linux kernel runs directly on the hardware, using physical addresses and accessing the hardware on behalf of user processes while enforcing access permissions.
The work you accomplish on the computer is done by your
processes, which were created out of system-owned processes
when you logged in.
All of these processes are descendants of init
,
one master process started by the kernel early
in the boot process.
The init
process manages the system state by
requesting the creation and termination of other processes.
The kernel first detects
enough hardware to find and mount the root file system.
It then starts the init
program,
which manages the system state and all subsequent processes.
The only process the kernel knows should run is
init
.
Once init
has started, the kernel
enforces permissions and manages resource utilization
(memory, processing priority, etc) and may prevent or
restrict some processes.
For the positive side, it's init
and its
descendants that request the creation of new processes.
The Kernel Boot Loader
Firmware
The system firmware selects some media and attempts to boot the operating system stored there. Selecting, loading, and starting the OS kernel might be done by the firmware itself when it is a small operating system of its own, as in the case of OpenBoot (later called Open Firmware), developed by Sun for the SPARC platform, or the Alpha SRM developed by DEC.
For motherboards with AMD/Intel processors using BIOS or UEFI, the firmware finds a "stub" of the boot loader in a disk partition, which then calls successively more capable components to get to the kernel.
The motherboard has firmware, for AMD/Intel processors it will be BIOS or UEFI. UEFI is a type of firmware, not a type of BIOS. Don't say "This system's BIOS is UEFI", that's like saying "This orange is an apple."
Firmware Finds the Boot Loader
BIOS
The BIOS firmware selects the bootable device, scanning the attached storage in an order specified by the bus and controller scan order as well as the BIOS configuration. It is looking for a device starting with a 512-byte block which ends with the boot signature 0x55AA. The first 446 bytes of the boot block hold the boot loader, followed by 64 bytes for the partition table, then those final two bytes of the boot signature.
446 bytes of program code doesn't provide much capability! It will be just a boot loader "stub" which can find and start a more capable boot loader within a partition.
UEFI
UEFI firmware initializes the processor, memory, and peripheral hardware such as Ethernet, SATA, video, and other interfaces. An interface can have its own firmware code, sometimes called Option ROM, which initializes that peripheral. UEFI can check those Option ROMs for embedded signatures which can appear on the UEFI's "Allowed" and "Disallowed" lists.
UEFI has a "boot manager", a firmware policy engine configured by a set of NVRAM variables. It must find the EFI System Partition (or ESP). It will look for a GPT or GUID Partition Table with GUID C12A7328-F81F-11D2-BA4B-00A0C93EC93B, the distinctive signature of an EFI System Partition in a GPT device. If it can't find that, it will look for a traditional MBR partition of type 0xEF. The EFI System Partition will usually be the first partition on some disk, but it can be anywhere as UEFI doesn't have the early 1980s limitations of BIOS.
This is why Knoppix media cannot boot under UEFI. It contains a single ISO-9660 file system and so you have to reset the UEFI firmware to "legacy BIOS mode" or "BIOS compatibility mode". Other media may have a DOS/MBR partition table and an EFI System Partition. Let's compare Knoppix and Red Hat media with some output broken into multiple lines for readability:
$ file KNOPPIX_V7.4.2DVD-2014-09-28-EN.iso KNOPPIX_V7.4.2DVD-2014-09-28-EN.iso: # ISO 9660 CD-ROM filesystem data \ 'KNOPPIX' (bootable) % file rhel-server-7.0-x86_64-dvd.iso rhel-server-7.0-x86_64-dvd.iso: DOS/MBR boot sector; \ partition 2 : ID=0xef, start-CHS (0x3ff,254,63), \ end-CHS (0x3ff,254,63), startsector 548348, 12572 sectors % fdisk -l rhel-server-7.0-x86_64-dvd.iso Disk rhel-server-7.0-x86_64-dvd.iso: 3.5 GiB, 3743416320 bytes, 7311360 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: dos Disk identifier: 0x1c7ff43a Device Boot Start End Blocks Id System rhel-server-7.0-x86_64-dvd.iso1 * 0 7311359 3655680 0 Empty rhel-server-7.0-x86_64-dvd.iso2 548348 560919 6286 ef EFI (FAT-12/16/32)
The EFI System Partition contains a small FAT32
file system (or FAT12 or FAT16 on removable media).
Considering that file system to be /EFI
,
the firmware looks inside that for a program in the
EFI System Partition, typically
/EFI/BOOT/BOOTX64.EFI
or similar.
The EFI System partition will usually be mounted as
/boot/efi
after booting so its content
will be accessible.
Let's find our EFI System Partition. The first command finds FAT32 file systems, which could be the ESP. The second command looks at the labels to find the magic GUID:
# mount -t vfat /dev/sda1 on /boot/efi type vfat (rw,relatime,fmask=0077,dmask=0077,codepage=437,iocharset=ascii,shortname=winnt,errors=remount-ro) # parted /dev/sda print Model: ATA VMware Virtual I (scsi) Disk /dev/sda: 21.5GB Sector size (logical/physical): 512B/512B Partition Table: gpt Disk Flags: Number Start End Size File system Name Flags 1 1049kB 211MB 210MB fat16 EFI System Partition boot 2 211MB 735MB 524MB xfs 3 735MB 21.5GB 20.7GB lvm
UEFI is much more capable than BIOS, but with that capability comes complexity. Exactly which program will UEFI run? Whichever program it has been configured to run. What second program will that first program run, and what configuration files might that second program run? Whatever the first and second programs have been built to do next.
The following walks through an example system which happens to use Red Hat Enterprise Linux 7. The following cannot predict exactly what your system has been built to do during the booting process, but you can use this same analysis to figure it out.
Run the following to see what the UEFI firmware is
configured to do on your system.
In this case, the BootOrder
line specifies
that the default boot target is the one labeled
as "Red Hat Enterprise Linux", but you could enter the
UEFI boot manager within the first 5 seconds to choose
from the others.
See the
efibootmgr
manual page
for how to change the timeout or the order
and default choice.
# efibootmgr -v
BootCurrent: 0004
Timeout: 5 seconds
BootOrder: 0004,0000,0001,0003,0002
Boot0000* EFI IDE Hard Drive (IDE 0:0) ACPI(a034ad0,0)PCI(7,1)ATAPI(0,0,0)
Boot0001* EFI SATA CDROM Drive (1.0) ACPI(a034ad0,0)PCI(11,0)PCI(5,0)03120a00010000000000
Boot0002* EFI Network ACPI(a0341d0,0)PCI(14,0)MAC(6c626db2f841,0)
Boot0003* EFI Internal Shell (Unsupported option) MM(b,bee94000,bf21efff)
Boot0004* Red Hat Enterprise Linux HD(spec)File(\EFI\redhat\shim.efi)
The spec
string is really
some long thing with a UUID, like this:
HD(1,800,64000,b22ad8fd-3b85-4517-987e-40cba35abd53)
but the point here is the file specification
I wanted you to see:
File(\EFI\redhat\shim.efi)
We can dig deeper with the following command,
showing that shim.efi
will call
grubx64.efi
.
In case the expected string is split across lines,
also search for other substrings: gru
,
bx64
, 64.efi
, and so on.
In this case we get lucky and find both grub
and g.r.u.b
complete on lines.
We see g.r.u.b
where the data contains
null characters between the letters:
0x67 0x00 0x76 0x00 0x75 0x00 0x62 0x00.
# hexdump -C /boot/efi/EFI/redhat/shim.efi | egrep -C 6 -i 'grub|g.r.u.b' 000c5090 42 00 53 00 74 00 61 00 74 00 65 00 0a 00 00 00 |B.S.t.a.t.e.....| 000c50a0 4d 00 6f 00 6b 00 49 00 67 00 6e 00 6f 00 72 00 |M.o.k.I.g.n.o.r.| 000c50b0 65 00 44 00 42 00 00 00 46 00 61 00 69 00 6c 00 |e.D.B...F.a.i.l.| 000c50c0 65 00 64 00 20 00 74 00 6f 00 20 00 73 00 65 00 |e.d. .t.o. .s.e.| 000c50d0 74 00 20 00 4d 00 6f 00 6b 00 49 00 67 00 6e 00 |t. .M.o.k.I.g.n.| 000c50e0 6f 00 72 00 65 00 44 00 42 00 3a 00 20 00 25 00 |o.r.e.D.B.:. .%.| 000c50f0 72 00 0a 00 00 00 5c 0067 00 72 00 75 00 62 00
|r.....\.g.r.u.b
.| 000c5100 78 00 36 00 34 00 2e 00 65 00 66 00 69 00 00 00 |x.6.4...e.f.i...| 000c5110 46 00 61 00 69 00 6c 00 65 00 64 00 20 00 74 00 |F.a.i.l.e.d. .t.| 000c5120 6f 00 20 00 67 00 65 00 74 00 20 00 6c 00 6f 00 |o. .g.e.t. .l.o.| 000c5130 61 00 64 00 20 00 6f 00 70 00 74 00 69 00 6f 00 |a.d. .o.p.t.i.o.| 000c5140 6e 00 73 00 3a 00 20 00 25 00 72 00 0a 00 00 00 |n.s.:. .%.r.....| 000c5150 46 00 61 00 69 00 6c 00 65 00 64 00 20 00 74 00 |F.a.i.l.e.d. .t.| -- 000c51b0 73 00 74 00 61 00 6c 00 6c 00 20 00 73 00 65 00 |s.t.a.l.l. .s.e.| 000c51c0 63 00 75 00 72 00 69 00 74 00 79 00 20 00 70 00 |c.u.r.i.t.y. .p.| 000c51d0 72 00 6f 00 74 00 6f 00 63 00 6f 00 6c 00 00 00 |r.o.t.o.c.o.l...| 000c51e0 42 00 6f 00 6f 00 74 00 69 00 6e 00 67 00 20 00 |B.o.o.t.i.n.g. .| 000c51f0 69 00 6e 00 20 00 69 00 6e 00 73 00 65 00 63 00 |i.n. .i.n.s.e.c.| 000c5200 75 00 72 00 65 00 20 00 6d 00 6f 00 64 00 65 00 |u.r.e. .m.o.d.e.| 000c5210 0a 00 00 00 00 00 00 00 5c67 72 75 62
78 36 34 |........\grub
x64| 000c5220 2e 65 66 69 00 74 66 74 70 3a 2f 2f 00 00 00 00 |.efi.tftp://....| 000c5230 55 00 52 00 4c 00 53 00 20 00 4d 00 55 00 53 00 |U.R.L.S. .M.U.S.| 000c5240 54 00 20 00 53 00 54 00 41 00 52 00 54 00 20 00 |T. .S.T.A.R.T. .| 000c5250 57 00 49 00 54 00 48 00 20 00 74 00 66 00 74 00 |W.I.T.H. .t.f.t.| 000c5260 70 00 3a 00 2f 00 2f 00 0a 00 00 00 00 00 00 00 |p.:././.........| 000c5270 54 00 46 00 54 00 50 00 20 00 53 00 45 00 52 00 |T.F.T.P. .S.E.R.| -- 00144430 73 5f 70 72 69 6e 74 00 58 35 30 39 5f 70 6f 6c |s_print.X509_pol| 00144440 69 63 79 5f 6c 65 76 65 6c 5f 6e 6f 64 65 5f 63 |icy_level_node_c| 00144450 6f 75 6e 74 00 50 45 4d 5f 77 72 69 74 65 5f 62 |ount.PEM_write_b| 00144460 69 6f 5f 44 53 41 50 72 69 76 61 74 65 4b 65 79 |io_DSAPrivateKey| 00144470 00 58 35 30 39 5f 41 54 54 52 49 42 55 54 45 5f |.X509_ATTRIBUTE_| 00144480 63 72 65 61 74 65 5f 62 79 5f 4f 42 4a 00 69 6e |create_by_OBJ.in| 00144490 69 74 5f67 72 75 62
00 52 53 41 5f 70 72 69 6e |it_grub
.RSA_prin| 001444a0 74 00 58 35 30 39 5f 74 72 75 73 74 5f 63 6c 65 |t.X509_trust_cle| 001444b0 61 72 00 42 49 4f 5f 73 5f 6e 75 6c 6c 00 58 35 |ar.BIO_s_null.X5| 001444c0 30 39 76 33 5f 67 65 74 5f 65 78 74 5f 62 79 5f |09v3_get_ext_by_| 001444d0 63 72 69 74 69 63 61 6c 00 53 68 61 31 46 69 6e |critical.Sha1Fin| 001444e0 61 6c 00 44 49 52 45 43 54 4f 52 59 53 54 52 49 |al.DIRECTORYSTRI| 001444f0 4e 47 5f 66 72 65 65 00 69 32 64 5f 58 35 30 39 |NG_free.i2d_X509|
We got lucky, all instances of both grub
and g.r.u.b
appear only within 16-byte blocks,
none span from one block to the next and are overlooked
by this simple search.
We can verify that's the case, and find any other instances,
by running something like this and dealing with the
messy output:
# egrep -a -C 2 /boot/efi/EFI/redhat/shim.efi | cat -A
Or we could open the shim.efi
program with
the vim
editor, using the -R
option to specify read-only mode.
Then search for the patterns.
The GRUB configuration file is not in
/boot/grub2/grub.cfg
where I and the
GRUB 2 documentation
would expect to find it.
Instead it's in the same directory as grubx64.efi
.
How does that program find it?
It has been hard-coded to find it there:
# strings /boot/efi/EFI/redhat/grubx64.efi | grep grub.cfg
%s/grub.cfg
This means that on this system, using the paths after the UEFI System Partition is mounted, the sequence would be:
- UEFI Firmware calls...
-
/boot/EFI/redhat/shim.efi
which calls... -
/boot/EFI/redhat/grubx64.efi
, which readsgrub.cfg
and loads... -
/boot/vmlinuz-release
See UEFI boot: how does that actually work, then? for more details and a great explanation of what UEFI is and is not.
The Boot Loader Starts
BIOS-MBR
On a BIOS-MBR system, that 446-byte boot loader "stub" was, in the past, LILO. However, LILO relied on physical addresses into the disk and was very sensitive to disk geometry and reconfigurations. You frequently had to rescue a system by booting from other media and then recreating the LILO boot loader block.
GRUB is a much better solution for BIOS-MBR.
That 446-byte block is the GRUB "stub",
which can be recovered from the first 446 bytes of a file
in /boot/grub
.
In legacy GRUB this is
/boot/grub/stage1
,
in GRUB 2 this is
/boot/grub2/i386-pc/boot.img
.
Encoded into the stage 1 GRUB loader is a definition of
where to find the small /boot
file system.
This is typically the first partition of the first disk.
The boot loader will need to read the file system in the
boot partition.
Legacy GRUB used
the files /boot/grub/*stage1_5
,
helper modules for the various file systems that could be
used for the /boot
file system —
e2fs_stage1_5
,
xfs_stage1_5
,
and others.
The boot block of a disk is sector 0.
For legacy reasons, the first partition of a disk does
not begin until sector 63, leaving a gap of 62 sectors.
The GRUB 2 component core.img
is written
into this gap.
It plays the role of the legacy GRUB *_stage1_5
modules.
Now that GRUB (either legacy or GRUB 2) can read the
/boot
file system,
it can run the final stage GRUB boot loader to do
the complex work.
In legacy GRUB this is
/boot/grub/stage2
,
configured by either
/boot/grub/grub.conf
or
/boot/grub/menu.lst
.
Both of those usually exist, one as a normal file and the
other as a symbolic link pointing to it.
In GRUB 2 this is /boot/grub2/i386-pc/kernel.img
,
configured by /boot/grub2/grub.cfg
.
UEFI-GPT
On a UEFI-GPT system, the firmware has found the UEFI System Partition, which must hold a FAT32 file system.
On a typical system, the firmware will load and run
whatever it is configured to run, frequently
/EFI/BOOT/BOOTX64.EFI
.
This in turn will load and run whatever it has been
built to run next, for a Linux system this will be
/EFI/BOOT/GRUB.EFI
.
The second of those
is a component of the GRUB 2 boot loader.
Or, to support Secure Boot, the firmware might first run
/EFI/BOOT/SHIM.EFI
which has been signed by
the UEFI signing service, and it then chain loads the
Grub program.
As mentioned above,
run efibootmgr -v
to
see what the firmware is configured to run.
This GRUB "stub" can then use the core.img
component to read whichever file system type was used for
the boot partition, and then start the kernel.img
program which reads its configuration from
/boot/grub2/grub.cfg
.
The EFI version of GRUB uses
/usr/lib/grub/x86_64-efi/linuxefi.mod*
to load the kernel.
Notice that the configuration file created by components
of the grub2-efi
package specify the kernel
and initial RAM disk image (more on that
below)
with linuxefi
and initrdefi
rather than linux
and initrd
as used on non-EFI platforms.
Once the kernel has been loaded and the file systems have been
mounted, a typical system will mount the EFI System Partition
as /boot/efi
and you can explore your system.
However, some of the Grub components will be outside
the file systems and you won't find them in /boot
or /boot/efi
.
On a GPT disk the boot block or master boot record is in
sector #0, the GPT header is in sector #1, and the GPT
partition entry arrays fill sectors #2-33.
The file core.img
is written into empty
sectors between the end of the partition table
and the beginning of the first partition.
The User Gets a Choice
The Grub boot loader can present a menu to the user,
typically a choice of various kernels,
or the menu
can be hidden and require the user to realize that the
<Escape>
key must be pressed within
a few seconds.
The menu may time out after a specified number of seconds,
or it may wait until the user makes a selection.
The user can also edit the selected boot option, useful for maintenance and rescue. For example, specifying a boot to single-user mode to repair some problems.
Grub wasn't all that complicated in the beginning, but Grub 2 adds a lot of complexity. Compare typical configuration files to see the large increase in complexity. The Linux kernel build process can modify it automatically, but you will probably want to go into the file and make some modifications. Be careful!
The Kernel
On some hardware, the
Alpha
for example,
the kernel file is simply a compressed copy of the
compiled vmlinux
kernel file.
On Intel/AMD platforms, the kernel file starts with a 512-byte boot block, then a secondary boot loader block, and then the compressed kernel image.
The boot loader does what is needed to extract and decompress the kernel into RAM and then turn control over to it.
Finding the File System
For example, the root file system might be on NFS, and storage devices might be connected through iSCSI or AoE or FCoE (that is, SCSI over IP, or ATA over Ethernet, or Fiber Channel over Ethernet), any of that requiring the network interface to be detected, enabled, and used in a certain way.
Or on a logical volume, which means that the underlying physical volumes and volume group must be detected.
Or, the root file system might be encrypted with the LUKS and dm-crypt components, requiring using those kernel modules and asking the user to enter a passphrase.
It doesn't have to be as complicated as any of those. SCSI controllers and SATA interfaces are based on a wide range of chip sets, each requiring one of several different kernel modules.
How can you load a kernel module (or device driver) used to interact with a device, when that module is stored on a disk controlled through the very device we're currently unable to interact with? You can't. The solution is...
The boot loader, be it GRUB or the mainboard firmware, tells the kernel where to find its primary or root file system. The problem is that that file system might be stored in any of several different complicated ways, many of them requiring kernel modules that can't all be simultaneously compiled into the monolithic kernel core.
Initial RAM Disk Image
The boot loader will tell the kernel how to find an initial RAM disk image stored in the boot file system.
The traditional method has been to build an
initrd
file.
This is a file system containing those kernel modules along
with binaries, shared libraries, scripts,
device-special files, and everything else needed to get the
system to the point it can find and use the real root
file system.
That resulting file system has been archived into a single
file with cpio
and then compressed with
gzip
.
In the initrd
design, this image is uncompressed
and extracted into /dev/ram
which is then
mounted as a file system.
It once executed the script /linuxrc
in that
file system, now it executes the script /init
.
In the initramfs
design, the
dracut
tool is used to create the file
(which may still be called initrd-release
)
while including its own framework from
/usr/lib/dracut
or similar.
The initrd
design uses a script,
linuxrc
, while dracut
uses a binary.
The goal is to speed this part of the boot process, to quickly
detect the root file system's device(s) and transition to
the real root file system.
In either case, that /init
script will
find and mount the real file system, so the real
init
program can be run.
Exploring Your Initial RAM Disk Image
For a listing, simply try this.
Change release
to the appropriate value:
# lsinitrd /boot/initrd-release
To extract and explore a copy, do this:
# mkdir /tmp/explore # cd /tmp/explore # zcat /boot/initrd-release | cpio -i # ls -l # tree | less
You will find that there is a /init
file,
but it is a shell script.
Directories /bin
and /sbin
contain needed programs while
/lib
and /lib64
contain shared libraries.
Three devices are in /dev
—
console
,
kmsg
, and
null
.
Some configuration is in /etc
,
mount points exist for /proc
,
/sys
, and
/sysroot
.
A /tmp
directory is available.
Finding the Actual Root File System
But out of the multiple file systems available on multiple
devices, combinations of devices, and network abstractions,
how does the kernel know which is to be mounted first as
the real /
, the real root file system?
Well, how do you want to accomplish that?
Some root=
directive is passed to the kernel
by the boot loader.
This can be the device name:
root=/dev/sda4
or a label embedded in the file system header:
root=LABEL=/
or the manufacturer and model name and its partition
as detected by the
kernel and listed in /dev/disk/by-id
:
root=/dev/disk/by-id/ata-WDC_WD20EARS-00MVWB0_WD-WMAZA3949022-part1
or the PCI bus address and path through the controller:
/dev/disk/by-path/pci-0000:00:02.1-usb-0:1:1.0-scsi-0:0:0:0.part1
or a UUID embedded in the file system header:
root=UUID=12e71ecd-833d-45ea-adfd-1eca8c27d912
Of these choices:
-
Device name, as in
/dev/sda1
, may seem the most "natural" as it's the simplest and the one that's been around the longest, but device discovery order easily changes. - Manufacturer, model, and serial number are awfully complex and arcane, and PCI path is even worse. They require more attention to hardware details than most administrators care for.
-
Embedded file system seemed like the obvious choice
when it first came out, but it really gets in the
way if you're doing any recovery work.
Imagine pulling a disk containing a file system
that happens to be labeled as
/var
out of a decommissioned system, and plugging it into a system that happens to rely on its own/var
file system being labeled that way. Confusion ensues. - UUID strings are awkwardly long and cryptic, but they seem to be the least bad solution.
Continuing Critiques of the Boot Sequence
This boot sequence is overly complicated. The system passes through three environments with similar functionality before boot: EFI, GRUB, and Dracut. All of them involve loading device drivers. EFI and Dracut have a small shell and scripting, and GRUB has a complex editable menu.
The 3.3 kernel added the capability for the EFI stub to directly load the kernel.
The Kernel Found the Root File System, Now What?
Once the kernel has started and it has discovered enough hardware to mount the root file system, it searches for a master user-space program which will control the state of the operating system itself and also manage the processes running on the operating system, both system services and user processes.
This master process is the init
program, or at least we will find that the kernel expects
that to be the case.
The init
program runs the appropriate set of
boot scripts, based on its own configuration and that of
the collection of boot scripts.
It is then the ancestor of all other processes,
cleaning up and freeing their resources if their parent
process has lost track of them ("reaping zombie processes"
in the UNIX parlance).
The kernel and init
The Linux kernel starts the init
program.
The kernel source code
contains a block of code in init/main.c
that looks for the init
program in the
appropriate place, the /sbin
directory.
If it isn't there, the kernel then tries two other locations
before falling back to starting a shell.
If it can't even do that, then the boot loader seems to have
misinformed the kernel about where the root file system
really is.
Maybe it specified the wrong device,
or maybe the root file system is corrupted or simply missing
a vital piece.
static int run_init_process(const char *init_filename) { argv_init[0] = init_filename; return do_execve(init_filename, (const char __user *const __user *)argv_init, (const char __user *const __user *)envp_init); } [ ... ] if (!run_init_process("/sbin/init") || !run_init_process("/etc/init") || !run_init_process("/bin/init") || !run_init_process("/bin/sh")) return 0; panic("No init found. Try passing init= option to kernel. " "See Linux Documentation/init.txt for guidance.");
The kernel does all the kernel-space work: interacting directly with hardware, managing running processes by allocating memory and CPU time, and enforcing access control through ownership and permissions.
init
handles the user-space work,
at least initially.
It runs a number of scripts sometimes called
boot scripts as they handle the
user-space part of starting the operating system environment,
and sometimes called init scripts because
they're run by init
.
The first of these boot scripts has traditionally been
/etc/rc.sysinit
.
It does basic system initialization, checking the root
file system with fsck
if required, checking
and mounting the other file systems, and loading any
needed kernel modules along the way.
Other boot scripts usually start both local
and remote authentication interfaces.
This includes the local command-line interface with
mingetty
and possibly a graphical login
on non-servers with some X-based display manager;
plus remote access with the SSH daemon.
Like all the other boot scripts,
the user authentication interfaces are started as
root
so the user authentication and subsequent
calls to
setuid()
and setgid()
can run.
User-owned processes then do the work within each session.

Boot scripts can start services that more safely run
with lower privileges.
For example, the Apache web server must start as
root
so it can open TCP port 80.
But it then drops privileges through the
setuid()
and setgid()
system calls
and continues running as an unprivileged user named
apache
or httpd
or similar.
The components drawn in red can directly access the hardware: the firmware, the boot loader, and the kernel.
Those drawn in black are owned by root
:
the init
process, the boot scripts, and
the daemons they spawn.
These should be persistent, running until there is a change
in system state, possibly shutting down to halt or reboot.
Those drawn in green are owned by unprivileged users: the user's login session and its processes, and possibly some daemonized non-root services like a web server. User processes run until they finish or are terminated.
Where Are the Scripts?
This can be confusing, as the scripts may be
directly in the directory /etc/
or maybe in its subdirectory /etc/rc.d/
.
This is made more confusing by the presence of symbolic
links which mean that — most of the time, anyway —
either naming convention works.
You frequently find something like this:
$ ls -lFd /etc/init.d /etc/rc* lrwxrwxrwx 1 root root 11 Feb 3 16:47 /etc/init.d -> rc.d/init.d/ drwxr-xr-x 11 root root 4096 Feb 9 16:51 /etc/rc.d/ lrwxr-xr-x 11 root root 13 Feb 9 16:48 /etc/rc.local -> rc.d/rc.local lrwxr-xr-x 11 root root 15 Feb 9 16:48 /etc/rc.sysinit -> rc.d/rc.sysinit lrwxrwxrwx 1 root root 10 Feb 3 16:58 /etc/rc0.d -> rc.d/rc0.d/ lrwxrwxrwx 1 root root 10 Feb 3 16:58 /etc/rc1.d -> rc.d/rc1.d/ lrwxrwxrwx 1 root root 10 Feb 3 16:58 /etc/rc2.d -> rc.d/rc2.d/ lrwxrwxrwx 1 root root 10 Feb 3 16:58 /etc/rc3.d -> rc.d/rc3.d/ lrwxrwxrwx 1 root root 10 Feb 3 16:58 /etc/rc4.d -> rc.d/rc4.d/ lrwxrwxrwx 1 root root 10 Feb 3 16:58 /etc/rc5.d -> rc.d/rc5.d/ lrwxrwxrwx 1 root root 10 Feb 3 16:58 /etc/rc6.d -> rc.d/rc6.d/ lrwxrwxrwx 1 root root 10 Feb 3 16:58 /etc/rcS.d -> rc.d/rcS.d/ $ ls -lF /etc/rc.d/ total 64 drwxr-xr-x 2 root root 4096 Feb 3 21:52 init.d/ -rwxr-xr-x 1 root root 220 Feb 3 16:51 rc.local* -rwxr-xr-x 1 root root 10707 Feb 3 16:52 rc.sysinit* drwxr-xr-x 2 root root 4096 Feb 9 16:25 rc0.d/ drwxr-xr-x 2 root root 4096 Feb 9 16:25 rc1.d/ drwxr-xr-x 2 root root 4096 Feb 9 16:25 rc2.d/ drwxr-xr-x 2 root root 4096 Feb 9 16:25 rc3.d/ drwxr-xr-x 2 root root 4096 Feb 9 16:25 rc4.d/ drwxr-xr-x 2 root root 4096 Feb 9 16:25 rc5.d/ drwxr-xr-x 2 root root 4096 Feb 9 16:25 rc6.d/ drwxr-xr-x 2 root root 4096 Feb 9 16:25 rc7.d/ lrwxrwxrwx 1 root root 5 Feb 3 16:47 rcS.d -> rc1.d/
Development of init
through several major versions
BSD Unix
uses a very simple init
scheme using one master
boot script (which calls other scripts to start services),
configured by one file specifying which services to enable
and with one additional script optionally adding other
boot-time tasks.
System V Unix added the concept of run levels, multiple target states for the running system. Each is defined as a collection of started/stopped states for services.
Upstart
is a modification of the SysV method.
The first thing the administrator notices it that
init
configuration has changed
from a single file to a collection of files.
More significantly in the long run, Upstart adds
support for dependencies between service components,
automatically restarting a crashed service,
and the ability for
events to trigger starting or stopping services.
systemd is the biggest change yet. A system can boot or otherwise transition to multiple simultaneous targets. Aggressive parallelization yields fast state transitions and a very flat process tree, The dependency support promised in Upstart is delivered in systemd.
We'll look at these one at a time.
BSD-style /etc/rc
The
OpenBSD
kernel also starts init
.
Here is a block of code from
/usr/src/sys/kern/init_main.c
showing that init
must be in an obvious place.
/* * List of paths to try when searching for "init". */ static char *initpaths[] = { "/sbin/init", "/sbin/oinit", "/sbin/init.bak", NULL, };
The BSD init
uses a simple boot script configuration that used to be
used in some Linux distributions such as Slackware.
But no mainstream Linux distributions does it this way now.
The BSD style init
program brings up the
system by running the /etc/rc
script.
That's it — rc
uses a few other
scripts, but it's a simple and efficient design.
The configuration script /etc/rc.conf
sets a number of standard parameters for available services.
You then modify /etc/rc.conf.local
to
turn on and modify the parameters of services on your system.
rc
runs rc.conf
and then
rc.conf.local
.
For example, rc.conf
says don't run a the Network
Time Protocol or start a web server by default,
it contains these lines:
ntpd_flags=NO # for normal use: "" httpd_flags=NO # for normal use: ""
But then you might customize your system with these changes,
in rc.conf.local
,
turning on both NTP and HTTP and disabling the
chroot()
capability of Apache:
ntpd_flags="" httpd_flags="-u"
The individual services are started by scripts in
/etc/rc.d/*
called by rc
.
Then almost at the very end of the master boot script
rc
, it calls /etc/rc.local
.
The only things done after that are starting some hardware
monitoring daemons, the cron
daemon, and
possibly the simple X display manager xdm
if you asked for that during the installation.
This gives you a place to add some customization.
My rc.local
contains this:
# $OpenBSD: rc.local,v 1.44 2011/04/22 06:08:14 ajacoutot Exp $
# Site-specific startup actions, daemons, and other things which
# can be done AFTER your system goes into securemode. For actions
# which should be done BEFORE your system has gone into securemode
# please see /etc/rc.securelevel.
## ADDED BELOW HERE ################################################# echo "Starting KDM" ( sleep 5 ; /usr/local/bin/kdm ) & echo "Saving kernel ring buffer in /var/log/dmesg" dmesg > /var/log/dmesg echo "Starting smartd to monitor drives" /usr/local/sbin/smartd echo "Unmuting audio" audioctl output_muted=0
Reboot the system with shutdown -r
or simply reboot
.
Halt and turn off the power with shutdown -h
or simply halt -p
.
SysV-style init
This is more complex than the BSD method. It is based on the concept of numbered run levels. Linux uses the definitions in this table. Solaris and other Unix-family operating systems use something very similar.
Level | Purpose | |
Most Linux | Some Linux | |
0 | Shut down and power off | Shut down and power off |
1 | Single-user mode | Single-user mode |
2 | Multi-user console login, no networking | Multi-user console login, networking enabled |
3 | Multi-user console login, networking enabled | Multi-user graphical login, networking enabled |
4 | not used | not used |
5 | Multi-user graphical login, networking enabled | not used |
6 | Shut down and reboot | Shut down and reboot |
Red Hat and therefore many other distributions
used this SysV-style init
roughly from
the late 1990s through the late 2000s.
/etc/inittab
The SysV init
program reads its configuration
file /etc/inittab
to see what to do by default and how to do that.
CentOS 5 used the inittab
shown here.
# # inittab This file describes how the INIT process should set up # the system in a certain run-level. # # Author: Miquel van Smoorenburg, <miquels@drinkel.nl.mugnet.org> # Modified for RHS Linux by Marc Ewing and Donnie Barnes # # Default runlevel. The runlevels used by RHS are: # 0 - halt (Do NOT set initdefault to this) # 1 - Single user mode # 2 - Multiuser, without NFS (The same as 3, if you do not have networking) # 3 - Full multiuser mode # 4 - unused # 5 - X11 # 6 - reboot (Do NOT set initdefault to this) #
id:5:initdefault:# System initialization.
si::sysinit:/etc/rc.d/rc.sysinit l0:0:wait:/etc/rc.d/rc 0 l1:1:wait:/etc/rc.d/rc 1 l2:2:wait:/etc/rc.d/rc 2 l3:3:wait:/etc/rc.d/rc 3 l4:4:wait:/etc/rc.d/rc 4 l5:5:wait:/etc/rc.d/rc 5 l6:6:wait:/etc/rc.d/rc 6# Trap CTRL-ALT-DELETE
ca::ctrlaltdel:/sbin/shutdown -t3 -r now# When our UPS tells us power has failed, assume we have a few minutes # of power left. Schedule a shutdown for 2 minutes from now. # This does, of course, assume you have powerd installed and your # UPS connected and working correctly.
pf::powerfail:/sbin/shutdown -f -h +2 "Power Failure; System Shutting Down"# If power was restored before the shutdown kicked in, cancel it.
pr:12345:powerokwait:/sbin/shutdown -c "Power Restored; Shutdown Canceled"# Run gettys in standard runlevels
1:2345:respawn:/sbin/mingetty tty1 2:2345:respawn:/sbin/mingetty tty2 3:2345:respawn:/sbin/mingetty tty3 4:2345:respawn:/sbin/mingetty tty4 5:2345:respawn:/sbin/mingetty tty5 6:2345:respawn:/sbin/mingetty tty6# Run xdm in runlevel 5
x:5:respawn:/etc/X11/prefdm -nodaemon
The default run level is 5, graphical desktop. If you built a server, this would be 3 instead.
The first boot script to be run is
/etc/rc.d/rc.sysinit
.
It does a number of initialization tasks.
Most importantly, it re-mounts the root file system in
read/write mode and finds and checks the other file systems.
Then, to get into run level 5, it runs the scripts with
5
in the second field.
That means that the second task is to run the script
/etc/rc.d/rc
with a parameter of 5
.
More on the details of this in a moment...
If it had been about to reboot because of a power failure, that is canceled.
It then starts six /sbin/mingetty
processes,
one each on TTY devices tty1
through
tty6
.
The key combinations
<Ctrl><Alt><F1>
through <Ctrl><Alt><F7>
switch you between these six text virtual consoles
plus X if it's running.
Finally, it runs the script /etc/X11/prefdm
which tries to determine which display manager is probably
the preferred one and then starts it.
Along the way it specified that a detected power failure
event scheduled a shutdown in two minutes,
and the text console keyboard event
<Ctrl><Alt><Del>
causes an immediate reboot.
Boot script directories and changing run levels
The boot scripts themselves are stored in
/etc/rc.d/init.d/
and can be thought of as a collection of available tools.
$ ls /etc/rc.d/rc5.d K15httpd S00microcode_ctl S22messagebus S80sendmail K20nfs S04readahead_early S25netfs S85denyhosts K28amd S06cpuspeed S26acpid S90crond K50netconsole S08arptables_jf S26lm_sensors S90xfs K65kadmin S10network S26lvm2-monitor S91freenx-server K65kprop S10restorecond S28autofs S95anacron K65krb524 S12syslog S50hplip S95atd K65krb5kdc S13irqbalance S55cups S96readahead_later K69rpcsvcgssd S13mcstrans S55sshd S98haldaemon K74nscd S13portmap S56rawdevices S99firewall K80kdump S14nfslock S56xinetd S99local K87multipathd S18rpcidmapd S58ntpd S99smartd K89netplugd S19rpcgssd S61clamd
You can manually stop, start, or restart a service by
running its boot script with a parameter of stop
or start
or restart
.
Most of the boot scripts also support checking on its current
state with status
, and some support
reload
to keep running but re-read its
configuration file to change some details of how its
running.
Try running a boot script with no parameter at all. That usually provides a short message explaining that a parameter is needed and then listing the possible parameters.
The directories /etc/rc.d/rc0.d
through /etc/rc.d/rc6.d
specify what to stop and start to get into the
corresponding run levels.
Each of those is populated with symbolic links pointing to
the actual scripts in /etc/rc.d/init.d/*
,
with the exception of S99local
which points
to /etc/rc.d/rc.local
.
For example,
/etc/rc.d/rc5.d
contains S10network
, a symbolic link
pointing to /etc/rc.d/init.d/network
.
The logic is that rc
goes through the list
of links in the target directory, first stopping (killing)
those with link names beginning with "K" in numerical order,
and then starting those with link names beginning with "S"
in numerical order.
The network
script sets up IPv4/IPv6 networking,
and so it is started in run level 5 before those
network services that rely on it.
Similarly, when going to run level 0 or 6, those services
are stopped before turning off IP networking.
There's more to it than just that — if the system was already running, and if a service is to be in the same state in both the current and target run level, then it isn't stopped. For example, if you booted a system to run level 3, in which networking and network services are started, and then you changed to run level 5, the only thing that will happen is that the graphical display manager will be started. It won't shut down all services and IP networking and then start them back up again.
To change from one run level to another, run the
init
command with a parameter of the
target run level.
Use runlevel
to see the previous and current
run levels, where N
means "none", you booted
the system directly into the current run level.
Reboot the system with init 6
or shutdown -r
or simply reboot
.
Halt and turn off the power with init 0
or shutdown -h
or simply halt
.
On a text console you can reboot with
<Ctrl><Alt><Del>
,
and on a graphical console that usually brings up a
dialog in which both rebooting and shutting down
are options.
You can also click through the graphical menus to
shut down or reboot from graphical mode.
Specifying how to get into a given run level
You could manually create the symbolic links, but you would have to think carefully about what numbers to assign to get everything into the correct order.
Don't do that, use chkconfig
.
The chkconfig
program is a little confusing
because it is programmed by shell script comments within
the boot scripts.
Let's look at an example:
$ head /etc/rc.d/init.d/network #! /bin/bash # # network Bring up/down networking # # chkconfig: 2345 10 90 # description: Activates/Deactivates all network interfaces configured to \ # start at boot time.
This specifies that if you want this service to be used (and you probably do, this sets up basic IP networking!), then it should be started in run levels 2, 3, 4, and 5, started as S10, fairly early. That leaves it to be turned off in run levels 0, 1, and 6, stopped (killed) as K90, fairly late.
Let's experiment with chkconfig
:
$ su password: # chkconfig --add network # chkconfig --list network network 0:off 1:off 2:on 3:on 4:on 5:on 6:off # ls /etc/rc.d/rc?.d/*network /etc/rc.d/rc0.d/K90network /etc/rc.d/rc4.d/S10network /etc/rc.d/rc1.d/K90network /etc/rc.d/rc5.d/S10network /etc/rc.d/rc2.d/S10network /etc/rc.d/rc6.d/K90network /etc/rc.d/rc3.d/S10network # chkconfig --del network # chkconfig --list network network 0:off 1:off 2:off 3:off 4:off 5:off 6:off # ls /etc/rc.d/rc?.d/*network /etc/rc.d/rc0.d/K90network /etc/rc.d/rc4.d/K90network /etc/rc.d/rc1.d/K90network /etc/rc.d/rc5.d/K90network /etc/rc.d/rc2.d/K90network /etc/rc.d/rc6.d/K90network /etc/rc.d/rc3.d/K90network # chkconfig --add network # chkconfig --list network network 0:off 1:off 2:on 3:on 4:on 5:on 6:off
We turned it on (probably not needed) and then checked its state in various run levels. We also listed the symbolic links showing that it's started as S10 and stopped (killed) as K90.
Then we turned off the service and tested what that did.
Finally, we turned it back on and made sure that worked.
Control now versus in the future
Remember that you can do two very different things, and often you should do both of them:
Start or stop the service right now
by running its boot script with a parameter of
start
or stop
.
Have it automatically started (or not) after
future reboots
by running chkconfig
with an option of
--add
(or --del
).
Upstart init
Upstart is an event-driven replacement or
re-design for init
.
It was meant to be analogous to the Service Management Facility
in Solaris, with services started and stopped by events.
These events might be kernel detection of hardware,
or they might be caused by other services.
This includes the crash of a service automatically
leading to its being started.
It was developed at Ubuntu
but it came to be used in many distributions, including
RHEL 6 and therefore CentOS and, less directly, Oracle
Linux and Scientific Linux.
Upstart is different from SysV init
, but
the differences are very small for the typical administrator.
Instead of a large /etc/inittab
specifying
several things, now that file has just one line specifying
the default target run level, initdefault
.
Instead of one configuration file, Upstart uses the collection
of significantly-named files in /etc/init/
.
/etc/init/rcS.conf
specifies
how to start the system.
It does this in a very familiar way, by running
/etc/rc.d/rc.sysinit
followed by
/etc/rc.d/rc
with the single parameter of
the target run level.
That is, as long as the system wasn't booted into
rescue or emergency mode, in which case it runs
/sbin/sulogin
to make sure it really is
the administrator at the keyboard and not someone doing
a simple console break-in, and then drops to a shell.
The text consoles are started by
/etc/init/start-ttys.conf
.
If you go to run level 5,
/etc/init/prefdm.conf
starts the graphical display manager.
If you passed the console=/dev/tty0
parameter to the kernel at boot time,
/etc/init/serial.conf
sets up a serial console line.
If you press
<Ctrl><Alt><Del>
on a text console,
/etc/init/control-alt-delete.conf
handles the task of rebooting.
Debian and Ubuntu Have Been A Little Different
In October 2014 Ubuntu went to
systemd,
but until then it and Debian used different script logic
and they configured services with sysv-rc-conf
instead of chkconfig
.
Instead of /etc/rc.d/rc.sysinit
,
Debian and Ubuntu run /etc/init.d/rcS
.
That in turn runs every script /etc/rcS.d/S*
in order.
Commands sysv-rc-conf
and update-rc.d
are used instead of chkconfig
.
It is probably easiest to see these by example.
See the run levels in which a given service is started:
Debian / Ubuntu # sysv-rc-conf --list # sysv-rc-conf --list apache Most other Linux distributions # chkconfig --list # chkconfig --list httpd BSD # more /etc/rc.conf /etc/rc.conf.local Solaris # svcs # svcs | grep httpd
Add/enable one service and delete/disable another after future boots:
Debian / Ubuntu # sysv-rc-conf apache on # sysv-rc-conf finger off Most other Linux distributions # chkconfig httpd on # chkconfig finger off BSD # vi /etc/rc.conf.local Solaris # svcadm enable network/httpd # svcadm disable network/finger
For all Linux distributions we have been able to stop, start, stop and restart, and sometimes take other actions simply by running the associated script with an appropriate parameter:
# /etc/init.d/httpd status # /etc/init.d/httpd restart # /etc/init.d/named reload # /etc/init.d/named status
However, all of that and much more changes with...
systemd
This is really different from what has come before. Lennart Poettering, the systemd author, provides a description of the systemd design goals and philosophy and then adds a later comparison of features. Also see the official systemd page at freedesktop.org.
It was the default in Mageia at least by early 2013, and became standard in Fedora around that time. By the end of 2013 a RHEL 7 beta release had appeared and it used systemd. By early 2014, Mark Shuttleworth announced that Ubuntu would also transition to systemd with Ubuntu 14.10 in October 2014.
Systemd uses many good ideas from Apple's launchd
,
introduced with macOS 10.4 and now also part of iOS.
However, systemd has its critics! See the boycott systemd page for a set of critiques, and see The World After Systemd for a project already planning for its demise.
To summarize the design:
Systemd Design Philosophy
Start only what's needed
It doesn't make sense to start the CUPS print service while everything else is trying to start. We're booting now, we'll print later. Start it on demand, when someone wants to print.
Similarly, for hardware-specific services like Bluetooth, only start those services when hardware has been detected and some process requests communication with it.
Start some daemons on demand.
For what you do start, aggressively parallelize it
Traditional SysV init
required a long sequence
of individual service starts.
Several early processes were needed by many other services,
the early ones had to fully start and their boot scripts
successfully terminate before the later ones could begin.
Notice the traditional use of boot scripts. Shell scripts are very good for rapid development, but they don't run fast. The script itself has to be created, and then everything it does requires the creation of further processes. This is made worse by the typical nesting of a boot script calling its helper script which in turn calls a number of configuration scripts.
Recode boot scripts in C, use binary executables as much as possible.
The CPU is the fastest component in the system, the disks are the slowest. The CPU must sit idle for many potentially useful cycles waiting for disk I/O. And, saying "the CPU" is a little old-fashioned, most systems have multiple CPU cores and we want to use them all in parallel. We want to aggressively parallelize the startup programs, but we don't want to coordinate actions by monitoring the file system.
Systemd can create sockets, then pass those sockets to daemon processes as they are started. This can be simplified and sped up by creating all needed sockets at once, and then starting all daemon processes at once. Get them all started and let them communicate among themselves as they come up. The sockets are maintained by systemd so if a daemon crashes, systemd restarts that daemon and programs that were communicating with the old daemon are still connected but now to the replacement.
Aggressively parallelize the startup by starting all daemons simultaneously and using sockets for inter-process communication to handle inter-service order dependencies.
There is more to it. Control Groups or cgroups are used to group related processes into a hierarchy of process groups, providing a way to monitor and control all processes of a group, including limiting and isolating their resource usage. When you stop the service, it will stop all the related processes.
Automounting can be used for all file systems other than the root file system, supporting encryption with LUKS, NFS and other network-based storage, LVM and RAID.
When Only-As-Needed Meets Parallelization
Let's say you boot a desktop system and it goes to its
default graphical boot target.
Log in, and see if you have any of the getty
family of programs running:
$ pgrep getty $ ps axuww | egrep 'PID|getty' USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 26717 0.0 0.0 108052 2020 pts/0 R+ 17:36 0:00 egrep --color=auto PID|getty
Probably not.
But you think that's odd, doesn't the login
program use something like mingetty
or
agetty
to handle command-line authentication
on a text-only console?
Let's check if those text consoles are really there
with Ctrl-Alt-F1, Ctrl-Alt-F2, Ctrl-Alt-F3, then back to X
with Ctrl-Alt-F1 (or F7, or F2, depending on the sequence
of events when the system started).
Yes, there were text login prompts waiting on those virtual consoles. Well, no, they weren't waiting, each was only started when you first switched to that virtual console. They're there now:
$ pgrep getty 22698 26727 $ ps axuww | egrep 'PID|getty' USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 22698 0.0 0.0 110012 1712 tty2 Ss+ 16:19 0:00 /sbin/agetty --noclear tty2 root 26727 0.0 0.0 110012 1640 tty3 Ss+ 17:37 0:00 /sbin/agetty --noclear tty3 root 26717 0.0 0.0 108052 2020 pts/0 R+ 17:36 0:00 egrep --color=auto PID|getty
See Lennart Poettering's description for more details.
Location and Components
It gets weird here,
there no longer is an /sbin/init
program!
You must either set up symbolic links, as seen here,
or else modify the boot loader to pass this option
to the kernel:
init=/lib/systemd/systemd
$ ls -l /usr/sbin/init /usr/bin/systemd /lib/systemd/systemd -rwxr-xr-x 1 root root 929520 Sep 22 12:26 /lib/systemd/systemd* lrwxrwxrwx 1 root root 22 Oct 6 01:37 /usr/bin/systemd -> ../lib/systemd/systemd* lrwxrwxrwx 1 root root 22 Oct 6 01:37 /usr/sbin/init -> ../lib/systemd/systemd*
Notice how some components are under /usr
,
just part of a general Linux trend of crucial components
moving under /usr
and making it impractical
for that to be a separate file system as it frequently
has been in UNIX tradition.
Beware that modern Linux systems typically have no real
/bin
,
/lib
,
/lib64
, or
/sbin
,
those are all symbolic links pointing to directories
in /usr
and so that must be part of the
root file system.
% ls -ld /bin /lib* /sbin lrwxrwxrwx 1 root root 7 Feb 3 16:45 /bin -> usr/bin/ lrwxrwxrwx 1 root root 7 Feb 3 16:45 /lib -> usr/lib/ lrwxrwxrwx 1 root root 9 Feb 3 16:45 /lib64 -> usr/lib64/ lrwxrwxrwx 1 root root 8 Feb 3 16:45 /sbin -> usr/sbin/
Systemd binaries are
located in /lib/systemd/systemd-*
,
with optional distribution-specific scripts in the
same directory.
The interesting parts are the task unit configuration files,
all of them under
/lib/systemd/system/
.
Units
Booting tasks are organized into units —
these include initializing hardware, mounting file systems,
creating sockets, and starting services that will daemonize
and run in the background.
Each of these task units is configured by a simple file
holding configuration information, these are sources of
information and not scripts to be run.
Their syntax is similar to things like kdmrc
,
the KDE display manager configuration file,
and therefore similar to Windows *.ini
files.
For example, here is the named.service
file,
specifying when and how to start the BIND DNS service:
[Unit] Description=Berkeley Internet Name Domain (DNS) Wants=nss-lookup.target Before=nss-lookup.target After=network.target [Service] Type=forking EnvironmentFile=-/etc/sysconfig/named Environment=KRB5_KTNAME=/etc/named.keytab PIDFile=/var/lib/named/var/run/named/named.pid ExecStartPre=/usr/sbin/setup-named-chroot.sh /var/lib/named on ExecStartPre=/usr/sbin/named-checkconf -t /var/lib/named -z /etc/named.conf ExecStart=/usr/sbin/named -u named -t /var/lib/named $OPTIONS ExecReload=/bin/sh -c '/usr/sbin/rndc reload > /dev/null 2>&1 || /bin/kill -HUP $MAINPID' ExecStop=/bin/sh -c '/usr/sbin/rndc stop > /dev/null 2>&1 || /bin/kill -TERM $MAINPID' ExecStopPost=/usr/sbin/setup-named-chroot.sh /var/lib/named off PrivateTmp=false TimeoutSec=25 [Install] WantedBy=multi-user.target
Unit Types
The file name indicates the type of that unit.
*.mount
files specify when and how
to mount and unmount file systems,
*.automount
files are for storage handled
by the automounter.
*.service
files handle services
that in the past were typically handled by scripts
in /etc/rc.d/init.d/
.
*.socket
files create sockets that
we be used by the associated service units.
*.path
files allow systemd to monitor
the specified files and directories through
inotify
; access in that path causes a service
start.
The CUPS printing service provides a simple example.
systemd watches for the appearance of a file named
/var/spool/cups/d*
,
which is what happens when you submit a print job.
The interesting difference from the old design is that there is no print service running until you submit a print job. Once started it persists, with both it and systemd monitoring the socket. When you submit a new print job, systemd sends out a log message "systemd[1]: Started CUPS Printing Service." But typically none is needed because the daemon is still running.
*.target
files define groups of units.
These are analogous to the run levels we saw in SysV and
Upstart, but you can have arbitrarily many of arbitrary
complexity.
(Actually that was true with SysV and Upstart but hardly
anyone did such a thing.)
$ cd /lib/systemd/system $ more cups.*:::::::::::::: cups.path ::::::::::::::
[Unit] Description=CUPS Printer Service Spool [Path] PathExistsGlob=/var/spool/cups/d* [Install] WantedBy=multi-user.target:::::::::::::: cups.service ::::::::::::::
[Unit] Description=CUPS Printing Service [Service] ExecStart=/usr/sbin/cupsd -f PrivateTmp=true [Install] Also=cups.socket cups.path WantedBy=printer.target:::::::::::::: cups.socket ::::::::::::::
[Unit] Description=CUPS Printing Service Sockets [Socket] ListenStream=/var/run/cups/cups.sock [Install] WantedBy=sockets.target
You can view the available targets with one command:
$ systemctl --type=target --all UNIT LOAD ACTIVE SUB JOB DESCRIPTION basic.target loaded active active Basic System cryptsetup.target loaded active active Encrypted Volumes emergency.target loaded inactive dead Emergency Mode final.target loaded inactive dead Final Step getty.target loaded active active Login Prompts graphical.target loaded active active Graphical Interface local-fs-pre.target loaded active active Local File Systems (Pre) local-fs.target loaded active active Local File Systems multi-user.target loaded active active Multi-User network.target loaded active active Network nfs.target loaded active active Network File System Client and nss-lookup.target loaded active active Host and Network Name Lookups nss-user-lookup.target loaded inactive dead User and Group Name Lookups printer.target loaded active active Printer remote-fs-pre.target loaded inactive dead Remote File Systems (Pre) remote-fs.target loaded active active Remote File Systems rescue.target loaded inactive dead Rescue Mode rpcbind.target loaded active active RPC Port Mapper shutdown.target loaded inactive dead Shutdown sockets.target loaded active active Sockets sound.target loaded active active Sound Card swap.target loaded active active Swap sysinit.target loaded active active System Initialization syslog.target loaded active active Syslog time-sync.target loaded active active System Time Synchronized umount.target loaded inactive dead Unmount All Filesystems LOAD = Reflects whether the unit definition was properly loaded. ACTIVE = The high-level unit activation state, i.e. generalization of SUB. SUB = The low-level unit activation state, values depend on unit type. JOB = Pending job for the unit. 26 loaded units listed. To show all installed unit files use 'systemctl list-unit-files'.
Directories named
servicename.target.wants
allow you to manually define dependencies between units.
For example, while some network services can handle
network interfaces that only appear after the network
service has started, the Apache web server needs to
have networking up and running before it starts.
Defining the Default Targets
/lib/systemd/system/default.target
defines the default target at boot time.
It is usually a symbolic link pointing to
multi-user.target
for a server or
graphical.target
for a workstation.
Note that /etc/systemd/system/default.target
can also exist and point to a unit file.
On my Mageia system, for example, that's a roundabout
way of getting to the same target:
/etc/systemd/system/default.target -> /lib/systemd/system/runlevel5.target
/lib/systemd/system/runlevel5.target -> /etc/systemd/system/graphical.target
You can override this default by passing a parameter
to the kernel at boot time, systemd will discover this in
/proc/cmdline
and override the default.
For example:
systemd.unit=runlevel3.target
or:
systemd.unit=rescue.target
Note that the traditional parameters from SysV and Upstart
can still be used,
1, s, S, single, 3, 5
.
Systemd maps those to the associated
runlevelX.target
definitions.
On my desktop system, the special target file
default.target
is a symbolic link pointing
to graphical.target
.
Leaving out the standard initial comment block, it contains
what we see here.
[Unit] Description=Graphical Interface Documentation=man:systemd.special(7) Requires=multi-user.target After=multi-user.target Conflicts=rescue.target Wants=display-manager.service AllowIsolate=yes [Install] Alias=default.target
Notice that it explicitly requires the
multi-user.target
unit,
and it will also require all the components in the
subdirectory default.target.wants
, although
that is empty on my system.
The reason for making servicename.target
be a directory of symbolic links is that you can easily add
and delete the "wants" without modifying the unit definition
file itself.
Going back one level to the multi-user target,
multi-user.target
has a requirement for
basic.target
.
Here's the content of multi-user.target
:
[Unit] Description=Multi-User Documentation=man:systemd.special(7) Requires=basic.target Conflicts=rescue.service rescue.target After=basic.target rescue.service rescue.target AllowIsolate=yes [Install] Alias=default.target
The multi-user.target.wants
directory contains
files defining these additional requirements:
dbus.service,
getty.target,
plymouth-quit-wait.service,
plymouth-quit.service,
rpcbind.target,
systemd-ask-password-wall.path,
systemd-logind.service,
systemd-user-sessions.service
Chasing it further back, basic.target
contains
what we see here, a requirement for the
sysinit.target
target:
[Unit] Description=Basic System Documentation=man:systemd.special(7) Requires=sysinit.target sockets.target After=sysinit.target sockets.target RefuseManualStart=yes
The basic.target.wants
directory adds these
requirements, restoring the sound service
and applying any distribution-specific scripts:
alsa-restore.service,
alsa-state.service,
fedora-autorelabel-mark.service,
fedora-autorelabel.service,
fedora-configure.service,
fedora-loadmodules.service,
mandriva-everytime.service,
mandriva-save-dmesg.service
And then sysinit.target
contains what we see here:
[Unit] Description=System Initialization Documentation=man:systemd.special(7) Conflicts=emergency.service emergency.target Wants=local-fs.target swap.target After=local-fs.target swap.target emergency.service emergency.target RefuseManualStart=yes
It has a larger list of added requirements in
sysinit.target.wants
:
cryptsetup.target,
dev-hugepages.mount,
dev-mqueue.mount,
kmod-static-nodes.service,
mandriva-kmsg-loglevel.service,
plymouth-read-write.service,
plymouth-start.service,
proc-sys-fs-binfmt_misc.automount,
sys-fs-fuse-connections.mount,
sys-kernel-config.mount,
sys-kernel-debug.mount,
systemd-ask-password-console.path,
systemd-binfmt.service,
systemd-journal-flush.service,
systemd-journald.service,
systemd-modules-load.service,
systemd-random-seed.service,
systemd-sysctl.service,
systemd-tmpfiles-setup-dev.service,
systemd-tmpfiles-setup.service,
systemd-udev-trigger.service,
systemd-udevd.service,
systemd-update-utmp.service,
systemd-vconsole-setup.service
Examining and Controlling System State With systemctl
List all active units (that is, units enabled and should have successfully run or still be running), showing their current status, paging through the results:
# systemctl list-units
List all target units, showing the collective targets
reached in the current system state.
This is broader than simply "the current run level"
as shown by the runlevel
command:
# systemctl list-units --type=target
List just those active units which have failed:
# systemctl --failed
List the units listening on sockets:
# systemctl list-sockets
List all available units, showing whether they are enabled or not:
# systemctl list-unit-files
Display the dependency tree for a service.
Service names are something like named.service
but they can be abbreviated by leaving off
.service
.
# systemctl list-dependencies named
Generate the full dependency graph for all services. This will be enormous and not very useful to most people. View it with Chrome or similar.
# systemd-analyze dot | dot -Tsvg > systemd.svg
Start, stop, restart, reload the configuration,
and report the status of one or more service.
These are like the corresponding /etc/init.d/*
boot scripts, with the addition of the inter-process
communication and automated dependency satisfaction.
Use show
for far more information on that service.
You will notice that the first time you check the
status
for a service it will probably take a noticeable
amount of time.
This is because it is checking the journal,
another powerful but complex addition that comes with
systemd.
More on that below...
# systemctl stop named dhcpd # systemctl start named dhcpd # systemctl restart named # systemctl reload named # systemctl is-active named # systemctl status named # systemctl show named
Disable and enable a service for use in the future.
These are like the corresponding chkconfig
commands.
# systemctl disable named # systemctl enable named
Check the current system configuration for the default
target run state, then change it to
newtarget
.
# systemctl get-default # systemctl set-default newtarget
Make major changes in system state:
# systemctl reboot # systemctl halt # systemctl poweroff
What About /etc/rc.d/rc.local?
Here's a common question:
How do I get /etc/rc.d/rc.local
to work
under systemd?
Maybe you're like me, you have written your own
iptables
firewall script or some other
locally developed programs you want to run at the end
of the booting process.
Well, maybe it already works.
See the example
/lib/systemd/system/rc-local.service
systemd service file here.
A comment refers to
/lib/systemd/system-generators/systemd-rc-local-generator
,
which is one of those fast-running binaries.
All I have to do is create an executable script named
/etc/rc.d/rc.local
, and the next time the
system boots, that script is run.
Otherwise, see if you have an rc-local.service
unit and enable it if needed.
If you don't have an rc-local.service
file, create one similar to what you see here and enable it:
systemctl enable rc-local.service
Maybe you want to tinker a little, use
/etc/rc.local
directly and leave out
the rc.d
subdirectory.
Or use
ConditionPathExists
instead of
ConditionFileIsExecutable
.
Have fun!
# This file is part of systemd. # # systemd is free software; you can redistribute it and/or modify it # under the terms of the GNU Lesser General Public License as published by # the Free Software Foundation; either version 2.1 of the License, or # (at your option) any later version. # This unit gets pulled automatically into multi-user.target by # systemd-rc-local-generator if /etc/rc.d/rc.local is executable. [Unit] Description=/etc/rc.d/rc.local Compatibility ConditionFileIsExecutable=/etc/rc.d/rc.local After=network.target [Service] Type=forking ExecStart=/etc/rc.d/rc.local start TimeoutSec=0 RemainAfterExit=yes SysVStartPriority=99
What is left in /etc/rc.d?
Specifically, what about the directories
/etc/rc.d/init.d/
and /etc/rc.d/rc?.d/
— do they
still contain scripts and symbolic links?
Not much!
But what remains does work.
You can run the scripts in init.d
and systemd will run the scripts in rc3.d
or
rc5.d
when going to the
multi-user or graphical target, respectively.
Writing your own service scripts
See these:
Writing systemd service files
How to write a startup script for systemd
Simplified "Phrase Book" Comparison of SVR4 init, Upstart, and systemd
There's much more to it than this, but here's what an administrator sees day-to-day:
SVR4 init
on CentOS/RHEL 5:
One file /etc/inittab
configures the
init
program as to what run level to enter
by default and what it takes to get there.
Other than starting multiple virtual consoles with text login
in run levels 3 and 5, and starting a graphical login in
run level 5, it says to use the directory
/etc/rc[0-6].d/
corresponding to the
target run level.
That directory will contain symbolic links
pointing to the collection of boot scripts in
/etc/init.d/
.
Each link has the same name as the actual script, preceded
with either K
(to kill) or S
(to start) and a two-digit number to impose order.
You use the chkconfig
program to enable or
disable services, it reads specially coded comments in the
comment block at the top of the boot script to specify
which run levels to start and stop the service and at
what numerical order position.
You directly run the boot script
/etc/init.d/servicename
to stop, start, or restart it right now.
Upstart on CentOS/RHEL 6:
Very similar to SVR4 init
as far as
configuration and operation goes.
The exception is that /etc/inittab
is now almost empty.
Its functionality has been expanded and moved into the files
/etc/sysconfig/init
and /etc/init/*
.
Systemd on CentOS/RHEL 7 and 8:
This is very different!
Instead of run levels, in which only 1 (maintenance or rescue),
3 (text-only, server) and 5 (graphics, workstation) are
useful, it uses "targets".
The commonly used one correspond to the traditional run
levels 3 and 5, but you can boot or transition into any
combination of the targets found in
/lib/systemd/system/*.target
.
Only a few boot scripts remain in /etc/init.d/
.
You use the program systemctl
to query the
current overall system state, to query the state of
individual services, to control a service right now,
and to enable or disable it for the future.
Simplified "Phrase Book" of Equivalent Commands
What run state are we in?
What services were started/stopped to get here, and with what order dependencies?
init
, Upstartrunlevel ls /etc/rcN.d
systemctl get-default systemctl systemctl -a systemctl list-dependencies systemctl list-sockets systemctl status crond sshd httpd ...
What is the default run state if the system is simply rebooted?
init
, Upstartgrep initdefault /etc/inittab
systemctl get-default
What is the default run state if the system Change the default run state to
newtarget
.
init
, Upstartvim /etc/inittab
systemctl set-default newtarget
What services are available? Of the available services, which are enabled and disabled?
init
, Upstartls /etc/rc.d/init.d chkconfig --list
systemctl list-unit-files
Stop service xyz.
Start service xyz.
Stop and restart service xyz.
Signal service xyz to re-read its configuration file.
init
, Upstart/etc/init.d/xyz stop /etc/init.d/xyz start /etc/init.d/xyz restart /etc/init.d/xyz reload
systemctl stop xyz systemctl start xyz systemctl restart xyz systemctl reload xyz
Enable service xyz to automatically start at the next boot.
Disable service xyz to not automatically start at the next boot.
init
, Upstartchkconfig --add xyz chkconfig xyz on chkconfig --levels 345 xyz on chkconfig --del xyz chkconfig xyz off
systemctl enable xyz systemctl disable xyzSystemd will automatically enable services that
xyz
depends upon.
What is involved in service
xyz
?
A short description, what it needs to run before it, what else wants this to run before it can, is it running now or stopped now, since when, if running what's it PID, and far more?
init
, Upstartmore /etc/init.d/xyz ls /etc/rc$(runlevel | awk '{print $2}').d/ /etc/init.d/xyz status grep xyz /var/log/messages ls /var/run/xyz cat /var/run/xyz ps axuww | egrep 'PID|xyz'Oof!
You would have to do all of these, plus many more, plus do some careful analysis of all of the output, to get everything you can get from the one systemd command.
This is an area where systemd has an advantage.
systemctl show xyz
Halt or reboot the system.
init
, Upstartinit 0 halt poweroff shutdown -h now -t 0 init 6 reboot shutdown -r now -t 0
systemctl halt systemctl poweroff systemctl reboot
Change to another run state
init
, Upstartinit 1 init 3 init 5
systemctl isolate rescue.target systemctl isolate multi-user.target systemctl isolate graphical.target
The system is shut down, boot it into a non-default run state (typically used for rescue or maintenance.
init
, UpstartInterrupt the boot loader's countdown timer and modify the line that will be passed to the kernel. Add the desired target state to the end — 1, 3, or 5 for SVR4
init
or Upstart;
rescue,
multi-user,
or graphical
for systemd
(1, 3, and 5 will
probably work, but don't count on it).
The kernel's command line at the last boot is
kept in /proc/cmdline
.
Smaller Process Trees
With many startup tasks now done by one binary executable instead of a script that spawned many child processes, including other scripts which may have called other scripts, fewer processes were spawned to bring the system up.
The aggressive parallelization means a flatter tree of processes.
Here is part of the process tree on CentOS 5 with
SysV init
:
init(1)-+-acpid(1850) |-atd(2290) |-crond(2100) |-cupsd(1935) [ ... ] |-gdm-binary(2401)---gdm-binary(2441)-+-Xorg(2446) | `-tcsh(2460,cromwell)-+-ssh-agent(2496) | `-startkde(2506)---kwrapper(2572) [ ... ] |-kdeinit(2559,cromwell)-+-artsd(2586) | |-autorun(2677) | |-bt-applet(2691) | |-eggcups(2591) | |-kio_file(2582) | |-klauncher(2564) | |-konqueror(2598) | |-konsole(2602)-+-tcsh(2705) | | |-tcsh(2707)---su(2854,root)---bash(2922) | | `-tcsh(2712) | |-kwin(2575) | |-nm-applet(2663) | |-pam-panel-icon(2590)---pam_timestamp_c(2592,root) | |-xload(2664) | |-xmms(2638)-+-{xmms}(2678) | | `-{xmms}(2786) | |-xterm(2593)---tcsh(2603) | |-xterm(2596)---tcsh(2606) | |-xterm(2597)---tcsh(2608)---ssh(3251) | `-xterm(2637)---bash(2640)-+-grep(2645) | |-grep(2646) | `-tail(2644) [ ... ] |-ntpd(2022,ntp) |-sendmail(2061) |-sendmail(2070,smmsp) |-smartd(2387) |-syslogd(1653) |-udevd(418) |-watchdog/0(4) |-xfs(2153,xfs) `-xinetd(2001)
Compare that to this process tree from Mageia with systemd. Shells and other processes aren't as deep:
$ pstree -pu | less systemd(1)-+-acpid(695) |-agetty(3006) |-atd(672,daemon) [ ... ] |-kmix(3278,cromwell)---{kmix}(3676) |-knotify4(3241,cromwell)---{knotify4}(3242) |-konsole(3288,cromwell)-+-tcsh(3455)-+-audacious(6294)-+-{audacious}(6295) | | | |-{audacious}(6298) | | | |-{audacious}(6300) | | | |-{audacious}(6310) | | | `-{audacious}(6418) | | |-less(6463) | | `-pstree(6462) | |-tcsh(12198)---vim(5903)---{vim}(5904) | `-{konsole}(3453) [ ... ] |-named(2365,named)-+-{named}(2366) | |-{named}(2367) | |-{named}(2368) | |-{named}(2369) | |-{named}(2370) | `-{named}(2371) |-ntpd(2227,ntp) |-plasma-desktop(3244,cromwell)-+-ksysguardd(3262) | |-{plasma-desktop}(3245) | |-{plasma-desktop}(3246) | |-{plasma-desktop}(3256) | |-{plasma-desktop}(3261) | `-{plasma-desktop}(3263) [ ... ] |-rpcbind(1683,rpc) |-rsyslogd(697)-+-{rsyslogd}(763) | |-{rsyslogd}(764) | |-{rsyslogd}(765) | `-{rsyslogd}(766) |-ssh-agent(2847,cromwell) |-sshd(1697) |-start_kdeinit(3200,cromwell) |-systemd-journal(380) |-systemd-logind(677) |-systemd-udevd(384) |-tor(2041,toruser) |-udisks-daemon(679)-+-udisks-daemon(683) | |-{udisks-daemon}(769) | `-{udisks-daemon}(817) |-udisksd(3217)-+-{udisksd}(3218) | |-{udisksd}(3220) | `-{udisksd}(3222) |-upowerd(699)-+-{upowerd}(767) | `-{upowerd}(770) `-xosview(3799,cromwell)
The Journal and journalctl
You probably noticed that
systemctl status servicename
took a while the first time you ran it.
And you may have stumbled across that large and possibly
mysterious /var/log/journal/
directory.
This is the systemd journaling system.
The systemd journal captures log information even when the
rsyslog
daemon isn't running, and stores it
in a form that requires the use of the
journalctl
command.
A unique machine ID was created during the installation,
it is a 16-byte or 128-bit string recorded in ASCII as
hexadecimal in /etc/machine-id
.
That machine ID is used as a subdirectory in which the
journal files are stored.
For example:
# cat /etc/machine-id 3845e210bd0d4dc5b2e5f5fd8fdc6f01 # find /var/log/journal -type d /var/log/journal /var/log/journal/3845e210bd0d4dc5b2e5f5fd8fdc6f01
The journal files are all owned by root
and
associated with group adm
or
systemd-journal
.
Put a user in both groups to ensure they can read the
journal with journalctl
.
The systemd-journald
manual page explains that
you can grant read access to all members of groups
adm
and wheel
for all journal
files existing now and created in the future:
# setfacl -Rnm g:wheel:rx,d:g:wheel:rx,g:adm:rx,d:g:adm:rx /var/log/journal/
Worries About Size and Compliance
On the one hand, you are likely to worry about all this
journal data filling your file system.
Don't worry — by default it will use no more than 10%
of the file system and keep at least 15% free.
See the manual page for journald.conf
to see
how to adjust that in /etc/systemd/journald.conf
.
If regulatory compliance requires you to retain log
information, you should worry about collecting and
archiving this information before its older content
is automatically trimmed away.
See the manual page for journalctl
to see
how to have a scheduled job extract the past day.
For example,
run this script via cron
after
midnight every night to capture all events from
midnight to midnight from the day before.
Log output tends to be very redundant and compress down to
about 5% of its original size with xz
:
#!/bin/sh # Create an archive if this is the first run ever. ARCHIVE=/var/log/journal-archive mkdir -p ${ARCHIVE} cd ${ARCHIVE} # Capture yesterday's events. File name will include the # host on which this was done plus yesterday's date in # YYYY-MM-DD format. Then compress it: HOST=$( hostname ) DATE=$( date --date=yesterday "+%F" ) journalctl --since=yesterday > journal-${HOST}-${DATE} xz journal-${HOST}-${DATE}
Useful journalctl Techniques
See the manual page for journalctl
for full
details.
You can accomplish these types of things with
rsyslog
data, but only with possibly
complicated grep
or awk
commands
based on some initial investigation into just when boot
events happened.
The journalctl
command makes these much
easier.
Some handy commands include:
See just the kernel events logged since the most recent boot:
# journalctl -k -b -0
Or, all logged events since the most recent boot:
# journalctl -b -0
Or, all logged events within the run before
this most recent boot.
For example, you rebooted this system some time yesterday
afternoon and again this morning, and you want to see
all events between those two reboots.
This would require some initial investigation and then
some complex grep
strings using
rsyslogd
data only:
# journalctl -b -1
Just the logged events for one systemd unit, or for two (or more):
# journalctl -u named # journalctl -u named -u dhcpd
Or, for just these three units since the last boot:
# journalctl -u named -u dhcpd -u httpd -b -0
Or, to emulate
tail -f /var/log/messages
# journalctl -f
journalctl or Rsyslog or both?
With journalctl
capturing all local events,
even those when the Rsyslog daemon isn't running,
do we still need to run rsyslogd
?
You very likely do want to also run
rsyslogd
and it's easily set and imposes
very little additional overhead.
A UNIX socket is created by systemd and rsyslogd
will listen to it by default, capturing all messages (whether
it saves them, and if so, where, is entirely up to your
configuration of rsyslogd
).
# ls -l /run/systemd/journal/syslog srw-rw-rw- 1 root root 0 Feb 9 16:28 /run/systemd/journal/syslog= # file /run/systemd/journal/syslog /run/systemd/journal/syslog: socket # lsof /run/systemd/journal/syslog COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME systemd 1 root 25u unix 0xffff880234b2b800 0t0 1730 /run/systemd/journal/syslog rsyslogd 787 root 3u unix 0xffff880234b2b800 0t0 1730 /run/systemd/journal/syslog
journalctl
is very nice for querying the
existing journal, but rsyslogd
can still
do some things that the journal cannot.
Centralized logging has a number of advantages. One is security, the integrity and availability of the log data. Yes, Forward Secure Sealing can periodically "seal" journal data to detect integrity violation, but I would feel better about having critical log data stored on a dedicated, hardened remote rsyslog server.
Rsyslog can enforce host authentication and data confidentiality and integrity through TLS, see my how-to page for the details.
Also, with all the log data in one place you're immediately ready to apply a log analysis package like Splunk or ArcSight.
So, for me, systemd journal plus Rsyslog makes sense.