UNIX / Linux keyboard.

How Linux Boots, Run Levels, and Service Control

How Linux Boots

"How does Linux boot?" That's a very important question! Of course you need to understand that for troubleshooting, to handle situations where it doesn't boot, or doesn't do it correctly or get all the needed services started. But you need to understand this for routine administration. You need to control which services are started, and handle dependencies where one service must be running before a second can be started. The answer to the question is complex, because there are so many choices along the way. The firmware on the platform, which boot loader you are using, which init system you are using, the details depend on these choices and more. Let's start with a simple explanation.

Turn it on, wait a few moments, start doing powerful things.

Maybe that's all you care to know. But maybe you want some details. At its very simplest, Linux is booted by this sequence:

Going just slightly deeper, we have choices for the firmware and boot loader. On Linux's traditional platform derived from the IBM PC architecture, the firmware has been the very limited BIOS but is starting to also be UEFI. The boot loader was once LILO, then GRUB, and now GRUB2.

The boot loader also tells the kernel how to find and load an initial RAM-based disk image providing device drivers for physical disk controllers as well as some initial scripts.

The init program continues to evolve in capability and complexity, from early BSD-like systems through a SVR4-style init, then Upstart, and now systemd.

This has been just the briefest of overviews. Continue to learn the details!

The following goes through the booting and service start and control steps in the sequence in which they happen, attempting to cover all the possibilities at each step.

Much of what follows will be needed to understand how to migrate from one Linux distribution to another, and how to upgrade across major releases of one distribution.

Kernel Space versus User Space

The Linux kernel runs directly on the hardware, using physical addresses and accessing the hardware on behalf of user processes while enforcing access permissions.

The work you accomplish on the computer is done by your processes, which were created out of system-owned processes when you logged in. All of these processes are descendants of init, one master process started by the kernel early in the boot process. The init process manages the system state by requesting the creation and termination of other processes.

The kernel first detects enough hardware to find and mount the root file system. It then starts the init program, which manages the system state and all subsequent processes. The only process the kernel knows should run is init. Once init has started, the kernel enforces permissions and manages resource utilization (memory, processing priority, etc) and may prevent or restrict some processes. For the positive side, it's init and its descendants that request the creation of new processes.

The Kernel Boot Loader

Firmware

The system firmware selects some media and attempts to boot the operating system stored there. Selecting, loading, and starting the OS kernel might be done by the firmware itself when it is a small operating system of its own, as in the case of OpenBoot (later called Open Firmware), developed by Sun for the SPARC platform, or the Alpha SRM developed by DEC.

For motherboards with AMD/Intel processors using BIOS or UEFI, the firmware finds a "stub" of the boot loader in a disk partition, which then calls successively more capable components to get to the kernel.

Firmware Finds the Boot Loader

BIOS

The BIOS firmware selects the bootable device, scanning the attached storage in an order specified by the bus and controller scan order as well as the BIOS configuration. It is looking for a device starting with a 512-byte block which ends with the boot signature 0x55AA. The first 446 bytes of the boot block hold the boot loader, followed by 64 bytes for the partition table, then those final two bytes of the boot signature.

446 bytes of program code doesn't provide much capability! It will be just a boot loader "stub" which can find and start a more capable boot loader within a partition.

UEFI

UEFI firmware initializes the process, memory, and peripheral hardware such as Ethernet, SATA, video, and other interfaces. An interface can have its own firmware code, sometimes called Option ROM, which initializes that peripheral. UEFI can check those Option ROMs for embedded signatures which can appear on the UEFI's "Allowed" and "Disallowed" lists.

UEFI has a "boot manager", a firmware policy engine configured by a set of NVRAM variables. It will look for a GPT or GUID Partition Table with GUID C12A7328-F81F-11D2-BA4B-00A0C93EC93B, the distinctive signature of an EFI boot partition in a GPT device. If it can't find that, it will look for a traditional MBR partition of type 0xEF. This is the EFI System partition. It will usually be the first partition on that disk.

The EFI System partition contains a small FAT or FAT32 file system. Considering that file system to be /EFI, the firmware looks inside that for a program /EFI/BOOT/BOOTX64.EFI. The EFI System partition will usually be mounted as /boot/efi after booting so its content will be accessible.

The Boot Loader Starts

BIOS-MBR

On a BIOS-MBR system, that 446-byte boot loader "stub" was, in the past, LILO. However, LILO relied on physical addresses into the disk and was very sensitive to disk geometry and reconfigurations. You frequently had to rescue a system by booting from other media and then recreating the LILO boot loader block.

GRUB is a much better solution for BIOS-MBR. That 446-byte block is the GRUB "stub", which can be recovered from the first 446 bytes of a file in /boot/grub. In legacy GRUB this is /boot/grub/stage1, in GRUB 2 this is /boot/grub2/i386-pc/boot.img.

Encoded into the stage 1 GRUB loader is a definition of where to find the small /boot file system. This is typically the first partition of the first disk.

The boot loader will need to read the file system in the boot partition. Legacy GRUB used the files /boot/grub/*stage1_5, helper modules for the various file systems that could be used for the /boot file system — e2fs_stage1_5, xfs_stage1_5, and others.

The boot block of a disk is sector 0. For legacy reasons, the first partition of a disk does not begin until sector 63, leaving a gap of 62 sectors. The GRUB 2 component core.img is written into this gap. It plays the role of the legacy GRUB *_stage1_5 modules.

Now that GRUB (either legacy or GRUB 2) can read the /boot file system, it can run the final stage GRUB boot loader to do the complex work. In legacy GRUB this is /boot/grub/stage2, configured by either /boot/grub/grub.conf or /boot/grub/menu.lst. Both of those usually exist, one as a normal file and the other as a symbolic link pointing to it. In GRUB 2 this is /boot/grub2/i386-pc/kernel.img, configured by /boot/grub2/grub.cfg.

UEFI-GPT

On a UEFI-GPT system, the firmware has found the UEFI System partition, which must hold a FAT or FAT32 file system, and is ready to run its /EFI/BOOT/BOOTX64.EFI. That will load /EFI/BOOT/grub.efi, which is a component of the GRUB 2 boot loader. If the firmware has been configured to enforce secure boot, then shim.efi has been signed by the UEFI signing service and it chain loads grub.efi.

This GRUB "stub" can then use the core.img component to read whichever file system type was used for the boot partition, and then start the kernel.img program which reads its configuration from /boot/grub2/grub.cfg.

The User Gets a Choice

The boot loader can present a menu to user, typically a choice of various kernels, or the menu can be hidden and require the user to realize that the <Escape> key must be pressed within a few seconds. The menu may time out after a specified number of seconds, or it may wait until the user makes a selection.

The user can also edit the selected boot option, useful for maintenance and rescue. For example, specifying a boot to single-user mode to repair some problems.

The Kernel

On some hardware, the Alpha for example, the kernel file is simply a compressed copy of the compiled vmlinux kernel file.

On x86 series platforms, the kernel file starts with a 512-byte boot block, then a secondary boot loader block, and then the compressed kernel image.

The boot loader does what is needed to extract and decompress the kernel into RAM and then turn control over to it.

Finding the File System

The boot loader, be it GRUB or the mainboard firmware, tells the kernel where to find its primary or root file system. The problem is that that file system might be stored in any of several different complicated ways, many of them requiring kernel modules that can't all be simultaneously compiled into the monolithic kernel core.

For example, the root file system might be on NFS, and storage devices might be connected through iSCSI or AoE or FCoE (that is, SCSI over IP, or ATA over Ethernet, or Fiber Channel over Ethernet), any of that requiring the network interface to be detected, enabled, and used in a certain way.

Or on a logical volume, which means that the underlying physical volumes and volume group must be detected.

Or, the root file system might be encrypted with the LUKS and dm-crypt components, requiring using those kernel modules and asking the user to enter a passphrase.

It doesn't have to be as complicated as any of those. SCSI controllers and SATA interfaces are based on a wide range of chip sets, each requiring one of several different kernel modules.

How can you load a kernel module (or device driver) used to interact with a device, when that module is stored on a disk controlled through the very device we're currently unable to interact with? You can't. The solution is...

Initial RAM Disk Image

The boot loader will tell the kernel how to find an initial RAM disk image stored in the boot file system.

The traditional method has been to build an initrd file. This is a file system containing those kernel modules along with binaries, shared libraries, scripts, device-special files, and everything else needed to get the system to the point it can find and use the real root file system. That resulting file system has been archived into a single file with cpio and then compressed with gzip.

In the initrd design, this image is uncompressed and extracted into /dev/ram which is then mounted as a file system. It once executed the script /linuxrc in that file system, now it executes the script /init.

In the initramfs design, the dracut tool is used to create the file (which may still be called initrd-release) while including its own framework from /usr/lib/dracut or similar.

The initrd design uses a script, linuxrc, while dracut uses a binary. The goal is to speed this part of the boot process, to quickly detect the root file system's device(s) and transition to the real root file system.

In either case, that /init script will find and mount the real file system, so the real init program can be run.

Exploring Your Initial RAM Disk Image

For a listing, simply try this. Change release to the appropriate value:

# lsinitrd /boot/initrd-release

To extract and explore a copy, do this:

# mkdir /tmp/explore
# cd /tmp/explore
# zcat /boot/initrd-release | cpio -i
# ls -l
# tree | less

You will find that there is a /init file, but it is a shell script. Directories /bin and /sbin contain needed programs while /lib and /lib64 contain shared libraries. Three devices are in /devconsole, kmsg, and null. Some configuration is in /etc, mount points exist for /proc, /sys, and /sysroot. A /tmp directory is available.

Finding the Actual Root File System

But out of the multiple file systems available on multiple devices, combinations of devices, and network abstractions, how does the kernel know which is to be mounted first as the real /, the real root file system?

Well, how do you want to accomplish that?

Some root= directive is passed to the kernel by the boot loader. This can be the device name:
root=/dev/sda4
or a label embedded in the file system header:
root=LABEL=/
or the manufacturer and model name and its partition as detected by the kernel and listed in /dev/disk/by-id:
root=/dev/disk/by-id/ata-WDC_WD20EARS-00MVWB0_WD-WMAZA3949022-part1
or the PCI bus address and path through the controller:
/dev/disk/by-path/pci-0000:00:02.1-usb-0:1:1.0-scsi-0:0:0:0.part1
or a UUID embedded in the file system header:
root=root=UUID=12e71ecd-833d-45ea-adfd-1eca8c27d912

Of these choices:

Continuing Critiques of the Boot Sequence

This boot sequence is overly complicated. The system passes through three environments with similar functionality before boot: EFI, GRUB, and Dracut. All of them involve drivers. EFI and Dracut have a small shell and scripting, and GRUB has an editable menu.

The 3.3 kernel added the capability for the EFI stub to directly load the kernel.

The Kernel Found the Root File System, Now What?

Once the kernel has started and it has discovered enough hardware to mount the root file system, it searches for a master user-space program which will control the state of the operating system itself and also manage the processes running on the operating system, both system services and user processes.

This master process is the init program, or at least we will find that the kernel expects that to be the case. The init program runs the appropriate set of boot scripts, based on its own configuration and that of the collection of boot scripts. It is then the ancestor of all other processes, cleaning up and freeing their resources if their parent process has lost track of them ("reaping zombie processes" in the UNIX parlance).

static int run_init_process(const char *init_filename)
{
        argv_init[0] = init_filename;
        return do_execve(init_filename,
                (const char __user *const __user *)argv_init,
                (const char __user *const __user *)envp_init);
}

[ ... ]

if (!run_init_process("/sbin/init") ||
    !run_init_process("/etc/init") ||
    !run_init_process("/bin/init") ||
    !run_init_process("/bin/sh"))
        return 0;

panic("No init found.  Try passing init= option to kernel. "
      "See Linux Documentation/init.txt for guidance.");

The kernel and init

The Linux kernel starts the init program. The kernel source code contains a block of code in init/main.c that looks for the init program in the appropriate place, the /sbin directory. If it isn't there, the kernel then tries two other locations before falling back to starting a shell. If it can't even do that, then the boot loader seems to have misinformed the kernel about where the root file system really is. Maybe it specified the wrong device, or maybe the root file system is corrupted or simply missing a vital piece.

The kernel does all the kernel-space work: interacting directly with hardware, managing running processes by allocating memory and CPU time, and enforcing access control through ownership and permissions.

init handles the user-space work, at least initially. It runs a number of scripts sometimes called boot scripts as they handle the user-space part of starting the operating system environment, and sometimes called init scripts because they're run by init.

Linux kernel, init process, boot scripts, and user processes.

The first of these boot scripts has traditionally been /etc/rc.sysinit. It does basic system initialization, checking the root file system with fsck if required, checking and mounting the other file systems, and loading any needed kernel modules along the way.

Other boot scripts usually start both local and remote authentication interfaces. This includes the local command-line interface with mingetty and possibly a graphical login on non-servers with some X-based display manager; plus remote access with the SSH daemon.

Like all the other boot scripts, the user authentication interfaces are started as root so the user authentication and subsequent calls to setuid() and setgid() can run. User-owned processes then do the work within each session.

Boot scripts can start services that more safely run with lower privileges. For example, the Apache web server must start as root so it can open TCP port 80. But it then drops privileges through the setuid() and setgid() system calls and continues running as an unprivileged user named apache or httpd or similar.

The components drawn in red can directly access the hardware: the firmware, the boot loader, and the kernel.

Those drawn in black are owned by root: the init process, the boot scripts, and the daemons they spawn. These should be persistent, running until there is a change in system state, possibly shutting down to halt or reboot.

Those drawn in green are owned by unprivileged users: the user's login session and its processes, and possibly some daemonized non-root services like a web server. User processes run until they finish or are terminated.

Where Are the Scripts?

This can be confusing, as the scripts may be directly in the directory /etc/ or maybe in its subdirectory /etc/rc.d/. This is made more confusing by the presence of symbolic links which mean that — most of the time, anyway — either naming convention works. You frequently find something like this:

$ ls -lFd /etc/init.d /etc/rc*
lrwxrwxrwx  1 root root   11 Feb  3 16:47 /etc/init.d -> rc.d/init.d/
drwxr-xr-x 11 root root 4096 Feb  9 16:51 /etc/rc.d/
lrwxr-xr-x 11 root root   13 Feb  9 16:48 /etc/rc.local -> rc.d/rc.local
lrwxr-xr-x 11 root root   15 Feb  9 16:48 /etc/rc.sysinit -> rc.d/rc.sysinit
lrwxrwxrwx  1 root root   10 Feb  3 16:58 /etc/rc0.d -> rc.d/rc0.d/
lrwxrwxrwx  1 root root   10 Feb  3 16:58 /etc/rc1.d -> rc.d/rc1.d/
lrwxrwxrwx  1 root root   10 Feb  3 16:58 /etc/rc2.d -> rc.d/rc2.d/
lrwxrwxrwx  1 root root   10 Feb  3 16:58 /etc/rc3.d -> rc.d/rc3.d/
lrwxrwxrwx  1 root root   10 Feb  3 16:58 /etc/rc4.d -> rc.d/rc4.d/
lrwxrwxrwx  1 root root   10 Feb  3 16:58 /etc/rc5.d -> rc.d/rc5.d/
lrwxrwxrwx  1 root root   10 Feb  3 16:58 /etc/rc6.d -> rc.d/rc6.d/
lrwxrwxrwx  1 root root   10 Feb  3 16:58 /etc/rcS.d -> rc.d/rcS.d/
$  ls -lF /etc/rc.d/
total 64
drwxr-xr-x 2 root root  4096 Feb  3 21:52 init.d/
-rwxr-xr-x 1 root root   220 Feb  3 16:51 rc.local*
-rwxr-xr-x 1 root root 10707 Feb  3 16:52 rc.sysinit*
drwxr-xr-x 2 root root  4096 Feb  9 16:25 rc0.d/
drwxr-xr-x 2 root root  4096 Feb  9 16:25 rc1.d/
drwxr-xr-x 2 root root  4096 Feb  9 16:25 rc2.d/
drwxr-xr-x 2 root root  4096 Feb  9 16:25 rc3.d/
drwxr-xr-x 2 root root  4096 Feb  9 16:25 rc4.d/
drwxr-xr-x 2 root root  4096 Feb  9 16:25 rc5.d/
drwxr-xr-x 2 root root  4096 Feb  9 16:25 rc6.d/
drwxr-xr-x 2 root root  4096 Feb  9 16:25 rc7.d/
lrwxrwxrwx 1 root root     5 Feb  3 16:47 rcS.d -> rc1.d/

Development of init through several major versions

BSD Unix uses a very simple init scheme using one master boot script (which calls other scripts to start services), configured by one file specifying which services to enable and with one additional script optionally adding other boot-time tasks.

System V Unix added the concept of run levels, multiple target states for the running system. Each is defined as a collection of started/stopped states for services.

Upstart is a modification of the SysV method. The first thing the administrator notices it that init configuration has changed from a single file to a collection of files. More significantly in the long run, Upstart adds support for dependencies between service components, automatically restarting a crashed service, and the ability for events to trigger starting or stopping services.

systemd is the biggest change yet. A system can boot or otherwise transition to multiple simultaneous targets. Aggressive parallelization yields fast state transitions and a very flat process tree, The dependency support promised in Upstart is delivered in systemd.

We'll look at these one at a time.

/*
 * List of paths to try when searching for "init".
 */
static char *initpaths[] = {
	"/sbin/init",
	"/sbin/oinit",
	"/sbin/init.bak",
	NULL,
};

BSD-style /etc/rc

The OpenBSD kernel also starts init. Here is a block of code from /usr/src/sys/kern/init_main.c showing that init must be in an obvious place.

The BSD init uses a simple boot script configuration that used to be used in some Linux distributions such as Slackware. But no mainstream Linux distributions does it this way now.

The BSD style init program brings up the system by running the /etc/rc script. That's it — rc uses a few other scripts, but it's a simple and efficient design.

The configuration script /etc/rc.conf sets a number of standard parameters for available services. You then modify /etc/rc.conf.local to turn on and modify the parameters of services on your system. rc runs rc.conf and then rc.conf.local. For example, rc.conf says don't run a the Network Time Protocol or start a web server by default, it contains these lines:

ntpd_flags=NO    # for normal use: ""
httpd_flags=NO   # for normal use: ""

But then you might customize your system with these changes, in rc.conf.local, turning on both NTP and HTTP and disabling the chroot() capability of Apache:

ntpd_flags=""
httpd_flags="-u"

The individual services are started by scripts in /etc/rc.d/* called by rc.

Then almost at the very end of the master boot script rc, it calls /etc/rc.local. The only things done after that are starting some hardware monitoring daemons, the cron daemon, and possibly the simple X display manager xdm if you asked for that during the installation. This gives you a place to add some customization. My rc.local contains this:

#       $OpenBSD: rc.local,v 1.44 2011/04/22 06:08:14 ajacoutot Exp $

# Site-specific startup actions, daemons, and other things which
# can be done AFTER your system goes into securemode.  For actions
# which should be done BEFORE your system has gone into securemode
# please see /etc/rc.securelevel.
## ADDED BELOW HERE #################################################
echo "Starting KDM"
( sleep 5 ; /usr/local/bin/kdm ) &
echo "Saving kernel ring buffer in /var/log/dmesg"
dmesg > /var/log/dmesg
echo "Starting smartd to monitor drives"
/usr/local/sbin/smartd
echo "Unmuting audio"
audioctl output_muted=0

Reboot the system with shutdown -r or simply reboot.

Halt and turn off the power with shutdown -h or simply halt -p.

Level Purpose
Most Linux Some Linux
0 Shut down and power off Shut down and power off
1 Single-user mode Single-user mode
2 Multi-user console login,
no networking
Multi-user console login,
networking enabled
3 Multi-user console login,
networking enabled
Multi-user graphical login,
networking enabled
4 not used not used
5 Multi-user graphical login,
networking enabled
not used
6 Shut down and reboot Shut down and reboot

SysV-style init

This is more complex than the BSD method. It is based on the concept of numbered run levels. Linux uses the definitions in the table at right. Solaris and other Unix-family operating systems use something very similar.

Red Hat and therefore many other distributions used this SysV-style init roughly from the late 1990s through the late 2000s.

#
# inittab       This file describes how the INIT process should set up
#               the system in a certain run-level.
#
# Author:       Miquel van Smoorenburg, <miquels@drinkel.nl.mugnet.org>
#               Modified for RHS Linux by Marc Ewing and Donnie Barnes
#

# Default runlevel. The runlevels used by RHS are:
#   0 - halt (Do NOT set initdefault to this)
#   1 - Single user mode
#   2 - Multiuser, without NFS (The same as 3, if you do not have networking)
#   3 - Full multiuser mode
#   4 - unused
#   5 - X11
#   6 - reboot (Do NOT set initdefault to this)
# 
id:5:initdefault:

# System initialization.
si::sysinit:/etc/rc.d/rc.sysinit

l0:0:wait:/etc/rc.d/rc 0
l1:1:wait:/etc/rc.d/rc 1
l2:2:wait:/etc/rc.d/rc 2
l3:3:wait:/etc/rc.d/rc 3
l4:4:wait:/etc/rc.d/rc 4
l5:5:wait:/etc/rc.d/rc 5
l6:6:wait:/etc/rc.d/rc 6

# Trap CTRL-ALT-DELETE
ca::ctrlaltdel:/sbin/shutdown -t3 -r now

# When our UPS tells us power has failed, assume we have a few minutes
# of power left.  Schedule a shutdown for 2 minutes from now.
# This does, of course, assume you have powerd installed and your
# UPS connected and working correctly.  
pf::powerfail:/sbin/shutdown -f -h +2 "Power Failure; System Shutting Down"

# If power was restored before the shutdown kicked in, cancel it.
pr:12345:powerokwait:/sbin/shutdown -c "Power Restored; Shutdown Canceled"

# Run gettys in standard runlevels
1:2345:respawn:/sbin/mingetty tty1
2:2345:respawn:/sbin/mingetty tty2
3:2345:respawn:/sbin/mingetty tty3
4:2345:respawn:/sbin/mingetty tty4
5:2345:respawn:/sbin/mingetty tty5
6:2345:respawn:/sbin/mingetty tty6

# Run xdm in runlevel 5
x:5:respawn:/etc/X11/prefdm -nodaemon

/etc/inittab

The SysV init program reads its configuration file /etc/inittab to see what to do by default and how to do that. CentOS 5 used the inittab shown here.

The default run level is 5, graphical desktop. If you built a server, this would be 3 instead.

The first boot script to be run is /etc/rc.d/rc.sysinit. It does a number of initialization tasks. Most importantly, it re-mounts the root file system in read/write mode and finds and checks the other file systems.

Then, to get into run level 5, it runs the scripts with 5 in the second field. That means that the second task is to run the script /etc/rc.d/rc with a parameter of 5. More on the details of this in a moment...

If it had been about to reboot because of a power failure, that is canceled.

It then starts six /sbin/mingetty processes, one each on TTY devices tty1 through tty6. The key combinations <Ctrl><Alt><F1> through <Ctrl><Alt><F7> switch you between these six text virtual consoles plus X if it's running.

Finally, it runs the script /etc/X11/prefdm which tries to determine which display manager is probably the preferred one and then starts it.

Along the way it specified that a detected power failure event scheduled a shutdown in two minutes, and the text console keyboard event <Ctrl><Alt><Del> causes an immediate reboot.

$ ls /etc/rc.d/rc5.d
K15httpd       S00microcode_ctl    S22messagebus    S80sendmail
K20nfs         S04readahead_early  S25netfs         S85denyhosts
K28amd         S06cpuspeed         S26acpid         S90crond
K50netconsole  S08arptables_jf     S26lm_sensors    S90xfs
K65kadmin      S10network          S26lvm2-monitor  S91freenx-server
K65kprop       S10restorecond      S28autofs        S95anacron
K65krb524      S12syslog           S50hplip         S95atd
K65krb5kdc     S13irqbalance       S55cups          S96readahead_later
K69rpcsvcgssd  S13mcstrans         S55sshd          S98haldaemon
K74nscd        S13portmap          S56rawdevices    S99firewall
K80kdump       S14nfslock          S56xinetd        S99local
K87multipathd  S18rpcidmapd        S58ntpd          S99smartd
K89netplugd    S19rpcgssd          S61clamd 

Boot script directories and changing run levels

The boot scripts themselves are stored in /etc/rc.d/init.d/ and can be thought of as a collection of available tools.

You can manually stop, start, or restart a service by running its boot script with a parameter of stop or start or restart. Most of the boot scripts also support checking on its current state with status, and some support reload to keep running but re-read its configuration file to change some details of how its running.

Try running a boot script with no parameter at all. That usually provides a short message explaining that a parameter is needed and then listing the possible parameters.

The directories /etc/rc.d/rc0.d through /etc/rc.d/rc6.d specify what to stop and start to get into the corresponding run levels. Each of those is populated with symbolic links pointing to the actual scripts in /etc/rc.d/init.d/*, with the exception of S99local which points to /etc/rc.d/rc.local.

For example, /etc/rc.d/rc5.d contains S10network, a symbolic link pointing to /etc/rc.d/init.d/network.

The logic is that rc goes through the list of links in the target directory, first stopping (killing) those with link names beginning with "K" in numerical order, and then starting those with link names beginning with "S" in numerical order. The network script sets up IPv4/IPv6 networking, and so it is started in run level 5 before those network services that rely on it. Similarly, when going to run level 0 or 6, those services are stopped before turning off IP networking.

There's more to it than just that — if the system was already running, and if a service is to be in the same state in both the current and target run level, then it isn't stopped. For example, if you booted a system to run level 3, in which networking and network services are started, and then you changed to run level 5, the only thing that will happen is that the graphical display manager will be started. It won't shut down all services and IP networking and then start them back up again.

To change from one run level to another, run the init command with a parameter of the target run level.

Use runlevel to see the previous and current run levels, where N means "none", you booted the system directly into the current run level.

Reboot the system with init 6 or shutdown -r or simply reboot.

Halt and turn off the power with init 0 or shutdown -h or simply halt.

On a text console you can reboot with <Ctrl><Alt><Del>, and on a graphical console that usually brings up a dialog in which both rebooting and shutting down are options. You can also click through the graphical menus to shut down or reboot from graphical mode.

Specifying how to get into a given run level

You could manually create the symbolic links, but you would have to think carefully about what numbers to assign to get everything into the correct order.

Don't do that, use chkconfig.

The chkconfig program is a little confusing because it is programmed by shell script comments within the boot scripts. Let's look at an example:

$ head /etc/rc.d/init.d/network
#! /bin/bash
#
# network       Bring up/down networking
#
# chkconfig: 2345 10 90
# description: Activates/Deactivates all network interfaces configured to \
#              start at boot time.

This specifies that if you want this service to be used (and you probably do, this sets up basic IP networking!), then it should be started in run levels 2, 3, 4, and 5, started as S10, fairly early. That leaves it to be turned off in run levels 0, 1, and 6, stopped (killed) as K90, fairly late.

Let's experiment with chkconfig:

$ su
password:
# chkconfig --add network
# chkconfig --list network
network         0:off   1:off   2:on    3:on    4:on    5:on    6:off
# ls /etc/rc.d/rc?.d/*network
/etc/rc.d/rc0.d/K90network  /etc/rc.d/rc4.d/S10network
/etc/rc.d/rc1.d/K90network  /etc/rc.d/rc5.d/S10network
/etc/rc.d/rc2.d/S10network  /etc/rc.d/rc6.d/K90network
/etc/rc.d/rc3.d/S10network
# chkconfig --del network
# chkconfig --list network
network         0:off   1:off   2:off   3:off   4:off   5:off   6:off
# ls /etc/rc.d/rc?.d/*network
/etc/rc.d/rc0.d/K90network  /etc/rc.d/rc4.d/K90network
/etc/rc.d/rc1.d/K90network  /etc/rc.d/rc5.d/K90network
/etc/rc.d/rc2.d/K90network  /etc/rc.d/rc6.d/K90network
/etc/rc.d/rc3.d/K90network
# chkconfig --add network
# chkconfig --list network
network         0:off   1:off   2:on    3:on    4:on    5:on    6:off

We turned it on (probably not needed) and then checked its state in various run levels. We also listed the symbolic links showing that it's started as S10 and stopped (killed) as K90.

Then we turned off the service and tested what that did.

Finally, we turned it back on and made sure that worked.

Control now versus in the future

Remember that you can do two very different things, and often you should do both of them:

Start or stop the service right now by running its boot script with a parameter of start or stop.

Have it automatically started (or not) after future reboots by running chkconfig with an option of --add (or --del).

Upstart init

Upstart is an event-driven replacement or re-design for init. It was meant to be analogous to the Service Management Facility in Solaris, with services started and stopped by events. These events might be kernel detection of hardware, or they might be caused by other services. This includes the crash of a service automatically leading to its being started. It was developed at Ubuntu but it came to be used in many distributions, including RHEL 6 and therefore CentOS and, less directly, Oracle Linux and Scientific Linux.

Upstart is different from SysV init, but the differences are very small for the typical administrator. Instead of a large /etc/inittab specifying several things, now that file has just one line specifying the default target run level, initdefault.

Instead of one configuration file, Upstart uses the collection of significantly-named files in /etc/init/.

/etc/init/rcS.conf specifies how to start the system. It does this in a very familiar way, by running /etc/rc.d/rc.sysinit followed by /etc/rc.d/rc with the single parameter of the target run level. That is, as long as the system wasn't booted into rescue or emergency mode, in which case it runs /sbin/sulogin to make sure it really is the administrator at the keyboard and not someone doing a simple console break-in, and then drops to a shell.

The text consoles are started by /etc/init/start-ttys.conf.

If you go to run level 5, /etc/init/prefdm.conf starts the graphical display manager.

If you passed the console=/dev/tty0 parameter to the kernel at boot time, /etc/init/serial.conf sets up a serial console line.

If you press <Ctrl><Alt><Del> on a text console, /etc/init/control-alt-delete.conf handles the task of rebooting.

Debian and Ubuntu are a Little Different

Instead of /etc/rc.d/rc.sysinit, Debian and Ubuntu run /etc/init.d/rcS. That in turn runs every script /etc/rcS.d/S* in order.

Commands sysv-rc-conf and update-rc.d are used instead of chkconfig. It is probably easiest to see these by example.

Debian / Ubuntu
    # sysv-rc-conf --list
    # sysv-rc-conf --list apache

Most other Linux distributions
    # chkconfig --list
    # chkconfig --list httpd

BSD
    # more /etc/rc.conf /etc/rc.conf.local

Solaris
    # svcs
    # svcs | grep httpd

See the run levels in which a given service is started:

Debian / Ubuntu
    # sysv-rc-conf apache on
    # sysv-rc-conf finger off

Most other Linux distributions
    # chkconfig httpd on
    # chkconfig finger off

BSD
    # vi /etc/rc.conf.local

Solaris
    # svcadm enable network/httpd
    # svcadm disable network/finger

Add/enable one service and delete/disable another after future boots:

# /etc/init.d/httpd status
# /etc/init.d/httpd restart
# /etc/init.d/named reload
# /etc/init.d/named status

For all Linux distributions we have been able to stop, start, stop and restart, and sometimes take other actions simply by running the associated script with an appropriate parameter:

However, that is changing with...

systemd

This is really different from what has come before. Lennart Poettering, the systemd author, provides a description of the systemd design goals and philosophy and then adds a later comparison of features. Also see the official systemd page at freedesktop.org.

It was the default in Mageia at least by early 2013, and became standard in Fedora around that time. By the end of 2013 a RHEL 7 beta release had appeared and it used systemd. By early 2014, Mark Shuttleworth announced that Ubuntu would also transition to systemd with Ubuntu 14.10 in October 2014.

Systemd uses many good ideas from Apple's launchd, introduced with Mac OS X 10.4 and now also part of iOS. To summarize the design:

Systemd Design Philosophy

Start only what's needed

It doesn't make sense to start the CUPS print service while everything else is trying to start. We're booting now, we'll print later. Start it on demand, when someone wants to print.

Similarly, for hardware-specific services like Bluetooth, only start those services when hardware has been detected and some process requests communication with it.

Start some daemons on demand.

For what you do start, aggressively parallelize it

Traditional SysV init required a long sequence of individual service starts. Several early processes were needed by many other services, the early ones had to fully start and their boot scripts successfully terminate before the later ones could begin.

Notice the traditional use of boot scripts. Shell scripts are very good for rapid development, but they don't run fast. The script itself has to be created, and then everything it does requires the creation of further processes. This is made worse by the typical nesting of a boot script calling its helper script which in turn calls a number of configuration scripts.

Recode boot scripts in C, use binary executables as much as possible.

The CPU is the fastest component in the system, the disks are the slowest. The CPU must sit idle for many potentially useful cycles waiting for disk I/O. And, saying "the CPU" is a little old-fashioned, most systems have multiple CPU cores and we want to use them all in parallel. We want to aggressively parallelize the startup programs, but we don't want to coordinate actions by monitoring the file system.

Systemd can create sockets, then pass those sockets to daemon processes as they are started. This can be simplified and sped up by creating all needed sockets at once, and then starting all daemon processes at once. Get them all started and let them communicate among themselves as they come up. The sockets are maintained by systemd so if a daemon crashes, systemd restarts that daemon and programs that were communicating with the old daemon are still connected but now to the replacement.

Aggressively parallelize the startup by starting all daemons simultaneously and using sockets for inter-process communication to handle inter-service order dependencies.

There is more to it. Control Groups or cgroups are used to group related processes into a hierarchy of process groups, providing a way to monitor and control all processes of a group, including limiting and isolating their resource usage. When you stop the service, it will stop all the related processes.

Automounting can be used for all file systems other than the root file system, supporting encryption with LUKS, NFS and other network-based storage, LVM and RAID.

See Lennart Poettering's description for more details.

Location and Components

It gets weird here, there no longer is an /sbin/init program! You must either set up symbolic links, as seen here, or else modify the boot loader to pass this option to the kernel:
init=/lib/systemd/systemd

$ ls -l /usr/sbin/init /usr/bin/systemd /lib/systemd/systemd
-rwxr-xr-x 1 root root 929520 Sep 22 12:26 /lib/systemd/systemd*
lrwxrwxrwx 1 root root     22 Oct  6 01:37 /usr/bin/systemd -> ../lib/systemd/systemd*
lrwxrwxrwx 1 root root     22 Oct  6 01:37 /usr/sbin/init -> ../lib/systemd/systemd* 

Notice how some components are under /usr, just part of a general Linux trend of crucial components moving under /usr and making it impractical for that to be a separate file system as it frequently has been in UNIX tradition. Beware that modern Linux systems typically have no real /bin, /lib, /lib64, or /sbin, those are all symbolic links pointing to directories in /usr and so that must be part of the root file system.

% ls -ld /bin /lib* /sbin
lrwxrwxrwx 1 root root 7 Feb  3 16:45 /bin -> usr/bin/
lrwxrwxrwx 1 root root 7 Feb  3 16:45 /lib -> usr/lib/
lrwxrwxrwx 1 root root 9 Feb  3 16:45 /lib64 -> usr/lib64/
lrwxrwxrwx 1 root root 8 Feb  3 16:45 /sbin -> usr/sbin/

Systemd binaries are located in /lib/systemd/systemd-*, with optional distribution-specific scripts in the same directory.

The interesting parts are the task unit configuration files, all of them under /lib/systemd/system/.

Units

Booting tasks are organized into units — these include initializing hardware, mounting file systems, creating sockets, and starting services that will daemonize and run in the background. Each of these task units is configured by a simple file holding configuration information, these are sources of information and not scripts to be run. Their syntax is similar to things like kdmrc, the KDE display manager configuration file, and therefore similar to Windows *.ini files. For example, here is the named.service file, specifying when and how to start the BIND DNS service:

[Unit]
Description=Berkeley Internet Name Domain (DNS)
Wants=nss-lookup.target
Before=nss-lookup.target
After=network.target

[Service]
Type=forking
EnvironmentFile=-/etc/sysconfig/named
Environment=KRB5_KTNAME=/etc/named.keytab
PIDFile=/var/lib/named/var/run/named/named.pid

ExecStartPre=/usr/sbin/setup-named-chroot.sh /var/lib/named on
ExecStartPre=/usr/sbin/named-checkconf -t /var/lib/named -z /etc/named.conf
ExecStart=/usr/sbin/named -u named -t /var/lib/named $OPTIONS

ExecReload=/bin/sh -c '/usr/sbin/rndc reload > /dev/null 2>&1 || /bin/kill -HUP $MAINPID'

ExecStop=/bin/sh -c '/usr/sbin/rndc stop > /dev/null 2>&1 || /bin/kill -TERM $MAINPID'
ExecStopPost=/usr/sbin/setup-named-chroot.sh /var/lib/named off

PrivateTmp=false
TimeoutSec=25

[Install]
WantedBy=multi-user.target 
$ cd /lib/systemd/system
$ more cups.*
::::::::::::::
cups.path
::::::::::::::
[Unit]
Description=CUPS Printer Service Spool

[Path]
PathExistsGlob=/var/spool/cups/d*

[Install]
WantedBy=multi-user.target
::::::::::::::
cups.service
::::::::::::::
[Unit]
Description=CUPS Printing Service

[Service]
ExecStart=/usr/sbin/cupsd -f
PrivateTmp=true

[Install]
Also=cups.socket cups.path
WantedBy=printer.target
::::::::::::::
cups.socket
::::::::::::::
[Unit]
Description=CUPS Printing Service Sockets

[Socket]
ListenStream=/var/run/cups/cups.sock

[Install]
WantedBy=sockets.target 

Unit Types

The file name indicates the type of that unit.

*.mount files specify when and how to mount and unmount file systems, *.automount files are for storage handled by the automounter.

*.service files handle services that in the past were typically handled by scripts in /etc/rc.d/init.d/.

*.socket files create sockets that we be used by the associated service units.

*.path files allow systemd to monitor the specified files and directories through inotify; access in that path causes a service start.

The CUPS printing service provides a simple example. systemd watches for the appearance of a file named /var/spool/cups/d*, which is what happens when you submit a print job.

The interesting difference from the old design is that there is no print service running until you submit a print job. Once started it persists, with both it and systemd monitoring the socket. When you submit a new print job, systemd sends out a log message "systemd[1]: Started CUPS Printing Service." But typically none is needed because the daemon is still running.

*.target files define groups of units. These are analogous to the run levels we saw in SysV and Upstart, but you can have arbitrarily many of arbitrary complexity. (Actually that was true with SysV and Upstart but hardly anyone did such a thing.)

You can view the available targets with one command:

$ systemctl --type=target --all
UNIT                   LOAD   ACTIVE   SUB    JOB DESCRIPTION
basic.target           loaded active   active     Basic System
cryptsetup.target      loaded active   active     Encrypted Volumes
emergency.target       loaded inactive dead       Emergency Mode
final.target           loaded inactive dead       Final Step
getty.target           loaded active   active     Login Prompts
graphical.target       loaded active   active     Graphical Interface
local-fs-pre.target    loaded active   active     Local File Systems (Pre)
local-fs.target        loaded active   active     Local File Systems
multi-user.target      loaded active   active     Multi-User
network.target         loaded active   active     Network
nfs.target             loaded active   active     Network File System Client and
nss-lookup.target      loaded active   active     Host and Network Name Lookups
nss-user-lookup.target loaded inactive dead       User and Group Name Lookups
printer.target         loaded active   active     Printer
remote-fs-pre.target   loaded inactive dead       Remote File Systems (Pre)
remote-fs.target       loaded active   active     Remote File Systems
rescue.target          loaded inactive dead       Rescue Mode
rpcbind.target         loaded active   active     RPC Port Mapper
shutdown.target        loaded inactive dead       Shutdown
sockets.target         loaded active   active     Sockets
sound.target           loaded active   active     Sound Card
swap.target            loaded active   active     Swap
sysinit.target         loaded active   active     System Initialization
syslog.target          loaded active   active     Syslog
time-sync.target       loaded active   active     System Time Synchronized
umount.target          loaded inactive dead       Unmount All Filesystems

LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.
JOB    = Pending job for the unit.

26 loaded units listed.
To show all installed unit files use 'systemctl list-unit-files'. 

Directories named servicename.target.wants allow you to manually define dependencies between units. For example, while some network services can handle network interfaces that only appear after the network service has started, the Apache web server needs to have networking up and running before it starts.

Defining the Default Targets

/lib/systemd/system/default.target defines the default target at boot time. It is usually a symbolic link pointing to multi-user.target for a server or graphical.target for a workstation.

Note that /etc/systemd/system/default.target can also exist and point to a unit file. On my Mageia system, for example, that's a roundabout way of getting to the same target:
/etc/systemd/system/default.target -> /lib/systemd/system/runlevel5.target
/lib/systemd/system/runlevel5.target -> /etc/systemd/system/graphical.target

You can override this default by passing a parameter to the kernel at boot time, systemd will discover this in /proc/cmdline and override the default. For example:
systemd.unit=runlevel3.target
or:
systemd.unit=rescue.target

Note that the traditional parameters from SysV and Upstart can still be used, 1, s, S, single, 3, 5. Systemd maps those to the associated runlevelX.target definitions.

[Unit]
Description=Graphical Interface
Documentation=man:systemd.special(7)
Requires=multi-user.target
After=multi-user.target
Conflicts=rescue.target
Wants=display-manager.service
AllowIsolate=yes

[Install]
Alias=default.target 

On my desktop system, the special target file default.target is a symbolic link pointing to graphical.target. Leaving out the standard initial comment block, it contains what we see here.

Notice that it explicitly requires the multi-user.target unit, and it will also require all the components in the subdirectory default.target.wants, although that is empty on my system.

The reason for making servicename.target be a directory of symbolic is that you can easily add and delete the "wants" without modifying the unit definition file itself.

[Unit]
Description=Multi-User
Documentation=man:systemd.special(7)
Requires=basic.target
Conflicts=rescue.service rescue.target
After=basic.target rescue.service rescue.target
AllowIsolate=yes

[Install]
Alias=default.target 

Going back one level to the multi-user target, multi-user.target a requirement for basic.target.

The multi-user.target.wants directory contains these additional requirements:

dbus.service, getty.target, plymouth-quit-wait.service, plymouth-quit.service, rpcbind.target, systemd-ask-password-wall.path, systemd-logind.service, systemd-user-sessions.service

[Unit]
Description=Basic System
Documentation=man:systemd.special(7)
Requires=sysinit.target sockets.target
After=sysinit.target sockets.target
RefuseManualStart=yes 

Chasing it further back, basic.target contains what we see here, a requirement for the sysinit.target target. The basic.target.wants directory adds these requirements, restoring the sound service and applying any distribution-specific scripts:

alsa-restore.service, alsa-state.service, fedora-autorelabel-mark.service, fedora-autorelabel.service, fedora-configure.service, fedora-loadmodules.service, mandriva-everytime.service, mandriva-save-dmesg.service

[Unit]
Description=System Initialization
Documentation=man:systemd.special(7)
Conflicts=emergency.service emergency.target
Wants=local-fs.target swap.target
After=local-fs.target swap.target emergency.service emergency.target
RefuseManualStart=yes 

And then sysinit.target contains what we see here. It has a larger list of added requirements in sysinit.target.wants:

cryptsetup.target, dev-hugepages.mount, dev-mqueue.mount, kmod-static-nodes.service, mandriva-kmsg-loglevel.service, plymouth-read-write.service, plymouth-start.service, proc-sys-fs-binfmt_misc.automount, sys-fs-fuse-connections.mount, sys-kernel-config.mount, sys-kernel-debug.mount, systemd-ask-password-console.path, systemd-binfmt.service, systemd-journal-flush.service, systemd-journald.service, systemd-modules-load.service, systemd-random-seed.service, systemd-sysctl.service, systemd-tmpfiles-setup-dev.service, systemd-tmpfiles-setup.service, systemd-udev-trigger.service, systemd-udevd.service, systemd-update-utmp.service, systemd-vconsole-setup.service

Examining and Controlling System State With systemctl

List all active units (that is, units enabled and should have successfully run or still be running), showing their current status, paging through the results:

# systemctl list-units

List all target units, showing the collective targets reached in the current system state. This is broader than simply "the current run level" as shown by the runlevel command:

# systemctl list-units --type=target

List just those active units which have failed:

# systemctl --failed

List the units listening on sockets:

# systemctl list-sockets

List all available units, showing whether they are enabled or not:

# systemctl list-unit-files

Display the dependency tree for a service. Service names are something like named.service but they can be abbreviated by leaving off .service.

# systemctl list-dependencies named

Start, stop, restart, reload the configuration, and report the status of one or more service. These are like the corresponding /etc/init.d/* boot scripts, with the addition of the inter-process communication and automated dependency satisfaction. Use show for far more information on that service.

You will notice that the first time you check the status for a service it will probably take a noticeable amount of time. This is because it is checking the journal, another powerful but complex addition that comes with systemd. More on that below...

# systemctl stop named dhcpd
# systemctl start named dhcpd
# systemctl restart named
# systemctl reload named
# systemctl status named
# systemctl show named

Disable and enable a service for use in the future. These are like the corresponding chkconfig commands.

# systemctl disable named
# systemctl enable named

Make major changes in system state:

# systemctl reboot
# systemctl halt
# systemctl poweroff

What About /etc/rc.d/rc.local?

Here's a common question: How do I get /etc/rc.d/rc.local to work under systemd? Maybe you're like me, you have written your own iptables firewall script or some other locally developed programs you want to run at the end of the booting process.

#  This file is part of systemd.
#
#  systemd is free software; you can redistribute it and/or modify it
#  under the terms of the GNU Lesser General Public License as published by
#  the Free Software Foundation; either version 2.1 of the License, or
#  (at your option) any later version.

# This unit gets pulled automatically into multi-user.target by
# systemd-rc-local-generator if /etc/rc.d/rc.local is executable.
[Unit]
Description=/etc/rc.d/rc.local Compatibility
ConditionFileIsExecutable=/etc/rc.d/rc.local
After=network.target

[Service]
Type=forking
ExecStart=/etc/rc.d/rc.local start
TimeoutSec=0
RemainAfterExit=yes
SysVStartPriority=99

Well, maybe it already works.

See the example /lib/systemd/system/rc-local.service systemd service file here.

A comment refers to /lib/systemd/system-generators/systemd-rc-local-generator, which is one of those fast-running binaries. All I have to do is create an executable script named /etc/rc.d/rc.local, and the next time the system boots, that script is run.

Otherwise, see if you have an rc-local.service unit and enable it if needed.

If you don't have an rc-local.service file, create one similar to what you see here and enable it:

systemctl enable rc-local.service

Maybe you want to tinker a little, use /etc/rc.local directly and leave out the rc.d subdirectory. Or use ConditionPathExists instead of ConditionFileIsExecutable. Have fun!

What is left in /etc/rc.d?

Specifically, what about the directories /etc/rc.d/init.d/ and /etc/rc.d/rc?.d/ — do they still contain scripts and symbolic links?

Not much! But what remains does work. You can run the scripts in init.d and systemd will run the scripts in rc3.d or rc5.d when going to the multi-user or graphical target, respectively.

Writing your own service scripts

See these:
Writing systemd service files
How to write a startup script for systemd

Smaller Process Trees

With many startup tasks now done by one binary executable instead of a script that spawned many child processes, including other scripts which may have called other scripts, fewer processes were spawned to bring the system up.

The aggressive parallelization means a flatter tree of processes.

Here is part of the process tree on CentOS 5 with SysV init:

init(1)-+-acpid(1850)
        |-atd(2290)
        |-crond(2100)
        |-cupsd(1935)
    [ ... ]
        |-gdm-binary(2401)---gdm-binary(2441)-+-Xorg(2446)
        |                                     `-tcsh(2460,cromwell)-+-ssh-agent(2496)
	|                                                           `-startkde(2506)---kwrapper(2572)
    [ ... ]
        |-kdeinit(2559,cromwell)-+-artsd(2586)
        |                        |-autorun(2677)
        |                        |-bt-applet(2691)
        |                        |-eggcups(2591)
        |                        |-kio_file(2582)
        |                        |-klauncher(2564)
        |                        |-konqueror(2598)
        |                        |-konsole(2602)-+-tcsh(2705)
        |                        |               |-tcsh(2707)---su(2854,root)---bash(2922)
        |                        |               `-tcsh(2712)
        |                        |-kwin(2575)
        |                        |-nm-applet(2663)
        |                        |-pam-panel-icon(2590)---pam_timestamp_c(2592,root)
        |                        |-xload(2664)
        |                        |-xmms(2638)-+-{xmms}(2678)
        |                        |            `-{xmms}(2786)
        |                        |-xterm(2593)---tcsh(2603)
        |                        |-xterm(2596)---tcsh(2606)
        |                        |-xterm(2597)---tcsh(2608)---ssh(3251)
        |                        `-xterm(2637)---bash(2640)-+-grep(2645)
        |                                                   |-grep(2646)
        |                                                   `-tail(2644)
    [ ... ]
        |-ntpd(2022,ntp)
        |-sendmail(2061)
        |-sendmail(2070,smmsp)
        |-smartd(2387)
        |-syslogd(1653)
        |-udevd(418)
        |-watchdog/0(4)
        |-xfs(2153,xfs)
	`-xinetd(2001) 

Compare that to this process tree from Mageia with systemd. Shells and other processes aren't as deep:

$ pstree -pu | less
systemd(1)-+-acpid(695)
           |-agetty(3006)
           |-atd(672,daemon)
    [ ... ]
           |-kmix(3278,cromwell)---{kmix}(3676)
           |-knotify4(3241,cromwell)---{knotify4}(3242)
           |-konsole(3288,cromwell)-+-tcsh(3455)-+-audacious(6294)-+-{audacious}(6295)
           |                        |            |                 |-{audacious}(6298)
           |                        |            |                 |-{audacious}(6300)
           |                        |            |                 |-{audacious}(6310)
           |                        |            |                 `-{audacious}(6418)
           |                        |            |-less(6463)
           |                        |            `-pstree(6462)
           |                        |-tcsh(12198)---vim(5903)---{vim}(5904)
           |                        `-{konsole}(3453)
    [ ... ]
           |-named(2365,named)-+-{named}(2366)
           |                   |-{named}(2367)
           |                   |-{named}(2368)
           |                   |-{named}(2369)
           |                   |-{named}(2370)
           |                   `-{named}(2371)
           |-ntpd(2227,ntp)
           |-plasma-desktop(3244,cromwell)-+-ksysguardd(3262)
           |                               |-{plasma-desktop}(3245)
           |                               |-{plasma-desktop}(3246)
           |                               |-{plasma-desktop}(3256)
           |                               |-{plasma-desktop}(3261)
           |                               `-{plasma-desktop}(3263)
    [ ... ]
           |-rpcbind(1683,rpc)
           |-rsyslogd(697)-+-{rsyslogd}(763)
           |               |-{rsyslogd}(764)
           |               |-{rsyslogd}(765)
           |               `-{rsyslogd}(766)
           |-ssh-agent(2847,cromwell)
           |-sshd(1697)
           |-start_kdeinit(3200,cromwell)
           |-systemd-journal(380)
           |-systemd-logind(677)
           |-systemd-udevd(384)
           |-tor(2041,toruser)
           |-udisks-daemon(679)-+-udisks-daemon(683)
           |                    |-{udisks-daemon}(769)
           |                    `-{udisks-daemon}(817)
           |-udisksd(3217)-+-{udisksd}(3218)
           |               |-{udisksd}(3220)
           |               `-{udisksd}(3222)
           |-upowerd(699)-+-{upowerd}(767)
           |              `-{upowerd}(770)
           `-xosview(3799,cromwell)

The Journal and journalctl

You probably noticed that systemctl status servicename took a while the first time you ran it. And you may have stumbled across that large and possibly mysterious /var/log/journal/ directory. This is the systemd journaling system.

The systemd journal captures log information even when the rsyslog daemon isn't running, and stores it in a form that requires the use of the journalctl command.

A unique machine ID was created during the installation, it is a 16-byte or 128-bit string recorded in ASCII as hexadecimal in /etc/machine-id. That machine ID is used as a subdirectory in which the journal files are stored. For example:

# cat /etc/machine-id
3845e210bd0d4dc5b2e5f5fd8fdc6f01
# find /var/log/journal -type d
/var/log/journal
/var/log/journal/3845e210bd0d4dc5b2e5f5fd8fdc6f01

The journal files are all owned by root and associated with group adm or systemd-journal. Put a user in both groups to ensure they can read the journal with journalctl.

The systemd-journald manual page explains that you can grant read access to all members of groups adm and wheel for all journal files existing now and created in the future:

# setfacl -Rnm g:wheel:rx,d:g:wheel:rx,g:adm:rx,d:g:adm:rx /var/log/journal/

Worries About Size and Compliance

On the one hand, you are likely to worry about all this journal data filling your file system. Don't worry — by default it will use no more than 10% of the file system and keep at least 15% free. See the manual page for journald.conf to see how to adjust that in /etc/systemd/journald.conf.

If regulatory compliance requires you to retain log information, you should worry about collecting and archiving this information before its older content is automatically trimmed away. See the manual page for journalctl to see how to have a scheduled job extract the past day. For example, run this script via cron after midnight every night to capture all events from midnight to midnight from the day before. Log output tends to be very redundant and compress down to about 5% of its original size with xz:

#!/bin/sh

# Create an archive if this is the first run ever.
ARCHIVE=/var/log/journal-archive
mkdir -p ${ARCHIVE}
cd ${ARCHIVE}

# Capture yesterday's events.  File name will include the
# host on which this was done plus yesterday's date in
# YYYY-MM-DD format.  Then compress it:
HOST=$( hostname )
DATE=$( date --date=yesterday "+%F" )
journalctl --since=yesterday > journal-${HOST}-${DATE}
xz journal-${HOST}-${DATE}

Useful journalctl Techniques

See the manual page for journalctl for full details. You can accomplish these types of things with rsyslog data, but only with possibly complicated grep or awk commands based on some initial investigation into just when boot events happened. The journalctl command makes these much easier. Some handy commands include:

See just the kernel events logged since the most recent boot:

# journalctl -k -b -0

Or, all logged events since the most recent boot:

# journalctl -b -0

Or, all logged events within the run before this most recent boot. For example, you rebooted this system some time yesterday afternoon and again this morning, and you want to see all events between those two reboots. This would require some initial investigation and then some complex grep strings using rsyslogd data only:

# journalctl -b -1

Just the logged events for one systemd unit, or for two (or more):

# journalctl -u named
# journalctl -u named -u dhcpd

Or, for just these three units since the last boot:

# journalctl -u named -u dhcpd -u httpd -b -0

Or, to emulate  tail -f /var/log/messages 

# journalctl -f

journalctl or Rsyslog or both?

With journalctl capturing all local events, even those when the Rsyslog daemon isn't running, do we still need to run rsyslogd?

You very likely do want to also run rsyslogd and it's easily set and and imposes very little additional overhead.

A UNIX socket is created by systemd and rsyslogd will listen to it by default, capturing all messages (whether it saves them, and if so, where, is entirely up to your configuration of rsyslogd).

#  ls -l /run/systemd/journal/syslog 
srw-rw-rw- 1 root root 0 Feb  9 16:28 /run/systemd/journal/syslog=
# file /run/systemd/journal/syslog 
/run/systemd/journal/syslog: socket
# lsof /run/systemd/journal/syslog 
COMMAND  PID USER   FD   TYPE             DEVICE SIZE/OFF NODE NAME
systemd    1 root   25u  unix 0xffff880234b2b800      0t0 1730 /run/systemd/journal/syslog
rsyslogd 787 root    3u  unix 0xffff880234b2b800      0t0 1730 /run/systemd/journal/syslog

journalctl is very nice for querying the existing journal, but rsyslogd can still do some things that the journal cannot.

Centralized logging has a number of advantages. One is security, the integrity and availability of the log data. Yes, Forward Secure Sealing can periodically "seal" journal data to detect integrity violation, but I would feel better about having critical log data stored on a dedicated, hardened remote rsyslog server.

Rsyslog can enforce host authentication and data confidentiality and integrity through TLS, see my how-to page for the details.

Also, with all the log data in one place you're immediately ready to apply a log analysis package like Splunk or ArcSight.

So, for me, systemd journal plus Rsyslog makes sense.