Look at the mixture of operating system
environments visible through the one desktop
This is a Red Hat Enterprise Linux system running in
graphical mode, with five virtualized systems running
on top of it, both containers and full virtual machines.
The most colorful one is a Windows 8.1 virtual machine.
In front of that is an Oracle Linux container system
in text mode, as appropriate for a server.
There is a Debian graphical desktop toward the back.
Also listed as runnning in the Virtual Machine Manager
but without connections to their consoles at the
moment are a CentOS 6 container and a
CentOS 7 virtual machine.
All this and more make up our goal. We want to control multiple operating systems running in multiple modes of operation, and do that through a common interface.
We want to be able to do this across a data center, where the containers and virtual machines are spread across multiple physical platforms but we can easily control them all from one console. From the command line for efficiency and scalability through scripting, and from a graphical interface for easy visibility and insight.
We need to get some terminology out of the way first. It can get a little confusing, but if we don't get it right we may end up doing the opposite of what we want.
Full virtualization has the hypervisor presenting virtual CPUs, memory, networking, and more, and the guest operating system is convinced that it is running on real hardware. The hypervisor must do dynamic binary translation to carry this off. It pretends to be the real hardware, ideally in real time but of course with some added latency.
Paravirtualization is when the guest
operating systems uses customized device drivers to access
the virtualized hardware.
The guest OS is not just aware that it is running on top
of virtualization, it is using different drivers to run
a little differently.
See Linux systems running on Xen hypervisors, for example.
Amazon Web Services' EC2 (or Elastic Compute Cloud)
is Linux on Xen.
The disk devices are not the usual
/dev/xvd*, so named by the
paravirtualization device driver for Xen-virtualized
Full virtualization can provide the greatest compartmentalization between virtual machines and between virtual machines and the host. That is, the greatest separation of file systems, process tables, and UID/GID identity or credentials. The hypervisor can even emulate different hardware — maybe the guest is Android on ARM while the host is Linux on x86_64. As you might already imagine, full virtualization, especially of foreign architectures, is slower.
is an expanded CPU instruction set that lets the CPU do some
of the virtualization work directly in hardware instead of
the hypervisor processes in user space.
Hardware-assisted virtualization has been around at least
since IBM CP-40 systems in the 1960s, but the Intel/AMD x86
architecture only added this around 2005.
Intel calls their version VT-x, AMD's is AMD-V.
Look for the feature in
kernel's notion of the CPU.
You should find
vmx on Intel CPUs and
svm on AMD.
You need it as virtualization is
significantly slower without it.
$ egrep 'vmx|svm' /proc/cpuinfo
With hardware-assisted virtualization, performance can reach 95–99% of the bare-metal speed.
Hybrid virtualization is when you have hardware-assisted virtualization supporting paravirtualization.
Finally, operating system virtualization has become popular in Linux. There is a single kernel shared by the host and the guest systems, but some compartmentalization of processes, memory, and file system. Linux Containers (or LXC) are virtualization technology, while Docker is an efficient and scaleable management and deployment scheme for LXC. More recently, Docker added its own virtualization technology for improved performance.
We can place these along a spectrum from best performance to best compartmentalization:
|Best Performance||Best Compartmentalization|
|Operating System Virtualization||Paravirtualization||Full System Virtualization|
Linux Containers / LXC
|separate file systems, becoming even more separate as we move to full system virtualization|
|separate users/groups and process tables|
Chroot has been around for ages but only recently has it come to be widely considered as a form of virtualization. Linux Containers or LXC have their own UID/GID and process space, and Docker is a system for efficiently managing containers and sharing their design. KVM or Kernel-Based Virtual Machine is functionality provided by the kernel, and QEMU is the user-space mechanism for controlling KVM-based virtual machines. Both are used whether you're doing paravirtualization or full system virtualization, but QEMU is much more obviously involved in full virtualization, especially when emulating foreign architectures.
You must run
chroot as the user
You can simply name the new file system root and it will
/bin/sh program within it:
# chroot /var/chroots/myproject
You can specify other non-
root UID/GID credentials:
# chroot --userspec=fred:wheel /var/chroots/myproject
You can name the root of the
and also the path to the program to run within it:
# chroot /var/www /sbin/httpd
Note that the above implies that
exists, and it is either a statically linked binary or else
/var/www/lib is populated with the needed
shared libraries usually found in
Furthermore, the web root will be beneath the new file system,
so the web pages in
log files in
/var/www/var/www/log/*, and so on.
And, very likely,
/var/www/dev/ is partially
populated with device nodes, at least with
The UNIX family has had
chroot for ages,
and it is only with the recently growing interest in
Linux-based virtualization that it has been considered
anything but pure compartmentalization.
Its syntax has been simple and stable for a long time,
Linux Containers or LXC
has been a rapidly evolving area,
and the commands may vary from one platform to another.
For example, compare
lxc-ls on Debian
and Red Hat:
$ lxc-ls --version 1.0.7 $ lxc-ls --fancy NAME STATE IPV4 IPV6 AUTOSTART ------------------------------------------------ container1 RUNNING 10.0.3.192 - NO container2 STOPPED - - NO $ lxc-info -n container1 Name: container1 State: RUNNING PID: 4140 IP: 10.0.3.192 CPU use: 0.66 seconds BlkIO use: 72.00 KiB Memory use: 3.22 MiB KMem use: 0 bytes Link: veth22AUQI TX bytes: 1.68 KiB RX bytes: 5.33 KiB Total bytes: 7.01 KiB $ lxc-info -n container2 Name: container2 State: STOPPED
That's on Debian, where
is a 15,958 byte Python script.
On Red Hat it is from a package of the same name and version,
lxc and 1.0.7, but it is a 2,837 byte
POSIX shell script and the command
yields a report on the version of
Check the manual pages, try running the command with parameters
--help, and then
be aware that other syntax may also be accepted.
For example, at least with version 1.0.7 on Red Hat Enterprise
Linux 7, these three commands are equivalent and the second
is not mentioned in the manual page:
# lxc-info -n mycontainer # lxc-info --name mycontainer # lxc-info --name=mycontainer
The following uses the shortest
name for the name of the
Most commands have more options,
the following shows enough to get you started.
Common LXC Commands
lxc-info -n name
initprocess, CPU and memory use) and container-side network information.
lxc-start -n name lxc-stop -n name
The current terminal will be the container's console, you will see it start and then run its default program, maybe a login prompt. Use the
-doption for daemon or detached mode.
lxc-console -n name
-doption, this attaches the current terminal to the console.
lxc-execute -n name -- command
namecontainer and execute
lxc-monitor -n name
namecontainer, displaying one line per state change: STARTING, RUNNING, STOPPING, STOPPED.
lxc-autostart -n name lxc-autostart --disable -n name
ls /usr/share/lxc/templates lxc-create --template=debian -h lxc-create --template=debian -r wheezy -n name
Second, see the list of specific options available for the
Third, create a container from that template, selecting the
wheezyrelease option and naming it
lxc-clone -o orig -n new
newas a clone of existing original container
lxc-destroy -n name
Warning: This is very different from the
destroyparameter to the
virshcommand, shown below!
LXC XML Files
A container is defined by the configuration file
which contains comments (from "#" to the end of the line)
feature = value
The libvirt system, described below, may support converting the LXC native configuration into an XML description with this syntax. Again, it depends on the capabilities of the version you have installed:
# virsh dom-from-xml lxc-tools /var/lib/lxc/name/config
If so, the result would be like the following. If not, the appropriate XML file can be created by hand:
<domain type='lxc'> <name>mycontainer</name> <!-- Performance Settings --> <memory>500000</memory> <vcpu>1</vcpu> <devices> <emulator>/usr/libexec/libvirt_lxc</emulator> <!-- The root of our container file system --> <filesystem type='mount'> <source dir='/var/lib/lxc/mycontainer/rootfs'/> <target dir='/'/> </filesystem> <interface type='network'> <source network='default'/> </interface> <console type='pty' /> </devices> <os> <type>exe</type> <init>/sbin/init</init> </os> <clock offset='utc'/> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>destroy</on_crash> </domain>
is a mechanism for managing and sharing
to very efficiently utilize storage.
It was initially built on top of LXC, but it now defaults
to using its own
operating system virtualization mechanism.
The expectation is that you will use Docker to create
lightweight containers dedicated to specialized single tasks,
so it uses copy-on-write to very quickly create
those new containers.
A Docker image is a template from which a container may be constructed. A registry is a distribution point for images, analogous to an RPM or APT repository or repo. They can be public, shared with the Internet, or kept private within your organization. You can pull an image from a registry and run it on the local machine. If you make some improvements in that container, you can then push your new image to a registry so others can use it.
Let's say that our organization has stored an image
centos, a container built from a
basic CentOS installation.
pull that to my system and make changes,
adding the Apache web server and setting up some
extra logging, and then
push the result
into our private registry.
You could then
pull down my image and add
a database back end, and then
result into the registry.
Someone else in our organization could then
down your image and run it.
It seems like our registry must be growing quickly, with three
images available — the original, the original plus
my web server and logging, and that plus your data base work.
But only the initial image will be a full image.
My modifications, and your modifications to mine, are
stored as just the changes needed to build the desired
Common Docker Commands
-ato include those that aren't currently running.
docker history name
docker inspect name
name, which can be an image or a container.
systemctl start docker
docker run -it --rm centos /bin/bash
centosimage, using the
-t(terminal) options to give me an interactive console terminal, and the
--rmoption to remove the container when its job, the Bash shell, terminates.
docker run -p 8080:80 --rm centos /usr/sbin/httpd
centosimage, without an interactive console (no
-t), running the
/usr/sbin/httpdweb server, and mapping host TCP port 8080 to the container's port 80. A client can now connect to TCP port 8080 on the host and that tunnels through to TCP port 80 on the container.
docker create mycentos centos
docker commit mycentos webserver
mycentos, generate a new image
webserverfor future use.
docker tag webserver registry.example.com:5000/myproject/webserver
docker push registry.example.com:5000/myproject/webserver
webserverwith the hostname and TCP port of our registry and the image's name.
Then, push that image with that tag to the registry.
docker rm webserver
docker rmi mycentos
You can also use a Dockerfile, analogous to a Makefile, to specify how to create a new image by modifying and starting an existing one.
# cat /path/to/source/directory/Dockerfile ## Dockerfile to quickly generate a web server ## ## Start with this base image: FROM centos ## Run this command before committing the image: RUN yum -y install httpd ## Connect this shared volume on the host ## containing an Apache configuration and logs: VOLUME ["/var/www"] ## Load a web site into the image file system: ADD site1.tar.gz /var/www/htdocs ## Set an environment variable within the container: ENV APACHE_HOME /var/www ## Run this command when starting the container: CMD ["/usr/sbin/httpd"]
We could now build a container based on that Dockerfile.
We will give it a fully qualified and meaningful repository,
name, and tag with the
In this example it's version (or tag) 1.0, named
webserver in the
# docker build -t project/webserver:1.0
QEMU is a generic open source machine emulator. It does full-system virtualization, where you run a separate kernel on the VM and that kernel sees a complete platform with your specialized number of CPUs, amount of RAM, disks, and Ethernet, USB, and audio interfaces. QEMU can emulate a different architecture. You can run a QEMU command on your Linux/x86_64 platform and have it emulate: an ARM system on which you run Android, or an UltraSPARC on which you run Solaris, or simply an x86_64 platform on which you run Windows.
If you are on x86_64 hardware, starting a virtual machine
commands described below actually starts a
qemu-system-x86_64 process to emulate
the virtual machine's platform.
To emulate a different architecture, you must run the
qemu-system-* command and
For example, let's say you want to run Android on
emulated ARM, and you want the terminal in which you
start it to be connected to a serial console in addition
to the graphical Android interface that will appear in
a new window.
- Specify which specific ARM variant and the amount of RAM.
- Pass the kernel and initial RAM disk image files.
- Specify the file containing the disk image, and indicate that it should appear to be on an SD memory card.
- Specify that the Android console is the first serial port, and that the serial port is (virtually) connected to your terminal.
Assembled together, that becomes:
# qemu-system-arm -machine vexpress-a9 -m 1024\ -kernel vmlinuz -initrd initrd.gz \ -drive file=android-disk.qcow2,if=sd \ -append "init=/init androidboot.console=ttyAMA0" -serial stdio
Carefully research the specifics, and be aware that the details vary not only from one architecture to another but also from one version of the QEMU tools to another. This page shows how one person got Fedora to run on a wide variety of architectures including AMD64, ARMel, ARMhf, i386, MIPS, PowerPC, and SPARC, and provides downloadable disk images in qcow2 format.
libvirt and virsh
libvirt is a toolkit for interacting with the virtualization capabilities of Linux. It includes command-line tools, a graphical user interface, and an application programming interface or API for C/C++, C#, Java, Perl, Python, PHP, Ruby, and OCaml so you can write your own command-line or graphical tools. libvirt can control a wide variety of virtualization technologies — KVM/QEMU, LXC, Xen, VirtualBox, OpenVZ, and more.
libvirt refers to hypervisors in URI format:
In these URIs
session generally refers to
the user's access to the hypervisor and
system to the system hypervisor.
virsh command is a do-it-all command-line
interface to libvirt.
-c URI to specify the
As an example, the parameters
will list all virtual machines, running or not,
known to the hypervisor.
# virsh -c lxc:/// list --all Id Name State ---------------------------------------------------- 1833 centos6 running - debian-wheezy shut off 3493 oracle6 running # virsh -c qemu:///system list --all Id Name State ---------------------------------------------------- 6 Windows-8.1 running 8 centos7 running 9 debian-vm running
This means that we have two equivalent ways of manipulating containers:
Common Container Operations in
virsh -c lxc:/// list --all
virsh -c lxc:/// start centos
lxc-start -n centos
virsh -c lxc:/// shutdown centos
lxc-stop -n centos
virsh -c lxc:/// reboot centos
lxc-stop -r -n centos
virsh -c lxc:/// create ...
virsh -c lxc:/// undefine newcontainer
lxc-destroy -n newcontainer
Containers run under the host kernel, "rebooting" or shutting them down only controls the processes running within the container.
Full system virtualization with QEMU and KVM means a
virsh commands have analogies to
actions taken on a physical machine:
virsh Command Analogies
virsh start myvm
virsh shutdown myvm
# systemctl poweroff
virsh reboot myvm
# systemctl reboot
virsh destroy myvm
lxc-destroy, which deletes a container!
virsh reset myvm
virt-manager is a nice graphical tool
in which you can monitor, control, and create
Use the File | Add Connection
menu choice to add connections to other
In this view, lxc:/// and qemu:///system
are in use.
libvirt XML Files
Virtual machines are described by XML.
You can dump the XML description to standard output
You could save that XML output in a file,
edit the file to change the
uuid and all
values, and then define a new virtual machine with
<domain type='kvm'> <name>myvm</name> <uuid>8b9ad37b-9ecb-444f-9f28-298f0bbad29b</uuid> <memory unit='KiB'>524288</memory> <currentMemory unit='KiB'>524288</currentMemory> <vcpu placement='static'>1</vcpu> <os> <type arch='x86_64' machine='pc-i440fx-rhel7.0.0'>hvm</type> <boot dev='hd'/> </os> <features> <acpi/> <apic/> <pae/> </features> <clock offset='utc'/> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <devices> <emulator>/usr/libexec/qemu-kvm</emulator> <disk type='file' device='disk'> <driver name='qemu' type='qcow2' cache='none'/> <source file='/usr/local/containers/myvm.qcow2'/> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </disk> <controller type='usb' index='0'> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> </controller> <controller type='pci' index='0' model='pci-root'/> <controller type='virtio-serial' index='0'> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </controller> <interface type='network'> <mac address='52:54:00:7a:fe:26'/> <source network='default'/> <model type='virtio'/> <address type='pci' domain='0x0000' bux='0x00' slot='0x03' function='0x0'/> </interface> <serial type='pty'> <target port='0'/> </serial> <console type='pty'> <target type='serial' port='0'/> </console> <channel type='spicevmc'> <target type='virtio' name='com.redhat.spice.0'/> <address type='virtio-serial' controller='0' bus='0' port='1'/> </channel> <input type='tablet' bus='usb'/> <input type='mouse' bus='usb'/> <graphics type='spice' autoport='yes'/> <video> <model type='qxl' ram='65536' vram='65536' heads='1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </video> <memballoon model='virtio'> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> </memballoon> </devices> </domain>
There is a similar
virsh net-dumpxml command,
yielding the XML definition of a virtual network.
<network> <name>default</name> <uuid>f37ba69f-6b2c-4775-82d4-6f03a392acb7</uuid> <forward mode='nat'> <nat> <port start='1024' end='65535'/> </nat> </forward> <bridge name='virbr0' stp='on' delay='0' /> <ip address='192.168.122.1' netmask='255.255.255.0'> <dhcp> <range start='192.168.122.2' end='192.168.122.254' /> </dhcp> </ip> </network>
Let's say that we have just the one net and virtual bridge so far:
# virsh net-list --all Name State Autostart Persistent ---------------------------------------------------------- default active yes yes # brctl show bridge name bridge id STP enabled interfaces docker0 8000.56847afe9799 no virbr0 8000.000000000000 yes
We can dump the virtualized network XML description into a file, edit it, and define a new net:
# virsh net-dumpxml default > /tmp/newnet.xml # vim /tmp/newnet.xml [... change name from "default" to "newnet", virtual bridge name from "virbr0" to "virbr1", and IP address range to 192.168.123.0/24 ...] # virsh net-define /tmp/newnet.xml Network newnet defined from /tmp/mynet.xml # virsh net-list --all Name State Autostart Persistent ---------------------------------------------------------- default active yes yes newnet inactive no yes # brctl show bridge name bridge id STP enabled interfaces docker0 8000.56847afe9799 no virbr0 8000.000000000000 yes
The new net is there but the new virtual bridge is not. We must start the new net.
# virsh net-start newnet Network newnet started # virsh net-autostart newnet Network newnet marked as autostarted # virsh net-list --all Name State Autostart Persistent ---------------------------------------------------------- default active yes yes newnet active yes yes # brctl show bridge name bridge id STP enabled interfaces docker0 8000.56847afe9799 no virbr0 8000.000000000000 yes virbr1 8000.52540016cb49 yes virbr1-nic