Violating Virtualization Security

Virtualization and Violations

This page is about attacks and defenses for full system virtualization, where the virtual machine has its own kernel running on what appears to it to be a unique hardware platform.

For other forms of virtualization including containers, and how to do full system virtualization with QEMU and KVM, see my open source virtualization page.

Virtualization is quite popular in the IT world for a variety of reasons. You certainly encounter it frequently, as the major web site hosting companies run virtualized platforms. Within an organization, virtualization offers great advantages of cost, space and flexibility. Virtualization is an absolute requirement for cloud computing as the rapid self-provisioning is otherwise impossible.

The virtual machines (also called VMs, guests and instances) run for the most part as if they are running directly on hardware. However, careful examination of their apparent hardware environment, as with lspci -v on Linux or OpenBSD, or examination of the kernel ring buffer immediately post-boot using dmesg on any Unix family OS, can reveal the virtualized environment.

Hardware
Vulnerabilities

The hypervisor, the controlling mechanism of the virtualization, controls access to hardware resources. This allows each multiple virtual machine to run on the shared hardware while hiding the presence of the others. Of course, if the underlying hardware is flawed, virtualization is just as vulnerable as any other software running on that risky platform. What really scares people is the idea of a malicious VM stealing secrets from its "siblings" in a cloud environment.

But let's focus here on the virtualization itself. How secure is virtualization?

Type 1 Versus Type 2 Virtualization

The hypervisor, also known as the virtual machine manager or VMM, is the software that creates and runs the virtual machines. Hypervisors are complex, really operating systems, and they come in two forms.

Type 1 or native or bare metal hypervisors run directly on the hardware. Examples include open-source Xen, Citrix XenServer, Linux KVM, VMware ESX, Microsoft Hyper-V, and Oracle VM Server.

Type 2 or hosted hypervisors run as an application within a conventional operating system which runs on the hardware. Examples include VMware Workstation and VMplayer, QEMU and VirtualBox.

Type 1

Privileged OS Dom0	Guest OS 1 DomU	Guest OS 2 DomU
Hypervisor
Hardware

Type 2

Guest OS 1	Guest OS 2	Guest OS 3
Hypervisor
Host OS
Hardware

In order to create and control the virtual machines, some software needs to run with elevated privileges and full visibility of the hypervisor environment. In Type 1 virtualization this takes the form of a privileged instance. Xen's terminology is that Dom0 is the privileged instance and DomU is any of the unprivileged ones. In Type 2 virtualization, a privileged application does this work.

But what about our question, how secure is virtualization?

Vulnerabilities

Hypervisors, like any complex pieces of software, are going to have vulnerabilities. Some will be issues of design, others will be problems with the implementation. We have certainly seen these already, for example:

Microsoft Virtual PC and Virtual Server:
Microsoft Security Bulletin MS07-049, in which an administrator on a guest could run code on other guests or on the host.
VMware Workstation, Player and ACE:
- VMware Shared Folder Bug Lets Local Users on the Guest OS Gain Elevated Privileges on the Host OS
- VMware DHCP server vulnerabilities: CVE-2007-0061, CVE-2007-0062, CVE-2007-0063
- VMware NAT networking components vmnat.exe and vmnet-natd: CVE-2005-4459
- VMware Authorization Service: CVE-2002-0814
Xen and QEMU:
Secunia Advisory SA26986, referencing CVE-2007-1320, CVE-2007-1321 and CVE-2007-4993 to explain that root on a guest domain can execute arbitrary commands in Dom0 by placing specially crafted entries in grub.conf and rebooting.
Every virtualization system has had a number of vulnerabilities, including KVM, Virtual PC, QEMU, VMware, Xen, and more.

But do not forget that we must expect design and implementation vulnerabilities in all complex software projects. Full operating system environments are more complicated than hypervisors, so we should expect even more vulnerabilities in the operating systems themselves. It really makes no sense to think you will avoid vulnerabilities by running an operating system directly on hardware!

VM Escape

The first exploits of the design of virtualization took the form of "virtualization escape". These have tended to abuse side channels of communication. Prominent examples include:

Escaping From The Virtualization Cave, presented at SANSFire 2007.
VM Escape discusses the presentation Escaping From The Virtualization Cave.
An Empirical Study into the Security Exposure to Hosts of Hostile Virtualized Environments, by a Google staff member.
On the Cutting Edge: Thwarting Virtual Machine Detection
Attacking Xen: DomU vs. Dom0 consideration and Our Xen 0wning Trilogy Highlights from The Invisible Things Lab.

Xen Episode IV: The Guests still Strike Back discusses attacks against the hypervisor from Dom0 and DomU. Frankly, if Dom0 is owned, there isn't much hope. The DomU based attacks are more interesting. Their paper discusses how Xen protects direct memory access and lists some attacks from paravirtualized guests.

Malicious Hypervisors

Rootkits evolved into hostile virtualization. Slide a malicious hypervisor underneath a victim OS, and it won't even realize what happened. Examples include:

SubVirt, PDF file here
Blue Pill, Blackhat presentation here
Vitriol

True Subversion of Type 1 Virtualization

For a while we thought we were reasonably safe in a cloud environment. The only known virtualization escapes or exploits were limited to Type 2 virtualization, not the Type 1 used by cloud providers.

Then the Type 1 virtualization exploits began to appear.

The Intel SYSRET Privilege Escalation

CVE-2012-0217 announces a vulnerability described in VU#649219. This vulnerability is caused by the way Intel processors handle errors in their version of AMD's SYSRET instruction. Versions of Xen and XenServer were vulnerable, as were versions of the operating systems Oracle Solaris, Windows Server 2008, Windows 7, FreeBSD, NetBSD and Illuminos. Apple OX X and OpenBSD were not vulnerable. Neither was Linux, as the problem had been fixed in the Linux kernel in 2006 with CVE-2006-0744. Note that the wording of that CVE description makes it sound like something specific to Linux, possibly leading to little to no attention from other operating system providers: "Linux kernel before 2.6.16.5 does not properly handle uncanonical return addresses on Intel EM64T CPUs, which reports an exception in the SYSRET instead of the next instruction, which causes the kernel exception handler to run on the user stack with the wrong GS." The Bugtraq entry describes it as "a local denial-of-service vulnerability". The Linux fix was driven by incompletely communicated understanding of the problem, the OpenBSD fix was a good fix but doesn't seem from its comments to have understood the risk.

VUPEN provides a detailed description of how a root attacker on a DomU instance can exploit the Dom0 virtual machine, and thereby manage the hypervisor. It is very detailed, authoritative but a bit difficult to read.

Rafal Woytczuk analyzed this in his paper A Stitch in Time Saves Nine: A Case of Multiple OS Vulnerability, presented at Black Hat 2012. It's still quite technical, showing the stack and registers, but the graphics and further explanation help.

Xen provides a more easily read and understood description, explaining the difference between AMD SYSCALL/SYSRET and Intel SYSENTER/SYSEXIT, how Intel's implementation can be abused, and how to avoid this problem. Xen also issued a formal vulnerability announcement accompanied with patches. Kaspersky Lab's Threatpost provided a brief overview.

The Cache Observing Attack

Cross-VM Side Channels and Their Use to Extract Private Keys is a paper by Zhang and Reiter at the University of North Carolina, Juels at RSA and Ristenpart at University of Wisconsin. This paper presents a major advance in virtualization attacks. As their Introduction concludes, "We thus believe that our work serves as a cautionary note for those who rely on virtualization for guarding highly sensitive secrets of many types, as well as motivation for the research community to endeavor to improve the isolation properties that modern VMMs provide to a range of applications."

Here is my summary of their paper:

Their paper has been discussed in the Cloud Computing Google group. It's a mailing list, with the expected irrelevant questions and misunderstandings, plus some self-promotion. But the discussion contains some useful information.

This is the first demonstration of an attack in which a hostile VM extracts fine-grained information from a victim VM running on a symmetric multiprocessing system virtualized on Xen. This type of attack is especially difficult because the hypervisor places more layers of isolation between the attacker and target operating systems than found in cross-process attacks.

They assume that the attacker can run processes on a VM co-resident on the same physical computer as the target VM. The exploit doesn't need root access, they infer cache contents without directly reading them with a compromised kernel.

The L1 cache has the most potential for a damaging side-channel attack, but it is not shared across multiple cores. The attacking VM must contrive to frequently alternate execution on the same core as the target, in order to measure side-effects of the target's instruction execution sequence. They accomplish this with aggressive interprocess interrupts (or IPIs). This scheduling abuse is a vulnerability on its own, as it allows degradation of service.

The cache attack uses a Prime-Probe protocol, in which they allocate many contiguous memory pages for a combined size equal to that of the cache. They then execute a sequence of instructions that jumps through all the blocks of the page, measuring the elapsed time for the sequence. The attacker then waits for a specific interval in which the target uses the cache. Timing how long it then takes to refill and traverse the same cache sets reveals some information about how the target used the cache during that interval.

They must contend with various sources of measurement noise. Hardware sources include TLB misses, speculative execution and power saving. Software sources include context switches, emulated rdtsc instruction calls, and interference from other non-target domains.

Their experiment used two paravirtualized DomU guest VMs, each of which had two virtualized CPUs, co-resident on a single-socket quad-core processor (Intel Core 2 Q9650). One was the target, the other was the attacker. Dom0 was given a single virtualized CPU.

They assume detailed but reasonable knowledge about the target. In their case, that the OS is Linux kernel 2.6.32.16 with libgcrypt cryptographic library version v.1.5.0 and GnuPG v2.0.19, all those part of Ubuntu 10.04. That level of knowledge seems reasonable in a cloud setting, as most cautious Amazon EC2 users will select a recent, perhaps the latest, Amazon Linux image.

They made some assumptions about the virtual machines being CPU-bound with non-cryptographic computation, in order to maximize the fraction of time when the target and attacker share a physical CPU. They also assumed that the target was often performing ElGamal decryption. These assumptions are not required for the attack to work, they just make the experiment run faster.

The goal of the attack was to steal an ElGamal private key. ElGamal computes a modular exponentiation, x^s mod N. The libgcrypt implementation of this exponentiation uses a classic square-and-multiply algorithm. The sequence of machine instructions directly leaks the bits of the key, and they aim to observe that sequence.

Their attack is an impressive combination of techniques. One component is a multiclass support vector machine, a supervised machine learning tool. Its output is noisy, so they use a hidden Markov model to reduce the errors. This provides a large collection of fragments of the execution sequence, which they then assemble with a dynamic programming algorithm developed for discovering similarities in the amino acid sequences of two proteins.

The result is that they perform 300,000,000 Prime-Probe trials in chunks of 100,000 over a period of about six hours. This yielded about 1000 key-related fragments output from their hidden Markov model. Assembling the spanning sequences, they were left with a brute-force search of just 9,862 possible keys.

Defenses Against This Cache Observing Attack

Avoid co-residency. But this conflicts with the desire (or need, in the case of cloud computing) to use virtualization in the first place.

Use side-channel resistant algorithms. Their suggestion is to do exponentiation with the Montgomery ladder algorithm instead of the square-and-multiply algorithm. However, these alternative algorithms are slower, and we do not yet have formal proof that they are immune to side-channel attacks.

Modify the core scheduling. Future Xen releases will limit the pre-emptive capability of a single virtualized CPU, although this only limits and does not eliminate the utility of this side-channel.

Practical Cloud Limits on This Cache Observing Attack

Practical cloud providers like Amazon and Rackspace have far more virtualized machines running on any one physical platform, greatly limiting the throughput of this type of attack.

Also, it seems (although you can't really tell) that any one virtual machine is migrated from one physical platform to another in an unpredictable and largely (if not entirely) undetectable way.

Finally, if you did manage to steal an ElGamal key in a public cloud setting, you would have no idea as to whose key it was or what data it might decrypt. It would be like finding a house key in the transfer-only hall of a large international airport. Yes, the key opens a door somewhere in the world, but you don't know where that door is.

Continuing Development

Simon Crosby is a creator of Xen. He founded a startup company Bromium, which is looking to use Xen features to boost security. Introspection is a feature of Xen allowing VMs to be inspected by a trusted VM. Intel and McAfee's DeepSAFE technology sits between the hardware and the OS much like a Type 1 hypervisor, letting it see things that the VM operating system can't. I suppose this includes things like detecting an OS subverted by hostile loadable kernel modules. Crosby was interviewed by Network World.

Also see McAffe's Virtualization and Security white paper.