VMScape and why Xen dodged it

virtualize.sh

123 points by plam503711 4 days ago


transpute - 4 days ago

On HP business PCs, Xen's microkernel architecture was extended for copy-on-write nested virtualization microVMs (VM per browser tab or HTTP connection) and UEFI-in-VM, https://www.platformsecuritysummit.com/2018/speaker/pratt/ | https://news.ycombinator.com/item?id=42282053#42286147

Imminent unification of Android and ChromeOS will likely use a similar h/w nested-virt architecture based on L0 pKVM + L1 KVM hypervisors on Arm devices.

Honda is using Xen, "How to accelerate Software Defined Vehicle" (2025), https://static.sched.com/hosted_files/xensummit2025/93/HowTo...

eigenform - 4 days ago

Since everyone is upset about the lack of technical details in the article, I'll try:

The takeaway from that paper (imo, afaict) is that guest userspace can influence indirect predictor entries in KVM host userspace. I don't really know anything about Xen, but presumably it is unaffected because there is no Xen host userspace, just a tiny hypervisor running privileged code in the host context. With KVM, Linux userspace is still functional in the host context.

Presumably, the analogy to host kernel/userspace in KVM is dom0, but in Xen this is a guest VM. If cross-guest cases are mitigated in Xen (like in the case of KVM, see Table 2 in the paper), you'd expect that this attack just doesn't apply to Xen. Apart from there being no interesting host userspace, IBPB/STIBP might be enough to insulate other guests from influencing dom0. If you're already taking the hit of resetting the predictors when entering dom0, presumably you are not worried about this particular bug.

edit: Additional reading, see https://github.com/xen-project/xen/blob/master/xen/arch/x86/...

bayesnet - 4 days ago

While it’s interesting that Dom0 avoids Spectre-style branch prediction attacks it’s not clear from TFA exactly why that is so. How does the architecture of the hypervisor avoid an attack that seems to be at the hardware level? From my limited understanding of Spectre and Meltdown, swapping from a monolithic to a microkernel wouldn’t mitigate an attack. The mitigations discussed in the VMscape paper [0] are hardware mitigations in my reading. And I don’t see Xen mentioned anywhere in the paper for that matter.

I guess it’s sort of off topic, but I was enjoying reading this until I got to the “That’s not just elegant — it’s a big deal for security” line that smelled like LLM-generated content.

Maybe that reaction is hypocritical. I like LLMs; I use them every day for coding and writing. I just can’t shake the feeling that I’ve somehow been swindled if the author didn’t care enough to edit out the “obvious” LLM tells.

[0]: https://comsec-files.ethz.ch/papers/vmscape_sp26.pdf

snvzz - 3 days ago

The Xen "microkernel" is unfortunately bloated. seL4 is much smaller and runs VMM as an isolated unprivileged task.

VM exceptions are all handled by VMM. A VM escape would still be confined in VMM, which has no higher capabilities than the VM itself. Capabilities are enforced by the formally verified seL4.

BobbyTables2 - 4 days ago

I don’t quite see what they’re getting at.

Is it just because it’s another VM switch to get to dom0? Seems a bit unlikely…

Xen has a hypervisor for dealing with the low level details of virtualization and uses dom0 for management and some HW emulation.

QEMU/KVM uses the host kernel for the low level details of virtualization and the QEMU userspace portion to do the actual HW emulation.

They’re actually remarkably similar aside from the detail that the Xen hypervisor only juggles VMs but the KVM design involves it juggling other normal processes…

The people praising Firecracker are just turning a blind eye to the 10000+ lines of (really hairy) C code in the kernel doing x86 instruction emulation and the actual hypervisor part.

AtlasBarfed - 3 days ago

So this requires the two VMs to be sharing execution on a core? Or perhaps a shared cache? Or would it work across VMs "pinned" to different CPUs?

It's weird to me that cloud hosts aren't absolutely swimming in cores now, but with Intel struggling and AMD somewhat resting on its laurels, which it stupidly did in the Hector Ruiz days, nothing is pushing the envelope. In 2010, fifteen years ago, we had 12 core CPUs.

In 2010 we had a billion or so transistors. In 2020, we had 50 billion. In 2010 we were at 28nm, now we're at 3nm.

We should have 100x the CPUs on die now or more. a thousand x86 cores, god knows how many Arms, and god knows how much you could do with hi-low core counts.

Anyway, what I'm getting at is all of these vulnerabilities across process execution or VM execution could be moot: if the processes were isolated to a core or set of cores, and the VM isolated to its own dedicated branch predictors in its own cores. Then go ahead and do whatever tricks you want. Obviously you don't want hyper-threading.

indigodaddy - 4 days ago

If anyone was looking there are still some Xen VPS providers around, one of the oldest being Tornado VPS (formerly prgmr.com).

https://tornadovps.com/about

The founders literally wrote the book on xen:

https://nostarch.com/releases/xen.html

yjftsjthsd-h - 4 days ago

I guess I don't quite follow. The attack can let an attacker in a normal VM see memory in either the host or a Xen dom0 VM. Why is it less impactful to get memory from the management VM instead of the host?

aborsy - 4 days ago

Which is precisely why Qubes OS uses Xen.

bionsystem - 3 days ago

Nowadays you can run your VMs inside LXC, SmartOS also run them inside zones by default. I wonder if the same exploits could be used accross the container layer of both technologies or if it would protect from leaks.

hugo1789 - 3 days ago

Maybe because xen is a type 1 hypervisor in its original meaning and all the other ones are type 2? (yes, ESX(i) doesn't use linux but it also brings its own os on which it runs on top)

jcjgraf - 2 days ago

Author of the VMScape paper here.

It's great to see an article highlighting the impact of VMScape on Xen, especially since our paper [1] does not discuss Xen in detail (we only briefly mention it in the blog post [2]).

That said, the article unfortunately lacks technical precision. Some statements are vague, and "our quote" ("According to the ETH team") is misleading, as those are not our words. To be clear: VMScape is not a cross-VM attack. So please treat such summaries with caution.

Here are some clarifications:

The core issue lies in the hardware. On all AMD Zen CPUs, the branch prediction unit cannot natively distinguish between host user, guest-1 user, and guest-2 user domains (newer Intel CPUs can do to some extend). Supervisor domains (host or guest kernel) are protected by the CPU effectively disabling speculative execution in those domains. But because user domains share branch predictor state, execution in one can control speculation in another - the fundamental root of Spectre-BTI. To enforce isolation, predictors must be flushed (IBPB) whenever transitioning between such domains.

On Linux KVM, an IBPB is issued on guest-1 to guest-2 switches and on process switches. However, because a guest runs in the same process as its userspace hypervisor (e.g. QEMU, firecracker, etc), there is no isolation mechanism in place for this transition. VMScape exploits exactly this gap. The mitigation is to add an IBPB on guest to host userspace transitions.

Xen, while also running on the same flawed hardware, is not vulnerable to VMScape. But the reason is not (just) asynchronism. Asynchronism makes exploitation only harder. Instead, the key reason is that the equivalent of Linux's userspace hypervisor runs inside Dom0 on Xen, which is itself "treated like a guest". Because Xen already issues IBPBs between guest transitions, Dom0 is protected from DomU.

Assigning responsibility for vulnerabilities at the hardware–software boundary is inherently challenging and often depends on implicit assumptions about the threat model. VMScape introduces a novel threat model that had not been considered before. Consequently, the responsible entities concluded that the lack of host/guest branch predictor state isolation does not qualify as a hardware issue, since adequate mitigations, such as IBPB, are readily available, but insufficiently used by software.

[1] https://comsec-files.ethz.ch/papers/vmscape_sp26.pdf [2] https://comsec.ethz.ch/research/microarch/vmscape-exposing-a...