VMScape and why Xen dodged it

123 points by plam503711 4 days ago

On HP business PCs, Xen's microkernel architecture was extended for copy-on-write nested virtualization microVMs (VM per browser tab or HTTP connection) and UEFI-in-VM, https://www.platformsecuritysummit.com/2018/speaker/pratt/ | https://news.ycombinator.com/item?id=42282053#42286147

Imminent unification of Android and ChromeOS will likely use a similar h/w nested-virt architecture based on L0 pKVM + L1 KVM hypervisors on Arm devices.

Honda is using Xen, "How to accelerate Software Defined Vehicle" (2025), https://static.sched.com/hosted_files/xensummit2025/93/HowTo...

eigenform - 4 days ago

Since everyone is upset about the lack of technical details in the article, I'll try:

The takeaway from that paper (imo, afaict) is that guest userspace can influence indirect predictor entries in KVM host userspace. I don't really know anything about Xen, but presumably it is unaffected because there is no Xen host userspace, just a tiny hypervisor running privileged code in the host context. With KVM, Linux userspace is still functional in the host context.

Presumably, the analogy to host kernel/userspace in KVM is dom0, but in Xen this is a guest VM. If cross-guest cases are mitigated in Xen (like in the case of KVM, see Table 2 in the paper), you'd expect that this attack just doesn't apply to Xen. Apart from there being no interesting host userspace, IBPB/STIBP might be enough to insulate other guests from influencing dom0. If you're already taking the hit of resetting the predictors when entering dom0, presumably you are not worried about this particular bug.

edit: Additional reading, see https://github.com/xen-project/xen/blob/master/xen/arch/x86/...

bayesnet - 4 days ago

While it’s interesting that Dom0 avoids Spectre-style branch prediction attacks it’s not clear from TFA exactly why that is so. How does the architecture of the hypervisor avoid an attack that seems to be at the hardware level? From my limited understanding of Spectre and Meltdown, swapping from a monolithic to a microkernel wouldn’t mitigate an attack. The mitigations discussed in the VMscape paper [0] are hardware mitigations in my reading. And I don’t see Xen mentioned anywhere in the paper for that matter.

I guess it’s sort of off topic, but I was enjoying reading this until I got to the “That’s not just elegant — it’s a big deal for security” line that smelled like LLM-generated content.

Maybe that reaction is hypocritical. I like LLMs; I use them every day for coding and writing. I just can’t shake the feeling that I’ve somehow been swindled if the author didn’t care enough to edit out the “obvious” LLM tells.

[0]: https://comsec-files.ethz.ch/papers/vmscape_sp26.pdf

csmantle - 3 days ago

I think the author actually meant "Yes, vmscape can leak information on Xen, but only leaks from a miniature Dom0 process." Leaking from an small pool not being a security issue they seemed to consider.
Agreed on the point about hw-level mitigation. The leakage still exists. Containing it in a watertight box is quick and effective, and it does avoid extra overhead. But it doesn't patch the hole.
jcjgraf - 2 days ago

Please see my other comment where I share more details about VMScape and why Xen is not affected. In short, it is because branch predictor state is flushed when transitioning to Dom0. Indeed, it has nothing to do with type of kernel... And yes, LLMs were at work. The "quote" in the article is not an actual quote...
- 4 days ago

[deleted]
mikewarot - 4 days ago

I think it might be translation from French instead of LLM usage.
While Microkernels are great for overall security, it's also not obvious to me how it helped in this case.
- sim7c00 - 3 days ago
  
  it might be as simple as more rigid context transfers flushing caches. there are a lot of guesses on here now. itd be great if people stopped using may or might and looked in the code. everyone's hopping on the lack of context and adding guesses. thats not helpful
somat - 4 days ago

Maybe this is the problem with LLMs, Using them feels great, But having them be used on you is highly unpleasant.
- sim7c00 - 3 days ago
  
  sound like a problem of cognitive dissonance, not of LLM
remix2000 - 4 days ago

It's not necessarily a sign of AI slop — could be just proper typography! :3
- duskwuff - 4 days ago
  
  It's not the em dash, but the negative parallelism ("not X, but Y"). This is a pattern which some LLMs really like using. I've seen some LLM-generated texts which used it in literally every sentence.
  (The irony of opening with this pattern is not lost on me.)
  As an aside, Wikipedia has a fascinating document identifying common "tells" for LLM-generated content:
  https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing
  - shippage - 3 days ago
    
    I'm also on the spectrum and like using various kinds of parallel construction, including antithesis.
    I also tend to use a lot of em dashes. If I posted something I wrote in, say, 2010, I'd likely get a lot of comments about my writing absolutely, 100% being AI-written. I have posted old writing snippets in the past year and gotten this exact reaction.
    I originally (two decades ago) started using em dashes, I think, because I also tend to go off on frequent tangents or want to add additional context, and at the beginning of the tangent, I'm not entirely sure how I'll phrase it. So, instead of figuring out the best punctuation at that moment (be that a parenthesis, a comma, or a semicolon for a list), I'll just type an em dash (easy on a Mac).
    Then I don't go back and fix it afterward because I have too many thoughts and not enough time to express them. There are popular quotes about exactly this issue.
    It's a kind of laziness in the form of my expression to give me more mental capacity to focus on the content. Alt 0151 and Alt 0150 are still burned into my memory from typing em dashes and en dashes so often on Windows.
    I suppose I'll have to consider this my own punctuation mode collapse that RLHF is now forcing me to correct.
    
    yellowapple - 2 days ago
    
    I've started deliberately using em-dashes and “smart” quotes (made easy by configuring a compose key) — mostly because they look nice, but also out of spite for any software that's somehow not properly Unicode-aware in 20-fucking-25.
  - rickydroll - 3 days ago
    
    Does using Grammarly count as AI-assisted writing?
    I use Grammarly because it helps fix speech recognition errors. One of the challenges of speech recognition use is that it is a bit difficult at times to construct grammatically correct sentences in your head, then speak those sentences, and then proofread them before you start the next bit of writing.
  - exe34 - 4 days ago
    
    I have autism and I like using that kind of comparison when writing.
  - barrkel - 3 days ago
    
    It's antithesis. And it's really overused by ChatGPT.

snvzz - 3 days ago

The Xen "microkernel" is unfortunately bloated. seL4 is much smaller and runs VMM as an isolated unprivileged task.

VM exceptions are all handled by VMM. A VM escape would still be confined in VMM, which has no higher capabilities than the VM itself. Capabilities are enforced by the formally verified seL4.

BobbyTables2 - 4 days ago

I don’t quite see what they’re getting at.

Is it just because it’s another VM switch to get to dom0? Seems a bit unlikely…

Xen has a hypervisor for dealing with the low level details of virtualization and uses dom0 for management and some HW emulation.

QEMU/KVM uses the host kernel for the low level details of virtualization and the QEMU userspace portion to do the actual HW emulation.

They’re actually remarkably similar aside from the detail that the Xen hypervisor only juggles VMs but the KVM design involves it juggling other normal processes…

The people praising Firecracker are just turning a blind eye to the 10000+ lines of (really hairy) C code in the kernel doing x86 instruction emulation and the actual hypervisor part.

jcjgraf - 2 days ago

Yes, Xen is indeed protected thanks to using Dom0 for running the pendant of Linux's userspace hypervisor (QEMU, fircracker, etc.).This is because transitions to Dom0 lead to a branch predictor flush. See my other comment for more information. As you say, firecracker is equally affected by VMScape as QEMU is...

AtlasBarfed - 3 days ago

So this requires the two VMs to be sharing execution on a core? Or perhaps a shared cache? Or would it work across VMs "pinned" to different CPUs?

It's weird to me that cloud hosts aren't absolutely swimming in cores now, but with Intel struggling and AMD somewhat resting on its laurels, which it stupidly did in the Hector Ruiz days, nothing is pushing the envelope. In 2010, fifteen years ago, we had 12 core CPUs.

In 2010 we had a billion or so transistors. In 2020, we had 50 billion. In 2010 we were at 28nm, now we're at 3nm.

We should have 100x the CPUs on die now or more. a thousand x86 cores, god knows how many Arms, and god knows how much you could do with hi-low core counts.

Anyway, what I'm getting at is all of these vulnerabilities across process execution or VM execution could be moot: if the processes were isolated to a core or set of cores, and the VM isolated to its own dedicated branch predictors in its own cores. Then go ahead and do whatever tricks you want. Obviously you don't want hyper-threading.

jcjgraf - 2 days ago

Indeed, victim (e.g. userspace hypervisor like QEMU, firecracker, etc) and attacker (e.g. malicious guest) need to run on the same core. But with VMScape this is always give, because a guest runs as the same process as its hypervisor. Before VMScape, developers only isolated different VMs, different processes and supervisor domains from malicious users. VMScape explits a novel threat model.
pjmlp - 3 days ago

Modern Windows is already using two VMs as well, or even more if WSL is being used.
Hyper-V is a type 1 hypervisor, when enabled, which is required for many security measures in modern Windows, the first Windows instance is a privileged guest, just like with Xen.
Additionally anyone using WSL 2.0, is running another set of VMs alongside Windows, depending on how many flavours of Linux and containers are configured.

indigodaddy - 4 days ago

If anyone was looking there are still some Xen VPS providers around, one of the oldest being Tornado VPS (formerly prgmr.com).

https://tornadovps.com/about

The founders literally wrote the book on xen:

https://nostarch.com/releases/xen.html

RealStickman_ - 3 days ago

This made me curious to find out reasons why KVM is so much more popular than Xen. I wasn't able to find anything concrete beyond "KVM is the standard and supported by out tooling", which obviously is the case nowadays, but still leaves me wondering what KVM did so much better than Xen when it first released or if this was just a concidence.
- shortsunblack - 3 days ago
  
  KVM was made because Citrix made moves against Xen that spooked Linux community, hence KVM. Then Red Hat ran with it and based its virtualization platform on it.
  Citrix involvement has subsided in meantime and the ecosystem is much healthier (governance is actually under Linux Foundation), but the damage was done.
  Xen to this day lacks in features, also.
  - jacobgorm - 3 days ago
    
    Also, Xen's main clame to fame was that para-virtualization allowed it to host Linux and *BSD VMs at close to zero overhead, but at time what everyone was looking for was a way to host Windows VMs, which is where all the pain points and the money was. CPUs were evolving to support this use-case, making para-virtualization less important, and Xen had to evolve quite quickly to include QEMU in the mix, leaving a bit of convoluted mess initially, and causing a lot of friction during the attempts to get merged into the Linux kernel. On top, the Xen management tool-stack had been written by happy amateurs in Python and Twisted, before any of those technologies where near ready for production use, with massive slowness and unfixable memory leaks as results. KVM provided a fresh take built with the benefits of hindsight, got merged into Linux on the first attempt, and gained the backing of Redhat, and the rest is more of less history.
  - jacobgorm - 3 days ago
    
    KVM was launched before Citrix acquired XenSource. But Redhat had also tried to acquire XenSource and threatened its founders that if they did not come along Redhat would “rip off their heads and shit down the hole”, because “Redhat was the only company allowed to make money off open source”. In that light it made sense for Redhat to back a competitor to Xen.
- rickydroll - 3 days ago
  
  The Citrix fuckery triggered the development of XCP-ng. I've been using the XCP-ng/Xen Orchestra stack for several years now, both in my homelab and professionally. It is so much easier to work with than KVM that it's part of my go-to toolkit.
  If you have a spare machine or feel like picking up a tiny form factor i5 PC, you can play with Zen and Xen Orchestra fairly easily.
  I once ran a three-node cluster of TinyFormFactor PCs running Xen, and it was A good framework for learning. The only reason I moved away from it is that the TFF PCs only had one gigabit Ethernet port and were limited to 32GB of RAM. I moved to more traditional small desktop PCs so I could add multiple 10-gigabit Ethernet interfaces and RAM.
  Someday, I'll write up how I did an easy DMZ with XCP-ng.
  - RealStickman_ - 3 days ago
    
    I use and like XCP-ng in my homelab, but the initial release was apparently in 2018. That's still ~15 years of Citrix fuckery that apparently birthed KVM.
    
    rickydroll - 3 days ago
    
    Yeah, Citrix really made a mess of things. If you look at the history of IT companies, it's impressive how many user/company decisions are made because of vendor fuckery.
floam - 3 days ago

I enjoyed seeing what I could do with a tiny tiny (64 MB RAM) NetBSD VPS on prgmr.com back in the day.

yjftsjthsd-h - 4 days ago

I guess I don't quite follow. The attack can let an attacker in a normal VM see memory in either the host or a Xen dom0 VM. Why is it less impactful to get memory from the management VM instead of the host?

jcjgraf - 2 days ago

VMScape does not allow an attacker to read memory of Dom0 or the host. Dom0 is safe because branch predictor state is flushed when transitioning to Dom0, and the host is secured as it runs as supervisor, while VMScape only targets userspace. See my comment further up for more information.
- yjftsjthsd-h - a day ago
  
  Since the attack does work cross-VM with KVM, it would then seem that Xen really has two advantages, and it kinda only got out unscathed because of the combination of both:
  * management stuff mostly lives in Dom0
  * Xen does the flushes to protect VMs from each other
  If you didn't do the first, then attacks on the host might work, and if you didn't do the second then attacks on Dom0 might work, but the combination blocks both vectors. Is that about right?

aborsy - 4 days ago

Which is precisely why Qubes OS uses Xen.

bionsystem - 3 days ago

Nowadays you can run your VMs inside LXC, SmartOS also run them inside zones by default. I wonder if the same exploits could be used accross the container layer of both technologies or if it would protect from leaks.

hugo1789 - 3 days ago

Maybe because xen is a type 1 hypervisor in its original meaning and all the other ones are type 2? (yes, ESX(i) doesn't use linux but it also brings its own os on which it runs on top)

jcjgraf - 2 days ago

Author of the VMScape paper here.

It's great to see an article highlighting the impact of VMScape on Xen, especially since our paper [1] does not discuss Xen in detail (we only briefly mention it in the blog post [2]).

That said, the article unfortunately lacks technical precision. Some statements are vague, and "our quote" ("According to the ETH team") is misleading, as those are not our words. To be clear: VMScape is not a cross-VM attack. So please treat such summaries with caution.

Here are some clarifications:

The core issue lies in the hardware. On all AMD Zen CPUs, the branch prediction unit cannot natively distinguish between host user, guest-1 user, and guest-2 user domains (newer Intel CPUs can do to some extend). Supervisor domains (host or guest kernel) are protected by the CPU effectively disabling speculative execution in those domains. But because user domains share branch predictor state, execution in one can control speculation in another - the fundamental root of Spectre-BTI. To enforce isolation, predictors must be flushed (IBPB) whenever transitioning between such domains.

On Linux KVM, an IBPB is issued on guest-1 to guest-2 switches and on process switches. However, because a guest runs in the same process as its userspace hypervisor (e.g. QEMU, firecracker, etc), there is no isolation mechanism in place for this transition. VMScape exploits exactly this gap. The mitigation is to add an IBPB on guest to host userspace transitions.

Xen, while also running on the same flawed hardware, is not vulnerable to VMScape. But the reason is not (just) asynchronism. Asynchronism makes exploitation only harder. Instead, the key reason is that the equivalent of Linux's userspace hypervisor runs inside Dom0 on Xen, which is itself "treated like a guest". Because Xen already issues IBPBs between guest transitions, Dom0 is protected from DomU.

Assigning responsibility for vulnerabilities at the hardware–software boundary is inherently challenging and often depends on implicit assumptions about the threat model. VMScape introduces a novel threat model that had not been considered before. Consequently, the responsible entities concluded that the lack of host/guest branch predictor state isolation does not qualify as a hardware issue, since adequate mitigations, such as IBPB, are readily available, but insufficiently used by software.

[1] https://comsec-files.ethz.ch/papers/vmscape_sp26.pdf [2] https://comsec.ethz.ch/research/microarch/vmscape-exposing-a...