> Unfortunately, the time between free and revocation introduces a short-but-not-zero window for UAF bugs/attacks. This time gap is even explicitly acknowledged in the Reloaded paper!
Yes, revocation is batched and asynchronous. This does mean that capabilities remain valid beyond the language-level lifetime of the allocation. However, that does not mean that, within that window, we have not dealt with any UAF attacks. The vast majority of UAF attacks do not care about the fact that the memory has been freed, but rather that the memory has since been repurposed for something else (whether the allocator's own internal metadata or some other new allocation). Cornucopia (both versions) ensures that this does not happen until the next revocation pass; that is, it "quarantines" the memory. Effectively, when you call free, it's "as if" the free were deferred until revocation time. Therefore, if your capability is still valid, that memory is still only in use by you, and so the vast majority of attacks no longer work. This protects you against UAF in a similar way to how making free a no-op protects against most attacks. This is not all attacks, very occasionally the bug is a result of something like undefined behaviour that follows, but I don't know if we've found even one real-world instance of a UAF that this approach isn't going to catch. I'm sure they exist, but the nuance is crucial here to be able to reason about the security of various models.
But yes, MTE+CHERI are complementary in this regard. We have drafted ideas for using MTE with CHERI, which would (a) let you immediately prevent access (noting though that the capability would remain valid for a while, still) (b) let you recycle memory with different MTE colours before needing to quarantine the memory (hoping that, by the time you run out of colours for that memory region, a revocation pass has reclaimed some of them). That is, in theory it both gives stronger protection and better performance. I say in theory because this is just a sketch of ideas, nobody has yet explored that research.
I also note that MTE does not fix the undefined behaviour problem; it will only trap when it sees a memory access, but vulnerabilities introduced due to compilers exploiting undefined behaviour for optimisation purposes may not perform a memory access with the pointer before it's too late.
Yeah you need a compiler, linker and OS. That's true of any security technology. CHERI may be more significant in that regard because it's a bigger rethink than just stuffing some extra metadata into the existing types, but it's not at all intractable. We, a research group, maintain CheriBSD, a "full-fat" port of FreeBSD to CHERI (Morello and CHERI-RISC-V), so to a big tech organisation it's a small investment. The cost to tech companies is not making it work, it's often much more boring business factors.
Where studies suggest "a lot" is sub-0.1%. For example, https://www.capabilitieslimited.co.uk/_files/ugd/f4d681_e0f2... was a study into porting 6 million lines of C and C++ to run a KDE+X11 desktop stack on CHERI, and saw 0.026% LoC change, or ~1.5k LoC out of ~6 million LoC, all done in just 3 months by one person. That's even an overestimate, because it includes many changes to build systems just to be able to cross-compile the projects. It's not nothing, but it's the kind of thing where a single engineer can feasibly port large bodies of code. Yes, certain systems code will be worse (like JITs), but the vast majority of cases are not that, and even those are still feasible (e.g. we have people working with Chromium and V8).
Does that study include enabling intra object overflow protection, or not?
When I say that this optional feature would force you to change a lot more code I’m comparing CHERI without intra object overflow protection to CHERI with intra object object overflow protection.
Finally, 6 million lines of code is not that impressive. Real OSes are measured in billions
> Does that study include enabling intra object overflow protection, or not?
>
> When I say that this optional feature would force you to change a lot more code I’m comparing CHERI without intra object overflow protection to CHERI with intra object object overflow protection.
Sorry, I misinterpreted what you were saying. No, that's not with subobject bounds. If you want that then yes there is more incompatibility, because C does not have a good subobject memory model. That's not really because there's anything wrong with CHERI, it's just because the language itself is at odds in places with doing that kind of enforcement with any technology. But, if you're willing to incur that additional friction (as we do for our pure-capability kernel in CheriBSD), you can enable it, and it can protect against additional vulnerabilities that other security technologies fundamentally cannot. We even provide a sliding scale of subobject bounds enforcement, where each of the three levels restricts bounds in more cases at the expense of compatibility. The architecture gives you the flexibility to decide what software model you want to enforce with it.
> Finally, 6 million lines of code is not that impressive.
We have far more than that ported, that was just one case study done in a few months by one developer. FreeBSD alone is, by my very rough estimation cloc that excludes LLVM, about 14 million lines of C and C++ (yes, I'm not distinguishing architecture-specific code and all kinds of other considerations, but it's close enough and gives an order of magnitude for the purposes of this conversation), and we have FreeBSD ported. Not to mention our work on, say, Chromium and V8 (Chromium being another set of 10s of millions of lines of code, again tractable with the engineering effort of just a few members of our research group).
> Real OSes are measured in billions
Citation needed. The Linux kernel is only a bit over 40 million lines of code these days. Real systems may well approach the billions of lines of code running once you factor in all the libraries, daemons and applications running on top of it, but that is not all low-level OS code that needs the kind of porting an OS or runtime does. Even if it were a billion lines of code, though, extrapolating at 0.026% that would be 260 kLoC changed, which isn't that scary a number.
Even V8, which is about the worse case you could possibly have (highly-stylised code written in a way that uses types in CHERI-unfriendly ways; a language runtime full of pointers; many (about 6?) different highly-optimised just-in-time compilers that embed deep knowledge of the ISAs and ABIs they are targeting and like to play games with pointers in the name of performance) we see (last I checked) ~0.8% LoC changed, or about 16k out of 2 million. The porting cost is real, but the numbers have never suggested to us it's at all intractable for industry.
You may wish to read what the current pure-capability CHERI Linux user ABI specifies for mremap(), because we (primarily Arm, in conjunction with us) have thought about this, and the conclusion is not "the existence of mremap() makes CHERI undeployable". See https://git.morello-project.org/morello/kernel/linux/-/wikis...
Add a a sliding window aliasing mode to the hardware? You'd set a page table bit saying "check capabilities not against my VA, but those VAs over there"
To reiterate what I've said elsewhere, CHERI does not need a whole parallel memory architecture, there is just one that gets a slight extension over a non-CHERI/MTE system to include tags. But that is the same story as MTE, which also needs to propagate the tags in the memory system (and in fact, more tags, since we just need one bit per 16 bytes, whereas MTE needs 4 bits per 16 bytes in the common scheme).
Cambridge and Arm have made a joint statement that nothing that is essential to the deployment of CHERI ("capability essential IP") is being patented by them: https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-953.pdf. As with any patent issues, you should consult your legal team and not take anyone else's word for it, because patent law is a minefield and who knows what patents may be out there lurking that nobody realises happens to cover some aspect of CHERI, or design choices in an implementation of it, as with any processor technology, but we are not out to patent it. We believe that the right thing to do is to make the technology open in order to allow it to be widely used for the good of the field.
There is one stack, the normal program stack that's normal main memory.
> capability pointers
If you use pure-capability CHERI C/C++ then there is only one type of pointer to manage; they just are implemented as capabilities rather than integers. They're also just extensions of the existing integer registers; much as 64-bit systems extend 32-bit registers, CHERI capability registers extend the integer registers.
> requires microarchitectural support for a tag storage memory
Also true of MTE?
> your memory operations also need to provide a pointer to a capability object stored in the capability store
There is no "capability object stored in the capability store". The capability is just a thing that lives in main memory that you provide as your register operand to the memory instruction. Instead of `ldr x0, [x1]` to load from the address `x1` into `x0`, you do `ldr x0, [c1]` to load from the capability `c1`. But `c1` has all of the capability; there is no indirection. It sounds like you are thinking of classical capability systems that did have that kind of indirection, but an explicit design goal of CHERI is to not do that in order to be much more aligned with contemporary microarchitecture.
> The capability store is literally a separate bus and memory that isn't accessible by programs,
As above, there is no separate bus, and capabilities are not in separate memory. Everything lives in main memory and is accessed using the same bus. The only difference is there are now capability tags being stored alongside that data, with different schemes possible (wider SRAM, DRAM ECC bits, carving out a bit of main memory so the memory controller can store tags there and pretend to the rest of the system that memory itself stores tags). To anything interacting with the memory subsystem, there is one bus, and the tags flow with the data on it.
> To anything interacting with the memory subsystem, there is one bus, and the tags flow with the data on it.
To the architecture, there is one access mechanism with the tag bit set and one separate mechanism with the tag bit unset, no?
I thought this was the whole difference: in MTE, there is a secret tag hidden in a “normal” pointer by the allocator, and in CHERI, there is a separate architectural
route for tag=0 (normal memory) and tag=1 (capabilities memory), whether that separate route eventually goes to some partition of main memory, a separate store entirely, ECC bit stuffing, or whatever?
No. The capability itself lives in normal memory intermingling with data just like any other pointer. There is no "capabilities memory", it is just memory.
In MTE, you have the N-bit (typically 4) per-granule (typically 16 byte) "colour"/tag that is logically part of the memory but the exact storage details are abstracted by the implementation. In CHERI, you have the 1-bit capability tag that is logically part of the memory but the exact storage details are abstracted by the implementation. If you understand how MTE is able to store the colours to identify the different allocations in memory (the memory used for the allocations, not the pointers to the allocations) then you understand how CHERI stores the tags for its capabilities, because they are the same basic idea. The difference comes in how they're used: in MTE, they identify the allocation, which means you "paint" the whole allocation with the given "colour" at allocation time (malloc, new, alloca / stack variables, load time for globals), but in CHERI, they identify valid capabilities, and so only get set when you write a valid capability to that memory location (atomically and automatically). This leads to very different access patterns and densities (e.g. MTE must tag all data regardless of its type, whereas CHERI only tags pointers, meaning large chunks of plain data have large chunks of zero tag bits, so how you optimise your microarchitecture changes).
Perhaps you're getting confused with details about the "tag table + cache" implementation for how tags can be stored in commodity DRAM? For CHERI you really want 129-bit word (or some multiple thereof) memory, but commodity DRAM doesn't give you that. So as part of the memory controller (or just in front of it) you can put a "tag controller" which hides a small (< 1%) fraction of the memory and uses it to store the tags for the rest of the memory, with various caching tricks to make it go fast. But that is just the tag, and that is an implementation detail for how to pretend that your memory can tag data. You could equally have an implementation that uses wider DRAM (e.g. in the case of DRAM with ECC bits to spare). Both schemes have been implemented. But importantly memory is just 128+1-bit; the same 128 bits always store the data, whether it's some combination of integers and floats, or the raw bytes of a capability. In the former case, the 129th tag bit will be kept as 0, and in the latter case it will be kept as whatever the capability's tag is (hopefully 1).
That's not true. Capabilities are in main memory as much as any other data. The tags are in separate memory (whether a wider SRAM, DRAM ECC bits, or a separate table off on the side in a fraction of memory that's managed by the memory controller; all three schemes have been implemented and have trade-offs). But this is also true of MTE; you do not want those tags in normal software-visible main memory either, they need to be protected.
64-bit size_t is the same on Darwin as GNU/Linux (unsigned long), but uint64_t is not (Darwin defines it as unsigned long long, not unsigned long). Perhaps this is what you remember?
Yes, revocation is batched and asynchronous. This does mean that capabilities remain valid beyond the language-level lifetime of the allocation. However, that does not mean that, within that window, we have not dealt with any UAF attacks. The vast majority of UAF attacks do not care about the fact that the memory has been freed, but rather that the memory has since been repurposed for something else (whether the allocator's own internal metadata or some other new allocation). Cornucopia (both versions) ensures that this does not happen until the next revocation pass; that is, it "quarantines" the memory. Effectively, when you call free, it's "as if" the free were deferred until revocation time. Therefore, if your capability is still valid, that memory is still only in use by you, and so the vast majority of attacks no longer work. This protects you against UAF in a similar way to how making free a no-op protects against most attacks. This is not all attacks, very occasionally the bug is a result of something like undefined behaviour that follows, but I don't know if we've found even one real-world instance of a UAF that this approach isn't going to catch. I'm sure they exist, but the nuance is crucial here to be able to reason about the security of various models.
But yes, MTE+CHERI are complementary in this regard. We have drafted ideas for using MTE with CHERI, which would (a) let you immediately prevent access (noting though that the capability would remain valid for a while, still) (b) let you recycle memory with different MTE colours before needing to quarantine the memory (hoping that, by the time you run out of colours for that memory region, a revocation pass has reclaimed some of them). That is, in theory it both gives stronger protection and better performance. I say in theory because this is just a sketch of ideas, nobody has yet explored that research.
I also note that MTE does not fix the undefined behaviour problem; it will only trap when it sees a memory access, but vulnerabilities introduced due to compilers exploiting undefined behaviour for optimisation purposes may not perform a memory access with the pointer before it's too late.