In my experience, finding the "correct" specification for a problem is usually very difficult for realistic systems. Generally it's unlikely that you'll be able to specify ALL the relevant properties formally. I think there's probably some facet of Kolmogorov complexity there; some properties probably cannot be significantly "compressed" in a way where the specification is significantly shorter and clearer than the solution.
But it's still usually possible to distill a few crucial properties that can be specified in an "obviously correct" manner. It takes A LOT of work (sometimes I'd be stuck for a couple of weeks trying to formalize a property). But in my experience the trade off can be worth it. One obvious benefit is that bugs can be pricey, depending on the system. But another benefit is that, even without formal verification, having a few clear properties can make it much easier to write a correct system, but crucially also make it easier to maintain the system as time goes by.
I'm curious since I'm not a mathematician: What do you mean by "stuck for a couple of weeks"? I am trying to practice more advanced math and have stumbled over lean and such but I can't imagine you just sit around for weeks to ponder over a problem, right? What do you do all this time?
I'm not a mathematician either ;) Yeah, I won't sit around and ponder at a property definition for weeks. But I will maybe spend a day on it, not get anywhere, and then spend an hour or two a day thinking about ways to formulate it. Sometimes I try something, then an hour later figure out it won't work, but sometimes I really do just stare at the ceiling with no idea how to proceed. Helps if you have someone to talk to about it!
Experience counter examples for why a specific definition is not going to work.
Many times, at various levels of "not going to", usually hovering slightly above a syntactic level, but sometimes hovering on average above a plain definition semantic level, i.e. being mostly concerned with some indirect interaction aspects.
I haven't followed closely, and I'm only faintly acquainted with algebraic geometry and category theory. But the TFA links to a formalization of Grothendieck schemes, which are definitely post-WW2 material, and they rely on the Isabelle's locales feature. Are you familiar with this work? How far from the "ordinary mathematician's style" is it?
Thank you for spelling this out; comments like these make this website worthwhile. You've enlightened at least one person today.
You hinted that there's more to QTT (or its implementation in Idris?) than this. Could you elaborate a bit on what these other features are, and what their purpose is?
QTT supports three different "multiplicities". I discussed "used 0 times at runtime" (can be erased) and "used any amount of times at runtime" (cannot be erased). The remaining one is "used exactly once at runtime". These also cannot be erased, but allowing this constraint allows you to encode some very interesting things in the type system. The classic example is "resource protocols", e.g. file handles that force you to close them (and that you cannot use after closing).
I think the main answer was given by another comment: for most projects, correctness usually isn't worth that much (i.e., a bug isn't that expensive for a company producing a piece of software). It also isn't in the software culture (yet?). Today people will be shocked if you don't have a version control system and a CI pipeline. Few people had one 20 years ago. Also, people are often reluctant to learn a new paradigm (think functional programming).
Having done multiple TLA verification projects myself, here are some technical ones:
1. It's actually surprisingly hard to write a good specification, i.e., precisely and formally state what you want. Often it's not even practical, i.e., the specification is so complex that you end up nearly recreating the implementation. Kolmogorov complexity is also a thing for specifications, I suppose ;)
2. TLA in particular is mostly useful for concurrent/distributed systems.
3. Model checking (the method used to verify the TLA models) hits limits pretty quickly. E.g., you may be able only check your system for 2 threads, but forget about 3 threads.
4. The TLA tooling is very clunky by modern standards.
I think for this reason TLA+ is—or should be—good for designing systems: it keeps your design as simple as possible :) and it asks you the difficult questions. Sort of a logical rubber duck.
And the best part is, when you implement the spec, it works! At least by design.
It is quite a different task to implement models for existing systems in fine detail. But it can be done, e.g. https://arxiv.org/abs/2409.14301 . This group in particular combined fine and coarse grained models for the checking, allowing to focus the checking performance only on the parts of the system they were interested in.
Funny to see this posted on HN, just last week I finished writing a blog post about a project I did for checking that code matches the TLA+ specs so I have to shamelessly plug it :) [1] I was aware of the MongoDB paper, but I ended up actually doing exactly what they suggested wouldn't work: I instrumented Rust programs to log traces and checked them against the specification. Even though the models were largely also post-factum models as in their case (i.e., the code was there first, and models were built later on), this worked for us since the models really were aimed at capturing the implementation's behavior. Our target implementation is possibly smaller than what they had to deal with, though (it's only around 5kLoC of Rust smart contracts) so that's a factor.
My technique was slightly different, though. First, I ended up ignoring the initial states, because tests would often manually create fixtures that would serve as a starting point. So I only ended up checking that trace steps obey the transition predicate, which is weaker, but hey, all this is best-effort only anyways. Second, I ended up using the Apalache tool instead of TLC; the reason being that my transitions were sometimes of the form "there exists a number between 1 and N". While model checking you would pick a small N, but in the test runs where the traces come from the N was sometimes huge. TLC ends up enumerating all the possibilities between 1 and N, whereas Apalache translates the whole thing into an SMT formula which is generally trivial to check. Apalache also has the side benefit of requiring type annotations (and providing a type checker) which makes the models a lot easier to refactor when the code changes.
I also ended up creating a small library for helping instrument Rust programs to collect such logs. The idea here was to minimize the amount of logging statements in the production code, to keep it out of the way of the people working on the code base who aren't familiar with TLA. It's somewhat specific to our code base, and there's a fair amount of unsafe nastiness that works only because our code is of certain shape, but in case someone's interested, it's open source: https://github.com/dfinity/ic/tree/master/rs/tla_instrumenta....
I wasn't aware of the 2024 paper they reference though, so curious to see what approach they took.
That's a really cool technique oggy, I don't think I've heard of anything exactly like it. If you haven't told the TLA+ community about it yet, I'm sure they'd like to learn about your approach, either on the mailing list[1] or at the next TLA+ conference[2] in 2026.
Thank you for the kind words! I haven't really talked about it anywhere yet since it's fresh off the press, I'll definitely post it on the mailing list.
A funny anecdote from a concert a few years ago: Allen was playing a solo, holding a note and blowing as hard as he could on his little soprano sax. Next thing you his teeth fly out, and there's a general commotion on as the rest of the band goes searching for the denture on the stage.
Amazing to be alive at that age, to be touring and rocking it, that's another level.
Maybe not range per se, but there are a couple of Europe-specific things that make EVs less attractive. First, many people live in apartment buildings and park their cars on the street, with no charging facilities. So they'd have to make that supercharger trip pretty often. Second, on any given Saturday in July and August there will be millions of people driving for vacation (1000+ km not being uncommon), mostly to the Mediterranean or back to their country of origin (Eastern Europe, Balkans, Turkey). In this period I've already ended up queuing for 10-15 minutes for gas - I can imagine it would be worse without chargers. Also, people who take their car to the poorer European countries often do it because of poor public transport infrastructure there, so the likely poorer EV infrastructure there would play somewhat of a role as well (though likely minor).
TLA+ has also had an SMT-based backend, Apalache [1], for a few years now. In general, you encode your system model (which would be the Rust functions for Verus, the TLA model for Apalache) and your desired properties into an SMT formula, and you let the solver have a go at it. The deal is that the SMT language is quite expressive, which makes such encodings... not easy, but not impossible. And after you're done with it, you can leverage all the existing solvers that people have built.
While there is a series of "standard" techniques for encoding particular program languages features into SMT (e.g., handling higher-order functions, which SMT solves don't handle natively), the details of how you encode the model/properties are extremely specific to each formalism, and you need to be very careful to ensure that the encoding is sound. You'd need to go and read the relevant papers to see how this is done.
"Verifying" and "proving" are synonymous in this case. You prove or verify that the system satisfies some specification, i.e., properties. Your normally write the properties yourself, but sometimes the properties are hardcoded (e.g., your code doesn't panic).
You don't need a particularly strong formal background in CS or maths to verify code, but it's a challenging task never the less.
1. The first challenge where people often give up is that you will often need to reason about your code in a different language than the code itself (not necessarily true for Verus that's being discussed here). In that sense it helps a lot if you already know multiple programming languages, ideally using different paradigms.
2. The second challenge is that you will need to be actually precise in describing what your want your program to. This is a lot harder than it sounds; there's no hand-waving, no "works in practice", but even beyond that, it's actually really difficult to express what you want from the program succinctly, and often not even possible or feasible. On the other hand, this is possibly the most practical skill you will currently be able to take away from the verification tools; it absolutely can change the way how you work, how you collect requirements, and how you go about writing code.
3. The third challenge is then actually performing the verification/proofs. This is currently painfully hard; most literature I've seen comes up with a ration between 5-20x between the amount of code you're verifying (in terms of lines of code, say) and the number of "proof lines" you need to write in some form. This may make sense for extremely critical code where a bug can cost tons of money; e.g., aerospace, or huge companies with tons of users - AWS is doing proofs at a fairly large scale, Apple is now doing the same at a smaller scale too. I would generally recommend NOT writing proofs if you can, but using tools that can do some kind of limited or heuristic verification. Which tools exactly I would recommend for a "working programmer" depend on what you're trying to do. For example, if you are writing concurrent or distributed code, I would recommend using something like TLA and the associated TLC tool. For lower-level Rust code (data structures and similar), I'd recommend the Kani verifier over something like Verus. For many other languages, the choices are limited or non-existent.
Zero-knowledge proofs roughly allow you to convince someone that something is true, without leaving them any wiser WHY it's true. Generally this is not what's interesting for software verification, but it's nevertheless extremely cool (because it's super counterintuitive). Most of the excitement indeed comes from blockchains, where these things are used, but it's debatable whether they're really practical. You can use them to transfer crypto around already today, without revealing the identities of the sender or recipient for example. However they are still computationally quite expensive and only work because there's currently not so much volume in crypto.
Great to see this. I hope it takes off - Bazel is useful but I really like the principled approach behind it (see the Build Systems a la Carte paper), and Neil is scarily good from my experience of working with him so I'd expect that they've come up with something awesome.
One thing I find annoying with all of these general, language-agnostic build systems though is that they break the "citizenship" in the corresponding language. So while you can usually relatively easily build a Rust project that uses crates.io dependencies, or a Python project with PyPi dependencies, it seems hard to make a library built using Bazel/Buck available to non-Bazel/Buck users (i.e., build something available on crates.io or PyPi). Does anyone know of any tools or approaches that can help with that?
If you want to see an approach of bazel to pypi taken a bit to the extreme you can have a look at tensorflow on GitHub to see how they do it. They don't use the above-mentioned building rule because I think their build step is quite complicated (C/C++ stuff, Vida/ROCm support, python bindings, and multiOS support all in one before you can publish to pypi).
I use py_wheel to build packages to be consumed by data scientists in my company. It works well and is reasonably straightforward. Although the packages are pure Python so I haven’t had to deal with native builds.
I have a lot of respect for Neil, but I've been burned by the incompleteness and lack of surrounding ecosystem for his original build system Shake (https://shakebuild.com/). This was in a team where everyone knows Haskell.
I'm cautiously optimistic with this latest work. I'm glad at least this isn't some unsupported personal project but something official from Meta.
I think of Shake as a library for implementing build systems, and was hoping that libraries would emerge that described how to implement rules like C++, and how they should compose together so you can compile C++/Haskell/Python all together happily. A few libraries emerged, but the overall design never emerged. Sorry you got burned :(
Buck2 is at a higher level than Shake - the rules/providers concepts pretty much force you into a pattern of composable rules. The fact that Meta has lots of languages, and that we were able to release those rules, hopefully means it's starting from the point of view of a working ecosystem. Writing those rules took a phenomenal amount of effort from a huge range of experts, so perhaps it was naive that Shake could ever get there on only open source volunteer effort.
The “citizenship” point is really interesting. I’ve found these build systems to be really useful for solving problems in multi-language repos. They make it super easy to create all the build artifacts I want. However, in many ways, they make the source more difficult to consume for people downstream.
These kinds of tools are designed to work in monorepos so you don’t really rely on package management like you do with separate repos. This works really well for sharing close inside companies/entities. Doesn’t work as well for sharing code between entities.
> One thing I find annoying with all of these general, language-agnostic build systems though is that they break the "citizenship" in the corresponding language
I mean, this is kind of the whole point. A language agnostic build system needs a way to express dependencies and relationships in a way that is agnostic to, and abstracts over, the underlying programming language and its associated ecosystem conventions.
That is only true if the output is an application.
If the output is libraries for some ecosystem (perhaps with bindings to something written in Rust or C), one needs to be able to build packages that others not invested in that build system can consume.
But it's still usually possible to distill a few crucial properties that can be specified in an "obviously correct" manner. It takes A LOT of work (sometimes I'd be stuck for a couple of weeks trying to formalize a property). But in my experience the trade off can be worth it. One obvious benefit is that bugs can be pricey, depending on the system. But another benefit is that, even without formal verification, having a few clear properties can make it much easier to write a correct system, but crucially also make it easier to maintain the system as time goes by.