The fact that they've hosted it on GitHub means they've agreed to GitHub's terms, which allows them (via OpenAI) to train on the code.
Also it's pretty hilarious to vibe-code a library that clones another library that someone has spent decades of work on, and then try to prohibit people from using that LLM output as training data for an LLM.
A programming language is a medium to communicate programs to something that can execute them. That isn't exactly the same thing as a tool. A tool in my book is a metaphor for a program that helps achieve some well-defined task. Even if we ignore this difference, we would still want to talk about tool safety.
In my experience there is a C++ mob that hates Rust. These are the people who declare statement of facts as ideology. No good faith dialogue is possible.
There are also competent C++ programmers who misunderstand or don't know how static checking works.
I also witness normal people who are completely surprised by a statement like "C++ is all unsafe" and find that too strong. Using the word "safe" with a technical meaning throws normal people off because, sadly, not everyone who writes code is an academic PL researcher.
"Safe", in Rust and much PL research, means "statically checked by the compiler to be free of UB". If you are pedantic, you need to add "... under the assumption that the programmer checked all conditions for the code that is marked `unsafe`" for Rust. That is all there is to it. Scientific definition.
C++ in its current form is full of gross design mistakes, many of which could be corrected at the price of breaking backwards compatibility. Mistakes happen, aldo to world leading PL researcher (the ML language and polymorphic references) which is why the field embraced mechanically checked proofs. The difference is the willingness to address mistakes.
Academics use "safe" in exactly the meaning the Rust community uses. If you don't understand this, go and educate yourself.
Academics need to communicate effectively which leads to technical meanings for everyday words or made up words and jargon.
Maybe a statically checked safe low-level language is marketing genius. It is also a technical breakthrough building on decades of academic research, and took a lot of effort.
Bjarne and friends chose a different direction. Safety was not a design goal originally but doubling down on this direction means that C++ is not going to improve. These are all facts.
Backwards compatibility is a constraint. Constraints don't give anyone license to stop people who don't have those constraints.
We don't have to feel any moral obligation to use statically checked languages for programs. But claiming that static checking does not make a difference is ignorant, and attaching value to one's ignorance certainly seems like an indicator for ideology and delusion.
The approach described here - model things as graph - can really be applied to model any domains.
If you are into this type of modelling, you may find value in Mangle, a datalog-based logic programming language and deductive database library. You do not need to invent dozens of DSLs but can do it all in one. And without all the RDF trouble.
My trouble with separate categories "memory safety technology" and "sandboxing technology" is that something like WASM execution is both:
* Depending on how WASM is used, one gets safety guarantees. For example, memory is not executable.
* Privileges are reduced as a WASM module interacts with the environment through the WASM runtime and the embedder
Now, when one compiles C to WASM one may well compile things with bugs. A memory access bug in C is still a memory access bug, but its consequences can be limited in WASM execution. Whether fail-stop behavior is guaranteed actually depends on the code the C compiler generates and the runtime (allocation/deallocation, concurrency) it sets up.
So when we enumerate immediately available security options and count WASM as sandboxing, this is not wrong. But WASM being an execution environment, one could do a lot of things, including a way of compiling and executing C that panics when a memory access bug is encountered.
Say your C program has sensitive information in module A and a memory safety bug in module B. Running that program in wasm won’t prevent the attacker from using the bug in B to get read/write access to the data in A.
In practice what the attacker will really do is use the memory safety bug to achieve weird execution: even without control over the program counter, the fact that a memory safety bug inside the wasm memory gives read write access to all of that memory means the attacker can make the program do whatever they want, subject to the wasm sandbox limits (ie whatever the host allows the wasm guest to do).
Basically wasm amounts to a lightweight and portable replacement for running native code in a sufficiently sandboxed process
Your general point stands - wasm's original goal was mainly sandboxing - but
1. Wasm does provide some amount of memory safety even to compiled C code. For example, the call stack is entirely protected. Also, indirect calls are type-checked, etc.
2. Wasm can provide memory safety if you compile to WasmGC. But, you can't really compile C to that, of course...
Correct me if I'm wrong, but with LLVM on Wasm, I think casting a function pointer to the wrong type will result in you calling some totally unrelated function of the correct type? That sounds like the opposite of safety to me.
I agree about the call stack, and don't know about GC.
That is incorrect about function pointers: The VM does check that you are calling the right function type, and it will trap if the type does not match.
Here it is in the spec:
> The call_indirect instruction calls a function indirectly through an operand indexing into a table that is denoted by a table index and must have type funcref. Since it may contain functions of heterogeneous type, the callee is dynamically checked against the function type indexed by the instruction’s second immediate, and the call is aborted with a trap if it does not match.
(Other sandboxing approaches, including related ones like asm.js, do other things, some closer to what you mentioned. But wasm has strict checking here.)
Depends on how it is used is already a sign that WebAssembly isn't really as safe as being sold, by many of its advocates, versus other bytecode formats.
Like, C is actually really safe, it only depends on how it is being used.
People only have to enumerate the various ways and tools to write safe code in C.
WASM is just a bytecode format for a stack based vm. Granted it is weirdly named, the actual "Assembly" equivalent is WAT.
But the point is, it is a format specification, which has nothing to do with safety. You can implement a totally unsafe WASM runtime if you so choose. Personally I think it's not a bad thing, at least we have something like it that can run in a browser environment. But I am curious to know why you dislike it so much.
> including a way of compiling and executing C that panics when a memory access bug is encountered.
WASM couldn’t do that because it doesn’t have a sense of the C memory model nor know what is and isn’t safe - that information has long been lost. That kind of protection is precisely what Fil-C is doing.
WASM is memory safe in that you can’t escape the runtime. It’s not memory safe in that you can escape escape the program running within the sandbox, which you can’t do with a memory safe language like Rust or Fil-C.
I remember a Luca Cardelli paper that explores a language with "type:type" and it contains a sentence roughly expressing: "even if the type system is not satisfying as a logic, it offers interesting possibilities for programming"
Small nit: As someone curious about a definition of memory safety, I had come across Michael Hicks' post. He does not use the list of errors as definition, and argues that such a definition is lacking rigor and he is right. He says;
> Ideally, the fact that these errors are ruled out by memory safety is a consequence of its definition, rather than the substance of it. What is the idea that unifies these errors?
He then offers a technical definition (model) involving pointers that come with capability of accessing memory (as if carrying the bounds), which seems like one way to be precise about it.
I have come to the conclusion that language safety is about avoiding untrapped errors, also known as "undefined behavior". This is not at all new, it just seems to have been forgotten or was never widely known somehow. If interested, find the argument here https://burakemir.ch/post/memory-safety-the-missing-def/
What's important is the context in which the term is used today: it's specifically about security and software vulnerabilities, not about a broader notion of correctness and program reliability. Attempts to push past that have the effect of declaring languages like Java and Python memory-unsafe, which is not a defensible claim.
This is a false dichotomy. Language design choices are the causes of security and software vulnerabilites. It is possible to recognize the value of GC languages and have precise technical terminology at the same time. We can invent new words.
I believe everyone who cares about memory safety appreciates that certain bugs cannot occur in Java and go, and if the world calls that memory safe, that is ok.
There are hard, well-defined guarantees that a language and implementation must make, and a space of trade-offs. We need language and recognition for the ability to push the boundary of hard, well-defined guarantees further. That, too, is memory safety and it will be crucial for moving the needle beyond what can be achieved with C and C++.
No one has a problem with applications being ported from low-level to GC-ed languages, the challenge is the ones where this is not possible. We need to talk about memory safety in this specific context, and mitigations and hardening will not solve the entire problem, only pieces of it.
The urgent problem is the problem settings where GC'd languages are not a good fit, including kernels and userland-kernels (AKA browsers). The problem is not that GC'd languages are insufficiently memory-safe.
Finally a good use case for decentralized technology? From https://www.eid.admin.ch/en/technology
"The e-ID architecture is based on a decentralised identity model that gives users full control over their identity and personal data. There is no central authority that aggregates, stores or controls credentials. Data flows occur directly and in a decentralised manner between the holder and an issuer or verifier. Linkability of usage across different services is technically restricted. Interactions between different actors also cannot be directly linked. During a verification process, the holder shares only the necessary data directly with a verifier, without the issuer being informed."
The verifier is the entity you hand your information to for verification, ie the CA. The extent of your interaction and linkage with them is mainly at point of verification and issuance.
It is however possible to trace a certificate to it's issuer, which on the surface sounds like a bad thing, but is in fact good if the goal is to provide privacy while ensuring accountability.
I mean in this case there is only one issuer, the Swiss state, so is that really a big deal? Ultimately the government is and should be the provider of identity.
Depends on the credential being issued. For the digital identity document it is the federal state, but the cantons (states) or even corporations are able to issue their own credentials using their own register.
Very interesting, that sounds a whole lot better than I envisaged. In theory your drivers license, passport, social number can all remain separate (not that it's necessarily set up like that in Switzerland). Thanks.
The privacy story of this looks better than the Norwegian BankID approach.
I would like if Norway moves in this direction, and I think that through the ongoing alignment with the EU wide program on digital id, that might happen.
MIT plus a condition that designates OpenAI and Anthropic as restricted parties that are not permitted to use or else?