The "One Simple Trick" that lets you port most Python code straight to Rust is to take all the aliasing "object soup" references that Rust doesn't like, shove those objects in a Vec or a HashMap, and use indexes/keys instead of Rust references. This does require plumbing the Vec/HashMap around through different functions, but if you do it this way from the beginning it's not much extra work.
At that point, what's the benefit of using Rust over basically any other high-level language that has ADTs and pattern matching? Since you lose all the benefits of lifetime tracking that are unique to Rust.
> you lose all the benefits of lifetime tracking that are unique to Rust
I think I understand what you're referring to, but stop me if I get this wrong. If you put a bunch of objects in a Vec and refer to them by index, you have to be careful with operations like .insert() and .remove() that shift Vec elements around and change their indexes. Also if you make each element an Option<T> to support deletion without .remove(), you still have to be careful about indexes of deleted objects, because you might put a new object in the same spot later. I have several thoughts about this:
- If "memory safety without garbage collection" is one of the unique features of Rust, you still have that. A reused Vec index in safe Rust code will never corrupt memory or trigger a segfault or anything like that. It's strictly a "logic bug".
- These logic bugs only apply specifically to the objects you put in a Vec and track by index. Creating a Vec<Foo> and getting an index mixed up doesn't change anything about Rust handles your local variable of type Bar (or for that matter, your local variable of type Foo).
- These bugs are understandable for beginners, and you can follow what's going on with print statements. Contrast that with a dangling pointer into a C++ std::vector that just reallocated, where printing the freed object often appears to show good data, and explaining the bug means talking about malloc/new and the heap. Relatedly, unwrapping a None in Rust will always panic and will never coincidentally appear to work. (Your comment was comparing Rust to higher level languages, but I've heard similar comparisons to C++.)
- There are good options for fixing the problem. Beginners might prefer to put their objects in a HashMap with an incrementing key. There's a performance cost to that, but it's familiar and relatively footgun-free. More advanced folks might reach for something like "generational indexes into a slab", which gives you back some performance in exchange for making you learn a bunch of new buzzwords and figure out which implementation to choose from crates.io.
So yes, with those caveats, putting objects into a Vec does turn some compile-time bugs into runtime bugs. That is a downside, much like Rc/Arc/RefCell/Mutex come with runtime downsides. But I've heard folks describe it in terms like "turning off the borrow checker", and I don't think that's a very accurate way of describing it.
As you say, I was specifically comparing to high-level memory-safe languages, not C++. You can't get a dangling reference in Java or C#. Not only that, but you can't get into a situation where some reference points to one object at some point of time, and to another object at a different time - but you can with indices.
The reason why I describe it as "turning off the borrow checker" is because that's exactly what it is - those indices are pointers semantically, but there's no ownership tracking for them. So for them, you turned off the checker. The more you use them, the less checked your code is. If you use this approach in some isolated piece of code, it's one thing. But if it's the go-to solution, then it's reasonable to wonder why you'd do that instead of using a language that has built-in ergonomic safe references with no borrow checking.
Regular Vec indices yes, but not incrementing HashMap keys or other fancier things, if we set aside overflow issues.
> But if it's the go-to solution
Oh yeah, I should clarify this point. Rust definitely prefers to use simple ownership (i.e. the ownership/reference graph is a tree) wherever possible. As an example, say we've got a Python program with Person objects and Dog objects. Each Person has a `pets` collection that might contain some dogs, and each Dog has an `owner` field that points back to their Person. When we port this program from Python to Rust, Rust isn't going to be happy with the circular relationships between these types, and trying to implement `pets` or `owner` with references probably won't compile.
In cases like this, the most ideal, most idiomatic, go-to option is to break the cycle and try to achieve simple ownership. In this case, that would probably mean making the `pets` collection hold Dogs by value, and removing the `owner` field entirely. Any Dog methods that previously referenced `owner` would need a short-lived reference passed in as an extra argument now, or maybe we could change some of them into Person methods. If we can express our program in this style, that's almost certainly what we want to do.
But there are lots of programs where this doesn't work, at least not everywhere. Maybe a Dog can have multiple owners. Maybe a Dog can have no owner at all. Maybe Dogs want to track their relationships with other Dogs. If people and dogs are independent entities walking around in a game world, or if they represent rows from a couple of tables in some relational database, we probably have lots of problems like this. This is where we start reaching for patterns like Rc<RefCell>/Arc<Mutex> or indexes pointing into Vecs and HashMaps. (I think it's interesting that those Vecs and HashMaps look a lot like db tables.)
A point I want to emphasize here, though, is that even when the most important relationships in our program use these patterns, the majority of our object relationships are probably still simple. If each Person has a `name`, that's still an owned string. If they have an `age`, that's still a regular integer. When our program reads config values from the filesystem, all of our file handles and protocol data still follow simple ownership rules, use destructors for cleanup, and definitely don't get aliased anywhere.
In contrast, if we port our program to Java (or keep it in Python), we can't statically guarantee any simple ownership. Any time we pass a Person or a Dog or a HashMap or a byte buffer to some function, we might worry about that function retaining a reference to it, and we might start making defensive copies or reaching for immutable types. We can still use lots of simple ownership, and for most of our objects we probably do, but we've lost the benefit of a compiler that can check that for us.
Kind of a tangent: When we make aliasing mistakes -- which we can do in Rust in these fancy arrangements, or in Java/Python whenever our data is mutable -- that usually leads to "spooky action at a distance" bugs. Some operation on `foo` has mysterious side effects on `bar`, etc. We've all been there. But I think where these bugs graduate from "annoying" to "insanity-inducing", is when multiple threads get involved. That's when we really want the option of simple, statically-checked ownership, and that's where I think Rust-isms like "Mutex owns the data it protects" are a big improvement over the tools we had before.
Logic bugs of this sort are way worse (arguably) than some memory corruption errors you might see in C. The latter can be caught by a very manual usage of valgrind and the litany of similar too, while in the former’s case, you have to find that you even have an error, and track it down.
Sure, a beginner may not know about these tools, but my point is about the manuality if fixing the problems, one almost doesn’t need any thinking .
I do agree with that point as far as it goes. There are definitely cases (like single-player games) where logic bugs like these suck up more developer time than memory corruption bugs. But I'd push back in a couple ways:
- Much of the time, maybe even most of the time, memory corruption means security vulnerabilities. Logic bugs can be security bugs too, of course, but at least we can reason about them in local terms like "Is this particular part of the program security-sensitive?" Memory corruption doesn't admit the same kind of local reasoning. For any program that touches input from the internet, that's a huge deal.
- Just like C programmers can reach for Valgrind, Rust programmers can reach for design patterns that are more robust than Vec<T>. In this case, HashMap<u64, T> with an incrementing key is a very robust pattern. In practice (until you overflow that u64), that pattern catches 100% of your use-after-free-style mistakes. And I think a major advantage of Rust's approach here compared to Valgrind/ASan, is that it runs in production, so it works even if your test coverage isn't amazing.