(author here) I'm one of the maintainers of HashiCorp's Nomad, so that example was likely inspired by the separation of duties that's part of our security model. In that environment, there's a subset of task (ex. container) configuration that's controlled by the cluster admin and a subset that's controlled by the job author deploying onto the cluster.
Author here! The motivating example of this post is frankly pretty lousy in retrospect (and was even so soon after writing, given the friendly reminder from Giovanni Campagna that `socket` wasn't one of the io_uring opcodes). At best this is an interesting limitation of seccomp. Maybe relevant if you were using gVisor?
This architecture is roughly how HashiCorp's Nomad, Consul, and Vault are built (I'm one of the maintainers of Nomad). While it's definitely a "weird" architecture, the developer experience is really nice once you get the hang of it.
The in-memory state can be whatever you want, which means you can build up your own application-specific indexing and querying functions. You could just use sqlite with :memory: for the Raft FSM, but if you can build/find an in-memory transaction store (we use our own go-memdb), then reading from the state is just function calls. Protecting yourself from stale reads or write skew is trivial; every object you write has a Raft index so you can write APIs like "query a follower for object foo and wait till it's at least at index 123". It sweeps away a lot of "magic" that normally you'd shove into a RDBMS or other external store.
That being said, I'd be hesitant to pick this kind of architecture for a new startup outside of the "infrastructure" space... you are effectively building your own database here though. You need to pick (or write) good primitives for things like your inter-node RPC, on-disk persistence, in-memory transactional state store, etc. Upgrades are especially challenging, because the new code can try to write entities to the Raft log that nodes still on the previous version don't understand (or worse, misunderstand because the way they're handled has changed!). There's no free lunch.
>You could just use sqlite with :memory: for the Raft FSM
That's the basic design that rqlite[1] had for its first ~7 years. :-) But rqlite moved to on-disk SQLite, since with WAL mode, and with 'PRAGMA synchronous=OFF' [2], it is about as fast as writing to RAM. Or at least close enough, and I avoid all the limitations that come with :memory: SQLite databases (max size of 2GB being one). I should have just used on-disk mode from the start, but only now know better.
(I'm guessing you may know some of this because rqlite uses the same Raft library [3] as Nomad.)
As for the upgrade issue you mention, yes, it's real. Do you find it in the field much with Nomad? I've managed to introduce new Raft Entry types very infrequently during rqlite's 10-years of development, only once did someone hit it in the field with rqlite. Of course, one way to deal with it is to release a version of one's software first that understands the new types but doesn't ever write the new types. And once that version is fully deployed, upgrade to the version that actually writes new types too. I've never bothered to do this in practise however, and it requires discipline on the part of the end-users too.
[2] This might sound dangerous but in the current design of rqlite, the underlying SQLite database is completely rebuilt from the Raft log on startup (which is fsync'ed on every write). So any corruption of the SQLite database due power loss, etc is moot since the SQLite database is not the authoritative store of data in rqlite.
> I should have just used on-disk mode from the start, but only now know better.
Yeah, I saw the recent post about reducing rqlite disk space usage. Using the on-disk sqlite as both the FSM and the Raft snapshot makes a lot of sense here. I'm curious whether you've had concerns about write amplification though? Because we have only the periodic Raft snapshots and the FSM is in-memory, during high write volumes we're only really hammering disk with the Raft logs.
> Do you find it in the field much with Nomad? I've managed to introduce new Raft Entry types very infrequently during rqlite's 10-years of development, only once did someone hit it in the field with rqlite.
My understanding is that rqlite Raft entries are mostly SQL statements (is that right?). Where Nomad is somewhat different (and probably closer to the OP) is that the Raft entries are application-level entries. For entries that are commands like "stop this job"[0] upgrades are simple.
The tricky entries are where the entry is "upsert this large deeply-nested object that I've serialized", like the Job or Node (where the workloads run). The typical bug here is you've added a field way down in the guts of one of these objects that's a pointer to a new struct. When old versions deserialize the message they ignore the new field and that's easy to reason about. But if the leader is still on an old version and the new code deserializes the old object (or your new code is just reading in the Raft snapshot on startup), you need to make sure you're not missing any nil pointer checks. Without sum types enforced at compile time (i.e. Option/Maybe), we have to catch all these via code review and a lot of tedious upgrade testing.
> it requires discipline on the part of the end-users too.
Oh for sure. Nomad runs into some commercial realities here around how much discipline we can demand from end-users. =)
>I'm curious whether you've had concerns about write amplification though?
I mean, yes, the more disk IO rqlite has to make to more write performance will be affected. However the advantages of running with an on-disk SQLite database are worth it I believe. In addition rqlite supports storing the SQLite database file on a memory-backed filed system if users really want that[1]. That can help squeeze more write throughput out of rqlite.
>My understanding is that rqlite Raft entries are mostly SQL statements (is that right?).
That's right, rqlite does statement-based replication, though I'm currently looking into extending it so it also does changeset[2] replication where it makes sense.
like you I'm more open to the idea of keeping data in memory than most of the responders here. when I got to the part of the article about how they are using common lisp with hot reloading, I was thinking, well you guys can do whatever you want, but not everybody is working on that team, ha.
Yes indeed! But this doesn't apply to a startup in the Explore phase, where you don't need replication, and how we did it for a long time. This is the phase where this architecture is the most use for product iteration.
But you're right, once you start using replication in the Expand phase, there certainly are engineering challenges, but they're all solvable challenges. It might help that in Common Lisp we can hot-reload code, which makes some migrations a lot easier.
I'm the lead developer for Joyent of ContainerPilot, which is the tool at the core of our Autopilot Pattern implementation examples. The lifecycle events you recognize in Distelli are definitely similar. And Chef's new tool Habitat has a supervisor that was independently developed but ended up having interesting parallels with ContainerPilot. So there's a universal idea lurking under there, which is why we called Autopilot a "Pattern" rather than a tool in itself.
But it's not clear to me from a casual glance at the the docs whether Distelli lives inside the container during those hooks? That's part of the distinction of the Autopilot Pattern is making the higher-level orchestration layer as thin as possible.
(As far as the root, some of it is derived from my experiences as a perhaps-foolishly-early adopter of Docker in prod at my previous gig at a streaming media startup. The rest is derived from both principals with which Joyent's own Triton infra is built and our experiences speaking with enterprise devs and ops teams.)
I'm the founder at Distelli and I just want to clarify that the Distelli agent doesn't typically live inside the container though it can. Its used to orchestrate the container lifecycle on the VM itself.
However if you're building Docker containers and deploying them we recommend using Kubernetes which is something that Distelli supports out of the box now - https://www.distelli.com
Thanks for sharing those details, great to have more insight into the process.
In my case the Distelli agent does live inside the "container", because I'm using SmartOS instances and not Docker containers. It handles deployment, and monitors processes of the apps when I'm not using an SMF.
I'm not sure how Distelli's K8s orchestration works, that functionality is more recent. In my case, the lifecycle details are in the manifest in the app repo, which is just a YAML where each lifecycle section is a bash script. App builds are just tarballs in S3. So there's not much to the deployment process.
> It’s not just deployment, but provides facilities for sharing code and extracting telemetry from your running apps. We believe that operations should be coupled with development, that developer tools should also be operational tools. For this reason, we’re keen to ensure that telemetry — metrics, logging, analytics, distributed tracing — is built-in from the beginning.
I think this is the most important problem IOPipe is trying to solve. Portability between platforms is a good goal but achievable with some discipline about avoiding the proprietary bits of the platform. Making "serverless" software operable to the same (or better) degree than traditional software deployment is the key.
I don't think you'll find 0.2c relative velocities except near very energetic phenomena, especially not of things that are too small to easily see coming.
I think it should be clear from the context that this discussion was about macroscopic objects. Cosmic rays don't carry the kinetic energy equivalent of a medium size nuke.
Their speed isn't particularly relevant, what matters more is their energy. As big as that number is to us, it's still pretty insignificant; there's a lot of objects already traveling around with us in our own solar system with 70kT (per PaulHoule in other comments) of energy. A looooooot of objects. In the grand scheme of things, this is a rounding error on a rounding error. We are small.
It would be nice if they could use solar wind to decelerate as they approached. I know there's not enough energy there to insert them into an orbit, but they might get time for a few more pictures?
The short version is: remember the old Bussard ramscoop idea? You use a magnetic field to collect interstellar hydrogen which you then fuse for thrust? Turns out that in our part of the galaxy, you get more drag from the sail than you do from the fusion thrust, so the idea was scrapped.
An embarrassingly long time later people finally realised that they'd invented a fuelless brake, and the idea was resurrected (but without the fusion drive). The maths are quite plausible and the sail itself trivially simple --- just a wire loop.
However, I don't think they'd be compatible with this idea --- I suspect you wouldn't get one big enough to be useful in a one gram package. But estimating the numbers is beyond me. Here's the paper if you want it. http://www.niac.usra.edu/files/studies/final_report/320Zubri...
Why decelerate? At 0.2c it takes around 40 minutes to cross 1AU. (Ignoring time dilation which is not super significant at 0.2c.) That gives ample time to take photos from a moderate distance.
Yeah, that plus gvp. But if you dig into both they're just shell scripts under the hood (nice ones, though!) and given that I've got a makefile or shell script to build the container, run tests, etc. then adding a third-party tool is just one tiny bit more overhead.
> Makefiles, build: docker build
Ugh, more Linux-Only. Don't follow this please.
Meh. I get your sentiment here, but I build and deploy on Linux and I post about the stuff I work on. I don't post about the Windows stuff that I haven't worked on in years. And I assume a level of intelligence in my readers that they can translate whatever might be generally applicable into their own platform in the same sense that I don't grumble about Raymond Chen's `Old New Thing` not being directly applicable to my own work but still enjoy it.