More

0x74696d · on Oct 12, 2024

(author here) I'm one of the maintainers of HashiCorp's Nomad, so that example was likely inspired by the separation of duties that's part of our security model. In that environment, there's a subset of task (ex. container) configuration that's controlled by the cluster admin and a subset that's controlled by the job author deploying onto the cluster.

0x74696d · on Oct 12, 2024

Author here! The motivating example of this post is frankly pretty lousy in retrospect (and was even so soon after writing, given the friendly reminder from Giovanni Campagna that `socket` wasn't one of the io_uring opcodes). At best this is an interesting limitation of seccomp. Maybe relevant if you were using gVisor?

0x74696d · on Aug 10, 2024

This architecture is roughly how HashiCorp's Nomad, Consul, and Vault are built (I'm one of the maintainers of Nomad). While it's definitely a "weird" architecture, the developer experience is really nice once you get the hang of it.

The in-memory state can be whatever you want, which means you can build up your own application-specific indexing and querying functions. You could just use sqlite with :memory: for the Raft FSM, but if you can build/find an in-memory transaction store (we use our own go-memdb), then reading from the state is just function calls. Protecting yourself from stale reads or write skew is trivial; every object you write has a Raft index so you can write APIs like "query a follower for object foo and wait till it's at least at index 123". It sweeps away a lot of "magic" that normally you'd shove into a RDBMS or other external store.

That being said, I'd be hesitant to pick this kind of architecture for a new startup outside of the "infrastructure" space... you are effectively building your own database here though. You need to pick (or write) good primitives for things like your inter-node RPC, on-disk persistence, in-memory transactional state store, etc. Upgrades are especially challenging, because the new code can try to write entities to the Raft log that nodes still on the previous version don't understand (or worse, misunderstand because the way they're handled has changed!). There's no free lunch.

otoolep · on Aug 11, 2024

>You could just use sqlite with :memory: for the Raft FSM

That's the basic design that rqlite[1] had for its first ~7 years. :-) But rqlite moved to on-disk SQLite, since with WAL mode, and with 'PRAGMA synchronous=OFF' [2], it is about as fast as writing to RAM. Or at least close enough, and I avoid all the limitations that come with :memory: SQLite databases (max size of 2GB being one). I should have just used on-disk mode from the start, but only now know better.

(I'm guessing you may know some of this because rqlite uses the same Raft library [3] as Nomad.)

As for the upgrade issue you mention, yes, it's real. Do you find it in the field much with Nomad? I've managed to introduce new Raft Entry types very infrequently during rqlite's 10-years of development, only once did someone hit it in the field with rqlite. Of course, one way to deal with it is to release a version of one's software first that understands the new types but doesn't ever write the new types. And once that version is fully deployed, upgrade to the version that actually writes new types too. I've never bothered to do this in practise however, and it requires discipline on the part of the end-users too.

[1] https://www.rqlite.io

[2] This might sound dangerous but in the current design of rqlite, the underlying SQLite database is completely rebuilt from the Raft log on startup (which is fsync'ed on every write). So any corruption of the SQLite database due power loss, etc is moot since the SQLite database is not the authoritative store of data in rqlite.

[3] https://github.com/hashicorp/raft

0x74696d · on Aug 11, 2024

> I should have just used on-disk mode from the start, but only now know better.

Yeah, I saw the recent post about reducing rqlite disk space usage. Using the on-disk sqlite as both the FSM and the Raft snapshot makes a lot of sense here. I'm curious whether you've had concerns about write amplification though? Because we have only the periodic Raft snapshots and the FSM is in-memory, during high write volumes we're only really hammering disk with the Raft logs.

> Do you find it in the field much with Nomad? I've managed to introduce new Raft Entry types very infrequently during rqlite's 10-years of development, only once did someone hit it in the field with rqlite.

My understanding is that rqlite Raft entries are mostly SQL statements (is that right?). Where Nomad is somewhat different (and probably closer to the OP) is that the Raft entries are application-level entries. For entries that are commands like "stop this job"[0] upgrades are simple.

The tricky entries are where the entry is "upsert this large deeply-nested object that I've serialized", like the Job or Node (where the workloads run). The typical bug here is you've added a field way down in the guts of one of these objects that's a pointer to a new struct. When old versions deserialize the message they ignore the new field and that's easy to reason about. But if the leader is still on an old version and the new code deserializes the old object (or your new code is just reading in the Raft snapshot on startup), you need to make sure you're not missing any nil pointer checks. Without sum types enforced at compile time (i.e. Option/Maybe), we have to catch all these via code review and a lot of tedious upgrade testing.

> it requires discipline on the part of the end-users too.

Oh for sure. Nomad runs into some commercial realities here around how much discipline we can demand from end-users. =)

[0] https://github.com/hashicorp/nomad/blob/v1.8.2/nomad/fsm.go#... [1] https://github.com/hashicorp/nomad/blob/v1.8.2/nomad/fsm.go#...

otoolep · on Aug 11, 2024

>I'm curious whether you've had concerns about write amplification though?

I mean, yes, the more disk IO rqlite has to make to more write performance will be affected. However the advantages of running with an on-disk SQLite database are worth it I believe. In addition rqlite supports storing the SQLite database file on a memory-backed filed system if users really want that[1]. That can help squeeze more write throughput out of rqlite.

>My understanding is that rqlite Raft entries are mostly SQL statements (is that right?).

That's right, rqlite does statement-based replication, though I'm currently looking into extending it so it also does changeset[2] replication where it makes sense.

[1] https://rqlite.io/docs/guides/performance/#use-a-memory-back...

[2] https://www.sqlite.org/sessionintro.html

jstrong · on Aug 10, 2024

like you I'm more open to the idea of keeping data in memory than most of the responders here. when I got to the part of the article about how they are using common lisp with hot reloading, I was thinking, well you guys can do whatever you want, but not everybody is working on that team, ha.

tdrhq · on Aug 11, 2024

> Upgrades are especially challenging

Yes indeed! But this doesn't apply to a startup in the Explore phase, where you don't need replication, and how we did it for a long time. This is the phase where this architecture is the most use for product iteration.

But you're right, once you start using replication in the Expand phase, there certainly are engineering challenges, but they're all solvable challenges. It might help that in Common Lisp we can hot-reload code, which makes some migrations a lot easier.

0x74696d · on July 21, 2019

Totally agreed. I wrote up my thoughts on this recently: https://blog.0x74696d.com/posts/mise-en-place/

0x74696d · on April 21, 2017

GKE clusters run on VMs, not on bare metal.

0x74696d · on Nov 3, 2016

I'm the lead developer for Joyent of ContainerPilot, which is the tool at the core of our Autopilot Pattern implementation examples. The lifecycle events you recognize in Distelli are definitely similar. And Chef's new tool Habitat has a supervisor that was independently developed but ended up having interesting parallels with ContainerPilot. So there's a universal idea lurking under there, which is why we called Autopilot a "Pattern" rather than a tool in itself.

But it's not clear to me from a casual glance at the the docs whether Distelli lives inside the container during those hooks? That's part of the distinction of the Autopilot Pattern is making the higher-level orchestration layer as thin as possible.

(As far as the root, some of it is derived from my experiences as a perhaps-foolishly-early adopter of Docker in prod at my previous gig at a streaming media startup. The rest is derived from both principals with which Joyent's own Triton infra is built and our experiences speaking with enterprise devs and ops teams.)

arrsingh · on Nov 4, 2016

I'm the founder at Distelli and I just want to clarify that the Distelli agent doesn't typically live inside the container though it can. Its used to orchestrate the container lifecycle on the VM itself.

However if you're building Docker containers and deploying them we recommend using Kubernetes which is something that Distelli supports out of the box now - https://www.distelli.com

doublerebel · on Nov 4, 2016

Thanks for sharing those details, great to have more insight into the process.

In my case the Distelli agent does live inside the "container", because I'm using SmartOS instances and not Docker containers. It handles deployment, and monitors processes of the apps when I'm not using an SMF.

I'm not sure how Distelli's K8s orchestration works, that functionality is more recent. In my case, the lifecycle details are in the manifest in the app repo, which is just a YAML where each lifecycle section is a bash script. App builds are just tarballs in S3. So there's not much to the deployment process.

0x74696d · on June 29, 2016

> It’s not just deployment, but provides facilities for sharing code and extracting telemetry from your running apps. We believe that operations should be coupled with development, that developer tools should also be operational tools. For this reason, we’re keen to ensure that telemetry — metrics, logging, analytics, distributed tracing — is built-in from the beginning.

I think this is the most important problem IOPipe is trying to solve. Portability between platforms is a good goal but achievable with some discipline about avoiding the proprietary bits of the platform. Making "serverless" software operable to the same (or better) degree than traditional software deployment is the key.

0x74696d · on April 12, 2016

Sending a swarm of unguided relativistic projectiles across the universe seems profoundly rude.

chasing · on April 12, 2016

> a swarm of unguided relativistic projectiles

In a sense, this is essentially what the universe is.

tgflynn · on April 12, 2016

I don't think you'll find 0.2c relative velocities except near very energetic phenomena, especially not of things that are too small to easily see coming.

skykooler · on April 12, 2016

Cosmic rays are far too small to easily see coming, and we get those all the time here.

tgflynn · on April 12, 2016

I think it should be clear from the context that this discussion was about macroscopic objects. Cosmic rays don't carry the kinetic energy equivalent of a medium size nuke.

maxxxxx · on April 12, 2016

True, the universe is generally a pretty unfriendly place that tries to kill you in any possible way.

amackera · on April 12, 2016

To be fair, the universe was pretty friendly and created us in the first place.

Razengan · on April 12, 2016

Technically, we are the universe trying to understand itself.

TheOtherHobbes · on April 12, 2016

Technically, we may be the universe trying not to kill itself.

hinkley · on April 12, 2016

Technically, we may be the universe trying to kill itself.

jerf · on April 12, 2016

Their speed isn't particularly relevant, what matters more is their energy. As big as that number is to us, it's still pretty insignificant; there's a lot of objects already traveling around with us in our own solar system with 70kT (per PaulHoule in other comments) of energy. A looooooot of objects. In the grand scheme of things, this is a rounding error on a rounding error. We are small.

ryao · on April 12, 2016

It would be rather anticlimatic if it hit one of them before reaching Alpha Centauri and no one noticed.

simonh · on April 13, 2016

That's one of the reasons they want to send thousands of them.

nkrisc · on April 12, 2016

Ten thousand years from then, some alien on some planet somewhere might have a very bad day.

clem · on April 12, 2016

"That is why, Serviceman Chung, we do not eyeball it!"

https://www.youtube.com/watch?v=4tIk-vUtLBs

a3n · on April 12, 2016

"Cool, an iPhone."

Uhhrrr · on April 12, 2016

Maybe it's the backstory for Seveneves.

NoGravitas · on April 12, 2016

It would be nice if they could use solar wind to decelerate as they approached. I know there's not enough energy there to insert them into an orbit, but they might get time for a few more pictures?

david-given · on April 12, 2016

You can --- you use a thing called a magsail.

The short version is: remember the old Bussard ramscoop idea? You use a magnetic field to collect interstellar hydrogen which you then fuse for thrust? Turns out that in our part of the galaxy, you get more drag from the sail than you do from the fusion thrust, so the idea was scrapped.

An embarrassingly long time later people finally realised that they'd invented a fuelless brake, and the idea was resurrected (but without the fusion drive). The maths are quite plausible and the sail itself trivially simple --- just a wire loop.

http://www.centauri-dreams.org/?p=22138

However, I don't think they'd be compatible with this idea --- I suspect you wouldn't get one big enough to be useful in a one gram package. But estimating the numbers is beyond me. Here's the paper if you want it. http://www.niac.usra.edu/files/studies/final_report/320Zubri...

mrfusion · on April 13, 2016

Very fascinating! I had no idea we had such a viable option for braking.

Two questions for you if you're up for it.

Could we use this concept for braking during earth re entry to depend less on complex heat shields?

Why is the idea of a bussard ramscoop not viable?

maverick_iceman · on April 12, 2016

Why decelerate? At 0.2c it takes around 40 minutes to cross 1AU. (Ignoring time dilation which is not super significant at 0.2c.) That gives ample time to take photos from a moderate distance.

1812Overture · on April 12, 2016

Just flip the battery in the laser around and it will pull instead of pushing.

LyndsySimon · on April 12, 2016

You mean we should... reverse the polarity?

rotexo · on April 12, 2016

excuse me, but I think you forgot to take the opportunity provided by that ellipsis to put on your visor.

0x74696d · on May 8, 2015

Yeah, that plus gvp. But if you dig into both they're just shell scripts under the hood (nice ones, though!) and given that I've got a makefile or shell script to build the container, run tests, etc. then adding a third-party tool is just one tiny bit more overhead.

0x74696d · on May 8, 2015

> Makefiles, build: docker build Ugh, more Linux-Only. Don't follow this please.

Meh. I get your sentiment here, but I build and deploy on Linux and I post about the stuff I work on. I don't post about the Windows stuff that I haven't worked on in years. And I assume a level of intelligence in my readers that they can translate whatever might be generally applicable into their own platform in the same sense that I don't grumble about Raymond Chen's `Old New Thing` not being directly applicable to my own work but still enjoy it.