Hacker Newsnew | past | comments | ask | show | jobs | submit | dbmikus's commentslogin

To stop agents from pausing for checkpointing, you can have a deterministic outer loop that re-runs until a stop condition is met.

I think teams need to be able to write nested workflows that transition between code-led and agent-led, with either supporting human-in-the-loop checkpoints.

Been iterating on what this should look like at our startup (https://www.amika.dev/). Model labs are also improving capabilities here, such as Codex's `/goal` and Claude Code's dynamic workflows[1]

The points about API usage cost still stand, but model intelligence is getting cheaper every month! No need to use the frontier model for every part of the work.

[1]: https://code.claude.com/docs/en/workflows


It's hard to get that outer loop done, especially considering that Claude doesn't let you automate the harness anymore (it gets prohibitively expensive). Same for gemini. The only option is Codex.

/goal is a dynamic workflow itself, from what I know. Dynamic workflows do not hold the initiative (and can't use any libraries or I/O).

Dynamic workflows do not prevent checkpointing.

I don't see the actual point of your startup, it's a cheap idea - such as most LLM startups out there.

I don't see how models are getting cheaper - I clearly see the opposite trend.


Claude Code's dynamic workflows are AI-generated JavaScript, so unlike `/goal` they can in theory import libraries and perform I/O (not sure that they can currently).

On checkpointing: I explained myself poorly. You're right that using higher level workflows doesn't turn off checkpointing. One can simply make harnesses non-interactive, but that can make models lose coherence over long tasks (because they can't ask for feedback). A higher level coordinator (/goal, CC dynamic workflows) is designed to provide this feedback without human intervention.

On price: older models keep getting cheaper, and most tasks don't need frontier capability. (I'm ignoring the part about subscription subsidies right now, and just talking about API price for tokens)

On my startup Amika: we run programmable cloud computers for agents, plus the workflow systems to guide them. We let people run any agent (Codex, Claude, etc.), prompt it from anywhere (Slack, web, CLI + SSH, API). It's like devboxes for humans + agents, with guardrails[1] to deterministically ensure things about the changes coding agents make (ie don't let agent modify module boundaries, require every DB query carry a multi-tenant org ID filter).

Maybe our website is bad at explaining it, in which case I appreciate any feedback!

[1]: https://docs.amika.dev/guides/code-annotations


I think with a proper managed agents platform, the user should have total control over the VM, the software on it, which model to use, and which agent harness to use. Then you can just override the system prompt and you don't need to follow Anthropic's rules!

Maybe Anthropic will give more control over configuring the Claude harness and VM, but they definitely won't let you swap out to other models and harnesses.

We've been building open core infra (https://github.com/gofixpoint/amika) for running any agent on any type of VM or sandbox, with the main use case for safely automating internal code-gen, but technically could repurpose our stack for anything.

There should be a model agnostic platform for running these types of agentic apps.



I really like exe.dev's pricing model where I pay a fixed monthly fee for compute and then can split it up into as many VMs as I want. I use exe.dev to run little vibe-coded apps and it's nice to just leave them running without a spend meter ticking up.

We're thinking about switching to this pricing model for our own startup[1] (we run sandboxed coding agents for dev teams). We run on Daytona right now for sandboxes. Sometimes I spin up a sandboxed agent to make changes to an app, and then I leave it running so my teammate can poke around and test the running app in the VM, but each second it's running we (and our users) incur costs.

We can either build a bunch of complicated tech to hibernate running sandboxes (there's a lot of tricky edge cases for detecting when a sandbox is active vs. should be hibernated) or we can just provision fixed blocks of compute. I think I prefer the latter.

[1] https://github.com/gofixpoint/amika


Like the detailed setup instructions in the readme!

Also agree that teams should invest in their own harness (or maybe pedantically, build a system on top of harness likes Claude Code, Codex, Pi, or OpenCode)


Yes! Broccoli is triggering Codex CLI and Claude Code CLI.


Does that mean you're using API pricing rather than subscription? Seems like itd get expensive very quickly for a small team.


It's a bit of trade-off. If we spin up a new container every time (which we do when we were using Google Cloud Run), we had to pay API pricing. However, with Blaxel, we can set containers to hibernate which also gives us the ability to use subscription


Yes, I hate how slow it is to swipe between desktop workspaces, for example.


Doesn't excuse it but in case you or other readers are unaware, there are some ways to mitigate it: https://arhan.sh/blog/native-instant-space-switching-on-maco...


Why would you use that feature? MacOS doesn't REALLY have multiple desktops (Spaces). That is merely a pre-release feature (for 10 years or so, I think). As evidenced by the many critical user journey bugs it has that don't get addressed.

I use both linux (with a decent tiling window manager; the tiling management being the least important part of it) and macos. And certain things are just not possible to do with macos. On linux I can have 300+ open terminal windows AND CAN find the one I need when I need to. On macos 20 (counting in Termianl tabs, which are implemented as windows, underneath) is about the high mark that it gets annoying to work on. On macos, you can't effectively work on multiple projects that use the same software (editor + terminal, for example). You can work with different Applications, though, and that is managed pretty well (better than most linux window managers that I have seen).

Every year or so I try adding a couple of Spaces, and always regret it a couple of hours later, switching back to a single Space (+ a few fullscreen apps).


I've used spaces since 2013, they work well enough. The animation bug is annoying though. On displays higher than 60Hz, the animation is slower because they made it frame-based instead of time-based, or something silly like that.

I love the three finger gesture to move between them though, it's like moving pieces of paper around. You can also work around the bug I mentioned by swiping faster, but yeah I wish they'd just fix it so we can move on.


Of course it can be used. But it is very buggy (as in missing or not well-though-out behaviors), which is unlike the typical polish Apple human interaction folks deliver. For example switching between Spaces and then between apps and windows and creating a new app window don't work as expected in some combination of steps and for some apps. There are several other "corner" cases that show the features were not laid out in a full design to exhaustively decide the desired behavior in each case. Which is very much like when someone bolts on a feature to a system without fully nail down its interaction with all other adjacent and relevant features.


I'm just responding to your "Why would you use that feature?" question. I use it because I like it, and it works well for me. I'm not disagreeing that they have some bugs and design issues to work out. It seems pretty obvious MacOS doesn't get as much attention as iOS when it comes to these things.


Spaces help me visually organize related apps. I have all my chat apps in one, all my dev stuff in another.

I used to run Linux with i3 tiling window manager, but switched to Mac because the battery is so much better. Although the new Framework laptop looks like it has pretty great battery life.


Curious if Andon has gone one level higher and has the AI decide what next real-world experiment it should do.


If you don't get it working with Claude Code Routines, would love to connect and see if we can help! We're building an open core product that can spin up sandboxed coding and control them from Slack (and also web UI, TUI, and HTTP APIs + CLIs)

We work with any coding model / harness.

website: https://www.amika.dev/

OSS repo: https://github.com/gofixpoint/amika

And my email is dylan@amika.dev (I'm one of the founders)


We might be building something up your alley! I wanted an OSS platform that let me run any coding agent (or multiple agents) in a sandbox and control it either programmatically or via GUI / TUI.

Website is https://amika.dev

And part of our code is OSS (https://github.com/gofixpoint/amika) but we're working on open sourcing more of it: https://docs.google.com/document/d/1vevSJsSCWT_reuD7JwAuGCX5...

We've been signing up private beta users, and also looking for feedback on the OSS plans.


necro-posting here, but that's kinda what we're working on! We're focused on creating cloud workspaces for sandboxed coding agents and it's built to support any agent harness. https://www.amika.dev/

Under the hood, we're open sourcing a lot of the parts for provisioning these agents, their VMs/sandboxes, and managing agent messaging + sessions. Put our open source plans here: https://docs.google.com/document/d/1vevSJsSCWT_reuD7JwAuGCX5...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: