It's an LLM ASIC that runs one single LLM model at ridiculous speeds. It's a demonstration chip that runs Llama-3-8B at the moment but they're working on scaling it to larger models. I think it has very big implications on how AI will look like a few years from now. IMO the crucial question is whether they will get hard-limited by model size similarly to Cerebras
There is a new breed of agent-agnostic tools that call the Claude Code CLI as if it's an API (I'm currently trying out vibe-kanban).
This could be used to adhere to Claude's TOS while still allowing the user to switch AI companies at a moment's notice.
Right now there's limited customizability in this approach, but I think it's not far-fetched to see FAR more integrated solutions in the future if the lock-in trend continues. For example: one MCP that you can configure into a coding agent like Claude Code that overrides its entire behavior (tools, skills, etc.) to a different unified open-source system. Think something similar to the existing IntelliJ IDEA's MCP that gives a separate file edit tool, etc. than the one the agent comes with.
Illustration of what i'm talking about:
- You install Claude Code with no configuration
- Then you install the meta-agent framework
- With one command the meta-agent MCP is installed in Claude Code, built-in tools are disabled via permissions override
- You access the meta-agent through a different UI (similar to vibe-kanban's web UI)
- Everything you do gets routed directly to Claude Code, using your Claude subscription legally. (Input-level features like commands get resolved by meta-agent UI before being sent to claude code)
- Claude Code must use the tools and skills directly from meta-agent MCP as instructed in the prompt, and because its own tools are permission denied (result: very good UI integration with the meta-agent UI)
- This would also work with any other CLI coding agent (Codex, Gemini CLI, Copilot CLI etc.) should they start getting ideas of locking users in
- If Claude Code rug-pulls subscription quotas, just switch to a competitor instantly
All it requires is a CLI coding agent with MCP support, and the TOS allowing automatic use of its UI (disallowing that would be massive hypocrisy as the AI companies themselves make computer use agents that allow automatic use of other apps' UI)
Could you think of it as ClaudeCode is just a tool used by another agent and that other agent is instructed to use the ClaudeCode tool for everything? Makes sense, i don't see why we can't have agents use these agents for us, just like the AI companies are proposing to use their agents in place of everything else we currently use.
Also, why not distribute implementation documentation so claudecode can write OpenCode itself and use your oauth token. Now you have opencode for personal use, you didn't get it from anywhere your agent created it for you and only you.
If you thought things were hard now just wait for the industrial-scale fully automatic fast-follow bots that will nearly-universally nuke the human-created original product to oblivion in a few years...
This is part of a bigger problem with vibe coding IMO. It's not just Show HN but signaling credentials in general. How would you signal that you actually put effort into your project on a resume or social event/presentation when others could just vibe-code some good looking but nonetheless unusable projects and show that off instead?
I've thought about this. Even in the pre-LLM era, projects were rarely judged by the quality of their source code. READMEs and slick demos were the focus. So in some sense nothing has changed.
The difference now is that there is even less correlation between "good readme" and "thoughtful project".
I think that if your goal is to signal credentials/effort in 2026 (which is not everyone's goal), a better approach is to write about your motivations and process rather than the artefact itself - tell a story.
Somewhat of an aside but I had the thought reading this that arcades would be a great format for games heavily involving GenAI. The pay-per-play model is probably the only model where you can either affordably use a lot of LLM tokens per game. Alternatively, having large commercial arcade machines is the only way to guarantee the very high hardware specs needed to run capable models locally.
Perhaps as a result, we might see LLM and video model-powered games become mainstream in arcades before any home consumer platforms.
Caching might be free, but I think making caching cost nothing at the API level is not a great idea either considering that LLM attention is currently more expensive with more tokens in context.
Making caching free would price "100000 token cache, 1000 read, 1000 write" the same as "0 token cache, 1000 read, 1000 write", whereas the first one might cost more compute to run. I might be wrong at the scale of the effect here though.
Are you hosting your own infrastructure for coding agents? At least from first glance, sharing actual codebase context across compacts / multiple tasks seems pretty hard to pull off with good cost-benefit unless you have vertical integration from the inference all the way to the coding agent harness.
I'm saying this because the current external LLM providers like OpenAI tend to charge quite a bit for longer-term caching, plus the 0.1x cache read cost multiplied by # LLM calls, so I doubt context sharing would actually be that beneficial considering you won't need all the repeated context every time, so caching context results in longer context for each agentic task which might increase API costs by more overall than you save by caching.
The most interesting part about this is that Anthropic's own employees are probably using this 24/7 to develop the next model and it's barely affordable to anyone else.
These articles feel like an overreaction. I use Discord daily and I don't think there is any reason for me to verify at all. The new restrictions are reasonable and don't affect the way I use the app.
It's an LLM ASIC that runs one single LLM model at ridiculous speeds. It's a demonstration chip that runs Llama-3-8B at the moment but they're working on scaling it to larger models. I think it has very big implications on how AI will look like a few years from now. IMO the crucial question is whether they will get hard-limited by model size similarly to Cerebras
reply