Hacker Newsnew | past | comments | ask | show | jobs | submit | armcat's commentslogin

This looks beautiful and I'm sorry the current state of affairs has made you not want to publish the code, I would love to play around with it. Regarding your decision to build - I feel you, I've had the same happen to me for everything from charting libs to various web components.

As an aside, I really like your web page - simple and clean with images and demos, no bloat.


How quickly they get new models supported on the API and it just works, is insane!

Thanks! We work really hard to make sure we are ready at launch :)

This person did a great comparison against Qwen models, and despite them having 8x less active params, they outperform the Cohere model in every category: https://x.com/DJLougen/status/2057196012918149368?s=20

What an awesome story. Not too many stories about Aussies out there, but what Han brothers are doing with Unsloth in AI, and stories like this one, makes this fellow Aussie super proud!


I had no idea those dan and the team were aussies! damn nice, we dont really seem to shine in tech on the world stage.


Interesting concept! A suggestion: `whichllm <USE_CASE>` would be more beneficial, i.e. `which coding` or `which text-to-video`.


Sun Solaris PPC (CDE) takes me back. I've built plenty of 3G/WCDMA telephony code on that thing. It never let me down.


Any particular reason for BM25? Why not just a table of contents or index structure (json, md, whatever) that is updated automatically and fed in context at query time? I know bag of words is great for speed but even at 1000s of documents, the index can be quite cheap and will maximise precision


do you want to pollute the context with blurbs for docs in disparate topics? cascade filtering, even with naïve bm25, helps reduce the amount of _noise_ that's pushed into the context window. if we reduce the amount of results to consider, further filtering or reranking, with more expensive options, becomes realistic. one could even put a cheaper model in front to further clean the results.


Is it OpenAI Cowork?


As someone who's been working in legaltech space where MS Word add-in chatbot was a killer feature, this is brutal. And in their demo they are hammering on the legal case (redline chat).


Doesn't this cut right into Legora's business?


I still find it incredible at the power that was unleashed by surrounding an LLM with a simple state machine, and giving it access to bash


That is why I am currently looking into building my own simple, heavily isolated coding agent. The bloat is already scary, but the bad decisions should make everyone shiver. Ten years ago people would rant endlessly about things with more then one edge, that requires a glimpse of responsibility to use. Now everyone seems to be either in panic or hype mode, ignoring all good advice just to stay somehow relevant in a chaotic timeline.


At it's heart it's prompt/context engineering. The model has a lot of knowledge baked into it, but how do you get it out (and make it actionable for a semi-autonomous agent)? ... you craft the context to guide generation and maintain state (still interacting with a stateless LLM), and provide (as part of context) skills/tools to "narrow" model output into tool calls to inspect and modify the code base.

I suspect that more could be done in terms of translating semi-naive user requests into the steps that a senior developer would take to enact them, maybe including the tools needed to do so.

It's interesting that the author believes that the best open source models may already be good enough to complete with the best closed source ones with an optimized agent and maybe a bit of fine tuning. I guess the bar isn't really being able to match the SOTA model, but being close to competent human level - it's a fixed bar, not a moving one. Adding more developer expertise by having the agent translate/augment the users request/intent into execution steps would certainly seem to have potential to lower the bar of what the model needs to be capable of one-shotting from the raw prompt.


unfortunately all the agent cli makers have decided that simply giving it access to bash is not enough. instead we need to jam every possible functionality we can imagine into a javascript “TUI”.


If all you want is a program that calls the model in a loop and offers a bash tool, then ask Claude Code to build that. You won't like it though!

For a preview of what it'd be like, just tell your AI chat app that you'll run bash commands for it, and please change the app in your "current directory" to "sort the output before printing it", or some such request.


Claude Code with Opus 4.6 regularly uses sed for multi-line edits, in my experience. On top of it, Pi is famously only exposing 4 tools, which is not just Bash, but far more constrained than CCs 57 or so tools.

So, yes, it can work.


I think the problem/limitation would be as much due to context management as tools. Obviously bash plus a few utilities is sufficient to explore/edit the code base, but I can't imagine this working reliably without the models being specifically trained to use specific tools, and recognize/adapt to different versions of them etc.

Context management, both within and across sessions, seems the bigger issue. Without the agent supporting this, you are at the mercy of the model compacting/purging the context as needed, in some generic fashion, as well as being smart enough to decide to create notes for itself tracking what it is doing, etc.

Apparently CC is 512K LOC, which seems massively bloated, but I do think that things like tools, skills, context management and subagents are all needed to effectively manage context and avoid the issues that might be anticipated by just telling the model it's got a bash tool, and go figure.


You don’t really need most of that stuff. Have sensible steering files. Have the agent keep state itself. Dont bother compacting. Its fine.


I thought CC only supports it's find/replace edit tool (implemented by CC itself, using Node.js for file access), and is platform agnostic. Are you saying that on linux CC offers "sed" as a tool too? I can't imagine it offers "bash" since that's way too dangerous.


Yes, Claude Code has a Bash tool, and Claude in some cases uses the CLI sed utility (via the Bash tool) for file changes (although it has built-in file update), at least on my Linux machine.


Interesting - thanks.

I just asked Claude, and apparently CC makes it's bash tool available on all platforms it runs on (Linux, macOS, Windows WSL, Git for Windows), and doesn't do platform-specifc filtering of bash commands, which would seem to make for some interesting incompatibilities - GNU utils (sed, grep, find) on Linux and Windows, but BSD variants on macOS.


Claude code will semi-regularly try to use GNU utils on my Mac


I think you get him wrong? He is already concerned about "bash on steroids" and current tools add concerning amounts of steroids to everything.


Claude Code gets smoked on benchmarks by an agent that has a single tool: tmux. So I think they might actually like that quite a bit.


What benchmarks are you referring to?


> If all you want is a program that calls the model in a loop and offers a bash tool, then ask Claude Code to build that. You won't like it though!

Okay sure it’s technically more than just bash, but my own for-fun coding agent and pi-coding-agent work this way. The latter is quite useful. You can get surprisingly far with it.


i did.. and thats what i use. obviously its a little more than just a tool that calls bash but it is considerably less than whatever they are doing in coding agents now.


If you saw the Claude Code leak, you’d know the harness is anything but simple. It’s a sprawling, labyrinthine mess, but it’s required to make LLMs somewhat deterministic and useful as tools.


That’s also because of how Claude Code was written. It doesn’t have to be that way per se.


Hypothesis: it's a sprawling, labyrinthine mess because it was grown at high speed using Claude Code.


There’s a lot of redundancy, because there has to be to make the system useful. It’s a hacked together mess.


It's pretty easy to get determinism with a simple harness for a well-defined set of tasks with the recent models that are post-trained for tool use. CC probably gets some bloat because it tries to do a LOT more; and some bloat because it's grown organically.


>It's pretty easy to get determinism with a simple harness for a well-defined set of tasks with the recent models that are post-trained for tool use.

Do you have a source? Claude Code is the only genetic system that seems to really work well enough to be useful, and it’s equipped with an absolutely absurd amount of testing and redundancy to make it useful.


Should I read that as 'generic system'? Most hard data is with company internal evals, but for the well defined tasks externally it's been pretty easy to spin up a basic tool loop and validate. Did you have something in mind? [I don't necessarily count 'coding' as well-defined in the generic sense, so I suspect we're coming at this from different scopes re: the definition of 'LLMs somewhat deterministic and useful as tools']


I found replacing bash with python to be more useful… that way, it can craft whatever it desires without having to pipe a billion pieces of gum together


Tools gave humans the edge over other animals.


And those tools regularly burnt cities to ashes. Took a long time to get it under control.


*burn - I'm not sure we've gotten that under control quite yet


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: