Hacker Newsnew | past | comments | ask | show | jobs | submit | james2doyle's commentslogin

Hmm just tried the google/gemma-4-31B-it through HuggingFace (inference provider seems to be Novita) and function/tool calling was not enabled...

Yeah you can see here that tool calling is disabled: https://huggingface.co/inference/models?model=google%2Fgemma...

At least, as of this post


Tool calling is enabled now

Hosted on Parasail + Google (both for free, as of now) themselves, probably would give those a shot

None of the Qwen 3.5 models seem present? I’ve heard people are pretty happy with the smaller 3.5 versions. I would be curious to see those too.

I would also be interested to see "KAT-Coder-Pro-V2" as they brag about their benchmarks in these bots as well


If they use OpenRouter pricing then the Qwen3.5 models are going to be poor value.

The Qwen3.5 27B model on OR is $1.56/million tokens out (it used to be $2.4/mil).

Meanwhile Minimax M2.7 (a much larger model) is $1.2/mil out.

The smaller and medium tier Qwen3.5 models are only really cost effective if you run them yourself.


Oh I never noticed that. Good to call out. But that would put it much closer to Minimax M2.7 in terms of price than to the likes of Mimo V2 Pro, and Gemini Flash 3 preview, which are both on the list

Is Minimax M2.7 better than Qwen3.5 27B, or is it just bigger?

Minimax M2.7 is similar to sonnet in my tests. This is the first non OAI/Anthropic model I use for coding. It does require more steering, though.

More steering than Sonnet? What is your experience?

I'm about 2 days into transitioning, using MiMo V2 Pro in place of Opus and MiniMax M2.7 in place of Sonnet.

I'm finding that the extra "hand holding" that MiMo and MiniMax need isn't really "extra." The Anthropic models happily agree to a plan and then do something else entirely way too often.

With MiMo and MiniMax I'm just spreading the attention throughout the day instead of big spikes of frustration figuring out where Claude went off the rails.


Thank for responding. So you are using MiMo V2 Pro to plan and then asking MiniMax M2.7 to read that plan file and execute? Or how the workflow looks like?

Pi/Opencode/Kilocode? Just curious.

I am using Opencode mostly and thinking to abandon Copilot so looking for something similar.


Yes, it's significantly better.

What caused the switch? Also, are you still trying to use Claude models in OpenCode?

Sorry, I missed part of your question:

What caused the switch was that we're building AI solutions for sometimes price-conscious customers, so I was already familiar with the pattern of "Use a superior model for setting a standard, then fine-tuning a cheaper one to do that same work".

So I brought that into my own workflows (kind of) by using Opus 4.6 to do detailed planning and one 'exemplar' execution (with 'over documentation' of the choices), then after that, use Opus 4.6 only for planning, then "throw a load of MiniMax M2.5s at the problem".

They tend to do 90% of the job well, then I sometimes do a final pass with Opus 4.6 again to mop up any issues, this saves me a lot of tokens/money.

This pattern wasn't possible with Claude Code, thus my move to Open Code.


You can access anthropic models with subscription pricing via a copilot license.

Pretty sure that's against TOS.

Edit: it's not. https://github.blog/changelog/2026-01-16-github-copilot-now-...

They must be eating insane amounts of $$$ for this. I wouldn't expect it to last


No, Claude on GitHub Copilot is billed at 3X the usage rate of the other models e.g. GPT-5.4 and you get an extremely truncated context window.

See https://models.dev for a comparison against the normal "vanilla" API.


Yes I regularly plan in Opus 4.6 and execute in “lesser” models ie MiniMax

The only similarity is that they both say "you’re absolutely right" when you point out their obvious mistakes


_hyper-competent collaborator who may completely make things up occasionally and will sometimes give different answers to the same question*_


So, indistinguishable from a human then


No. A competent human doesn't make things up, he admits ignorance. He also only very rarely changes answers he previously gave.


I’ve used M2.5 in OpenCode using their Zen inference. I found it to be decent. Did not really seem comparable to Opus 4.5 for "quality" output. As in, I often tweaked the output more when using M2.5.

I think the best thing was the speed. If it is going to be wrong, I would prefer it to be wrong quickly.


They have the details up on their site now: https://www.minimax.io/models/text/m27


There is a blog post now: https://mistral.ai/news/leanstral


> Leanstral > Our first open-source code agent designed for Lean 4, built for formal proof engineering in realistic repositories. 119B parameters with 6.5B active.

Mentioned in the 2.5.0 release of the Vibe CLI tool: https://github.com/mistralai/mistral-vibe/releases/tag/v2.5.... A HuggingFace page is linked for the weights but it returns a 404: https://huggingface.co/mistralai/Leanstral-120B-A6B-2603


Mentioned in the 2.5.0 release of the Vibe CLI tool: https://github.com/mistralai/mistral-vibe/releases/tag/v2.5....


This article sure uses a lot of em dashes. I see 9 in the article body.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: