What are people using to cost efficiently use this? I was using a Google Ultra s...

UncleOxidant · 2026-02-25T04:04:00 1771992240

Run Qwen3-coder-next locally. That's what I'm doing (using LMstudio). It's actually a surprisingly capable model. I've had it working on some LLVM-IR manipulation and microcode generation for a kind of VLIW custom processor. I've been pleasantly surprised that it can handle this (LLVM is not easy) - there are also verilog code that define the processor's behavior that it reads to determine the microcode format and expected processor behavior. When I do hit something that it seems to struggle with I can go over to antigravity and get some free Gemini 3 flash usage.

zirror · 2026-02-25T05:30:16 1771997416

What kind of hardware do you run it on?

UncleOxidant · 2026-02-25T19:34:54 1772048094

Framework Desktop (AMD Strix Halo with 128GB). Runs it around 27 tok/sec which is quite acceptable.

kristianpaul · 2026-02-25T04:07:50 1771992470

Same here

lambda · 2026-02-25T02:43:04 1771987384

Qwen3 Coder Next in llama.cpp on my own machine. I'm an AI hater, but I need to experiment with it occasionally, I'm not going to pay someone rent for something they trained on my own GitHub, Stack overflow, and Reddit posts.

beacon294 · 2026-02-25T01:47:12 1771984032

FWIW the lockout probably wasn't related... maybe the content you were working on or your context window management somehow triggered something?

rahimnathwani · 2026-02-25T03:09:59 1771988999

You could try minimax 2.5 via openrouter.

ursuscamp · 2026-02-25T03:21:41 1771989701

MiniMax has an incredibly affordable coding plan for $10/month. It has a rolling five hour limit of 100 prompts. 100 prompts doesn't sound like much, but in typical AI company accounting fashion, 1 prompt is not really 1 prompt. I have yet to come even close to hitting the limit with heavy use.

raffkede · 2026-02-25T10:53:01 1772016781

Kimi code with the .99 Cent plan is not to bad if you're savy