Schekin's comments

Schekin · 2026-04-03T10:20:27 1775211627

This matches my experience.

The weights usually arrive before the runtime stack fully catches up.

I tried Gemma locally on Apple Silicon yesterday — promising model, but Ollama felt like more of a bottleneck than the model itself.

I had noticeably better raw performance with mistralrs (i find it on reddit then github), but the coding/tool-use workflow felt weaker. So the tradeoff wasn’t really model quality — it was runtime speed vs workflow maturity.