If you knew that the model never changed it might be very helpful, but most of t...

cwillu · 2025-06-19T17:47:15 1750355235

Even if you used a local copy of a model, it would still just be a semi-quantitative version of “everyone knows ‹thing-you-don't-have-a-grounded-argument-for›”

layer8 · 2025-06-19T17:54:57 1750355697

Their performance also varies depending on load (concurrent users).

BriggyDwiggs42 · 2025-06-19T18:53:27 1750359207

Dear god does it really? That’s very funny.

wiseowise · 2025-06-20T05:47:50 1750398470

Why are you surprised? It’s a computational thing, after all.

BriggyDwiggs42 · 2025-06-20T18:33:28 1750444408

It’s not that crazy, just the architecture of differently quantized models and so on that you’d need to do that is impressive considering.

layer8 · 2025-06-20T22:22:52 1750458172

The models are the same, it's the surrounding processing like "thinking" iterations that are adjusted.

BriggyDwiggs42 · 2025-06-20T23:40:57 1750462857

That only works for LRMs no? Not traditional LLM inference.