The frontier labs distill their own base models all day long. It’s not just some...

coldtea · 2026-05-28T19:14:52 1779995692

>It’s not just something done by nefarious Chinese copycats

And even that would be rich as a accusation from SOTAs that depend on explicitly disregarding millions of training data intellectual property..

flossly · 2026-05-28T23:57:16 1780012636

> nefarious Chinese copycats

LLMs are themselves copy cats.

I say thanks for open sourcing and thereby promoting affordable innovation, instead of "nefarious". :)

manmal · 2026-05-28T19:17:57 1779995877

But how? The training data is the unadulterated content those models are based on? I genuinely don’t understand, no snark.

wtallis · 2026-05-28T23:11:39 1780009899

Raw training data is raw. A really big model trained on it has already done a first-pass of finding patterns and squeezing out redundancy. Re-ingesting the full training set to train a smaller model is probably more expensive, for marginal quality improvement over distilling from the large model.

adgjlsfhk1 · 2026-05-29T00:41:56 1780015316

Distilling from a larger model is not only probably cheaper than from data, it's also likely higher quality. There's pretty strong support for the proposition that NNs learn a smoothed and regularized version of the data. The NNs are likely higher quality than most of the data they are training from.

supern0va · 2026-05-28T18:46:20 1779993980

I think you replied to the wrong parent.