Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

“Each fine tuned toward slightly different aims”

So…a sort of mixture of experts if you will



Kind of. More like a mixture of a mixture of experts.

The problem is MoE on its own isn't able to use the context as a scratch pad for differentiated CoT trees.

So you have a mixture of token suggestions, but a singular chain of thought.

A mixture of both is probably going to perform better than just a mixture of the former, especially given everything we know by now regarding in context learning or the degree of transmission synthetic data is carrying.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: