“Each fine tuned toward slightly different aims” So…a sort of *mixture of expert...

kromem · on April 7, 2024

Kind of. More like a mixture of a mixture of experts.

The problem is MoE on its own isn't able to use the context as a scratch pad for differentiated CoT trees.

So you have a mixture of token suggestions, but a singular chain of thought.

A mixture of both is probably going to perform better than just a mixture of the former, especially given everything we know by now regarding in context learning or the degree of transmission synthetic data is carrying.