There's still plenty of data out there, including in other languages and undigitised books - and that's before you get to data in other modalities, like speech and videos. Synthetic data can also be used quite effectively if you're trying to distill a model instead of trying to grow capabilities, as Phi-1.5 demonstrates.
For capability growth, well, we don't know what we don't know. There are still many unknowns when it comes to architecture, training, data, modalities, incremental learning, alignment, self-critique, and more. There's plenty of companies and governments trying to find their angle here.
Even if we're at the very peak of what LLMs are capable of -- which seems unlikely -- there's still potentially decades of research in making what we have more effective.
For capability growth, well, we don't know what we don't know. There are still many unknowns when it comes to architecture, training, data, modalities, incremental learning, alignment, self-critique, and more. There's plenty of companies and governments trying to find their angle here.
Even if we're at the very peak of what LLMs are capable of -- which seems unlikely -- there's still potentially decades of research in making what we have more effective.