That’s kinda what’s going on here, but this is a very narrow version of “teach themselves”. In this case they’re bootstrapping a smarter language model from a dumb language model.
We already know how to train a dumb model (one that can’t use tools) like GPT on a wholly unsupervised dataset. Surprisingly, these dumb language models can be used to re-annotate the training dataset to identify sites that would benefit from using external tools—without retraining the dumb model. Instead you just tell the dumb model with natural language instructions what you want it to do and then feed process each training example. Importantly, the dumb model doesn’t understand the tools, and it can’t actually use them itself.
Next, you use the re-annotated dataset to train a new, smaller language model. Since the updated dataset is now annotated with examples of how to call the external tools, the new model learns how to call external tools—and it does so correctly for new tasks that were never part of the training data.
The big win is that an existing model can be used to annotate the training data, and a smaller output model can outperform the huge dumb model because it knows how to use tools.
But it isn’t generating new data, and it’s not fetching updated data to learn from, and it’s not defining its own tools.
And the model itself is still a pure function: given the same inputs (including random values during sampling), it will always produce the same outputs. So this is kinda saying that a large language model can learn to use domain specific languages as part of its natural language to incorporate knowledge from external tools.
I've read about self-supervised learning, which I think is what you're describing, but is any research being done on continuous self-training models that _do_ generate new data? I'm curious if/when we'll reach that state.
We already know how to train a dumb model (one that can’t use tools) like GPT on a wholly unsupervised dataset. Surprisingly, these dumb language models can be used to re-annotate the training dataset to identify sites that would benefit from using external tools—without retraining the dumb model. Instead you just tell the dumb model with natural language instructions what you want it to do and then feed process each training example. Importantly, the dumb model doesn’t understand the tools, and it can’t actually use them itself.
Next, you use the re-annotated dataset to train a new, smaller language model. Since the updated dataset is now annotated with examples of how to call the external tools, the new model learns how to call external tools—and it does so correctly for new tasks that were never part of the training data.
The big win is that an existing model can be used to annotate the training data, and a smaller output model can outperform the huge dumb model because it knows how to use tools.
But it isn’t generating new data, and it’s not fetching updated data to learn from, and it’s not defining its own tools.
And the model itself is still a pure function: given the same inputs (including random values during sampling), it will always produce the same outputs. So this is kinda saying that a large language model can learn to use domain specific languages as part of its natural language to incorporate knowledge from external tools.