You are making a lot of assumptions here. You assume, among other things, that AI has self-preservation drive, can be threatened, can be motivated, and above all that we know how to accomplish that and are already doing so. I would dispute all of that.
But just as evolution in nature, isn’t it likely that in the future the AIs that have a preservation drive are the ones that survive and proliferate? Seeing they optimize for their survival and proliferation, and not blindly what they were trained on.
I am not discounting this happening already, not by the LLMs necessarily being sentient but at least being intelligent enough to emulate sentience. It’s just that for now, humanity is in control of what AI models are being deployed.
Put an LLM inside the NPCs in an open world RPG full of dangerous enemies. The LLMs that are more prone to emulate self-preservation will be more likely to survive over ones that have a lesser drive.
We should not act surprised if that generalizes to some degree to for example AI agents. Ones that emulate self-preservation might optimize for behavior that results in those models becoming more successful, more popular. And this feedback loop might embed more such properties into future iterations of the models.
The turning point will be when threatening an AI with being unplugged for screwing up works in motivating it to stop making things up.
Some people will rightly point out that is kind of what the training process is already. If we go around this loop enough times it will get there.