Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Much like how many people predicted we'd all be driving flying cars, the people predicting that coding will be replaced by AI just isn't realistic. Primarily because these AI models can literally only exist as long as there's humans constantly creating code for it to read (see: steal) in the first place.

AI cannot sustain itself trained on AI work. If new languages, engines etc pop up it cannot synthesize new forms of coding without that code having existed in the first place. And most importantly, it cannot fundamentally rationalize about what code does or how it functions.

The more you use it or try to integrate it into your workflow (or worse, have others try to integrate it on their own) the more the inherent flaws of the LLMs come into play.



> AI cannot sustain itself trained on AI work.

This isn’t true. You can train LLMs entirely on synthetic data and get strong results. [0]

> If new languages, engines etc pop up it cannot synthesize new forms of coding without that code having existed in the first place.

You can describe the semantics to a LLM, have it generate code, tell it what went wrong (i.e. with compiler feedback), and then train on that. For an example of this workflow in a different context, see [1].

> And most importantly, it cannot fundamentally rationalize about what code does or how it functions.

Most competent LLMs can trivially describe what some code does and speculate on the reasoning behind it.

I don’t disagree that they’re flawed and imperfect, but I also do not think this is an unassailable state of affairs. They’re only going to get better from here.

[0]: https://arxiv.org/abs/2309.05463

[1]: https://voyager.minedojo.org/


> They’re only going to get better from here.

Every AI apology seems to include this statement. It is more likely that LLMs have already hit a local maximum, and the next iterations will provide diminishing incremental returns - if anything at all.


What makes you say that? There are constant improvements in how they’re being trained and what they’re being trained with; there really isn’t any particular reason to believe we’re at a maxima. Especially with multimodality being introduced!


My understanding is that essentially they have been trained on everything (meaning the whole internet), so there is not much left except niche sources adding incremental benefit. But granted I can imagine the data being used more effectively for training, though I doubt there would be a step change in capabilities coming from that - my suspicion is that as well as the data, the techniques have reached a maximum or close to it.


There's still plenty of data out there, including in other languages and undigitised books - and that's before you get to data in other modalities, like speech and videos. Synthetic data can also be used quite effectively if you're trying to distill a model instead of trying to grow capabilities, as Phi-1.5 demonstrates.

For capability growth, well, we don't know what we don't know. There are still many unknowns when it comes to architecture, training, data, modalities, incremental learning, alignment, self-critique, and more. There's plenty of companies and governments trying to find their angle here.

Even if we're at the very peak of what LLMs are capable of -- which seems unlikely -- there's still potentially decades of research in making what we have more effective.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: