So where does your responsibility of this code end ? Do you just push to repo, merge and that's it or do you also deploy, monitor and maintain the production systems? Who handles outages on saturday night, is it you or someone else ?
Iteration is inherent to how computers work. There's nothing new or interesting about this.
The question is who prunes the space of possible answers. If the LLM spews things at you until it gets one right, then sure, you're in the scenario you outlined (and much less interesting). If it ultimately presents one option to the human, and that option is correct, then that's much more interesting. Even if the process is "monkeys on keyboards", does it matter?
There are plenty of optimization and verification algorithms that rely on "try things at random until you find one that works", but before modern LLMs no one accused these things of being monkeys on keyboards, despite it being literally what these things are.
Of course it doesn't matter indeed. What I was hinting at is if you forget all the times the LLM was wrong and just remember that one time it was right it makes it seem much more magical than it actually might be.
Also how were the data races significant if nobody noticed them for a decade ? Were you all just coming to work and being like "jeez I dont know why this keeps happening" until the LLM found them for you?
I agree with your points. Answering your one question for posterity:
> Also how were the data races significant if nobody noticed them for a decade ?
They only replicated in our CI, so it was mainly an annoyance for those of us doing release engineering (because when you run ~150 jobs you'll inevitably get ~2-4 failures). So it's not that no one noticed, but it was always a matter of prioritization vs other things we were working on at the time.
But that doesn't mean they got zero effort put into them. We tried multiple times to replicate, perhaps a total of 10-20 human hours over a decade or so (spread out between maybe 3 people, all CS PhDs), and never got close enough to a smoking gun to develop a theory of the bug (and therefore, not able to develop a fix).
To be clear, I don't think "proves" anything one way or another, as it's only one data point, but given this is a team of CS PhDs intimately familiar with tools for race detection and debugging, it's notable that the tools meaningfully helped us debug this.
First OpenAI video I've ever seen, the people in it all seem incompetent for some reason, like a grotesque version of apple employees from temu or something.
Any old laptop (especially thinkpad) can run linux well. If you want to use it it's not "trouble" per se because once you really know what you are doing there is no trouble(and you can't get to knowing what you're doing without finding out what is it you did that caused you the trouble).
If you just want to use linux so you can tell someone about it, don't bother using linux and stick to what works for you.
It's just one of those consequences of "I don't care about the specifics just put it in production" that ends up in "why didn't you tell me that I completely misunderstood"
reply