Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes, the current technology cannot replace an engineer.

The easiest way to understand why is by understanding natural language. A natural language like english is very messy and and doesn't follow formal rules. It's also not specific enough to provide instructions to a computer, that's why code was created.

The AI is incredibly dumb when it comes to complex tasks with long range contexts. It needs an engineer that understands how to write and execute code to give it precise instructions or it is useless.

Natural Language Processing is so complex, it started around the end of world war two and we are just now seeing innovation in AI where we can mimmick humans, where the AI can do certain things faster than humans. But thinking is not one of them.



LOL. Figuring out how to solve IMO-level math problems without "thinking" would be even more impressive than thinking itself. Now there's a parrot I'd buy.


It isn't thinking it's RL with reward hacking.

It's like taking a student who wins a gold in IMO math, but can't solve easier math problems, because they did not study those type of problems. Where a human who is good at IMO math generalizes to all math problems.

It's just memorizing a trajectory as part of a specific goal. That's what RL is.


It's like taking a student who wins a gold in IMO math, but can't solve easier math problems

I've tried to think of specific follow-up questions that will help me understand your point of view, but other than "Cite some examples of easier problems than a successful IMO-level model will fail at," I've got nothing. Overfitting is always a risk, but if you can overfit to problems you haven't seen before, that's the fault of the test administrators for reusing old problem forms or otherwise not including enough variety.

GPT itself suggests[1] that problems involving heavy arithmetic would qualify, and I can see that being the case if the model isn't allowed to use tools. However, arithmetic doesn't require much in the way of reasoning, and in any case the best reasoning models are now quite decent at unaided arithmetic. Same for the tried-and-true 'strawberry' example GPT cites, involving introspection of its own tokens. Reasoning models are much better at that than base models. Unit conversions were another weakness in the past that no longer seems to crop up much.

So what would some present-day examples be, where models that can perform complex CoT tasks fail on simpler ones in ways that reveal that they aren't really "thinking?"

1: https://chatgpt.com/share/695be256-6024-800b-bbde-fd1a44f281...


In response to your direct question -> https://gail.wharton.upenn.edu/research-and-insights/tech-re...

“ This indicates that while CoT can improve performance on difficult questions, it can also introduce variability that causes errors on “easy” questions the model would otherwise answer correctly.”

Other response to strawberry example; There are 25,000 people employed globally that repair broken responses and create training data, a big whack-a-mole effort to remediate embarrassing errors.


(Shrug) Ancient models are ancient. Please provide specific examples that back up your point, not obsolete .PDFs to comb through.


Your ideas are quite weak and you ask for overwhelming proof, but not willing to read any research. That’s just intellectually lazy.

Perhaps if you took some time to learn from the experts, those who create these systems and really understand what’s happening you would realize these limitations in AI are widely known.

Take a look around the 5 minute mark.

https://youtu.be/PqVbypvxDto?si=gZq-2yEuE4sTeQZe

Just understand you are dead wrong in your assumptions.


You appear to be arguing with someone who isn't here (or else you replied to the wrong post.) Your personal fallacy of choice appears to be, "LLMs aren't godlike and infallible only a few years after being invented, despite absolutely no one ever claiming they were, so it's all a bunch of empty hype."

No one cares about the state of the art. Only the first couple of time derivatives matters. You're not getting smarter, but the models are.

How are those examples coming along, by the way? The ones that prove that IMO-level models aren't reasoning, but just getting really, really lucky?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: