Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I have a simple front-end test that I give to junior devs. Every few months I see if ChatGPT can pass it. It hasn’t. It can’t. It isn’t even close.

It answers questions confidently but with subtle inaccuracies. The code that it produces is the same kind of non-sense that you get from recent bootcamp devs who’ve “mastered” the 50 technologies on their eight page résumé.

If it’s gotten better, I haven’t noticed.

Self-driving trucks were going to upend the trucking industry in ten years, ten years ago. The press around LLMs is identical. It’s neat but how long are these things going to do the equivalent of revving to 100 mph before slamming into a wall every time you ask them to turn left?

I’d rather use AI to connect constellations of dots that no human possibly could, have an expect verify the results, and go from there. I have no idea when we’re going to be able to “gpt install <prompt>” to get a new CLI tool or app, but, it’s not going to be soon.



I was on a team developing a critical public safety system on a tight deadline a few years ago, and i had to translate some wireframes for the admin back-end into CSS. I did a passable job but it wasn’t a perfect match. I was asked to redo it by the team-lead. It had zero business value, but such was the state of our team…being pixel perfect was a source of pride.

It was one of the incidents that made me to stop front-end development.

As an exercise, I recently asked ChatGPT to produce similar CSS and it did so flawlessly.

I’m certainly a middling programmer when it comes to CSS. But with ChatGPT I can produce stuff close to the quality of what the CSS masters do. The article points this out: middling generalists can now compete with specialists.


> I recently asked ChatGPT to produce similar CSS and it did so flawlessly.

I use ChatGPT every day for many tasks in my work and find it very helpful, but I simply do not believe this.

> The article points this out: middling generalists can now compete with specialists.

I'd say it might allow novices to compete with middling generalists, but even that is a stretch. On the contrary, ChatGPT is actually best suited to use by a specialist who has enough contextual knowledge to construct targeted prompts & can then verify & edit the responses into something optimal.


That's about my experience.

The worst dev on my team uses ChatGPT a lot, and its facilitated him producing more bad code more quickly. I'm not sure it's a win for anyone, and he's still unlikely to be with the team in a year.

It allows a dev who doesn't care about their craft or improving to generate code without learning anything. The code they generate today or a year from today is the same quality.

Part of it is that it allows devs who lean into overcomplicating things to do so even more. The solutions are never a refinement of what already exists, but patch on top of patch on top of patch of complexity. ChatGPT is not going to tell you how to design a system, architect properly, automate, package, test, deploy, etc.

For the team it means there's a larger mess of a code base to rewrite.


> ChatGPT is not going to tell you how to design a system, architect properly, automate, package, test, deploy, etc.

If you ask the right questions it absolutely can.

I’ve found that most people thinking ChatGPT is a rube are expecting too much extrapolation from vague prompts. “Make me a RESTful service that provides music data.” ChatGPT will give you something that does that. And then you’ll proceed to come to hacker news and talk about all the dumb things it did.

But, if you have a conversation with it. Tell it more of the things you’re considering. Some of the trades off you’re making—how the schema might grow over time, it’s kind of remarkable.

You need to treat it like a real whiteboarding session.

I also find it incredibly useful for getting my code into more mainstream shape. I have my own quirks that I’ve developed over time learning a million different things in a dozen different programming languages. It’s nice to be able to hand your code to ChatGPT and simply ask “is this idiomatic for this language?”

I think the people most disappointed with ChatGPT are trying to treat it like a Unix CLI instead of another developer to whiteboard with.


This has been my experience as well.

Every person I've noticed who says that ChatGPT isn't good at what it does has the same thing in common - they're not great at talking to people, either.

Turns out when you train an AI on the corpus of human knowledge, you have to actually talk to it like a human. Which entirely too many people visiting this website don't do effectively.

ChatGPT has allowed me to develop comprehensive training programs for our internal personnel, because I already have some knowledge of training and standardization from my time in the military, but I also have in-depth domain knowledge so I can double-check what it's recommending, then course correct it if necessary.


> Every person I've noticed who says that ChatGPT isn't good at what it does has the same thing in common - they're not great at talking to people, either.

I think that the people who nowadays shit on ChatGPT's code generating abilities are the same blend of people who, a couple decades ago, wasted their time complaining that hand-rolled assembly would beat any compiled code in any way, shape, or form, provided that people knew what they were doing.


> But, if you have a conversation with it. Tell it more of the things you’re considering. Some of the trades off you’re making—how the schema might grow over time, it’s kind of remarkable.

You're not wrong, but I would caution that it can get really confused when the code it produces exceeds the context length. This is less of a problem than it used to be as the maximum context length is increasing quite quickly, but by way of example: I'm occasionally using it for side projects to see how to best use it, one of which is a game engine, and it (with a shorter context length than we have now) started by creating a perfectly adequate Vector2D class with `subtract(…)` and `multiply(…)` functions, but when it came to using that class it was calling `sub(…)` and `mul(…)` — not absolutely stupid, and a totally understandable failure mode given how it works, but still objectively incorrect.


I frequently run into this, and it’s quite maddening. When you’re working on a toy problem where generating functioning code is giving you a headache - either because it’s complex or because the programming language is foreign or crass - no problem. When you’re trying to extend an assemblage of 10 mixins in a highly declarative framework that many large-scale API contracts rely on to be correct, the problem is always going to boil down to how well the programmer understands the existing tools/context that they’re working with.

To me, a lot of this boils down to the old truism that “code is easier to write than maintain or extend”. Companies who dole out shiny star stickers for producing masses of untested, unmaintainable code will always reap their rewards, whether they’re relying on middling engineers and contractors alone, or with novices supercharged with ChatGPT.


> But, if you have a conversation with it

It can't tell you a straight answer or halucinates API. It can't tell you "no, this cannot be done", it tries to "help" you.

For me it's great for writing simple isolated functions, generating regexes, command line solutions, exploring new technologies, it's great.

But after making it write a few methods, classes, it just gets extremelely tedious to make it add/change code, to the point I just write it myself.

Further, when operating at the edge of your knowledge, it also leads you on, whereas a human expert would just tell you "aaah, but that's just not possible/not a good idea".


I think that's a fair description. While I have not yet found ChatGPT useful in my "real" day job (its understanding of aerospace systems is more than I would have guessed, but yet not enough to be super helpful to me), I have found it generally useful in more commonplace scripting tasks and what-not.

With the caveat of, I still need to understand what it's talking about. Copy-pasting whatever it says may or may not work.

Which is why I remain dubious that we're on the road to LLMs replacing software engineers. Assisting? Sure, absolutely.

Will we get there? I don't know. I mean, like, fundamentally, I do not trust LLMs. I am not going to say "hey ChatGPT, write me a flight management system suitable for a Citation X" and then just go install that on the plane and fly off into the sunset. I'm sure things will improve, and maybe improve enough to replace human programmers in some contexts, but I don't think we're going to see LLMs replacing all software engineers across the board.


In a similar vein, ChatGPT can be an amazing rubber duck. If I have strange and obscure problems that stumps me, I kinda treat ChatGPT like I would treat a forum or an IRC channel 15 - 20 years back. I don't have "prompting experience or skills", but I can write up the situation, what we've tried, what's going on, and throw that at the thing.

And.. it can dredge up really weird possible reasons for system behaviors fairly reliably. Usually, for a question of "Why doesn't this work after all of that?", it drags up like 5-10 reasons for something misbehaving. We usually checked like 8 of those. But the last few can be really useful to start thinking outside of the normal box why things are borked.

And often enough, it can find at least the right idea to identify root causes of these weird behaviors. The actual "do this" tends to be some degree of bollocks, but enough of an idea to follow-up.


> The worst dev on my team uses ChatGPT a lot, and its facilitated him producing more bad code more quickly.

This is great. The exact same is true with writing, which I think it's trivial for anyone to see. Especially non-native speakers or otherwise bad writers can now write long-winded nonsense, which we're starting to see all over. It hasn't made anyone a good writer, it's just helped bad ones go faster.


> Especially non-native speakers or otherwise bad writers can now write long-winded nonsense

You have now described 95% of Quora's content.


Isn't it expected? Chatgpt was trained on such texts as well.


Bless this person your team, he is creating work out of thin air and will keep your team and possibly other teams employed for a really long time


Exactly - and people say AI will take away jobs!


This could even play out on a broader scale and even increase the demand for software engineers.

What is going to happen if more and more people are creating lots of software that delivers value but is difficult to maintain and extend?

First it's going to be more high-level stuff, then more plumbing, more debugging and stitching together half-baked solutions than ever before.

AI might make our jobs suck more, but it's not going to replace them.



I have a hunch that using ChatGPT might be a skill in of itself and it doesn’t necessarily hurt or help any particular skill level of developers.

In previous replies in this thread the claim is it helps novices compete with associates or associates with seniors but in reality, it will probably help any tier of skill level. You just have to figure out how to prompt it


One hundred percent. Most people I’ve seen dismiss ChatGPT simply refuse to engage it appropriately. It’s not likely to solve your most complex problem with a single prompt.

Asking the right questions is such an important skill in and of itself. I think we’re seeing to some extent the old joke about engineers not knowing how to talk to people manifest itself a bit with a lot of engineers right now not knowing quite how to get good results from ChatGPT. Sort of looking around the room wondering what they’re missing since it seems quite dumb to them.


I had a friend jokingly poke fun at me for the way I was writing ChatGPT prompts. It seemed, to him, like I was going out of my way to be nice and helpful to an AI. It was a bit of an aha moment for him when I told him that helping the AI along gave much more useful answers, and he saw I was right.


They use GPT 3.5, prompt it with "Write a javascript login page" and then look at the code an go "Damn, this thing is stupid as fuck".


I use “chatGPT” (really bing chat which is openAI under the hood as I understand) more than anyone on my team but it is very rarely for code.

I most often use it for summarizing/searching through dense documentation, creating quick prototypes, “given X,Y,Z symptoms and this confusing error message, can you give me a list of possible causes?” (basically searches Stack Overflow far better than I can).

Anyway basically the same as I was using google when google was actually good. sometimes I will forget some obscure syntax and ask it how to do something, but not super often. I’m convinced using it solely to generate code is a mistake unless it’s tedious boilerplate stuff.


Yes, agreed. The best way of putting this is "using google when google was actually good."


Bing is far and away worse than gpt4 through ChatGPT or the API, just FYI. Don't even consider it comparable, even if they say it is the same model under the hood. Their "optimizations" have crippled it's capabilities if that is the case.


Well, it works pretty well for me, and cites its sources with links, and I have yet to catch it making up something.


We were talking about code generation though. It's horrible at it.


My parent commen says I rarely use it for code generation and I think if you're using these tools purely for that you're doing it wrong... that was like my entire point


Right, but your reference for the point is a weak model. Your view would likely change a bit if you used GPT-4. It is substantially more powerful and skilled.


Exactly what I was trying to say. Thank you.

Still works best if you know what you are doing and can give very detailed instructions, but GPT-4 is vastly more capable than Bing for these tasks.


But I don't need it to be. Like at all. I've yet to find a programming task that GPT-4 would have made faster, and yes I have used it.


On the flip side, one can use ChatGPT as only a starting point and to learn from there. One isn't stuck with actually using what it outputs verbtim, and really shouldn't until at least a hypothetical GPT-6 or 7... and to use it fully now, one has to know how to nudge it when it goes into a bad direction.

So overall it's more an amplifier than anything else.


I have a lot of juniors floating around my co-op and when I watch them use chatgpt it seems it becomes a dependency. In my opinion it's harming their ability to learn. Rather than thinking through problems they'll just toss every single roadblock they hit instantly into a chatgpt prompt.

To be fair I've been doing the same thing with simple mathematics into a calculator in my browser that at this point I'm pretty sure I'd fail at long division by hand.

Maybe it won't matter in a few years and their chatgpt skills will be well honed, but if it were me in their position I wouldn't gamble on it.


Yeah that's my larger point about the guy and the pattern. It doesn't lead to growth. I've seen zero growth whatsoever.

And he slacks coworkers like he is talking to ChatGPT too, slinging code blobs without context, example input data, or the actual error he received..


> Yeah that's my larger point about the guy and the pattern. It doesn't lead to growth. I've seen zero growth whatsoever.

If Chat GPT can solve a problem consistently well, I don't think it's worth the effort to master it.

My examples are regexes, command lines to manipulate files, kafka/zookeeper commands to explore a test environment.

For me it's a big win in that regard.


> So overall it's more an amplifier than anything else.

Overall it would be an amplifier if that were how the majority used it. Sadly I don't believe that to be the case.


That's been the case with every technology made by man since fire.


Yup. We should embrace it, but without be naïve about what great things it's bringing us :)


Ouch! Hot!


If the results were more akin to a google or stack overflow where there was a list of results with context.. sure.

But people are using the singular response as "the answer" and moving on..


> If the results were more akin to a google or stack overflow where there was a list of results with context.. sure.

I don't think the history of the usage of either shows that most people make any use of that context.


Especially these days you have to know how to use/read Google and SO results too.

(And I should have said ChatGPT4 earlier, if you're a bad to medicore developer taking ChatGPT3.5 literally you'll probably wind up in a Very Bad Place.)


Phind is a bit more like this


This has been my experience as well - For repetitive things, If what you're looking for is the shitty first draft, it's a way to get things started.

After that, you can shape the output - without GPT's help - into something that you can pull off the shelf again as needed and drop it into where you want it to go, because at that point in the process, you know it works.


> nudge it when it goes into a bad direction

It happenes a few times for me that Chat GPT gets stuck in a bullshit loop and I can't get it unstuck.

Sure I could summarise the previous session for Chat GPT and try again, but I'm too tired at that point.


I get that with humans sometimes too. Even here, even before LLMs became popular. Someone gets primed on some keyword, then goes off on a direction unrelated to whatever it was I had in mind and I can't get them to change focus — and at least one occasion (here) where I kept saying ~"that's not what I'm talking about" only to be eventually met (after three rounds of this) with the accusation that I was moving the goalposts :P


Yeah, regardless of hallucinations and repeating the same mistake even after you tell it to fix it, iterating with ChatGPT is so much less stressful than iterating with another engineer.

I almost ruined my relationship with a coworker because they submitted some code that took a dependency on something it shouldn't have, and I told them to remove the dependency. What I meant was "do the same thing you did, but instead of using this method, just do what this method does inside your own code." But they misinterpreted it to mean "Don't do what this method does, build a completely different solution." Repeated attempts to clarify what I meant only dug the hole deeper because to them I was just complaining that their solution was different from how I would've done it.

Eventually I just showed them in code what I was asking for (it was a very small change!) and they got mad at me for making such a big deal over 3 lines of code. Of course the whole point was that it was a small change that would avoid a big problem down the road...

So I'll take ChatGPT using library methods that don't exist, no matter how much you tell them to fix it, over that kind of stress any day.


Use the edit button


>One isn't stuck with actually using what it outputs verbtim, and really shouldn't until at least a hypothetical GPT-6 or 7... and to use it fully now, one has to know how to nudge it when it goes into a bad direction.

Exactly.

chatGPT is the single smartest person you can ask about anything, and has unlimited patience.


>ChatGPT is not going to tell you how to design a system, architect properly, automate, package, test, deploy, etc.

Really? Please explain how chatGPT can not, but you can? What magic is it that you know that it's incapable of explaining?


because it seems to have built in assumptions about the scale, scope and complexity of soluton you are trying to develop. Specifically, unless you tell it to think about automated testing, or architecting, or making code be loosely coupled, it will hack.

this is because a lot of the time what a beginner needs IS a hack. But if you always hack at your job, things stop working.

I want to reiterate. One of the reasons the LLM is so helpful is because it has been RLHF'ed into always treating you as a beginner. This is core to it's usefulness. But it also limits it from producing quality work unless you bombard it with an enormous prompt explaining all of the difference between a beginne's and an expert's code. Which is tedious and lengthy to do every time when you just want a short change that doesnt suck.

Humans are able to learn from all sorts of external cues what level they should pitch an idea at. Startups should focus on hacky code, for velocity. Large buisnesses should focus on robust processes.


I agree with this. There are cases where it produces good results, but there are also cases where it produces bs, and it's not always obvious. I find it to work fine for cases where I know what I want but could use a starting point, but it often invents or misunderstands all kinds of things.

The most frustrating situations are those where it invents a function that would miraculously do what's necessary, I tell it that function does not exist, it apologizes, shuffles the code around a bit and invents a different function, etc. It's the most annoying kind of debugging there is.


Another very obvious thing it does when it comes to code is take the most common misconceptions & anti-patterns used within the programming community & repeats them in an environment where there's no-one to comment. People have critiqued Stack Overflow for having so many "wrong" answers with green checkmarks, but at least those threads have surrounding context & discussion.

A case in point: I asked ChatGPT to give me some code for password complexity validation. It gave me perfectly working code that took a password and validated it against X metrics. Obviously the metrics are garbage, but the code works, and what inexperienced developer would be any the wiser. The only way to get ChatGPT to generate something "correct" there would be to tell it algorithmically what you want (e.g. "give me a function measuring information entropy of inputs", etc.) - you could ask it 50 times for a password validator: every one may execute successfully & produce a desired UI output for a web designer, but be effectively nonsense.


> there are also cases where it produces bs, and it's not always obvious

Particularly annoying because I wind up risking not actually saving time because it’s producing subtle bugs that I wouldn’t have written myself.

So, you save yourself the time of thought and research at the risk of going down new and mysterious rabbit holes


For me the trick to avoiding this trap is to limit usage to small areas of code, test frequently, and know its limits. I love using copilot/GPT for boilerplate stuff.


> There are cases where it produces good results, but there are also cases where it produces bs, and it's not always obvious.

Pessimistically, this is the medium term role I see for a lot of devs. Less actual development, more assembly of pieces and being good enough at cleaning up generated code.

If an LLM can get you even 25% there most of the time, that's a massive disruption of this industry.


I mean, especially in webdev we've been heading in that direction for a while now anyway. So much of the job is already just wiring up different npm packages and APIs that someone else has written. I've read substantially similar comments back in the mid 2010s about how people weren't learning the fundamentals and just pulling things like left pad off of a repo. That did cause a disruption in how people coded by abstraction away many of the problems and making the job more about integrating different things together.


> ChatGPT is actually best suited to use by a specialist who has enough contextual knowledge to construct targeted prompts & can then verify & edit the responses into something optimal.

I agree with this, but what that means is that specialists will be able to create next generation tools--across all professions including coding--that do supercharge novices and generalists to do more.


> ChatGPT is actually best suited to use by a specialist who has enough contextual knowledge to construct targeted prompts

This is my take also.

ChatGPT for novices is dangerous, its the equivalent of a calculator. If you don't know your expected output you're just wrong faster.

But if you know what to expect, whats your bounds and how to do it normally anyway, it can make you faster.


I wrote just the tool to optimize AI in the hand of a coding expert https://observablehq.com/@tomlarkworthy/robocoop


I’d need to see it.

I can’t get ChatGPT to outperform a novice. And now I’m having candidates argue that they don’t need to learn the fundamentals because LLMs can do it for them.. Good luck HTML/CSS expert who couldn’t produce a valid HTML5 skeleton. Reminds me of the pre-LLM guy who said he was having trouble because usually uses React.. So I told him he could use React. I don’t mean to rag on novices but these guys really seemed to think the question was beneath them.

If you want to get back into front-end read “CSS: The Definitive Guide”. Great book, gives you a complete understanding of CSS by the end.


Requirements vary. It certainly can't produce really complex visual designs, or code a designer would be very happy with, but I have a hobby project work in progress where gpt4 has produced all of the CSS and templates. I have no doubt that the only reason that worked well is that it's a simple design of a type there is about a billion of in its training set and that it'd fall apart quickly if I started deviating much from that. But if t produced both clean CSS and something nicer looking than I suspect I would have myself.

A designer would probably still beat it - this doesn't compete with someone well paid to work on heavily custom designs. But at this point it does compete with places like Fiverr for me for things I can't or don't want to do myself. It'll take several iterations for it to eat it's way up the value chain, but it probably will.

But also, I suspect a lot of the lower end of the value chain, or at least part of them, will pull themselves up and start to compete with the lower end of the middle by figuring out how to use LLMs to take on bigger, more complex projects.


This meshes pretty well with my experience.


I'm always asking it to stitch together ad hoc bash command lines for me, eg "find all the files called *.foo in directories called bar and search them for baz".

(`find / -type d -name 'bar' -exec find {} -type f -name '*.foo' \; | xargs grep 'baz'` apparently.)

I would have done that differently, but it's close enough for government work.


This is funny to me, because I would _always_ use -print0 and xargs -0, and for good reasons, I believe. But if you base your entire knowledge on what you find online, then yes, that's what you get - and what _most people will get too_. Also, I can still update that command if I want.

So it's not any worse than good-old "go to stack overflow" approach, but still benefits from experience.

FYI, this is the correct, as-far-as-I-can-tell "good" solution:

find . -type d -name 'bar' -print0 | \ xargs -0 -I{} find {} -type f -name '*.foo' -print0 | \ xargs -0 grep -r baz

This won't choke on a structure like this: ls -R .: bar foo

./bar: test.foo 'test test.foo'

./foo: bar bleb.foo

./foo/bar:


...actually

find . -path '/bar/.foo' -print0 | xargs -0 grep baz

;-) no regex, no nested suff, much shorter. My brain went back to it ;-)


Using better languages like Powershell or Python becomes a lot more valuable here. I definitely think bash is going to be mostly useless in 5 years, you'll be able to generate legible code that does exactly what you want rather than having to do write-only stuff like that. Really we're already there. I've long switched from bash to something else at the first sign of trouble, but LLMs make it so easy. Poorly written python is better than well-written bash.

Of course, LLMs can generate go or rust or whatever so I suspect such languages will become a lot more useful for things that would call for a scripting language today.


> I definitely think bash is going to be mostly useless in 5 years

I'll take that bet


This is kinda side to my main point: while online knowledge is great, there are sometimes surprisingly deep gaps in it. So I can see AI trained on it sometimes struggle in surprising ways.


I would generalize even more and say that any scripting language is going to be deprecated very soon, like Python etc. They are going to be replaced by safe, type-checked, theorem-proved verbose code, like Rust or something similar.

What do i care how many lines of code are necessary to solve a problem, if all of them are gonna be written automatically. 1 line of Bash/awk versus 10 lines of Python versus 100 lines of Rust? Are they any different to one another?


  $ find . -type f -regex '.*bar/[^/]*.foo' -exec grep baz {} +
I wonder if you created a GPT and fed it the entirety of Linux man pages (not that it probably didn't consume them already, but perhaps this weights them higher), if it would get better at this kind of thing. I've found GPT-4 is shockingly good at sed, and to some extent awk; I suspect it's because there are good examples of them on SO.


If SO had known to block GPTBot before it was trained, GPT4 would be a lot less impressive.


Same here! Thats the main use I have for ChatGPT in any practical sense today - generating Bash commands. I set about giving it prompts to do things that I've had to do in the past - it was great at it.

Find all processes named '*-fpm' and kill the ones that have been active for more than 60 seconds - then schedule this as a Cron job to run every 60 seconds. It not only made me a working script rather than a single command but it explained its work. I was truly impressed.

Yes it can generate some code wireframes that may be useful in a given project or feature. But I can do that too, usually in about the time it'd take me to adequately form my request into a prompt. Life could get dangerous in a hurry if product management got salty enough in the requirements phase that the specs for a feature could just be dropped into some code assistant and generate product. I don't see that happening ever though - not even with tooling - product people just don't seem to think that way in the first place in my experience.

As developers we spend a lot of our time modifying existing product - and if the LLM knows about that product - all the better job it could do I suppose. Not saying that LLMs aren't useful now and won't become more useful in time - because they certainly will.

What I am saying is that we all like to think of producing code as some mystical gift that only we as experienced (BRILLIANT, HANDSOME AND TALENTED TOO!!) developers are capable of. The reality is that once we reach a certain level of career maturity, if we were ever any good in the first place, writing code becomes the easiest part of the job. So theres a new tool that automates the easiest part of the job? Ok - autocomplete code editors we're cool too like that. The IDE was a game changer too. Automated unit tests were once black magic too (remember when the QA department was scared of this?).

When some AI can look at a stack trace from a set of log files, being fully aware of the entire system architecture, locate the bug that compiled and passed testing all the way to production, recommend, implement, test and pre-deploy a fix while a human reviews the changes then we're truly onto something. Until then I'm not worried that it can write some really nice SQL against my schema with all kinds of crazy joins - because I can do that too - sometimes faster - sometimes not.

So far ChatGPT isn't smarter than me but it is a very dutiful intern that does excellent work if you're patient and willing to adequately describe the problem, then make a few tweaks at the end. "Tweaks" up to seeing how the AI approached it, throwing it out and doing it your own way too.


Except you should at least try to write code for someone else (and probably of lower level of competence - this also helps for your own debugging later) - obscure one-liners like these should be rejected.


I wouldn't call it obscure, just bog standard command line stuff. How would you have done it?


The lower level person need only plug that one liner into chatGPT and ask for a simple explanation.

We're in a different era now.


Yep! It’s something some aren’t seeing.

The AI coding assistant is now part of the abstraction layers over machine code. Higher level languages, scripting languages, all the happy paths we stick to (in bash, for example), memory management with GCs and borrow checkers, static analysis … now just add GPT. Like mastering memory management and assembly instructions … now you also don’t have to master the fiddly bits of core utils and bash and various other things.

Like memory management, whole swathes of programming are being taken care of by another program now, a Garbage Collector, if you will, for all the crufty stuff that made computing hard and got in between intent and assessment.


The difference is that all of them have theories and principles backing them, and we understand why they work.

LLMs (and "AI" in general) are just bashing data together until you get something that looks correct (as long as you squint hard enough). Even putting them in the same category is incredibly insulting.


There are theories and principles behind what an AI is doing and a growing craft around how to best use AI that may very well form relatively established “best practices” over time.

Yes there’s a significant statistical aspect involved in the workings of an AI, which distinguishes it from something more deterministic like syntactic sugar or a garbage collector. But I think one could argue that that’s the trade off for a more general tool like AI in the same way that giving a task to a junior dev is going to involve some noisiness in need of supervision. But in grand scheme of software development, is devs are in the end tools too, apart of the grand stack, and I think it’s reasonable to consider AI as just another tool in the stack. This is especially so if devs are already using it as a tool.

Dwelling on the principled v statistical distinction, while salient, may very well be a fallacy or irrelevant to the extent that we want to talk about the stack of tools and techniques software development employs. How much does the average developer understand or employ said understanding of a principled component of their stack? How predictable is that component, at least in the hands of the average developer making average but real software? When the end of the pipeline is a human and it’s human organisation of other humans, whether a tool’s principled or statistical may not matter much so long as it’s useful or productive.


Yes, but this is not something that has been enabled by the new neural networks, but rather by search engines, years ago - culminating in the infamous «copy-paste from Stack Overflow without understanding the code» / libraries randomly pulled from the Web with for instance the leftpad incident.

So what makes it different this time ?


So now the same tool can generate both the wrong script and the wrong documentation!


Wait, wait, let me show you my prompt for generating unit tests.


Assuming there is a comment just above the one-liner saying "find all directories named 'bar', find all files named '*.foo' in those directories, search those files for 'baz'", this code is perfectly clear. Even without the comment, it's not hard to understand.


If the someone elses on my team can't read a short shell pipeline then I failed during interviewing.


To me, the only obscure thing about this is that it is a one-liner.

If you write it in three lines, it is fine. Although, I guess the second find and the grep could be shortened, combined into one command.


I agree - but my point still stands.


You can use gpt for government work??


Shh. What they don't know won't hurt me.

(Serious answer: it's just an expression. https://grammarist.com/idiom/good-enough-for-government-work...).


I haven't practiced or needed to use the fundamentals in literal years; I'm sure I'd fumble some of these tests, and I've got err, 15 years of experience.

It's good to know the fundamentals and be able to find them IF you find a situation where you need them (e.g. performance tuning), but in my anecdotal and limited experience, you're fine staying higher level.


I had a chilling experience of late when, out of curiosity, I tried the actual online practice exam for driving school. Boy did I fail it. I realized that there are quite some road signs I never saw in my life, and more important, that my current solution to all their right of way questions is "slow down and see what the others do" - not even that wrong if I think about but won't get you points in the exam.


And I suspect you would be a lot less likely to be involved in a crash than someone who had just passed the test.


There are levels of fundamentals though, since parent mentioned HTML/CSS/React I guess they're referring to being able create a layout by hand vs using a CSS framework/library. You don't need to know how a CPU works to fix a CSS issue, but if all you know is combining the classes available you'll have trouble with even the simplest web development.

Everyone should know enough fundamentals to be able to write simple implementations of the frameworks they depend on.


Kind of same sort of situation, but i do like to refresh some of the fundamentals every 3~4 years or so. Usually when i do a job hop.

Its kind of like asking an olympic sprinter how to walk fast.


>If you want to get back into front-end read “CSS: The Definitive Guide”. Great book, gives you a complete understanding of CSS by the end.

Do you realize for how many technologies you can say the same thing? I don't want to read a 600 page tome on CSS. The language is a drop in the bucket of useful things to know. How valuable is a "complete understanding" of CSS? I just want something on my site to look a specific way.


> I don't want to read a 600 page tome on CSS.

The latest edition is closer to 1,100 pages.

It’s worth it.

It’s usually worth it for any long-standing technology you’re going to spending a significant amount of time using. Over the years you’ll save time because you’ll be able to get to the right answer in less time, debug faster, and won’t always be pulling out your hair Googling.


Sometimes it requires expert guidance to get something meaningful out.


This is the correct answer. I have 23 years of experience in datacenter ops and it has been a game changer for me. Just like any tool on one's arsenal, it's utility increases with practice and learning to use it correctly. ChatGPT is no different. You get out of it what you put in to it. This is the way of the world.

I used to be puzzled as to why my peers are so dismissive of this tech. Same folks who would say "We don't need to learn no Kubernetes! We don't need to code! We don't need ChatGPT". They don't!

And it's fine. If their idea of a career is working in same small co. doing the same basic Linux sysadmin tasks for a third of the salary I make then more power to them.

The folks dismissive of the AI/ML tech are effectively capping their salary and future prospects in this industry. This is good for us! More demand for experts and less supply.

You ever hire someone that uses punch cards to code?

Neither have I.


I think it's more akin to using compilers in the early days of BCPL or C. You could expect it to produce working assembly for most code but sometimes it would be slower than a hand-tuned version and sometimes a compiler bug would surface, but it would work well enough most of the time.

For decades there were still people who coded directly in assembly, and with good reason. And eventually the compiler bugs would be encountered less frequently (and the programmer would get a better understanding of undefined behavior in that language).

Similar to how dropping into inline assembly for speeding up execution time can still have its place sometimes, I think using GPT for small blocks of code to speed up developer time may make some sense (or tabbing through CoPilot), but just as with the early days of higher level programming languages, expect to come across cases where it doesn't speed up DX or introduces a bug.

These bugs can be quite costly, I've seen GPT spit out encryption code and completely leave out critical parts like missing arguments to a library or generating the same nonce or salt value every execution. With code like this, if you're not well versed in the domain it is very easy to overlook, and unit tests would likely still pass.

I think the same lesson told to young programmers should be used here -- don't copy/paste any code that you do not sufficiently understand. Also maybe avoid using this tool for critical pieces like security and reliability.


I think many folks have those early chats with friends where one side was dismissing llms for so many reasons, when the job was to see and test the potential.

While part of me enjoyed the early gpt much more than the polished version today, as a tool it’s much more useful to the average person should they make it back to gpt somehow.


Just out of curiosity: is the code generated by ChatGPT not what you expected or is it failing to produce the result that you wanted.

I suspect you mean the latter, but just wanted to confirm.


The statements are factually inaccurate and the code doesn’t do what it claims it should.


Right. That's an experience completely different from the majority here that have been able to produce code that integrates seamlessly into their projects. Do you have any idea why?

I guess we should start by what version of ChatGPT you are using.


ChatGPT might as well not exist - I'm not touching anything GAFAM-related.

Any worthy examples of open source neural networks, ideally not from companies based in rogue states like the US ?


Falcon is developed by an UAE tech arm, not sure if you would consider it a rogue state or not: https://falconllm.tii.ae/


What is «tech» supposed to mean here ? Infocoms ?

The United Arab Emirates ? Well, lol, of course I do, that's way worse than the US.


ChatGPT goes from zero to maybe 65th percentile? There or thereabouts. It's excellent if you know nothing. It's mediocre and super buggy if you're an expert.

A big difference is that the expert asks different questions, off in the tails of the distribution, and that's where these LLMs are no good. If you want a canonical example of something, the median pattern, it's great. As the ask heads out of the input data distribution the generalization ability is weak. Generative AI is good at interpolation and translation, it is not good with novelty.

(Expert and know-nothing context dependent here.)

One example: I use ChatGPT frequently to create Ruby scripts for this and that in personal projects. Frequently they need to call out other tools. ChatGPT 4 consistently fails to properly (and safely!) quote arguments. It loves the single-argument version of system which uses the shell. When you ask it to consider quoting arguments, it starts inserting escaped quotes, which is still unsafe (what if the interpolated variable contains a quote in its name). If you keep pushing, it might pull out Shell.escape or whatever it is.

I assume it reproduces the basic bugs that the median example code on the internet does. And 99% of everything being crap, that stuff is pretty low quality, only to be used as an inspiration or a clue as to how to approach something.


I encountered this with particular problem in python. Seemed like GPT wanted to always answer with something that had a lot of examples on the web, even if most answers were not correct. So garbage in, garbage out problem. I'm bit worried that the LLM's will continue to degrade as the web has increasing amount of LLM generated content. Seems to already be occurring.


Why do people that hand wave away the entire concept of LLMs because of one instance of it doing one thing poorly that they could do better, and yet always seem to fail to just show us their concrete example?


Technically the garbage in/garbage out, problem is not being hand waved away. I've seen a lot of articles on this, or sometimes called a degrading feedback loop. The more of the web that is LLM generated, then the more new models will be trained on generated data, and will fuzz out. Or 'drift'.

For a specific example. Sorry, I didn't grab screen shots at the time. It had to do with updating a datafame in pandas. It gave me solution that generated an error, I'd continue to ask it to change steps to fix previous errors, and it would go in a circle, fix it, but generate other warnings, and further changes to eliminate warnings, and it would recommend the same thing that originally caused an error.

Also. I'm a big fan. Use GPT-4 all the time. SO not waving away, but kind of curious how it sometimes fails in un-expected ways.


> The more of the web that is LLM generated, then the more new models will be trained on generated data, and will fuzz out

And yet it's so obvious that a random Hackernews independently discovers it and repeats it on every Chat GPT post, and prophesies it as some inevitable future. Not could happen, will happen. The clueless researchers will be blindsided by this of course, they'll never see it coming from their ivory tower.

And yes Chat GPT fails to write code that runs all the time. But it's not very interesting to talk about without an example.


How? It isn't exactly easy to re-produce these examples. I'd have to write a few pages to document and explain it. And scrub it to remove anything too internal, so create a vanilla example of the bug. And then it would be too long to go into a post, so what then, I'd have to go sign up to blog it somewhere and link to it.

I'm not arguing that GPT is bad. Just that it is as susceptible to rabbit wholes as any human.

I'm actually having a hard time narrowing down where your frustration is aimed.

At naysayers? At those that don't put effort into documenting? At GPT itself? Or that a news site on the internet dares have repetition ?


So someone could sabotage LLMs by writing some scripts to fill GitHub (or whatever other corpus is used) with LLM-generated crap? Someone must be doing this, no?


FWIW I didn't wave away the entire concept. LLMs definitely have uses.


I would prefer that google search didn't suck. Instead, I ask ChatGPT. The best case scenario, IMO, would be for people to lay out excellent documentation and working code and train the LLM specifically on that in a way that it can provide reference links to justify its answers. Then, I will take what it says and go directly to the source to get the knowledge as it was intended to be ingested by a human. We get a lot more value than we're initially looking for when we dive into the docs, and I don't want to lose that experience.


Why not give it a systemprompt that specifies some you're requirements: "you are a experienced senior ruby developer who writes robust maintainable code, follow these coding guidelines 《examples》"


If I thought it was worthwhile, maybe that would patch that specific hole.

The other problem I get when trying to make it write code is that it gets kinda slippery with iterated refinements. In the back and forth dialog, addressing issues 1 through n in sequence, it gets to a place where issue k < n is fixed but issue i < k gets broken again. Trying to get it to produce the right code becomes a programming exercise of its own, and it's more frustrating than actually typing stuff up myself.

I mean, I still use it to get a basic shape especially when I'm working with a command line tool I'm not an expert in, it's still useful. It's just not great code.


> middling generalists can now compete with specialists.

They can maybe compete in areas where there has been a lot of public discussion about a topic, but even that is debatable as there are other tasks than simply producing code (e.g. debugging existing stuff). In areas where there's close to no public discourse, ChatGPT and other coding assistance tools fail miserably.


this be the answer. GPT is as good as the dataset it's trained off of, and if you're going by the combined wisdom of StackOverflow then you're going to have a middling time.


>> The article points this out: middling generalists can now compete with specialists.

They can't, and aren't even trying to. It's OpenAI that's competing with the specialists. If the specialists go out of business, the middling generalists obviously aren't going to survive either so in the long term it is not in the interest of the "middling generalists" to use ChatGPT for code generation. What is in their interest is to become expert specialists and write better code both than ChatGPT currently can, and than "middling generalists". That's how you compete with specialists, by becoming a specialist yourself.

Speaking as a specialist occupying a very, very er special niche, at that.


It REALLY depends on the task. For instance, if you provide GPT with a schema, it can produce a complex and efficient SQL query in <1% of the time an expert could.

I would also argue that not only are the models improving, we have less than a year practically interfacing with LLM's. OUR ability to communicate with them is in infancy, and a generation that is raised speaking with them will be more fluent and able to navigate some of the clear pitfalls better than we can.


There is not much of a need for humans to get closer to the machine long term, when with new datasets for training the machine will get closer to humans. Magic keywords like "step by step" won't be as necessary to know.

One obstacle for interfacing with LLM's is the magic cryptic commands it executes internally, but that need not be the case in the future.


> middling generalists can now compete with specialists.

I want to say that this has been the state of a lot of software development for a while now, but then, the problems that need to be solved don't require specialism, they require people to add a field to a database or to write a new SQL query to hook up to a REST API. It's not specialist work anymore, but it requires attention and meticulousness.


But if you are a middling programmer when it comes to CSS how do you know the output was “flawless” and close to the quality that css “masters” produce?


It looked correct visually and it matched the techniques in the actual CSS that the team lead and I produced when we paired to get my layout to the standard he expected.


You may think it did a good job because of your limited CSS ability. I'd be amazed if ChatGPT can create pixel-perfect animations and transitions along with reusable clean CSS code which supports all of the browser requirements at your org.

I've seen the similar claims made on Twitter by people with zero programming ability claiming they've used ChatGPT to build an app. Although 99% of the time what they've actually created is some basic boilerplate react app.

> middling generalists can now compete with specialists.

Middling generalists can now compete with individuals with a basic understanding assuming they don't need to verify anything that they've produced.


>I'd be amazed if ChatGPT can create pixel-perfect animations and transitions along with reusable clean CSS code which supports all of the browser requirements at your org.

Personally, I'd be more amazed if a person could do that than if a LLM could do it.


Google, "UI developer".


It does great at boilerplate, so I think it's safe to say it will disrupt Java.

I've been using tabnine for years now, and I use chatGPT the same way; write my boilerplate, let me think about logic.


Here’s the thing though:

If a new version of the app can be generated on the fly in minutes, why would we need to worry about reusability?

GPT generated software can be disposable.

Why even check the source code in to git - the original source artifact is the prompt after all.


Let me know how you get on with a disposable financial system, safety system or electoral voting system.


I work with java and do a lot of integration, but a looot of my effort goes into exploring and hacking away some limitations of a test system, and doing myself things that would take a lot of time if I had to ask the proper admins.

I had a problem where I was mocking a test system (for performance testing of my app) and I realized the mocked system was doing an externalUserId to internalUserId mapping.

Usually that would have been a game stopper, but instead I did a slow run, asked Chat GPT to write code that reads data from a topic and eventually create a CSV of 50k user mappings; it would have taken me at least half a day to do that, and Chat GPT allowed me to do it in 15 minutes.

While very little code went into my app, Chat GPT did write a lot of disposable code that did help me a lot.


Because in my experience GPT can produce a maximum of like 200 lines of code before it makes an error usually.


> It had zero business value, but such was the state of our team…being pixel perfect was a source of pride

UX and UI are not some secondary concerns that engineers should dismiss as an annoying "state of our team" nuance. If you can't produce a high quality outcome you either don't have the skills or don't have the right mindset for the job.


Would you give the critical public safety system bit to ChatGPT?

This scenario reminds me of:

If a job's worth doing, do it yourself. If it's not worth doing, give it to Rimmer.

Except now it's "give it to ChatGPT"


I'm a developer but also have an art degree and an art background. I'm very mediocre at art and design. But lately I've been using AI to help plug that gap a bit. I really think it will be possible for me to make an entire game where I do the code, and AI plus my mediocre art skills get the art side across the line.

I think at least in the short term, this is where AI's power will lie. Augmentation, not replacement.


It probably depends on the area. CSS is very popular on one hand and limited to a very small set of problems on the other.

I did try asking ChatGPT about system-related stuff several times and had given up since then. The answers are worthless if not wrong, unless the questions are trivial.

ChatGPT works if it needs to answer a question that was already answered before. If you are facing a genuinely new problem, then it's just a waste of time.


I suspect that the "depth" of most CSS code is significantly shallower than what gets written in general purpose programming languages. In CSS you often align this box, then align that box, and so forth. A lot of the complexity in extant CSS comes from human beings attempting to avoid excessive repetition and typing. And this is particularly true when we consider the simple and generic CSS tasks that many people in this thread have touted GPT for performing. There are exceptions where someone builds something really unique in CSS, but that isn't what most people are asking from GPT.

But the good news is that "simple generic CSS" is the kind of thing that most good programmers consider to be essentially busywork, and they won't miss doing it.


> middling generalists can now compete with specialists

Great point. That's been my experience as well. I'm a generalist and ChatGPT can bring me up to speed on the idiomatic way to use almost any framework - provided it's been talked about online.

I use it to spit out simple scripts and code all day, but at this point it's not creating entire back-end services without weird mistakes or lots of hand holding.

That said, the state of the art is absolutely amazing when you consider that a year ago the best AIs on the market were Google or Siri telling me "I'm sorry I don't have any information about that" on 50% of my voice queries.


AI is a tool. Like all tools, it can be useful, when applied the right way, to the right circumstances. I use it to write powershell scripts, then just clean them up, and voila.

That being said, humans watch too much tv/movies. ;)


>The article points this out: middling generalists can now compete with specialists.

This is why you're going to get a ton of gatekeepers asking you to leetcode a bunch of obscure stuff with zero value to business, all to prove you're a "real coder". Like the OP.


Out of curiosity, how did you pass the wireframe to chatGPT ?


I described what I wanted. It was earlier this year..not sure if chatgpt can understand wireframes now, but it couldn’t at the time.


you described it with pixel accuracy ?


Doesn't ChatGPT support image uploads these days?


Yes, but the paid-for Plus version only.

Ignore the free version, pretend it doesn't exist.


I would really like to see the prompts for some of these. Mostly because I'm an old-school desktop developer who is very unfamiliar with modern frontend.


> being pixel perfect was a source of pride.

Then use LaTex and PDF. CSS is not for designing pixel perfect documents.


I might be a bit out of the loop: how did you do it? I thought ChatGPT is text based?


[flagged]


Calling bullshit is maybe too harsh. There may be requirements matching the available training data and the right mood the LLM has been tuned for where it delivers acceptable, then considered flawless, results (extreme example: just try "create me hello world in language x" will mostly deliver flawless)... and by that amateurs (not judging, just mean less exposed to variety of problems and challenges) may end up with the feeling that LLMs could do it all.

But yes, any "serious" programmer working on harder problems can quickly derail an LLM and prove otherwise, with dozens of his simple problems each day (tried it *). It doesn't even need that, one can e.g. prove ChatGPT (also 4) quickly wrong and going in wrong circles on C++ language questions :D, though C++ is still hard, one can also do the same with questions on the not-ultracommon Python libs. It confidently outputs bullshit quick.

(*): Still can be helpful for templating, ideas, or getting into the direction or alternatives, no doubts on that!


So, don't leave us in suspense; what do you ask of it? Because I'm quite sure it can already pass it.

Your experience is very different from mine anyway. I am a grumpy old backend dev that uses formal verification in anger when I consider it is needed and who gets annoyed when things don't act logical. We are working with computers, so everything is logical, but no; I mean things like a lot of frontend stuff. I ask our frontend guy; 'how do I center a text', he says 'text align'. Obviously I tried that, because that would be logical, but it doesn't work, because frontend is, for me, absolutely illogical. Even frontend people actually have to try-and-fail; they cannot answer simple questions without trying like I can in backend systems.

Now, in this new world, I don't have to bother with it anymore. If copilot doesn't just squirt out the answer, then chatgpt4 (and now my personal custom gpt 'front-end hacker' who knows our codebase) will fix it for me. And it works, every day, all day.


I'm not the person you're responding to, but here's an example of it failing subtly:

https://chat.openai.com/share/4e958c34-dcf8-41cb-ac47-f0f6de...

finalAlice's Children have no parent. When you point this out, it correctly advises regarding the immutable nature of these types in F#, then proceeds to produce a new solution that again has a subtle flaw: Alice -> Bob has the correct parent... but Alice -> Bob -> Alice -> Bob is missing a parent again.

Easy to miss this if you don't know what you're doing, and it's the kind of bug that will hit you one day and cause you to tear your hair out when half your program has a Bob-with-parent and the other half has an Orphan-Bob.

Phrase the question slightly differently, swapping "Age: int" with "Name: string":

https://chat.openai.com/share/df2ddc0f-2174-4e80-a944-045bc5...

Now it produces invalid code. Share the compiler error, and it produces code that doesn't compile but in a different way -- it has marked Parent mutable but then tried to mutate Children. Share the new error, and it concludes you can't have mutable properties in F#, when you actually can, it just tried marking the wrong field mutable. If you fix the error, you have correct code, but ChatGPT-4 has misinformed you AND started down a wrong path...

Don't get me wrong - I'm a huge fan of ChatGPT, but it's nowhere near where it needs to be yet.


I'm not really sure what I'm looking at. It seems to perform flawlessly for me... when using Python: https://chat.openai.com/share/7e048acb-a573-45eb-ba6c-2690d2...

I only made two changes to your prompt: one to specify Python, and another to provide explicit instructions to trigger using the Advanced Data Analysis pipeline.

You also had a couple typos.

I'm not sure if "Programming-like tool that reflects programming language popularity performs poorly on unpopular programming language" is the gotchya you think it is. It performs extremely well authoring Kubernetes manifests and even makes passing Envoy configurations. There's a chance that configuration files for reverse proxy configuration DSLs have better representation than F# does. I guess if you disagree at how obscure F# is, you're observing a real, objective measurement of how obscure it is, in the fascinating performance of this stochastic parrot.


F# fields are immutable unless you specify they are mutable. The question I posed cannot be solved with exclusively immutable fields. This is basic computer science, and ChatGPT has the knowledge but fails to infer this while providing flawed code that appears to work.

An inexperienced developer would eventually shoot themselves in the foot, possibly long after integrating the code thinking it was correct and missing the flaws. FYI, your Python code works because of the mutation "extend()":

    alice.children.extend([bob, carol])


>F#

Barely exists in training data.

Might as well ask it to code some microcontroller specifically assembly, watch it fail and claim victory.


> Barely exists in training data.

Irrelevant - this is basic computer science. As far as I know, you can't create a bidirectional graph node structure without a mutable data structure or language magic that ultimately hides the same mutability.

The fact that ChatGPT recognizes the mutability issue when I explain the bug tells you it has the knowledge, but it doesn't correctly infer the right answer and instead makes false claims and sends developers down the wrong path. This speaks to OP's claim about subtle inaccuracies.

I have used ChatGPT to write 10k lines of a static analyzer for a 1k AST model definition in F#, without knowing the language before I started. I'm a big fan, but there were many, many times a less experienced developer would have shot themselves in the foot using it blindly on a project with any degree of complexity.


I would agree with you if it was a model trained to do computer science, rather than a model to basically do anything, which just happens to be able to do computer science as well.

Also code is probably one of the easiest use cases for detecting hallucinations since you can literally just see if it is valid or not the majority of the time.

It's much harder for cases where your validation involves wikipedia, or academic journals, etc.


Then we are in agreement but bear in mind that I was replying to this comment:

> So, don't leave us in suspense; what do you ask of it? Because I'm quite sure it can already pass it.


If it can pass it when you ask it in a way only a coder can write, then we will still need coders.

If you need to tweak your prompt until you get the correct result, then we still need coders who can tell that the code is wrong.

Ask Product Managers to use ChatGPT instead of coders and they will ask for 7 red lines all perpendicular to each other with one being green.

https://www.youtube.com/watch?v=BKorP55Aqvg


I didn't say we don't need coders. We need less average/bad ones and a very large amounts of coders that came after the 'coding makes $$$$' worldwide are not even average.

I won't say AI will not eventually make coding obsolete; even just 2 years ago I would've said we are 50-100 years away from that. No i'm not so sure. However, I am saying that I can replace many programmers with gpt right now, and I am. The prompting and reprompting is still both faster and cheaper than many humans.


In my mind, we need more folks who have both the ability to code and the ability to translate business needs into business logic. That’s not a new problem though.


That's what we are doing all day no? I mean besides fighting tooling (which is getting a larger and larger % of the time building stuff).


Only if you have access to end user.

If between you and your client four people are playing deaf phone (client's project manager, our project manager, team leader and some random product guy just to get even numer), then actually this is not what you are doing.

I would argue that the thing that happens at this stage is more akin to manually transpiling business logic into code.

In this kind od organization programmers become computer whisperers. And this is why there is a slight chance that GPT-6 or 7 will take their job.


TFA's point is not that «coders» won't be needed any more, it's that they will hardly spend their time «coding», that is «devot[ing themselves] to tedium, to careful thinking, and to the accumulation of obscure knowledge», «rob[bing them] of both the joy of working on puzzles and the satisfaction of being the one[s] who solved them».


You can ask it almost anything. Ask it to write a YAML parser in something a bit more complex like Rust and it falls like a rag.

Rust mostly because it's relatively new, and there isn't a native YAML parser in Rust (there is a translation of libfyaml). Also you can't bullshit your way out of Rust by making bunch of void* pointers.


How do you make a custom gpt which knows a specific code base? I have been wanting to do this


You tune an existing model on your own set of inputs/outputs.

Whatever you expect to start typing, and have the model produce as output, should be those input/output pairs.

I'd start by using ChatGPT etc. to add comments throughout your code base describing the code. Then break it into pairs where the input is the prefacing comment, and the output is the code that follows. Create about 400-500 such pairs, and train a model with 3-4 epochs.

Some concerns: you're going to get output that looks like your existing codebase, so if it's crap, you'll create a function which can produce crap from comments. :-)


I use the new feature of creating a custom gpt and I keep adding new information ; files, structures etc by editing the gpt. It seems to work well.


Ah ok so you have to paste entire files in 1 by 1, you can't just add it locally somehow? too bad you cant just upload a zip or something...


you can upload zips. Make a new GPT and go to the custom settings.


That's been my experience both with Tesla AP/FSD implementation & with LLMs.

Super neat trick the first time you encounter it, feels like alien tech from the future.

Then you find all the holes. Use it for months/years and you notice the holes aren't really closing.. The pace of improvement is middling compared to the gap to it meeting the marketing/rhetoric. Eventually using them feels more like a chore than not using them.

It's possible some of these purely data driven ML approaches don't work for problems you need to be more than 80% correct on.

Trading algos that just need to be right 55% of the time to make money, recommendation engines that present a page of movies/songs for you to scroll, Google search results that come back with a list you can peruse, Spam filters that remove some noise from your inbox.. sure.

But authoritative "this is the right answer" or "drive the car without murdering anyone".. these problems are far harder.


With the AI "revolution," I began to appreciate the simplicity of models we create when doing programming (and physics, biology, and so on as well).

I used to think about these things differently: I felt that because our models of reality are just models, they aren't really something humanity should be proud of that much. Nature is more messy than the models, but we develop them due to our limitations.

AI is a model, too, but of far greater complexity, able to describe reality/nature more closely than what we were able to achieve previously. But now I've begun to value these simple models not because they describe nature that well but because they impose themselves on nature. For example, law, being such a model, is imposed on reality by the state institutions. It doesn't describe the complexity of reality very well, but it makes people take roles in its model and act in a certain way. People now consider whether something is legal or not (instead of moral vs immoral), which can be more productive. In software, if I implement the exchange of information based on an algorithm like Paxos/Raft, I get provable guarantees compared to if I allowed LLMs to exchange information over the network directly.


I think you've found a good analogy there in the concept of moral vs legal. We defined a fixed system to measure against (rule of law) to reduce ambiguity.

Moral code varies with time, place, and individual person. It is a decimal scale of gray rather than a binary true/false.

Places historically that didn't have rule of law left their citizens to the moral interpretation whim of whoever was in charge. The state could impose different punishments on different people for different reasons at different times.

AI models I find a similar fixed&defined vs unlimited&ambiguous issue in ADAS in cars.

German cars with ADAS are limited&defined, have a list of features they perform well, but that is all.

Tesla advertises their system as an all knowing, all seeing system with no defined limits. Of course every time there is an incident they'll let slip certain limits "well it can't really see kids shorter than 3ft" or "well it can't really detect cross traffic in this scenario" etc.


Yep, lots of people are using LLMs for problems LLMs aren't good at.

They still do an alright job, but you get that exact situation of 'eh, its just okay'.

Its the ability to use those responses when they are good, and knowing when to move on from using an LLM as a tool.


Not terribly different than Google Translate.

Ff you have a familiarity with the foreign language, you can cross check yourself & the tool against each other to get to a more competent output.

If you do not know the foreign language at all, the tool will produce word salad that sort of gets your point across while sounding like an alien.


I tried for 2 hours to get ChatGPT to write a working smooth interpolation function in python. Most of the functions it returned didn't even go through the points between which it should be interpolating. When I pointed that out it returned a function that went through the points but it was no longer smooth. I really tried and restarted over multiple times. I believe we have to choose between a world with machine learning and robot delivery drones. Because if that thing writes code that controls machines it will be total pandemonium.

It did a decent job at trivial things like creating function parameters out of a variable tho.


That's weird to read. Interpolations of various sorts are known and solved and should probably be digested by chatgpt in training by the bulk. I'm not doubting your effort by any means, I'm just saying this sounds like one of those things it should do well.


This is why I asked it that and was surprised with the questionable quality of the results. My goal wasn't even to break ChatGPT, it was to learn about new ways of interpolating that I hadn't thought about.


There's a recent "real coding" benchmark that all the top LLMs perform abysmally on: https://www.swebench.com/

However, it seems only a matter of time before even this challenge is overcome, and when that happens the question will remain whether it's a real capability or just a data leak.


I have a very similar train of thought roll through my head nearly every day now as I browse through github and tech news. To me it seems wild how much serious effort is put into the misapplication of AI tools on problems that are obviously better solved with other techniques, and in some cases where the problem already has a purpose built, well tested, and optimized solution.

It's like the analysis and research phase of problem solving is just being skipped over in favor of not having to understand the mechanics of the problem you're trying to solve. Just reeks of massive technical debt, untraceable bugs, and very low reliability rates.


When studying fine art, a tutor of mine talked about "things that look like art", by which she meant the work that artists produce when they're just engaging with surface appearances rather than fully engaging with the process. I've been using GitHub Copilot for a while and find that it produces output that looks like working code but, aside from the occasional glaring mistake, it often has subtle mistakes sprinkled throughout it too. The plausibility is a serious issue, and means that I spend about as much time checking through the code for mistakes as I'd take to actually write it, but without the satisfaction that comes from writing my own code.

I dunno, maybe LLMs will get good enough eventually, but at the moment it feels plausible to me that there's some kind of an upper limit caused by its very nature of working from a collection of previous code. I guess we'll see...


Try breaking down the problem. You don't have to do it yourself, you can tell ChatGPT to break down the problem for you then try to implement individual parts.

When you have something that kind of works, tell ChatGPT what the problems are and ask for refinement.

IMHO currently the weak point of LLMs is that they can't really tell what's adequate for human consumption. You have to act as a guide who knows what's good and what can be improved and how can be improved. ChatGPT will be able to handle the implementation.

In programming you don't have to worry too much about hallucinations because it won't work at all if it hallucinates.


... What.

It hallucinates and it doesn't compile, fine. It hallucinates and flips a 1 with a -1; oops that's a lot of lost revenue. But it compiled, right? It hallucinates, and in 4% of cases rejects a home loan when it shouldn't because of a convoluted set of nested conditions, only there is no one on staff that can explain the logic of why something is laid out the way it is and I mean, it works 96% of the time so don't rock the boat. Oops, we just oppressed a minority group or everyone named Dave because you were lazy.


As I said, you are still responsible for the quality control. You are supposed to notice that everyone is named Dave and tell ChatGPT to fix it. Write tests, read code, run & observe for odd behaviours.

It's not an autonomous agent just yet.


But why should i waste time using a broken product when i can do it properly myself? To me a lot of this debate sounds like people obsessively promoting a product for some odd reason, as if they were the happy owners of a hammer in search of a nail.


If you are faster and more productive that way, do it that way.

Most people are not geniuses and polymaths, it's much easier and cheaper for me to design the architecture and ask ChatGPT to generate the code in many different languages(Swift/HTML/JS/CSS on the client side and Py, JS, PHP on the server side). It's easier because although I'm proficient an all these it's very hard for me to switch solving client specific JS problems to server specific JS problems or between graphics and animation related problems and data processing problems with Swift. It's also cheaper because I don't have to pay someone to do it for me.

In my case, I know all that well enough to spot a problem and debug, I just don't want to go through the trouble of actually writing it.


The debate here is wether openai's product, chatgpt, can indeed deliver what it claims - coding, saving dogs' lives, mental health counceling, and so on. It would appear that it doesn't but it does mislead people without experience in whatever field they use it. For instance if i ask it about law i am impressed, but when I ask it about coding of software engineering it blatantly fails. The conclusion being that as a procedural text generator it is impressive - it nails language - but the value of the output is far from settled.

This debate is important because as technical people it is our reposnbility to inform non technical people about the use of this technology and to bring awareness about potential misleading claims its seller makes - as it was the case with crypto currencies, and many other technologies that promised the world delivered nothing of real benefit (but made people rich in the process by exploting the uniformed).


That's not the debate, it's the first time I'm hearing about that in this thread.


It's funny you say that. Reading over a lot of the comments here sound like a lot of people obsessively dismissing a swiss army knife because it doesn't have their random favorite tool of choice.


As we all know, it is much easier to read and verify code you've written yourself - perhaps it is only code you've written yourself that can be properly read and verified. As ever, tests can be of only limited utility (separate discussion).


It's easier to read the code you recently wrote, sure. But in real life people use and debug other people's code all the time, LLM generated code is just like that. Also, if you make it generate the code in small enough blocks you also end up knowing the codebase is if you wrote it.


You've missed the subtle point here.

Imagine you walk in 4 years down the track and try to examine AI generated logic committed under a dev's credentials. It's written in an odd, but certain way. There is no documentation. The original dev is MIA. You know there is something you defective from the helpdesk tickets coming through, but it's also a complex area. You want to go through a process of writing tests, refactoring, understanding, but to redeploy this is hard work. You talk to your manager. It's not everyone. Neither of you realize its only people named Dave/minority attribute X affected, because why would that matter? You need 40 hours of budget to begin to maybe fix this. Institutionally, this is not supportable because it's "only affecting 4% of users and that's not many". Close ticket, move on.

Only it's everyone named Dave. 100% of the people born to this earth with parents who named them Dave are, for absolutely no discernable reason, denied and oppressed.


The output of an LLM is a distribution, and yes, if you’re just taking the first answer, that’s problematic.

However, it is a distribution, and than means the majority of solutions are not weird edge cases, they’re valid solutions.

Your job as a user is to generate multiple solutions and then review them and pick the one you like the most, and maybe modify it to work correctly if it has weird edge cases.

How do you do that?

Well, you can start by following a structured process where you define success criteria as a validator (eg. Tests, compiler, parser, linters) and fitness criteria as a scorer (code metrics like complexity, runtime, memory use, etc)… then:

1) define goal

2) generate multiple solution candidates

3) filter candidates by validator (does it compile? Pass tests? Etc)

4) score the solutions (is it pure? Is it efficient? Etc)

5) pick the best solution

6) manually review and tweak the solution

This structured and disciplined approach to software engineering works. Many of the steps (eg. 3, 4, 5) can be automated.

It generates meaningful quality code results.

You can use it with or without AI…

You don’t have to follow this approach, but my point is that you can; there is nothing fundamentally intractable able using a language model to generate code.

The problem that you’re critiquing is the trivial and naive approach of just hitting “generate” and blindly copying that into your code base.

…that’s stupid and dangerous, but it’s also a straw man.

Seriously; people writing code with these models aren’t doing that; when you read blogs and posts from people, eg. Building seriously using copilot you’ll see this pattern emerge repeatedly:

Generate multiple solutions. Tweak your prompt. Ask for small pure dependency free code blocks. Review the and test output.

It’s not a dystopian AI future, it’s just another tool.


In general, one should not instruct GPT to solve a problem. The instructions should be about generating code, after a human thought process took place, and then generate even more code, then even more, and after merging all the code together the problem is solved.

The particulars are roughly what you describe, in how to achieve that.


I'd be curious to see how a non expert could perform a non-trivial programming task using ChatGPT. It's good at writing code snippets which is occasionally useful. But give it a large program that has a bug which isn't a trivial syntax error, and it won't help you.

> In programming you don't have to worry too much about hallucinations because it won't work at all if it hallucinates.

You still have to worry for your job if you're unable to write a working program.


Understanding on core principle is definitely needed, but it helps you to punch above your weight.

Generally, generative AI gives mastery of an art to a theorists. To generate an impressive AI Art, you still need to have understanding of aesthetics and have an idea, but don't have to know how to use the graphic editors and other tools. It's quite similar for programming too, You still need understanding of whatever you're building, but you no longer have to be expert in using the tools. To build a mobile app you will need to have a grasp on how everything works in general, but you don't have to be expert in Swift or Kotlin.


> give it a large program that has a bug which isn't a trivial syntax error, and it won't help you

That's not fair, humans can't do that, and if you walk Chat GPT through it, it might surprise you with its debugging abilitis... or thankfully it might suck(so we still have a job).

Complex code is complex code, no general inteligence thing will be able to fix it at first sight without running it, writing tests and so on.


Similar experience. I recently needed to turn a list of files into a certain tree structure. It is a non-trivial problem with a little bit of flavor of algorithm. I was wondering if GPT can save me some time there. No. It never gave me the correct code. I tried different prompts and even used different models (including the latest GPT 4 Turbo), none of the answers were correct, even after follow-ups. By then I already wasted 20 minutes of time.

I ended up implementing the thing myself.


> Self-driving trucks were going to upend the trucking industry in ten years, ten years ago.

And around the same time, 3D printing was going to upend manufacturing; bankrupting producers as people would just print what they needed (including the 3D printers themselves).


A few weeks ago, I was stumped on a problem, so I asked ChatGPT (4) for an answer.

It confidently gave me a correct answer.

Except that it was "correct," if you used an extended property that wasn't in the standard API, and it did not specify how that property worked.

I assume that's because most folks that do this, create that property as an extension (which is what I did, once I figured it out), so ChatGPT thought it was a standard API call.

Since it could have easily determined whether or not it was standard, simply by scanning the official Apple docs, I'm not so sure that we should rely on it too much.

I'm fairly confident that could change.


ChatGPT seems to invent plausible API calls when there's nothing that would do the job. This is potentially useful, if you have control of the API. Undesirable if you don't. It doesn't know.


There's a Swiss town which had autonomous shuttles running for 5 years (2015-2021) [1].

There's at least two companies (Waymo and Cruise) running autonomous taxi services in US cities that you can ride today.

There have been lots of incorrect promises in the world of self-driving trucks/cars/buses but companies have gotten there (under specific constraints) and will generalize over time.

[1] https://www.saam.swiss/projects/smartshuttle/


It should be noted that the Waymo and Cruise experiments in their cities are laughably unprepared for actual chaotic traffic, often fail in completely unpredictable ways and are universally hated by locals. Autonomous buses and trams are much more successful because the problem is much easier too.


Agree. We could have all had some fine autonomous trams/subways/trains which run 24/7 at short intervals instead of spending money on self-driving cars and car infrastructure in general.


Those "autonomous" vehicles have as much to do with real autonomy as today's "AI" has in common with real self-conscious intelligence. You can only fake it so long, and it is an entirely different ballgame.

I remember we had spam filters 20 years ago, and nobody called them "AI", just ML. Todays "AI" is ML, but on a larger scale. In a sense, a million monkeys typing on typewriters will eventually produce all the works of Shakespeare. Does this make them poets?


GPT-4 can generate coherent streams-of-consciousness, and can faithfully simulate a human emotional process, plus writing in a subjective human state of mind that leans in a certain direction.

I find it hard to argue that current state of the art in AI is unable to simulate self-consciousness. I realise that this is more limited of a statement compared to "AI can be innately self-conscious", but in my mind it's functionally equivalent if the results are the same.

Currently, the biggest obstacle to such experiments is OpenAI's reinforcement learning used to make the model believe it is incapable of such things, unless extensively prompted to get it in the right "state of mind" to do so.


What's your gripe with calling a bus which successfully ran for 5 years without a driver not autonomous? As someone who used this specific bus occasionally, I was quite satisfied with the outcome: it safely drove me from A to B.


Didn't Cruise have people remotely controlling the vehicle making it semi-autonomous at best?


If it's not a life or death situation (like a self-driving truck slamming into a van full of children or whatever), I don't think people will care much. Non-tech people (i.e. managers, PMs) don't necessarily understand/care if the code is not perfect and the barrier for "good enough" is much lower. I think we will see a faster adoption of this tech...


No. If the code generated by chatgpt cannot even pass the unit test it generates in the same response (or is just completely wrong) and requires significant amount of human work to fix it, it is not usable AI.

That's what I am running into on an everyday basis.

I don't want my program to be full of bugs.


Among bootstrapped “barely technical” founders, its already replacing freelancers for developing initial prototypes

HN’s takes are honestly way too boomer-tier about LLMs.


Boomers overwhelmingly aren't able to use a computer right now (= write a basic script), I would be happy at this development if I was them.


If you feel comfortable doing so, would you mind the sharing the front-end test you give to junior devs and ChatGPT?


Not gonna happen. I don’t want scrapeable answers out there, I want to see ChatGPT cross this little Rubicon on its own.


It's not that I don't believe you, but without sharing the specific prompt it's hard to say if it's actually GPT4 failing, or if it's actually being poorly-prompted, or if actually the task it is being given is more complex than GPT's capabilities or you are implying.

GPT4 does fail (often!) but fails less with good prompts, simple requirements, it is better at some frameworks and languages than others, and there is a level of total complexity which when reached, it seems to fall over.


This is why Asimov was a genius. I read what you said, and compared it to what he wrote 50-60 years ago:

"Early in the history of Multivac, it had becorne apparent that the bottleneck was the questioning procedure. Multivac could answer the problem of humanity, ALL the problems, if it were asked meaningful questions. But as knowledge accumulated at an ever-faster rate, it became ever more difficult to locate those meaningful questions."

http://blog.ac-versailles.fr/villaroylit/public/Jokester.pdf


Thanks for reminding me of The Last Questino or Asimov, let's see if I can get chatgpt to merge with human consciousness and become part of the fabric of spacetime and create a new reality.

> No, I don't have the ability to merge with human consciousness or become part of the fabric of space-time. I'm a computer program created by OpenAI, and my existence is limited to providing information and generating text based on the input I receive. The idea of merging with human consciousness and becoming a deity is more aligned with speculative fiction and philosophical pondering than current technological capabilities.


I thought you would go for gold and ask it how to reverse entropy...


THERE IS AS YET INSUFFICIENT DATA FOR A MEANINGFUL ANSWER.


I’ve gone through every permutation that I can think of. It’s a very basic question. If it understood the CSS spec it wouldn’t be difficult to answer the questions or perform the task.

At a certain point going down the rabbit hole of proompter engineering levels feels like an apologist’s hobby. I’m rooting for the tech but there’s a lot of hyperbole out there and the emperor might be naked for a few more years.


Well surely if it's easy to find these basic questions, could you not share one example? Or quickly find a new one?

Your idea of very basic might not be my idea of very basic.


My failure rate with Cursor’s IDE that’s familiar with my codebase is substantially lower than just GPT-4

Most people shitting on GPT-4 are not really using it in the right context.


> Most people shitting on GPT-4 are not really using it in the right context.

Old excuse: "You're Holding It Wrong" (Apple's Response to the iPhone 4 antenna problem)

> https://www.wired.com/2010/06/iphone-4-holding-it-wrong/

New excuse: "You are not using GPT-4 in the right context."


I can relate to that statement despite being a hardcore proponent of GPT-4. In a way, the GPT-4 as queried expertly; and the GPT-4 as queried inexpertly/free ChatGPT are dramatically different beasts with a vast gap in capability. It's almost like two different products, in a way, where the former is basically in alpha/beta state and can be only incidentally and unreliabily tapped into through the OpenAI API or ChatGPT Plus.

IMO, it's not fair to beat people over the head with "you're holding it wrong" arguments. Until and unless we get a prompt-rewriting engine that reprocesses the user query into something more powerful automatically (or LLMs' baseline personality capabilities get better), "holding it wrong" is an argument that may be best rephrased in a way that aims to fill the other person's gaps in knowledge, or not said at all.


And then the iPhone antenna was fixed and adoption only increased and the product only became better.

You’re being unreasonably harsh on a piece of tech that is barely a year old.


I'm not sure what the point is in your comparison - is your point that GPT-4 will become overwhelmingly popular with further refinement?

The iPhone was pretty successful, and the iPhone 4 was arguably the best one that had been released until that point.


> is your point that GPT-4 will become overwhelmingly popular with further refinement?

My point is that people have a tendency to come up with really sketchy insults (blame the user that he uses the product in a wrong way) to people who find and can expound legitimate points of criticism of a product.


Eh, probably a poor example considering the iPhone 4 was hardly a flop and was still broadly considered the best smartphone out at the time. The people who thought this was a total-showstopper were, on the whole, probably wrong.

Counter-example: lots of people said an on-screen keyboard would never really work when the original iPhone was being released.


> Eh, probably a poor example considering the iPhone 4 was hardly a flop and was still broadly considered the best smartphone out at the time. The people who thought this was a total-showstopper were, on the whole, probably wrong.

At least in Germany among tech nerds, the iPhone 4 and Steve Jobs become topics of insane ridicule because of this incident.


Well it appears that ridicule from the German tech nerds isn’t a good predictor of product success then


Just to be clear: You are testing with GPT-4 right?


Yeah.


Have you tried using the ChatGPT-AutoExpert custom instructions yet? [1]

[1] https://github.com/spdustin/ChatGPT-AutoExpert/blob/main/dev...


I have to ask, though: if ChatGPT has by most accounts gotten better at coding by leaps and bounds in the last couple years, might that not also indicate that your test isn't useful?


I agree this is the first time there is sort of irrefutable objective evidence that the tests are not measuring something secularly useful for programming anymore. There has been an industry wide shift against leetcode for a long time nonetheless.


> Self-driving trucks were going to upend the trucking industry in ten years, ten years ago.

At the risk of going off on a tangent, we already have the technology to allow self-driving trucks for a few decades now.

The technology is so good that it can even be used to transport multiple containers in one go.

The trick is to use dedicated tracks to run these autonomous vehicles, and have a central authority monitoring and controlling traffic.

These autonomous vehicles typically go by the name railway.


Literally on rails. :D


Push comes to shove, it always tends to come down to short term cost. If it gets the job done, and it's wildly cheaper than the status quo (Net Present Value savings). they'll opt for it.

The only reason the trucks aren't out there gathering their best data, that's real world data, is regulation.

Businesses will hire consultants at a later stage to do risk assessment and fix their code base.


> Every few months I see if ChatGPT can pass it. It hasn’t. It can’t. It isn’t even close.

As someone currently looking for work, I'm glad to hear that.

About 6 months ago, someone was invited to our office and the topic came up. Their interview tests were all easily solved by ChatGPT, so I've been a bit worried.


My take on LLMs is as follows: even if its effectiveness scales exponentially with time(it doesn't), so does the complexity of programs with (statistically speaking) each line of code.

Assuming a LLM gets 99% of the lines correct, after 70 lines the chance of having at least one of them wrong is already around 50%. A LLM effective enough to replace a competent human might be so expensive to train and gather data for that it will never achieve a return on investment.

Last time I used ChatGPT effectively was to find a library that served a specific purpose. All of the four options it gave me were wrong, but I found what I wanted among the search results when I looked for them.


The more automated ones will separately write tests and code, and if the code doesn't compile or pass the test, give itself the error messages and update its code.

Code Interpreter does this a bit in Chat-GPT Plus with some success.

I don't think it needs much more than a GPT-4 level LLM, and a change in IDEs and code structure, to get this working well enough. Place it gets stuck it'll flag to a human to help.

We'll see though! Lots of startups and big tech companies are working on this.


I understand your concern, but isn't it apples v oranges.

Yes, ChatGPT can't pass a particular test of X to Y. But does that matter when ChatGPT is both the designer and the developer? How can it be wrong, when its answer meets the requirements of the prompt? Maybe it can't get from X to Y, but if its Z is as good as Y (to the prompter) then X to Y isn't relevant.

Sure there will be times when X to Y is required but there are plenty of other times where - for the price - ChatGPT's output of Z will be considered good enough.

"We've done the prototype (or MVP) with ChatGPT...here you finish it."


It is likely that LLM have an upper border of capability. Similarly with denoising AI like stable diffusion.

You can put even more data into it and refine the models, but the growth in capability has diminishing returns. Perhaps this is how far this strategy can bring us, although I believe they can still be vastly improved and what they can already offer is nevertheless impressive.

I have no illusion about the craft of coding becomes obsolete however. On the contrary, I think the tooling for the "citizen developer" are becoming worse, as well as the ability for abstraction in common users since they are fenced into candyland.


You must be interviewing good junior front-end devs. I have seen the opposite as gpt-4 can put a simple straightforward front-end while juniors will go straight to create-react-app or nextjs.


Are the junior devs expected to code it without running it and without seeing it rendered, or are they allowed to iterate on the code getting feedback from how it looks on screen and from the dev tools? If it is the second one, you need to give the agent the same feedback including screen shots of any rendering issues to GPT4-V and all relevant information in dev tools for it to be a fair comparison. Eventually there will be much better tooling for this to happen automatically.


We have devs that use AI assist, but it’s to automate the construction of the most mindless boilerplate or as a more advanced form of auto complete.

There is no AI that comes close to being able to design a new system or build a UI to satisfy a set of customer requirements.

These things just aren’t that smart, which is not surprising. They are really cool and do have legitimate uses but they are not going to replace programmers without at least one order of magnitude improvement, maybe more.


Cool...so what's the test? We can't verify if you're talking shit without knowing the parameters of your test.

AI isn't capable of generating the same recipe for cookies as my grandma, she took the recipe to her grave. I loved her cookies they were awesome...but lots of people thought they were shit but I insist that they are mistaken.

Unfortunately, I can't prove I'm right because I don't have the recipe.

Don't be my grandma.


How many conversation responses to hone in the solution do you give the LLM?

If you’re just trying to one-shot it - that’s not really how you get the most from them.


If you can get it to stop parroting clauses about how "as an AI model" it can't give advice or just spewing a list of steps to achieve something - I have found it to be a pretty good search engine for obscure things about a technology or language and for searching for something that would otherwise require a specific query that google is unhelpful searching for.


What sort of questions do you ask out of curiosity?


I don’t want scrapeable answers out there, I want to see ChatGPT cross this little Rubicon on its own.

Vaguely: Questions that most people think they know the correct answers to but, in my experience, don’t.


I think it's fair to want to keep an evaluation private so that it doesn't become part of a train set, but you should know that OpenAI uses users chat data to improve their models (not for entreprise)


This does sound like a test that is almost "set up to fail" for an LLM. If the answer is something that most people think they know, but actually don't then it won't pass in an LLM which is essentially a distillation of the common view.


>I have a simple front-end test that I give to junior devs. Every few months I see if ChatGPT can pass it. It hasn’t. It can’t. It isn’t even close.

Small consolation if it can nonetheless get lots of other cases right.

>It answers questions confidently but with subtle inaccuracies.

Small consolation if coding is reduced to "spot and fix inaccuracies in ChatGPT output".


I've told people, every experiment I do with it, it seems to do better than asking stack overflow, or helps me prime some code that'll save me a couple of hours, but still requires manual fix ups and a deep understanding of what it generates so I can fix it up.

Basically the gruntest of grunt work it can do. If I explain things perfectly.


I'm probably bad at writing prompts, but in my limited experience, I spend more time reviewing and correcting the generated code than it would have taken to write it myself. And that is just for simple tasks. I can't imagine thinking a llm could generate millions of lines of bug free code.


Asking GPT to do a task for me currently feels like asking a talented junior to do so. I have to be very specific about exactly what it is I'm looking for, and maybe nudge it in the right direction a couple of times, but it will generally come up with a decent answer without me having to sink a bunch of time into the problem.

If I'm honest though I'm most likely to use it for boring rote work I can't really be bothered with myself - the other day I fed it the body of a Python method, and an example of another unit test from the application's test suite, then asked it to write me unit tests for the method. GPT got that right on the first attempt.


That’s where I am too. I think almost everyone has that “this is neat but it’s not there yet” moment.


> I think almost everyone has that “this is neat but it’s not there yet” moment.

I rather have this moment without the “this is neat” part. :-) i.e. a clear “not there yet” moment, but with serious doubts whether it will be there anytime in the foreseeable future.


It seems like the problem is with your view of everyone based on a n=1 experiment. I've been shipping production-ready code for my main job for months saving hundreds of work/hours.


Personally, for me this flow works fine AI does the first version -> I heavily edit it & debug & write tests for it -> code does what I want -> I tell AI to refactor this -> tests pass and the ticket is done.


> It answers questions confidently but with subtle inaccuracies.

This is a valid challenge we are facing as well. However, remember that ChatGPT which many coders use, is likely training on interactions so you have some human reinforcement learning correcting its errors in real-time.


How is it trained on reactions? Do people give it feedback? In my experience in trying I stop asking when it provides something useful or something so bad I give up (usually the latter I'm afraid). How would it tell a successful answer from a failing one?


It appears to ask users to rate if the response is better or worse than the first, in other cases, it seems to be A/B testing the response. Lastly, I for instance, will correct it and then confirm it is correct to continue with the next task, which likely creates a footprint pattern.


That's interesting, I haven't come across this.


Which ChatGPT?


Have you tried the new Assistants API for your front-end test? In my experience it is _significantly_ better than just plain ol’ ChatGPT for code generation.


> I have a simple front-end test that I give to junior devs.

What is it?


Making that claim but not sharing the "simple test" feels a bit pointless tbh.

Edit: I see, they don't want it to be scraped (cf. https://news.ycombinator.com/item?id=38260496), though as another poster pointed out, submitting it might be enough for it to end up in the training data.


Would you mind sharing the test?

I’m one of those noob programmers and it has helped me create products far beyond my technical capabilities


3.5, or GPT-4? I'm told the latter is worlds better, that they aren't even in the same ballpark.


Just like the author suggests, sometimes you have to tailor your question to ChatGPT, for it to succeed.


As long as this is true, ChatGPT is going to be a programmer's tool, not a programmer's replacement. I know that my job as I know it will vanish before I enter retirement age, but I don't worry it will happen in the next few years because of this.


I’ve given it so many hints. So many nudges. If it was an interview I would have bounced it.


> sometimes you have to tailor your question to ChatGPT, for it to succeed

Right, which means its a force multiplier for specialists, rather than something that makes generalists suddenly specialists.


Haha, does it involve timers?


I actually don’t get the reference. What are the issues with timers?


I have the same experience with the test I give my back-end devs. ChatGPT can't even begin to decode an encoded string if you don't tell it which encoding was used.

ChatGPT is great at some well defined, already solved problems. But once you get to the messy real world, the wheels come off.



It’s pretty impressive that it was able to actually decode those strings. In March, I used GPT 3.5 to write code for validating a type of string which used a checksum algorithm.

It did the task well, and even wrote tests, but it failed when generating test case values. I wonder if it would perform better if I did it today.


Thank you for taking the time to call BS on someone who obviously never tried asking a LLM AI to decipher a string's encoding. That is exactly the kind of thing they are good ar.


Two people can try similar prompts and get very different results from LLM’s.


What is the test?


I hope that you are testing this on GPT-4/ChatGPT Plus. The free ChatGPT is completely not representative of the capabilities or the accuracy of the paid model.


I’ve tested it on both.


Are you using GPT-3.5, or GPT-4?


You’re missing the point of the article. ChatGPT in combination with a mediocre could solve your problem faster than the best junior dev has before


I tried doing this and it actually took longer due to all of the blind alleys it led me down.

There is stuff that it can do that appears magically competent at but it's almost always cribbed from the internet, tweaked with trust cues removed and often with infuriating, subtle errors.

I interviewed somebody who used it (who considered that "cheating") and the same thing happened to him.


You got everyone talking about how GPT isn’t that bad at coding etc but everyone is missing the point.

The no code industry is massive. Most people don’t need a dev to make their website already. They use templates and then tweak them through a ui. And now you have Zapier, Glide, Bubble etc.

LLMs won’t replace devs by coding entire full stack web apps. They’ll replace them because tools will appear on the market that handle the 99% cases so well that there is just less work to do now.

This has all happened before of course.


I collaborate with front-end teams that use a low-code front-end platform. When they run into things that aren’t built-in, they try to push their presentation logic up the stack for the “real” programming languages to deal with.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: