My experience is likely colored by the fact that I tend to turn to LLMs for prob...

tptacek · on Sept 13, 2024

I use LLMs for three things:

* To catch passive voice and nominalizations in my writing.

* To convert Linux kernel subsystems into Python so I can quickly understand them (I'm a C programmer but everyone reads Python faster).

* To write dumb programs using languages and libraries I haven't used much before; for instance, I'm an ActiveRecord person and needed to do some SQLAlchemy stuff today, and GPT 4o (and o1) kept me away from the SQLAlchemy documentation.

OpenAI talks about o1 going head to head with PhDs. I could care less. But for the specific problem we're talking about on this subthread: o1 seems materially better.

throwup238 · on Sept 13, 2024

> * To convert Linux kernel subsystems into Python so I can quickly understand them (I'm a C programmer but everyone reads Python faster).

Do you have an example chat of this output? Sounds interesting. Do you just dump the C source code into the prompt and ask it to convert to Python?

tptacek · on Sept 13, 2024

No, ChatGPT is way cooler than that. It's already read every line of kernel code ever written. I start with a subsystem: the device mapper is a good recent example. I ask things like "explain the linux device mapper. if it was a class in an object-oriented language, what would its interface look like?" and "give me dm_target as a python class". I get stuff like:

    def linear_ctr(target, argc, argv):
        print("Constructor called with args:", argc, argv)
        # Initialize target-specific data here
        return 0
     
    def linear_dtr(target):
        print("Destructor called")
        # Clean up target-specific data here
     
    def linear_map(target, bio):
        print("Mapping I/O request")
        # Perform mapping here
        return 0
     
    linear_target = DmTarget(name="linear", version=(1, 0, 0), module="dm_mod")
    linear_target.set_ctr(linear_ctr)
    linear_target.set_dtr(linear_dtr)
    linear_target.set_map(linear_map)
     
    info = linear_target.get_info()
    print(info)

(A bunch of stuff elided). I don't care at all about the correctness of this code, because I'm just using it as a roadmap for the real Linux kernel code. The example use case code is an example of something GPT 4o provides that I didn't even know I wanted.

throwup238 · on Sept 13, 2024

That's awesome. Have you tried asking it to convert Python (psuedo-ish) code back into C that interfaces with the kernel?

tptacek · on Sept 13, 2024

No, but only because I have no use for it. I wouldn't be surprised if it did a fine job! I'd be remiss if I didn't note that it's way better at doing this for the Linux kernel than with codebases like Zookeeper and Kubernetes (though: maybe o1 makes this better, who knows?).

I do feel like someone who skipped like 8 iPhone models (cross-referencing, EIEIO, lsp-mode, code explorers, tree-sitter) and just got an iPhone 16. Like, nothing that came before this for code comprehension really matters all that much?

INGSOCIALITE · on Sept 13, 2024

it's all placeholders - that's my experience with gpt trying to write slop code

throwup238 · on Sept 13, 2024

Those are placeholders for user callbacks passed to the device mapper subsystem. It’s a usage example not implementation code.

jeffhuys · on Sept 13, 2024

Then ask it to expand. Be specific.

tptacek · on Sept 13, 2024

I wasn't about to paste 1000 lines of Python into the thread; I just picked an interesting snippet.

mensetmanusman · on Sept 13, 2024

LLMs are not for expanding the sphere of human knowledge, but for speeding up auto-correct of higher order processing to help you more quickly reach the shell of the sphere and make progress with your own mind :)

tsunamifury · on Sept 13, 2024

Definitely. When we talk about being skilled in a T shape LLMs are all about spreading your top of T and not making the bottom go deeper.

ben_w · on Sept 13, 2024

Indeed, not much more depth — though even Terence Tao reported useful results from an earlier version, so perhaps the breadth is a depth all of it's own: https://mathstodon.xyz/@tao/110601051375142142

I think of it as making the top bar of the T thicker, but yes, you're right, it also spreads it much wider.

skydhash · on Sept 13, 2024

I prefer reading some book. Maybe the LLM was trained on some piece of knowledge not available on the net, but I much prefer the reliability and consistency of a book.

Al-Khwarizmi · on Sept 13, 2024

It's funny because I'm very happy with the productivity boost from LLMs, but I use them in a way that is pretty much diametrically opposite to yours.

I can't think of many situations where I would use them for a problem that I tried to solve and failed - not only because they would probably fail, but in many cases it would even be difficult to know that it failed.

I use it for things that are not hard, can be solved by someone without a specialized degree that took the effort to learn some knowledge or skill, but would take too much work to do. And there are a lot of those, even in my highly specialized job.

Terr_ · on Sept 13, 2024

LLMs: When the code can be made by an enthusiastic new intern with web-search and copy-paste skills, and no ability to improve under mentorship. :p

Tangentially related, a comic on them: https://existentialcomics.com/comic/557

troupo · on Sept 13, 2024

> That's the frustrating thing. LLMs don't materially reduce the set of problems where I'm running against a wall or have trouble finding information.

As you step outside regular Stack Overflow questions for top-3 languages, you run into limitations of these predictive models.

There's no "reasoning" behind them. They are still, largely, bullshit machines.

KoolKat23 · on Sept 13, 2024

you're both on the wrong wavelength. No one has claimed it is better than an expert human yet. Be glad, for now your jobs are safe, why not use it as a tool to boost your productivity, yes, even though you'll get proportionally less use than others in other perhaps less "expert" jobs.

troupo · on Sept 13, 2024

In order for it to boost productivity it needs to answer more than the regular questions for the top-3 languages on Stackoverflow, no?

It often fails even for those questions.

If I need to babysit it for every line of code, it's not a productivity boost.

KoolKat23 · on Sept 14, 2024

Why does it need to answer more than that?

You underestimate the opportunity that exists for automation out there.

In my own case I've used it to make simple custom browser extensions transcribing PDFs, I don't have the time and wouldn't of made the effort to make the extension myself, the task would of continued to be done manually. It took two hours to make and it works, that's all I need in this case.

Perfection is the enemy of good.

troupo · on Sept 14, 2024

> Perfection is the enemy of good.

Where exactly did I write anything about perfection? For me "AIs" are incapable of producing working code: https://news.ycombinator.com/item?id=41534233

KoolKat23 · on Sept 15, 2024

You said you have to babysit each line of code, I mean this is simply untrue, if it works there's no need to babysit, the only reason you'd need to babysit every single line is if you're looking for perfection or it's something very obscure or unheard of.

Your example is perhaps valid, but there are other examples where it does work as I mentioned. I think it may be imprecise prompting, too general or with too little logic structure. It's not like Google search, the more detail and more technical you speak the better, assume it's a very precise expert. Its intelligence is very general so it needs precision to avoid confusing subject matter. A well structured logic to your request also helps as it's reasoning isn't the greatest.

Good prompting and verifying output is often still faster than manually typing it all.

troupo · on Sept 15, 2024

> You said you have to babysit each line of code, I mean this is simply untrue, if it works there's no need to babysit

No. It either doesn't work, or works incorrectly, or the code is incomplete despite requirements etc.

> Your example is perhaps valid, but there are other examples where it does work as I mentioned.

It's funny how I'm supposed to assume your examples are the truth, and nothing but the truth, but my examples are "untrue, you're a perfectionist, and perhaps you're right"

> the more detail and more technical you speak the better

As I literally wrote in the comment you're so dismissive of: "As for "using LLMs wrong", using them "right" is literally babysitting their output and spending a lot of time trying to reverse-engineer their behavior with increasingly inane prompts."

> assume it's a very precise expert.

If it was an expert, as you claim it to be, it would not need extremely detailed prompting. As it is, it's a willing but clumsy junior.

To the point that it would rewrite the code I fixed with invalid code when asked to fix an unrelated mistake.

> Good prompting and verifying output

How is it you repeat everything I say, and somehow assume I'm wrong and my examples are invalid?

KoolKat23 · on Sept 15, 2024

I did not say your examples are untrue, no need to be so defensive. Believe what you wish but my example is true and works. A willing but clumsy junior benefits tremendously from a well scoped task.

TeMPOraL · on Sept 13, 2024

If you need to babysit it for every line of code, you're either a superhuman coder, working in some obscure alien language, or just using the LLM wrong.

troupo · on Sept 13, 2024

No. I'm just using for simple things like "Help me with the Elixir code" or "I need to list Bonjour services using Swift".

It's shit across the whole "AI" spectrum from ChatGPT to Copilot to Cursor aka Claude.

I'm not even talking about code I work with at work, it's just side projects.

As for "using LLMs wrong", using them "right" is literally babysitting their output and spending a lot of time trying to reverse-engineer their behavior with increasingly inane prompts.

Edit: I mean, look at this ridiculousness: https://cursor.directory/