If you've got some time, I highly recommend going through the exercise of trying to change the prompt in a way that would produce code similar to what you've achieved manually. Doing a similar exercise really helps to improve agent prompting skills, as it shows how changing parts of the prompt influences the result.
I haven’t had any luck prompting LLMs to “have taste.” They seem to over fixate on instructions (e.g. golfing when asked for concise code) or require specifying so many details and qualifications that the results no longer generalize well to other problems.
Do you have any examples or resources that worked well for you?
Yeah prompting doesn't work for this problem because the entire point of an LLM is you give it the what and it outputs the how. The more how that you have to condition it with in the prompt, the less profitable the interaction will be. A few hints is OK, but doing all the work for the LLM tends to lead to negative productivity.
Writing prompts and writing code takes about the same amount of time, for the same amount of text, plus there's the extra time that the LLM takes to accomplish the task, and review time afterwards. So you might as well just write the code yourself if you have to specify every tiny implementation detail in the prompt.
Makes me think of this commitstrip comic: https://i.xkqr.org/itscalledcode.jpg (mirrored from the original due to TLS issues with the original domain.)
A guy with a mug comes up to a person standing with their laptop on a small table. The mug guy says, "Some day we won't even need coders any more. We'll be able to just write the specification and the program will write itself."
Guy with laptop looks up. "Oh, wow, you're right! We'll be able to write a comprehensive and precise spec and bam, we won't need programmers any more!"
Guy with mug takes a sip. "Exactly!"
Guy with laptop says, "And do you know the industry term for a project specification that is comprehensive and precise enough to generate a program?"
You know, this makes me wonder... is anybody actually prompting LLMs with pseudocode rather than an English specification? Could doing so result in code that that's more true to the original pseudocode?
You can give the macro-structure using stubs then ask the LLM to fill in the blanks.
The problem is that it doesn't work too well for the meso-structure.
Models tend to be quite good at the micro-structure because they've seen a lot of it already, and the macro-structure can easily be promoted, but the levels in between are what distinguishes a good vs bad model (or human!).
Goodhart's Law of Specification: When a spec reaches a state where it's comprehensive and precise enough to generate code, it has fallen out of alignment with the original intent.
Of course there are some systems where correctness is vital, and for those I'd like a precise spec and proof of correctness. But I think there's a huge bulk of code where formal specification impedes what should be a process of learning and adapting.
My dream antiprogram is a specification compiler that interprets any natural language and compiles it to a strict specification. But on any possible ambiguity it gives an error.
?
This terse error was found to be necessary as to not overwhelm the user with pages and pages of decision trees enumerating the ambiguities.
Openspec does this. But instead of "?" it has a separate Open Questions section in the design document. In codex cli, if you first go in plan mode it will ask you open questions before it proceeds with the rest.
The UX is there, for small things it does work for me, but there is still something left for LLMs to truly capture major issues.
> Do you have any examples or resources that worked well for you?
Using this particular example, if you simply paste the exact code into the prompt, the model should able to reproduce it. Now, you can start removing the bits and see how much you can remove from the prompt, e.g. simplify it to pseudocode, etc. Then you can push it further and try to switch from the pseudocode to the architecture, etc.
That way, you'll start from something that's working and work backwards rather than trying to get there in the absence of a clear path.
That’s an interesting approach, but what do you learn from it that is applicable to the next task? Do you find that this eventually boils down to heuristics that generalize to any task? It sounds like it would only work because you already put a lot of effort into understanding the constraints of the specific problem in detail.
What worked for me was Gemini 3 Pro (I guess 3.1 should work even better now) with the prompt "This code is unnecessarily complicated. Simplify it, but no code golf". This decreased code size by about 60 %. It still did a bit of code-golfing, but it was manageable.
It is important to start a new chat so the model is not stuck in its previous mindset, and it is beneficial to have tests to verify that the simplified code still works as it did before.
Telling the model to generate concise code did not work for me, because LLMs do not know beforehand what they are going to write, so they are rarely able to refactor existing code to break out common functionality into reusable functions. We might get there eventually. Thinking models are a bit better at it. But we are not quite there yet.
I have a stupid solution for this which is working wonders. It does not help to tell the LLM "don't do this pattern". I literally make it write a regex based test which looks for that pattern and fails the test.
For example I am developing a game using GDscript, LLMs (including codex and claude) keep making scripts with no classnames and then loading them with @preload. Hate this, and its explicitly mentioned in my godot-development skill. What agents can't stand is a failing test. Feels a bit like enforcing rules automatically.
This is a stupid idea but it works wonders on giving taste to my LLM. I wonder if I should open source that test suite for other agentic developers.
I really should spend some time analyzing what I do to get the good output I get..
One thing that is fairly low effort that you could try is find code you really like and ask the model to list the adjectives and attributes that that code exhibits. Then try them in a prompt.
With LLMs generally you want to adjust the behavior at the macro level by setting things like beliefs and values, vs at the micro level by making "rules".
By understanding how the model maps the aspects that you like about the code to language, that should give you some shorthand phrases that give you a lot of behavioral leverage.
Edit:
Better yet.. give a fresh context window the "before" and "after" and have it provide you with contrasting values, adjectives, etc.
Concise isn't specific enough: I've primed mine on basic architecture I want: imperative shell/functional core, don't mix abstraction levels in one function, each function should be simple to read top-to-bottom with higher level code doing only orchestration with no control flow. Names should express business intent. Prefer functions over methods where possible. Use types to make illegal states unrepresentable. RAII. etc.
You need to think about what "good taste " is to you (or find others who have already written about software architecture and take their ideas that you like). People disagree on what that even means (e.g. some people love Rails. To me a lot of it seems like the exact opposite of "good taste").
I spend much more time refactoring that creating features (though, it is getting better with each model). My go-to approach is to use Claude Code Opus 4.6 for writing and Gemini 3.1 Pro for cleaning up. I feel that doing it just one-stage is rarely enough.
A lot of prompts about finding the right level of abstraction, DRY, etc.
I actually don’t think golfing is such a bad thing, granted it will first handle the low hanging fruits like variable names etc, but if you push it hard enough it will be forced to think of a simpler approach. Then you can take a step back and tell it to fix the variable names, formatting etc. With the caveat that a smaller AST doesn’t necessarily mean simpler code, but it’s a decent heuristic.
I appreciate that your message is a good-natured, friendly tip. I don't mean for the following to crap on that. I just need to shout into the void:
If I have some time, the last thing I want to do with it is sharpen prompting skills. I can't imagine a worse or more boring use of my time on a computer or a skill I want less.
Every time I visit Hacker News I become more certain that I want nothing to do with either the future the enthusiasts think awaits us or the present that they think is building towards it.
While I somewhat understand the impact on the craft, the agents have allowed me to work on the projects that I would never have had enough time to work on otherwise.
You dont need to learn anything, it needs to learn from you. When it fails, don't correct it out of bounds, correct it in the same UI. At the end say "look at what I did and create a proposed memory with what you learned" and if it looks good have it add it to memories.
This better reflects what I thought about the other day. You either, let clankers do its thing and then bake in your implementation on top, you think it through and make them do it, but at the end of the day you've still gotta THINK of the optimal solution and state of the code at which point, do clankers do anything asides from saving you a bunch of keypresses, and maybe catching a couple of bugs?
Also useful to encode into the steering of your platform. The incremental aspect of many little updates really help picking up speed by reducing review time.
Big bang approach could be a start, but a lot of one line guidance from specific things you dont want to see stack up real fast.
My mildly amusing anecdote is that, whenever Claude Code produces something particularly egregious, I often find it sufficient to reply with just "wtf?" for it to present a much more sensible version of the code (which often needs further refinement, but that's another story...)
What's incredibly ironic is that research labs are releasing the most advanced hacking toolkit ever known, and cybersecurity defence stocks are going down as a result somehow. There’s no logic in the stock markets.
In Theory There Is No Difference Between Theory and Practice, While In Practice There Is.
In large projects, having a specific AGENTS.md makes the difference between the agent spending half of its context window searching for the right commands, navigating the repo, understanding what is what, etc., and being extremely useful. The larger the repository, the more things it needs to be aware of and the more important the AGENTS.md is. At least that's what I have observed in practice.
Great article! Just yesterday I watched a Devoxx talk by Andrei Pangin [1], the creator of async-profiler where I learned about the new heatmap support. To many folks it might not sound that exciting, until you realise that these heatmaps make it much easier to see patterns over time. If you’re interested there’s a solid blog post [2] from Netflix that walks through the format and why it can be incredibly useful.
Great that you had the time to be curious and dig into what was going on. QEMU is quite an amazing tool.
I'm kind of surprised there isn't a fairly robust kernel test around this issue, since it locks the machine down and I think the fix was to prevent a stuck CPU last time as well?
It's also vaguely surprising that this hasn't been encountered more often, particularly by the https://news.ycombinator.com/user?id=everlier talking in links to this HN post about "20-30 containers" running simultaneously and occasionally locking up the machine.
If you're still thinking about the bug a little, you could look over how other kernel tests work and implement a failing test around it....?
I imagine the tests have some way of detecting a locked up kernel... I don't know exactly how they'd do it, but they probably have a technique. Most likely since the kernel is literally in a loop it won't respond to anything.. so starting any process, something as simple as creating any process, even one as simple as printing "Hello World!!" would fail and indicate the machine is locked.
Perhaps this is one of those cases where something like UserModeLinux would allow a test to be easily put together, rather than spawning complete VMs via some kind of VM software. Again, would be interesting to know what Linux does with this kind of test.
Definitely not the first AI generated font. One can find an enormous amount of research in AI font generation on https://scholar.google.com/ going back many years. This could possibly be the first one that used Nano Banana though, and the result is impressive for sure!
I believe there is no contradiction with the definition from the linked article?
> A system is said to be real-time if the total correctness of an operation depends not only upon its logical correctness, but also upon the time in which it is performed. Real-time systems, as well as their deadlines, are classified by the consequence of missing a deadline:
> Hard – missing a deadline is a total system failure.
> Firm – infrequent deadline misses are tolerable, but may degrade the system's quality of service. The usefulness of a result is zero after its deadline.
> Soft – the usefulness of a result degrades after its deadline, thereby degrading the system's quality of service.
> I guess I just have to accept that the term has lost it's meaning at this point and can be used for whatever whoever wants to use it for
It's maybe more like you point out: realtime in the OS context vs realtime in an event processing context. The latter is certainly not defined as strictly and often just means push-based. It has been a popular moniker, e.g. in kafka-land, for a while. I'm not sure it intrinsically takes away from the OS context - it doesn't need to be a deep dish pizza situation.
The highest level of cringe you can feel is when you see people you know well in real life post on LinkedIn. The contrast between the way they speak in real life and on LinkedIn is often immense, you don't feel that level of contrast with random internet strangers.
On the other hand, people have commented (in real life to me) that my linkedin comments are bold, hilarious and entirely unprofessional- earning me a sort of credibility in their eyes for being authentic and having integrity.
(and probably more privately, they believe I am too outspoken..)
Pro’s/Con’s; just like with all public broadcast information.
Also, its always embarrassing when someone talks about a linkedin comment I have made, not because I am ashamed but because I am sort of used to a semi-anonymous shouting into the void style forum like hackernews.
Reminds me of a blog post I once read from a manager writing about all the qualities of being a good manager. I read it nodding along that they all seemed like good traits. Then in the comment section there was a post from someone saying something like "You were my manager at one point and honestly you were one of the worst managers I've had in my career. I didn't see many of these behaviours from you". The author responded with something like "I don't disagree. There's sometimes a gap between knowing and doing"
In my professional network, people mostly just reshare and like things their peers are doing or that they want to boost engagement for (mainly job postings, which they also post occasionally).
I _do_ have acquaintances I made outside of working life on LinkedIn, though - the only two that are really active are a mechanical engineer who mostly just posts about robotics and someone in marketing. I don't know if it's because I'm just really good friends with the latter person, but I've never felt annoyed reading their posts; they mostly seem to just talk about enjoying conferences or new externally facing projects - ad campaigns, large-scale promotions, etc - wherever they are currently working. I don't know if part of that is they're in the EU and the culture for marketers there is different?
I have a friend who behaves similarly on linkedin and in real life, and he's very blunt. I like how he calls out some crap on linkedin posts, and nobody dares to like his comments, even though I'm sure everybody approves.
Overall, I don't see anyone I know being a cringe bootlicker on LinkedIn. These people are very visible, but probably a small minority of users.
I'm curious about how their internal policies work such that they are allowed to publish a post mortem this quickly, and with this much transparency.
Any other large-ish company, there would be layers of "stakeholders" that will slow this process down. They will almost always never allow code to be published.
Well… we have a culture of transparency we take seriously. I spent 3 years in law school that many times over my career have seemed like wastes but days like today prove useful. I was in the triage video bridge call nearly the whole time. Spent some time after we got things under control talking to customers. Then went home. I’m currently in Lisbon at our EUHQ. I texted John Graham-Cumming, our former CTO and current Board member whose clarity of writing I’ve always admired. He came over. Brought his son (“to show that work isn’t always fun”). Our Chief Legal Officer (Doug) happened to be in town. He came over too. The team had put together a technical doc with all the details. A tick-tock of what had happened and when. I locked myself on a balcony and started writing the intro and conclusion in my trusty BBEdit text editor. John started working on the technical middle. Doug provided edits here and there on places we weren’t clear. At some point John ordered sushi but from a place with limited delivery selection options, and I’m allergic to shellfish, so I ordered a burrito. The team continued to flesh out what happened. As we’d write we’d discover questions: how could a database permission change impact query results? Why were we making a permission change in the first place? We asked in the Google Doc. Answers came back. A few hours ago we declared it done. I read it top-to-bottom out loud for Doug, John, and John’s son. None of us were happy — we were embarrassed by what had happened — but we declared it true and accurate. I sent a draft to Michelle, who’s in SF. The technical teams gave it a once over. Our social media team staged it to our blog. I texted John to see if he wanted to post it to HN. He didn’t reply after a few minutes so I did. That was the process.
> I texted John to see if he wanted to post it to HN. He didn’t reply after a few minutes so I did
Damn corporate karma farming is ruthless, only a couple minute SLA before taking ownership of the karma. I guess I'm not built for this big business SLA.
We're in a Live Fast Die Young karma world. If you can't get a TikTok ready with 2 minutes of the post modem drop, you might as well quit and become a barista instead.
> I read it top-to-bottom out loud for Doug, John, and John’s son. None of us were happy — we were embarrassed by what had happened — but we declared it true and accurate.
I'm so jealous. I've written postmortems for major incidents at a previous job: a few hours to write, a week of bikeshedding by marketing and communication and tech writers and ... over any single detail in my writing. Sanitizing (hide a part), simplifying (our customers are too dumb to understand), etc, so that the final writing was "true" in the sense that it "was not false", but definitely not what I would call "true and accurate" as an engineer.
I'm not sure I've ever read something from someone so high up in a company that gave me such a strong feeling for "I'd like to work for these people". If job posts could be so informal and open ended, this post could serve as one in the form of a personality fit litmus test.
How do you guys handle redaction? I'm sure even when trusted individuals are in charge of authoring, there's still a potential of accidental leakage which would probably be best mitigated by a team specifically looking for any slip ups.
Team has a good sense, typically. In this case, the names of the columns in the Bot Management feature table seemed sensitive. The person who included that in the master document we were working from added a comment: “Should redact column names.” John and I usually catch anything the rest of the team may have missed. For me, pays to have gone to law school, but also pays to have studied Computer Science in college and be technical enough to still understand both the SQL and Rust code here.
Probably because he could check legalities of a release himself without council. It is probably equivalent to educating yourself on your rights and laws so if you get pulled over by a cop who may try to do things that you can legally refuse, you can say no.
The person who posted both this blog article and the hacker news post, is Matthew Prince, one of highly technical billionaire founders of cloudflare. I'm sure if he wants something to happen, it happens.
I mean the CEO posted the post-mortem so there aren't that many layers of stakeholders above. For other post-mortems by engineers, Matthew once said that the engineering team is running the blog and that he wouldn't event know how to veto even if he wanted [0]
There’s lots of things we did while we were trying to track down and debug the root cause that didn’t make it into the post. Sorry the WARP takedown impacted you. As I said in a comment above, it was the result of us (wrongly) believing that this was an attack targeting WARP endpoints in our UK data centers. That turned out to be wrong but based on where errors initially spiked it was a reasonable hypothesis we wanted to rule out.
Why give this sort of content more visibility/reach?
I'm sure that's not your intent, so I hope my comment gives you an opportunity to reflect on the effects of syndicating such stupidity, no matter what platform it comes from.
Mainly to make others aware of what’s happening in the context of this Cloudflare outage. Sure I can avoid giving it visibility/reach but it’s growing and proliferating on its own, and I think ignoring it isn’t going to stop it so I am hoping awareness will help. I’ve noticed a huge rise in open racism against Chinese and Indian and workers of other origin, even when they’re here on a legal visa that we have chosen as a nation to grant for our own benefit.
The legislation that MTG (Marjorie Taylor Green) just proposed a few days ago to ban H1B entirely, and the calls to ban other visa types, is going to have a big negative impact on the tech industry and American innovation in general. The social media stupidity is online but it gives momentum to the actual real life legislation and other actions the administration might take. Many congress people are seeing the online sentiment and changing their positions in response, unfortunately.
I'm not the person you were replying to, but there is a rule I often see about not directly replying/quote tweeting because "engagement" appears to boost support for the ideas expressed. The recommendation then, would be to screenshot it (often with the username removed) and link to that.
FWIW it seems pretty obvious that this was ragebait. OP's profile is pretty much non-stop commentary on politics with nearly zero comments or submissions pertaining to the broader tech industry.
Posts like that deserve to be flagged if the sum of their substance is jingoist musing & ogling dumb people on Twitter.
> Let me save you fifteen minutes, or the rest of your life: They aren’t.
Knowing that all profilers aren't perfectly accurate isn't a very useful piece of information. However, knowing which types of profilers are inaccurate and in which cases is indeed very useful information, and this is exactly what this article is about. Well worth 15 minutes.
> And that often involves ignoring the fancy visualization and staring at the numbers.
Visualisations are incredibly important. I've debugged a large number [1] of performance issues and production incidents highlighted by the async profiler producing Brendan Gregg's flame graphs [2]. Sure, things could be presented as numbers, but what I really care about most of the time when I take a CPU profile from a production instance is – what part of the system was taking most of the CPU cycles.
Isn’t not that they’re “not perfectly accurate”, it’s that you can find half an order of magnitude of performance after the profiler tells you everything is fine.
That’s perfectly inaccurate.
Most of the people who seem to know how to actually tune code are in gaming, and in engine design in particular. And the fact that they don’t spend all day every day telling us how silly the rest of us are is either a testament to politeness or a shame. I can’t decide which.
> Isn’t not that they’re “not perfectly accurate”, it’s that you can find half an order of magnitude of performance after the profiler tells you everything is fine.
> That’s perfectly inaccurate.
That's a very strong claim, and it's not true in my experience as I've showed above.
My read is that it's easy to be quite negative on Java features when you're not the person they were designed for. For example, the main "customer" of the module system is the JDK itself. The main customer of NIO/2 is the low-level libraries like Netty.
I highly recommend the Growing the Java Language talk by Brian Goetz to anyone who's interested in the philosophy behind evolving the modern Java language [1]. And Don’t be misled by the title, it’s not just about Java, it’s about software design.
> So yeah, why expose it to those who are not the "main customer"?
How did modules affect you as a user? I'd guess that you had to add `--add-opens`/`--add-exports` during one of the JDK migrations at some point. And the reason you had to do it was that various libraries on your classpath used JDK internal APIs. So modules provided encapsulation and gave you an escape hatch for when you still have to use those libraries. How else would you do it while still achieving the desired goal?