Hacker Newsnew | past | comments | ask | show | jobs | submit | rsalus's commentslogin

I don't know, even if AI allows two engineers to do the work of six, companies will likely just use that efficiency to expand their scope. I think we'll see short-term layoffs and a more stratified engineering field during the transition, but the fundamental need for deep technical expertise isn't going away.

>I don't know, even if AI allows two engineers to do the work of six, companies will likely just use that efficiency to expand their scope.

Not really. It will be a cuttthroat landscape, and the scope wont matter as much anymore. First because everyone else will equally be able to throw LLMs at the scope, but also because the scope has natural limits: your market fit, customer expectations, and (for software/hw products) physical world/manufacturing limitations.

They'll want to reduce their margins.


AI usage will directly impact said margins. Moreover, for the scenario you describe, companies need to have the capability to precisely estimate the cost of a given deliverable - not something possible with current tooling + models. You're also underestimating the market trend towards vertical integration: companies are not going to be constrained by a sector or niche. They will expand to capture as much value as they can, because now their capacity to do so is partially decoupled from labor.

It will certainly be a cutthroat landscape for engineers, but companies will be building _more_ capacity, not less. In other words, the demand won't disappear for skilled technical labor, it will just move higher up the value chain.


>You're also underestimating the market trend towards vertical integration: companies are not going to be constrained by a sector or niche. They will expand to capture as much value as they can, because now their capacity to do so is partially decoupled from labor.

They will still totally be, because the capacity to do so was never coupled to labor, it was coupled to domain knowledge, client network, other players dominating the market, and so on...


> even if AI allows two engineers to do the work of six, companies will likely just use that efficiency to expand their scope.

1) they won't, they'll just cut costs

or

2) they will, but unless it's a new scope or one that can absorb growth, they'll just be competing with other companies in the space and taking away business from them

either way, labor loses


No, my life experience tells me those companies will fire the ones they no longer need instead.

not necessarily, TDD has little bearing on output quality

In what world or frame of reference would doing TDD have "little" bearing on output quality? If you build a system around satisfying some set of requirements it seems logical that output quality would have pretty heavy correlation.

It's possible to satisfy a set of requirements with code that's low quality. There's the maintainability of the code, for example, or the performance of the system.

The set of requirements TDD encourages code to meet happen to be ones that increase code quality.

Code that is easy to test tends to be well-structured.

Code that is badly structured tends to be hard to test.

TDD is not a QA methodology, it is a design methodology. It also tends to help quality out a lot, but that's a secondary effect.


for LLMs, TDD amounts to little more than ceremony. there is a study on this exact topic: https://arxiv.org/pdf/2602.07900

That’s an interesting proposition, are you saying people do TDD just for the heck of it?

I was a big proponent of encoding TDD red-green-refactor methodology into my agent workflows until recently when I made the same realization after reading this study: https://arxiv.org/pdf/2602.07900

TLDR; it found test-writing volume only weakly correlates with success and that encoding test-writing principles did not move resolution rates but _did_ materially change cost. Encouraging tests cost +19.8% output tokens for 0% gain; discouraging them saved 33–49% input tokens for ≤2.6pp accuracy loss. Separately, imposing the TDD procedure specifically seems like it can backfire: it actually _increased_ regressions from 6.08% to 9.94%.

IMO, where tests clearly help is primarily as an "oracle" applied after generation. It gives the models a signal that enables them to verify and self-correct if necessary.


Very interesting paper and it lines up exactly with my observations. The ROI just isn't there writing tests up front and the conclusion in that paper lays it out clearly

    Overall, these findings suggest that agent-written
    tests often behave more like a habitual software-development rou-
    tine than a dependable source of validation in this setting. More
    agent-written tests do not mean more solves; what they more reli-
    ably change is the process footprint—API calls, token usage, and
    interaction patterns. Improving the value of testing for code agents
    may therefore require better oracles and more actionable validation
    signals, rather than simply inducing agents to write more tests.
> IMO, where tests clearly help is primarily as an "oracle" applied after generation

Bingo. I'm not against writing tests it's that the returns are better when its used as verification feedback and as "Oracle" exactly as you put it.


Just chiming in to say that I've seen the exact same that you have. Tests are better used to help validate that was was generated worked after the fact.

That, and even the absolute SOTA models still suck at writing tests.

Which shouldn't be surprising: humans suck at it too most of the time...


Absolutely, there's no reason to believe that agents will be more capable of writing tests than any other piece of code. The big pay off is actually verifying the code that was generated.

From that paper:

> This raises a central question: do such tests meaningfully improve issue resolution, or do they mainly mimic a familiar software-development practice while consuming interaction budget?

This is an important question but it's not the one I'm most interested in when requiring agents to follow TDD. My goal is to lock in behavior because it was happening way too frequently that an agent would successfully fix the issue at hand, but break something else that it wasn't supposed to touch.

The tests add another layer and it's why I always separate out red and green worker subagents. The green worker might get trigger happy and go beyond scope/break something but it's not allowed to fudge the tests so I'll know and can clean up and revert.

It's also why I'm not too bothered about perfect red green TDD. I can add the tests later if needed.


tests are an important signal of course, but the use case you describe doesn't necessarily mean you need to follow TDD. the data suggests that creating the tests after the code is just as or even more effective, and at significantly cheaper input cost.

I've been finding enforcing integrations and behavior structurally (e.g., through codegen/schemagen, e2e tests, etc) more reliable than simply instructing the models to write tests. oftentimes these tests are pretty low quality anyway, and results in its own form of tech debt.


Why do you think creating tests later would be cheaper?

The paper focuses on two things: default behavior and behavior with a prompt to write at least one new test.

In general — just like with humans — I find "just add more tests" to be counter-productive.

Tests make sense in a testable architecture: TDD can encourage one to be implicitly used, but it is a design, architectural choice that should be made explicit (lean to functional code; use direct, explicit dependency injection; ensure test stubs are just variants of the real implementation and fully tested using the same test as the real one...). LLMs should be prompted with this guidance instead for proper value estimation.


no. red green tdd is great because you'll have tests when your llm breaks something later, or you're doing a massive refactor. i imagine studies are not done on codebases where the complexity gets that high.

tdd has been invaluable for this project (almost entirely llm written, but i review it) https://github.com/ityonemo/clr


this is not really backed by any empirical evidence. there are simply more efficient means of verifying outputs than TDD.

agreed but depends on then language. for instance, the .NET equivalent (MemoryCache) is pretty poor.

functionally they operate as a marketplace for cloud providers. I feel like there is value there, especially as API costs rise and companies explore cost-saving/efficiency. IMO, this is a particularly attractive value prop in the SMB space, where it is common to interoperate between multiple SaaS/software stacks.


there is a difference between concurrency in a distributed environment and concurrency on a single machine across processes. SQLite is incredibly useful for the latter.

you seem like the inexperienced one to me..


SQLite does not support concurrent writes at all (on a single machine), a single writer process locks the entire database.


It doesn't block reads. Single writer systems are often faster than concurrent writers no coordination overhead and you can batch.


not really true, SQLite supports WAL mode which allows concurrent writes (technically write _attempts_, but these writes are exceptionally fast and are serialized to the file-system anyway, so functionally equivalent to concurrent writes for p50 use case).

also, use-case for massively concurrent writes is pretty narrow, and SQLite is not optimizing for that anyway.


> you seem like the inexperienced one to me

There is irony here


mirrors my own experience creating a persistent event log. I started with JSON, then JSONL, etc until finally landing on SQLite.


they need to make 5t-10t back, but not necessarily through selling tokens. as we can see, the frontier labs are making vertically integrated products. their revenue is no longer strictly tied to inference.


often times the classical Greeks + Romans would cite their family lineage using works of Homer and other poets


https://github.com/lvlup-sw/exarchos

It's an SDLC workflow harness for agents. Instead of using skills to encode my typical workflows (e.g., create PRD, then create plan using TDD, then dispatch subagents, etc) I've built a concurrent event-sourced process manager to handle it.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: