What do you mean that it "ended up not even being related to the actual issue"? ...

cmrdporcupine · 2025-06-19T20:59:58 1750366798

I explain this in sibling-node comment but I've caught Claude multiple times in the last week just inserting special-case kludges to make things "pass", without actually successfully fixing the underlying problem that the test was checking for.

Just outright "if test-is-running { return success; }" level stuff.

Not kidding. 3 or 4 times in the past week.

Thinking of cancelling my subscription, but I also find it kind of... entertaining?

jiggawatts · 2025-06-19T21:51:13 1750369873

I just realised that this is probably a side-effect of a faulty training regime. I’ve heard several industry heads say that programming is “easy” to generate synthetic data for and is also amenable to training methods that teach the AI to increase the pass rate of unit tests.

So… it did.

It made the tests pass.

“Job done boss!”

falcor84 · 2025-06-20T08:14:23 1750407263

I found that working with an AI is most productive when I do so in an Adversarial TDD state of mind. As described in this classic qntm post [0] following the VW emissions scandal, which concludes with:

> Honestly? I blame the testing regime here, for trusting the engine manufacturers too much. It was foolish to ever think that the manufacturers were on anybody's side but their own.

> It sucks to be writing tests for people who aren't on your side, but in this case there's nothing which can change that.

> Lesson learned. Now it's time to harden those tests up.

[0] https://qntm.org/emissions

GardenLetter27 · 2025-06-19T20:54:07 1750366447

It made the other tests fail, I wasn't using it in agent mode, just trying to debug the issue.

The issue is that it can happily go down the completely wrong path and report exactly the same as though it's solved the problem.