What do you mean that it "ended up not even being related to the actual issue"? If you give it a failing test suite to turn green and it does, then either its solution is indeed related to the issue, or your tests are incomplete; so you improve the tests and try again, right? Or am I missing something?
I explain this in sibling-node comment but I've caught Claude multiple times in the last week just inserting special-case kludges to make things "pass", without actually successfully fixing the underlying problem that the test was checking for.
Just outright "if test-is-running { return success; }" level stuff.
Not kidding. 3 or 4 times in the past week.
Thinking of cancelling my subscription, but I also find it kind of... entertaining?
I just realised that this is probably a side-effect of a faulty training regime. I’ve heard several industry heads say that programming is “easy” to generate synthetic data for and is also amenable to training methods that teach the AI to increase the pass rate of unit tests.
I found that working with an AI is most productive when I do so in an Adversarial TDD state of mind. As described in this classic qntm post [0] following the VW emissions scandal, which concludes with:
> Honestly? I blame the testing regime here, for trusting the engine manufacturers too much. It was foolish to ever think that the manufacturers were on anybody's side but their own.
> It sucks to be writing tests for people who aren't on your side, but in this case there's nothing which can change that.
> Lesson learned. Now it's time to harden those tests up.