The problem: coding accelerates, I can't keep up with tests. Either I review everything and we're not moving faster, either I don't and don't trust them
The idea: executable specs
Write plain text tests, i.e. revive Cucumber/Behat (the "gherkin" language), BUT without the implementation step. AI run the tests in a browser, like human QA would.
I've been trying it out for 3 weeks:
1. serves as specs for new features
2. serves as acceptance criteria (Claude/Codex runs them after implementing)
3. serves as regression testing: run the full test suite every night
`npx exspec` runs Claude Code under the hood (with your existing Pro/Max subscription) with Playwright, all built-in. So using exspec is as easy as running PHPUnit/Pest/Vitest/… tests, nothing to install.
To be clear, it comes on top of unit tests, it doesn't replace them. It's great for covering happy paths and important business logic. It helps increase confidence, and also helps specifying new features.
I am cautious with letting AI "interpret" how to execute each step. But so far I'm pleasantly surprised :
- Claude seems to respect the steps (and not fake it)
- AI being very resilient, it's able to figure things out and recover (e.g. it needs to create a user but the email is already used -> no problem, it tries again with a different email address)
- It uses the website UI and nothing else, it's like a human QA -> it cannot edit files, cannot touch the database, or run code
- It reports failure with plenty of details, making it super easy to create issues from failures
I'm still careful and I want to scale up testing the idea, that's why I'm sharing publicly.
But I guess if performance isn't a critical use (or you just have so many invocations all the time in every data center that cold start isn't an issue), it's nice to be able to ~port~ copy and paste over your existing PHP code.
Their clients have very variable traffic patterns because they are in very different industries. They have traffic spikes at lunch, or at the end of every month for example:
> infrastructure must be able to scale and be resilient to accommodate various usage patterns. Whether it's a luncheon voucher transaction spike at lunchtime or a monthly batch of transactions by corporate clients
With your webhost, you have to setup and maintain the server, including patching/updating it, setting up SSL, etc.
I used to do that for side projects, but I've had too many servers, and maintaining them was a pain.
For something with high traffic, Lambda provides the instant scalability. Imagine servers autoscaling, but instantly. And again, without having to setup or maintain that.
> With your webhost, you have to setup and maintain the server
You're confusing a root server or a VPS with a webhost aka "shared hosting", where the hosting service takes care of Let's Encrypt certificates, and scheduled downtime for patching.
The idea: executable specs Write plain text tests, i.e. revive Cucumber/Behat (the "gherkin" language), BUT without the implementation step. AI run the tests in a browser, like human QA would.
I've been trying it out for 3 weeks:
1. serves as specs for new features 2. serves as acceptance criteria (Claude/Codex runs them after implementing) 3. serves as regression testing: run the full test suite every night
`npx exspec` runs Claude Code under the hood (with your existing Pro/Max subscription) with Playwright, all built-in. So using exspec is as easy as running PHPUnit/Pest/Vitest/… tests, nothing to install.
To be clear, it comes on top of unit tests, it doesn't replace them. It's great for covering happy paths and important business logic. It helps increase confidence, and also helps specifying new features.
I am cautious with letting AI "interpret" how to execute each step. But so far I'm pleasantly surprised : - Claude seems to respect the steps (and not fake it) - AI being very resilient, it's able to figure things out and recover (e.g. it needs to create a user but the email is already used -> no problem, it tries again with a different email address) - It uses the website UI and nothing else, it's like a human QA -> it cannot edit files, cannot touch the database, or run code - It reports failure with plenty of details, making it super easy to create issues from failures
I'm still careful and I want to scale up testing the idea, that's why I'm sharing publicly.
reply