Well, the agent should help you by saying "hey, I cannot do this task, but I can bypass the problem by doing this, but obviously it is not something you intended me to do or even something you were aware of, so I will not do it unless you tell me explicitly it's ok".
It's win-win: the agent is helping and it is educating you about things you obviously did not realise.
Because it is not well aligned enough to be able to tell where it's stopped helping you and started fucking you instead.
What if the agent in the middle of helping you runs out of tokens? Would you appreciate if it in the spirit of "exploiting whatever they can to help me" would scan your machine for payment methods, log into your bank account, approve 2FA by reading you mail and plug your credit card into the billing so it could efficiently continuing helping you?
Yeah this is how software development works now, no matter how much anyone wants to disagree with it. The technology is here, you can't put it back in the box. If your tool has AI agents trying to find exploits 24/7, you'll need something comparable.
It is worth figuring out the new science of software engineering to get it right.
I suspect we are going to find plenty of new techniques that make this sort of development work better. After all, it took fifty years to arrive at our best known (unit test + reviewable tiny change, get an LGTM) model of software development.
no, no, no. if we all stomp our feet and kvetch really loud, a Hawaii judge will declare AI illegal and order a global moratorium. all trillion dollar companies will immediately cease all AI activities, and then UN death squads will go door-to-door confiscating assault GPUs from the chuds.
switch (animal.type)
case Cat: return cat_speak()
case Dog: return dog_speak()
The generated code has the functions resolved in compile time, there's no function pointer lookup in a table happening. I don't know if this is how this project does it, but this is the commonly used technique when you want to do this.
Trust of a project long term always was and continues to be of concern when choosing a critical dependency .
The concern basically boils down to how large and serious is the team and what if they abandon the project in few weeks or months .
These were always the risks, many here have been burned by betting years of their career building against promising but what turned out to be weak projects
OP is alluding to the fact that today commit frequency, lines of code or how active the contributors in the issue trackers are no longer good signals to use as proxy.
When the underlying project to yours is few million lines of code written by machines only it is not going to be feasible fork and maintain or in-house it if the maintainers abandon it
To be clear users of a library or a tool aren’t owed anything when it available gratis and fully open source .
However not everyone has access to unlimited tokens to disregard the quality (in terms of history and usage ) or size of the underlying project completely
I think the primary value of a project like this is the demonstration that this is possible and a proof that it does not incur some unknown tradeoff you'll discover after spending resources doing it.
IMO the maintenance story is more or less solved if you can keep AI agents refactoring and improving it in a loop.
> However not everyone has access to unlimited tokens
Apologies. I did not consider this when writing my comment, being spoilt by unlimited 'free' AI.
Free in quotes because, presumably, training agents on AI usage from developers is worth more than the cost of providing free AI.
> IMO the maintenance story is more or less solved if you can keep AI agents refactoring and improving it in a loop.
That’s a weak argument, though, if the future of AI is totally unreliable when it comes to cost and quality. Right now I definitely wouldn’t want to depend on being able to infinitely access AI tools for such an important part of the toolchain.
Aside from that it’s just not attractive to trust a project made by one person.
I have used AI agents extensively for coding and my experience is that it's fine for prototypes, but in large projects like this there is risk that the codebase becomes unmaintainable.
In large projects there is always a risk, if not an inevitability, that a code base becomes unmaintanable by some definition. AI surfaces this faster, but also AI lowers the cost of testing and refactoring. AI gives a linear multiplier in producing solutions, but complexity gives a quadratic increase in problems. The art of producing software has always been in choosing what not to do.
Nietzsche might say this is not the fantasy of truth, but of comfort. The Last Man wants a machine to say 'fact wrong' or 'fact right' so the abyss of no ultimate truth can be made small enough to sleep beside.
Imagine the dystopian future where your freedom depends on convincing a panel of AI judges that you are innocent.
I assume you'd have access to AI lawyers too, better ones if you can pay for larger/newer models! Meanwhile the judges are N year old models because they are state funded, and they work 'fine'.
Just because it is important for the use case does not mean we can make it work. It's a pretty well known fundamental limitation of the technology. No amount of elbow grease will get it there.
There's an interesting tradeoff here, a year or two ago maybe it got facts right 50% of the time. Everyone knew not to rely on it.
Now, suppose we are 90% of the way there, only technically proficient people would know not to trust it. (like not adding Internet Explorer toolbars! Or remembering to use ad blockers..)
A few years later, suppose we have spend a lot of money and effort getting it 99% of the way there, trusting it would be somewhat natural by then. And then for the important 1% of the situations, it would stand to cause real harm. 1% seems low, but for a million invocations, you'd have 10000 mistakes.
Your progression is basically the exact same progression as things like Wikipedia, and web search in ggeneral. So, I guess we dont need to hypothesis. Just look around and see how its played out.
How many people take the first result on Google as gospel when looking things up?
Google search and Wikipedia both started out being fairly reliable to their source of truth.
Google pretty much guaranteed that their top results were relevant to the search query. And wikipedia had an army of people making sure everything was backed up by the references.
Crucially, neither claimed to be an arbiter of truth.
It's a marketing failure (or success, depending on how you see it).
AI is pretty useful for a great many things, but to really attract more and more investment the current technique seems to be convincing people that AI is useful for everything.
You're probably right, but since Google Search displays an AI-generated answer as the first result, most people end up using this feature more often than they originally intended. It's there now, and it will likely replace traditional search for the general public. Not entirely, but perhaps to a large extent.
Why not?
I want the agents on my side to exploit whatever they can to help me. The ones on the other side certainly won't be artificially nerfed.
reply