Hacker Newsnew | past | comments | ask | show | jobs | submit | fooker's commentslogin

> the presence of a security hole should not be seen as permission to exploit

Why not?

I want the agents on my side to exploit whatever they can to help me. The ones on the other side certainly won't be artificially nerfed.


Well, the agent should help you by saying "hey, I cannot do this task, but I can bypass the problem by doing this, but obviously it is not something you intended me to do or even something you were aware of, so I will not do it unless you tell me explicitly it's ok".

It's win-win: the agent is helping and it is educating you about things you obviously did not realise.


Because it is not well aligned enough to be able to tell where it's stopped helping you and started fucking you instead.

What if the agent in the middle of helping you runs out of tokens? Would you appreciate if it in the spirit of "exploiting whatever they can to help me" would scan your machine for payment methods, log into your bank account, approve 2FA by reading you mail and plug your credit card into the billing so it could efficiently continuing helping you?


I do not wish my Amazon delivery driver to show up in my living room.

Wow there is really is a relevant xkcd for everything!

Figuring out which commit broke what functionality is not something you can expect users to do.

No, but that's probably a required skill to have before you initiate claims as to what the cause of the loss of functionality was.

If you're willing to build from source it's not particularly difficult with git bisect

Yeah this is how software development works now, no matter how much anyone wants to disagree with it. The technology is here, you can't put it back in the box. If your tool has AI agents trying to find exploits 24/7, you'll need something comparable.

It is worth figuring out the new science of software engineering to get it right.

I suspect we are going to find plenty of new techniques that make this sort of development work better. After all, it took fifty years to arrive at our best known (unit test + reviewable tiny change, get an LGTM) model of software development.


no, no, no. if we all stomp our feet and kvetch really loud, a Hawaii judge will declare AI illegal and order a global moratorium. all trillion dollar companies will immediately cease all AI activities, and then UN death squads will go door-to-door confiscating assault GPUs from the chuds.

Yeah but what are the downsides?

You can just generate the 'vtable' as code :)

  switch (animal.type)
    case Cat: return cat_speak()
    case Dog: return dog_speak()

The generated code has the functions resolved in compile time, there's no function pointer lookup in a table happening. I don't know if this is how this project does it, but this is the commonly used technique when you want to do this.

Hmm yeah good point. I didn't think of it. It might even be cheaper to do this when the list of possible types are closed and few.

I am still inclined to believe AI just made up the documentation though, because this has its own tradeoffs.


> might even be cheaper to do this

Oh yeah, very often. Especially if it's a loop and resolves the same way very often.

Modern CPUs just blaze through code like this, after three decades optimizing for object oriented and dynamic languages.


Why go through LLVM at all?

There are a bunch of tools that JIT somewhat optimized assembly.


Yeah, a missed opportunity. But I'm sure such a compiler will eventually arise.

vibeware :)

This is how software development works now. We have to live with it.

The models are good enough that this works.

You can keep disagreeing for a while, but know that almost all the code in the industry is written like this now.


Trust of a project long term always was and continues to be of concern when choosing a critical dependency .

The concern basically boils down to how large and serious is the team and what if they abandon the project in few weeks or months .

These were always the risks, many here have been burned by betting years of their career building against promising but what turned out to be weak projects

OP is alluding to the fact that today commit frequency, lines of code or how active the contributors in the issue trackers are no longer good signals to use as proxy.

When the underlying project to yours is few million lines of code written by machines only it is not going to be feasible fork and maintain or in-house it if the maintainers abandon it

To be clear users of a library or a tool aren’t owed anything when it available gratis and fully open source .

However not everyone has access to unlimited tokens to disregard the quality (in terms of history and usage ) or size of the underlying project completely


I think the primary value of a project like this is the demonstration that this is possible and a proof that it does not incur some unknown tradeoff you'll discover after spending resources doing it.

IMO the maintenance story is more or less solved if you can keep AI agents refactoring and improving it in a loop.

> However not everyone has access to unlimited tokens

Apologies. I did not consider this when writing my comment, being spoilt by unlimited 'free' AI.

Free in quotes because, presumably, training agents on AI usage from developers is worth more than the cost of providing free AI.


> IMO the maintenance story is more or less solved if you can keep AI agents refactoring and improving it in a loop.

That’s a weak argument, though, if the future of AI is totally unreliable when it comes to cost and quality. Right now I definitely wouldn’t want to depend on being able to infinitely access AI tools for such an important part of the toolchain.

Aside from that it’s just not attractive to trust a project made by one person.


I have used AI agents extensively for coding and my experience is that it's fine for prototypes, but in large projects like this there is risk that the codebase becomes unmaintainable.

In large projects there is always a risk, if not an inevitability, that a code base becomes unmaintanable by some definition. AI surfaces this faster, but also AI lowers the cost of testing and refactoring. AI gives a linear multiplier in producing solutions, but complexity gives a quadratic increase in problems. The art of producing software has always been in choosing what not to do.

This is a very popular opinion that is sort of obsolete now in my opinion.

It was a valid concern last year. We have seen tremendous progress on this in the last 4-6 months.

Even if your initial prototypes are unmaintainable slop, the state of the art models are fairly good at refactoring and fixing things.


Not at all, I can assert that the Spring code on my current project is classical programming.

In many places AI tools aren't even allowed to touch customer repos.


I don't get why everyone is hellbent on getting LLMs to perform fact checking.

This is not the technology for it. Sure it might sorta kinda work in some circumstances. That doesn't make it a good fit.

Think of it like buying a refrigerator for storing clothes.


Nietzsche might say this is not the fantasy of truth, but of comfort. The Last Man wants a machine to say 'fact wrong' or 'fact right' so the abyss of no ultimate truth can be made small enough to sleep beside.

Imagine the dystopian future where your freedom depends on convincing a panel of AI judges that you are innocent.

I assume you'd have access to AI lawyers too, better ones if you can pay for larger/newer models! Meanwhile the judges are N year old models because they are state funded, and they work 'fine'.


People ask questions to get answers. For me, it feels quite important? Especially when search engines start to push them?

Just because it is important for the use case does not mean we can make it work. It's a pretty well known fundamental limitation of the technology. No amount of elbow grease will get it there.

There's an interesting tradeoff here, a year or two ago maybe it got facts right 50% of the time. Everyone knew not to rely on it.

Now, suppose we are 90% of the way there, only technically proficient people would know not to trust it. (like not adding Internet Explorer toolbars! Or remembering to use ad blockers..)

A few years later, suppose we have spend a lot of money and effort getting it 99% of the way there, trusting it would be somewhat natural by then. And then for the important 1% of the situations, it would stand to cause real harm. 1% seems low, but for a million invocations, you'd have 10000 mistakes.


Your progression is basically the exact same progression as things like Wikipedia, and web search in ggeneral. So, I guess we dont need to hypothesis. Just look around and see how its played out.

How many people take the first result on Google as gospel when looking things up?


Google search and Wikipedia both started out being fairly reliable to their source of truth.

Google pretty much guaranteed that their top results were relevant to the search query. And wikipedia had an army of people making sure everything was backed up by the references.

Crucially, neither claimed to be an arbiter of truth.


I don't recall any LLM provider claiming to be a arbiter of truth.


But people use it for that. So what's your point?

It's a marketing failure (or success, depending on how you see it).

AI is pretty useful for a great many things, but to really attract more and more investment the current technique seems to be convincing people that AI is useful for everything.


You're probably right, but since Google Search displays an AI-generated answer as the first result, most people end up using this feature more often than they originally intended. It's there now, and it will likely replace traditional search for the general public. Not entirely, but perhaps to a large extent.

Edit: corrected bad spelling with AI XD


Search and fact checking are different problems though.

LLMs are pretty decent at 'search' given the inherent knowledge compression, and some amount of inaccuracy is fine.


> Instead of 0-60 in 3 seconds, ...

The theoretical limit is 1.7s, which is already basically achieved by multiple gas and electric cars.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: