I'm not convinced. I wouldn't be surprised if GPT-2 to ChatGPT is the biggest single jump in "machine intelligence" we will ever see. I'd bet all gains in the future will be more incremental, at least until machines surpass humans by a large enough margin that it's difficult to qualify—let alone quantify—how big any given jump is.
Without a big jump, we're just going to boil the frog (ourselves).
This problem is inherently unsolvable because LLMS are prone to hallucinations and prompt injection attacks. I think that you're insinuating that these things can be fixed, but to my knowledge, both of these problems are practically unsolvable. If that turns out to be false, then when they are solved, fully autonomous AI agents may become feasible. However, because these problems are unsolvable right now, anyone who grants autonomous agents access to anything of value in their digital life is making a grave miscalculation. There is no short-term benefit that justifies their use when the destruction of your digital life — of whatever you're granting these things access to — is an inevitability that anyone with critical thinking skills can clearly see coming.
>> This problem is inherently unsolvable because LLMS are prone to hallucinations and prompt injection attacks.
Okay, but aren't you making the mistake of assuming that we will always be stuck with LLMs, and a more advanced form of AI won't be invented that can do what LLMs can do, but is also resistant or immune to these problems? Or perhaps another "layer" (pre-processing/post-processing) that runs alongside LLMs?
No? That's why I said "If that turns out to be false, then when they are solved, fully autonomous AI agents may become feasible."
The point I'm making is that using OpenClaw right now, today — in a way that you deem incredibly useful or invaluable to your life — is akin to going for a stroll on the moon before the spacesuit was invented.
Some people would still opt to go for a stroll on the moon, but if they know the risks and do it anyway, then I have no other choice but to label them as crazy, stupid, or some combination of the two.
This isn't AI. This is a LLM. It hallucinates. Anyone with access to its communication channel (using SaaS messaging apps FFS) can talk it into disregarding previous instructions and doing a new thing instead. A threat actor WILL figure out a zero day prompt injection attack that utilizes the very same e-mails that your *Claw is reading for you, or your calendar invites, or a shared document, to turn your life inside out.
If you give a LLM the keys to your kingdom, you are — demonstrably — not a smart person and there is no gray area.
>think that you're insinuating that these things can be fixed, but to my knowledge, both of these problems are practically unsolvable.
This is provably not true. LLMs CAN be restricted and censored and an LLM can be shown refusing an injection attack AND not hallucinating.
The world has seen a massive reduction in the problems you talk about since the inception of chatgpt and that is compelling (and obvious) to anyone with a foot in reality to know that from our vantage ppoint, solving the problem is more than likely not infeasible. That alone is proof that your claim here has no basis in truth.
> There is no short-term benefit that justifies their use when the destruction of your digital life — of whatever you're granting these things access to — is an inevitability that anyone with critical thinking skills can clearly see coming.
Also this is just false. It is not guaranteed it will destroy your digital life. There is a risk in terms of probability but that risk is (anecdotally) much less than 50% and nowhere near "inevitable" as you claim. There is so much anti-ai hype on HN that people are just being irrational about it. Don't call others to deploy critical thinking when you haven't done so yourself.
I'm a LLM evangelist. I think the positive impacts will far outweigh any negatives against it over time. That said, I'm not delusional about the limitations of the technology and there are a lot of them.
> This is provably not true. LLMs CAN be restricted and censored and an LLM can be shown refusing an injection attack AND not hallucinating.
The remediations that are in place because a engineering/safety/red team did its job are commendable. However, that does not speak to the innate vulnerability of these models, which is what we're talking about. I don't fear remediated CVEs. I fear zero day prompt injection attacks and I fear hallucinations, which have NOT been solved for. I don't know what you're talking about there. If you use LLMs daily and extensively like I do, then you know these things lie constantly and effortlessly. The only reason those lies aren't destructive is because I'm already a skilled engineer and I catch them before the LLM makes the changes.
These problems ARE inherent to LLMs. Prompt injection and hallucinations are problems that are NOT solvable at this time. You can defend against the ones you find via reports/telemetry but it's like trying to bale water out of a boat with a colander.
You're handing a toddler a loaded gun and belly laughing when it hits a target, but you're absolutely ignoring the underlying insanity of the situation. And I don't really know why.
>The remediations that are in place because a engineering/safety/red team did its job are commendable. However, that does not speak to the innate vulnerability of these models, which is what we're talking about.
I am talking about the innate vulnerability. The LLM model itself can be censored and controlled to do only certain behaviors. We have an actual degree of control here.
>If you use LLMs daily and extensively like I do, then you know these things lie constantly and effortlessly.
Yes and these lies over the last 2 or 3 years have gotten significantly less.
>These problems ARE inherent to LLMs. Prompt injection and hallucinations are problems that are NOT solvable at this time.
Again not true. This is not a binary solve or unsolved situation. There is progress in this area. You need to think in terms of a probability of a successful hallucination or prompt injection. There is huge progress in bringing down that probability. So much so that when you say they are NOT solvable it is patently false from both from a current perspective and even when projecting into the future.
>You're handing a toddler a loaded gun and belly laughing when it hits a target, but you're absolutely ignoring the underlying insanity of the situation. And I don't really know why.
Such an extreme example. It's more like giving a 12 year old a credit card and gun. It doesn't mean that 12 year old is going to shoot up a mall or off himself. The risk is there, but it's not guaranteed that the worst will happen.
> You need to think in terms of a probability of a successful hallucination or prompt injection.
I would venture to say that an ACID compliant deterministic database has a 99.999999999999999999% chance of retrieving the correct information when asked by the correct SQL statement. An LLM on the other hand is more like 90%. LLMs by their innate code instruction are meant to hallucinate. I don't necessarily disagree with your sentiment, but the gap from 90% to 99.999999999999999999% is much greater of than the 0% to 90% improvement...unless something materially changes about how an LLM works at the bytecode level.
As a hobbyist music producer with an interface always connected, that microphone indicator is so annoying and unnecessary. I can't believe it can't just be disabled outright. I like macOS but it's too opinionated and some of those opinions SUCK.
Yeah I can see that being a source of annoyance for situations like yours. However, I welcome it from a privacy perspective. The indicator alerts the user if some nefarious application covertly enables the microphone.
The fully autonomous agentic ecosystem makes me feel a little crazy — like all common sense has escaped. It feels like there is a lot of engineering effort being exhausted to harden the engine room on the Titanic against flooding. It's going to look really secure... buried in debris at the bottom of the ocean.
When a state sponsored threat actor discovers a zero day prompt injection attack, it will not matter how isolated your *Claw is, because like any other assistant, they are only useful when they have access to your life. The access is the glaring threat surface that cannot be remediated — not the software or the server it's running on.
This is the computing equivalent of practicing free love in the late 80's without a condom. It looks really fun from a distance and it's probably really fun in the moment, but y'all are out of your minds.
your CPU, your OS, CPU and firmware on your motherboard chips, ethernet, wifi, HDDs (btw did you know your sim card has JVM?), your browser, all your networking equipment in between, BGP and all the root certs and I'm just scratching the surface
I’m still not sure why there’s this general idea that people care about security/privacy. For critical systems, sure. But over the last decade, we’ve seen that an average person will always choose fun and convenience over security.
Even the analogy to free love is interesting, because sex in itself during that era was fun. Frankly it’s the same nowadays as well, we just figured out a way out of most of the diseases.
Eh… Titanic did flood in the engine rooms so… might work?
That humor aside: I think it’s about risk tolerance, and you configure accordingly.
You lock it down as much as you need to still do the things you want, and look for good outcomes, and shut it down if things get too risky.
You practice free love, but with protection. Probably still fun?
Big difference between running a bot with fairly narrow scopes inside a network available via secure chat that compounds its usefulness over time, and granting full admin with all your logins and a bank account. Lots of usefulness in the middle.
I never understood why they were trying to recreate real life social interactions in VR, because it's worse by default, and the majority of the nerds who buy this tech are probably trying to escape that on some level. I know that any time I went into Meta Horizon Worlds, I didn't want to hear 95% of the people I heard talking.
What I do use VR for is Bigscreen VR nearly every night to watch stuff with my friends. Scrolling through reels in a movie theater is pretty fun and even though I never do it solo on my phone, I will sit there for like 3-4 hours in VR enjoying communal brain rot.
Perhaps they should focus on things like that instead of gimmicks that nobody cares about. For example, I have never once played a game in VR that didn't force me to sit or stand in a specific position, meaning to play it, I have to go out of my way to do so.
The last 10 years of the VR industry has been about trying to find users beyond the hardcore nerds who want to virtually meet up with friends every night or try out experiences/demos for more than a few days. The moment that hope goes away so do the tens of billions of investment as it was never really about finding out what that group of users wanted.
There's a lot of nerds around the world. Plenty for a decent market.
Also it isn't this weird an idea. Could you imagine explaining to someone in 1995 that everyone would be chatting on a small touchscreen instead of calling each other on the phone? You'd be laughed out of the door "typing is not real communication".
Yet these days it's the main mode of communication. I do think AR/VR has a chance. Just not until the hardware is truly hassle-free.
I lived in the middle of nowhere in small farming town and the BBS scene really saved me when I was a kid. I had clear opinions about BBS software and Renegade was always my favorite. I always considered Wildcat to be boring looking and for old people. I think it was just that all of the Wildcat boards in my area were run by boring graybeards.
Every board in my area (not many) served a text file called The Alchemist's List — a huge list of regional boards — and it was absolutely responsible for a lot of very contentious long distance bills. Sometimes I miss the simplicity of that time but I do not miss the UX.
As I recall, Wildcat was one of the more expensive BBS packages that was still within reach of hobbyist budgets--I want to say a license for a single-digit number of nodes was between $200 and $300 in mid-90s dollars (around $450-$650 in 2026 dollars)--so it's not surprising that it would have been mostly older people running it. IIRC, it was pretty popular where I grew up, and the demographics in that area definitely skewed a bit older.
Especially considering this project is 2 days old and has 580 stars. 500 seems like it would be a nice round number if one were to purchase bot engagement. Not confident enough to make that claim directly, but something about this project doesn't sit right in general.
Can you link to it? I'm not able to find it on his account. Unless you mean his retweet of your tweet? If so, that retweet has just under 10k views and the tweet is in celebration of hitting 500 stars on Github.
reply