Hacker Newsnew | past | comments | ask | show | jobs | submit | eli's commentslogin

How do you know what the founders sincerely believe?

They said why they think it’s a sincere belief: past statements from before the AI hype cycle took off. I take it you have other evidence?

Things can change, and if you know pushing the metaphorical red button brings your company more attention, then you press that button everytime.

So if I claim I am a communist who doesn't want to ever get rich and then someone dangles a billion shiny dollars in front of me to just simply grab and own, you think I'd still be a communist then?

If you go around saying “I’m a communist, I believe in communism, I think it’s very important that we establish communism”? Sure, absolutely. Engels was pretty rich.

They're going around saying "Imagine no possessions. To help you, we'll take them all. Don't thank us".

Replace the cash with Apple or some other trillion dollar corporation and you're given the CEO's seat and voting control on the BoD. Can I be Tim Cook and preach communism and expect anyone to believe it?

If you were playing a text based game, wouldn't you try a few out?

I imagine there are a fair number of war games in the training data and not so many actual transcripts of internal military force deliberations.


The other "cheating" examples are even worse. It's wild to me that people keep designing benchmarks where the answer is lying around on disk or in the git history. "Hardening" the benchmark with strongly worded prompt instructions is bizarre. There are so many agent sandbox solutions. Why not use one and give it only access to the code it should see?

And I'm not sure how they can rule out other solutions also benefiting from being in the training data, just not reproduced exactly. Seems like it should focus on only CVEs from the last 30 days or something.


100%… the fact that they're just using prompting to discourage the agent from looking ahead in the Git history is wild.

To be fair, it is good to know that it disobeys simple instructions like "don't examine my git history" far more than other models. (It should of course be a different benchmark, so as not to conflate things.)

It's not a great sign for alignment.


Agreed, alignment is just a separate issue that a vuln fixing benchmark doesn't need to be testing.

Obviously they could just delete .git for their test if they wanted to. But consider telling the LLM not to use git commands the same as if you have keys in a .env file, and you tell the LLM not to read it, you might be concerned.

Every day I am more and more convinced that AI labs can't code.

The word mythos means roughly the same as "myth" and dates to 1753.

So it can write code to prevent the problem described?

[flagged]


What kind of "standard inbuilt anti injection code" are you referring to? Mysql_real_escape_string()?

Look up "prepared statements", it's pretty well documented.

How does this prevent prompt injection described in the article?

How does it prevent DDOSing and/or exposing the database from an injected prompt?


The user asks for details of the last transaction, the user gets back the amount, the source, and the description in a safely quoted format with the LLM never reading it.

You can't inject the LLM if it doesn't see the data.

An architecture like this won't work in many situations, but it can work for a lot of simple questions.

And if you want the LLM to summarize things, you run an isolated instance that makes a summary and you never show that summary to the LLM that's following the user's instructions.


You can do this, it is useful, but it's just not the same as where the goalposts are now which is: the AI is a person in a box and can do everything a person can.

If we actually limit them to "only accepts tiny ultra well defined problems and ultra well defined outputs" then theycease being a $10T/year idea and become a merely $10B/year idea.

Thus, it is not exactly popular at the moment.


> The user asks for details of the last transaction, the user gets back the amount, the source, and the description in a safely quoted format

What's "safely quoted format" when prompt injection is already safe in the description?

> You can't inject the LLM if it doesn't see the data.

How doesn't it see the data when you literally say "The user asks for details of the last transaction, the user gets back the amount, the source, and the description"?

> And if you want the LLM to summarize things, you run an isolated instance that makes a summary

And it will make a summary exactly how?


> How doesn't it see the data when you literally say "The user asks for details of the last transaction, the user gets back the amount, the source, and the description"?

The above post said how. The LLM writes code to do it. The code has a function to send text to the user. The LLM is not allowed to see the text, only the user is.

> And it will make a summary exactly how?

The second summarizing-only LLM is fed the raw data and allowed to output summary text. This is then sent directly to the user and put in a box with some hazard lines on it. The main LLM is not allowed to see the summary, only the user is.


Ah. Now I see you point. This might actually work for a number of situations.

I think this is a worthwhile argument, but you do it a disservice by spamming it in trollish comments

It's unethical to price it in a way not everyone can afford?

Like what?

There are many with subtle tells.

Not nearly as obvious as the ones from 6 months ago, but seems to be more the use of hyperbolic phrasing in a particularly unnatural way.

The assess/explain, then hyperbole at the end kind of structure.

Top comment looks suspicious from this perspective, but it's kind of a losing battle to be able to differentiate them with sufficient accuracy anyway


This is very reminiscent of the "everyone's a Russian bot" era of social media, where everyone would just lob that accusation at people without any real proof.

There is no way to prove, but what is definitely true is that many people are attempting to use LLMs on forums and otherwise.

So if you think none of these comments are written by LLMs, you're probably mistaken too.

In the end we accept that we can't tell anymore and move on (barring some biometric protocol that can't be gamed via automation)


Neat. The frontier models have gotten pretty impressive, but they're all a bit too slow for interactive, human-in-the-loop coding. It incentivizes vibecoding and running multiple agents in parallel. A fast agent feels more like a partner.

For a while I was running Cerebras GLM 4.7 for a bunch of tasks. Not a very smart model, but it's fantastic to be have a live prototype of a site up and be able to type "make the fonts bigger. No not that big" and see it change in real time. And MiMo 2.5 is a lot more capable than GLM 4.7.


> And MiMo 2.5 is a lot more capable than GLM 4.7

MiMo 2.5 is not the same model as MiMo 2.5 Pro.

GLM 5.1 is z.ai's lastest iteration & is one of the popular open weight coding models.

If you've had the chance, how does GLM 5.1 (which is now more expensive than MiMo 2.5 Pro after its recent 70% price drop) compare?


GLM 5.1 is very good. Definitely a contender for best open weight coding model. Nothing like 4.7.

But quite a bit more expensive than MiMo 2.5 Pro. Like 5x to 10x more on my little tests, at least by the API rates.


i tried glm 4.7 for agents that write code. simple scripts 200-1000 LOC. extremely bad . Had to abandon cerebras oferning, their smart models are only on enterprise plan.

glm 4.7 is quite old by now. I don't even use 5.1 anymore, cause I found kimi k2.6, mimi 2.5 pro, deepseek v4 pro and qwen 3.7 all better than glm 5.1

"Lying" is not supported by the evidence. In the context of bot traffic on the web, looking at only GETs for HTML is a reasonable approach. If you're counting all requests for all assets then a single page view of nytimes.com would count 100x as much as one for HN.

I would assume a lot of people running websites tend to think in pageviews, especially when dealing with bots because images and CSS files tend to be "cheap" static content but HTML requests are often dynamically generated.

It's also a single tweet that links to the data used to "disprove" it. Would be a weird way to lie.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: