Hacker Newsnew | past | comments | ask | show | jobs | submit | holbrad's commentslogin

You are misreading the benchmark.

https://artificialanalysis.ai/#aa-omniscience-hallucination-...

If you look at the results 3.0 hallucinates an awful lot, when it's wrong.

It's just not wrong that often.

(And it looks like 3.1 does better on both fronts)


This isn't actually correct.

Gemini 3.0 gets a very high score because it's very often correct, but it does not have a low hallucination rate.

https://artificialanalysis.ai/#aa-omniscience-hallucination-...

It looks like 3.1 is a big improvement in this regard, it hallucinates a lot less.


Yes and no. The hallucination rate shown there is the percentage of time the model answers incorrectly when it should have instead admitted to not knowing the answer. Most models score very poorly on this, with a few exceptions, because they nearly always try to answer. It's true that 3.0 is no better than others on this. By given that it does know the correct answers much more often than eg. GPT 5.2, it does in fact give hallucinated answers much less often.

In short, its hallucination rate as a percentage of unknown answers is no better than most models, but its hallucination rate as a percentage of total answers in indeed better.


I'm just genuinely really confused by this take.

It's a 100x easier to build products today than it was in the 1990's. (I don't think that's an exaggeration in the slightest)

It would be basically be impossible to build anything like Maps, Excaidraw, Chat GPT etc.

Arguably people are reaching for the tools without those interactive requirements ?


There's an awful lot of people running these tools in YOLO mode.

https://embracethered.com/blog/posts/2025/the-normalization-...


>isn't going anywhere

That's not true at all, Reform could get in and remove it day 1!


There generally aren't underfed people in the US. This just simply isn't true.

The opposite is a far bigger issue.


> There generally aren't underfed people in the US. This just simply isn't true. > > The opposite is a far bigger issue.

I'm sorry but what's the basis for this claim?


I'm not the person you asked, but I assume their basis is that the majority of the Adult US Population is overweight or obese.[1]

However, we're conflating the related problems of hunger, food insecurity, and malnutrition. Food insecurity at its most extreme will result in hunger (a lack of any food), but the affordable food that is available in food deserts (and at food banks) is often ultraprocessed and incompletely nutritious, which can lead to obesity.[2]

Largely, Americans don't seem to be affected by "hunger" as defined by the United Nations Food and Agriculture Organization[3], but are very affected by malnutrition and food insecurity (as defined by that same body).

1: https://www.niddk.nih.gov/health-information/health-statisti... 2: https://pmc.ncbi.nlm.nih.gov/articles/PMC9790279/#jhn12994-s... 3: https://en.wikipedia.org/wiki/Hunger#Definition_and_related_...


But it's undeniable that culture and how that's reflected in governance have a huge impact: South vs North Korea.


> But it's undeniable that culture and how that's reflected in governance have a huge impact: South vs North Korea.

Considering that north and south korea share the same "culture", wouldn't they be an example against your assertion?

Isn't the bigger impact that one is sanctioned by the world's sole superpower and the other is not?


Yeah, also it shows the comment is ignorant of history.

In the immediate aftermath of the Korean war, the North was actually more prosperous than the South. That changed with time, dramatically so, but initially it'd be reasonable to see the north as having better economic prospects.


I gave the table of inputs and outputs to both Gemini 3.0 flash and GPT 5.2 instant and they were stumped.

https://t3.chat/share/j2tnfwwful https://t3.chat/share/k1xhgisrw1


When I was writing this, GPT 5.1 was the latest and it got it right away. It’s the sequence of prime numbers fwiw :)


What is the function supposed to be? It’s not Celsius to Farenheit. (2C=35F, 206C=406F, …)


I thought it was pretty settled that it was social contagion similar to other mental illnesses in the past.


I know Andrej Karpathy mentions it in his youtube series so there's a good chance of it being true.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: