Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Sure, here you go:

https://www.youtube.com/watch?v=hhiLw5Q_UFg

Summarizing:

1. Obviously you can't completely "solve" truthfulness because people disagree on what is and is not true. But you can go a long way.

2. The models do know what they don't know. Their level of uncertainty is not only expressed in the final token logprobs but also seems to be reified somehow, such that they can express their own level of certainty in words.

3. Many of the problems are introduced by subtle issues introduced during training which effectively teach the AI to guess. One example is by training on QA data sets where the answer is never "I don't know". RLHF introduces its own problems because the human trainers don't know what the model knows, so may reward it for getting a correct answer by guessing.

4. Many failures are caused by lack of access to information. Giving the LLM tools to let it search the web and access other databases can help a lot, and it's relatively easy to do.

5. In some cases you want it to guess. Like, it's much more useful when coding for it to spit out a screenful of code that has one or two minor errors, than to just refuse to try at all because the result might not be perfect.



This is complete technical hubris.

You're effectively saying that the "truth" problem with AI is solved by letting it guess. Which is probably a fine enough answer, but doesn't actually "solve" the problem at all.

To the points:

3) Training on data sets where the answer is never "I don't know" is effectively the same as raising children to believe they're always right and teaching them to be needlessly confident. No one should trust those folks.

4) "Lack of access to information" is not the issue... These models are trained on the entire corpus of Twitter, Wikipedia, and more information than I've ever seen in my lifetime, and some of them already do this (Bing) and produce little more than summaries of blog posts based on keywords. If anything, the issue is that an LLM lacks the real world knowledge to discern any nuance as to whether something is correct or grounded in reality.


I'm summarizing the talk, not giving opinions of my own. It is by an OpenAI researcher who is saying why they think GPT guesses so much and what they are doing about it.

I don't quite follow the rest of your post. Nobody is saying the solution to truth is to let it guess. At most, sometimes something not 100% perfect is preferable to nothing at all, but obviously only sometimes.

The point is to identify what in the training process is accidentally causing it to guess too often instead of admit when it doesn't know or is uncertain. Some of this bias comes from the nature of the data set. On the internet, people don't normally post "I don't know" as an answer to a question because that's useless and would be considered spam, but in conversation it's normal and desirable. In other cases they have QA datasets where the goal is to impart knowledge so every question has an answer, but this accidentally trains the model that questions always have answers. Human raters may accidentally reward guessing. And so on.

The talk goes in to what can be done to correct these biases.

Finally, in many cases where the models hallucinate it's because they can't look anything up. Yes they know a lot but just like a human this knowledge is compressed. So they make up references that sound plausible but don't exist for true facts, for example, because they can't check Google Scholar to find the right reference. This is exactly what you'd expect to see from a human who was forced to come up with everything off the top of their head. Think about how much programmers hate whiteboarding interviews, it's for the same reason. Giving LLMs tooling access does make a noticeably large difference.


Thanks for the link!

I've now taken the time to watch John's talk and I have some thoughts. It's not only difficult to solve truthfulness due to disagreement (and subjectivity), but it's also very difficult because different contexts have different standards of evidence.

In some contexts, like programming, we'd rather have the model output its best guess for what the program should be, no matter how low the confidence, because we would like a starting point and we can debug the program from there. The answer "I don't know how to write that program" is not a useful starting point and it may even be an example of the model withholding information it does have due to low confidence.

In other contexts, such as scientific or historical questions, we want a high standard of evidence. Asking the question "what year did Neil Armstrong land on Mars?" should not produce a hallucinated response with fully unhedged language complete with fictitious date of landing. This problem may be solvable by training the model to hedge or even to question the premise when the confidence is low. Of course, this also suffers from the garbage-in-garbage-out problem of having falsehoods buried in the training set.

A more subtle and difficult problem with scientific/historical questions is with long-form answers. Currently, models tend to produce long-form answers that fairly consistently contain a mixture of true facts and falsehoods, and it can be quite difficult for even expert readers to spot all of the mistakes every time. Furthermore, the human labellers were given very sophisticated tools for highlighting sentences in long-form output but the information this produced had to be reduced down to a single bit per example since the detailed information did not improve training very much.

Personally, I think it's going to be very difficult to teach the model how to recognize the appropriate contexts and associated standards. This is a very subtle problem and one of the issues is that it relies on information the model does not have access to, for example: the identity of the question-asker. If a child asks an astrophysicist about black holes they're going to get a different answer than if an undergraduate student asks the same question in class. Yes, this additional context can be included in the prompt, but at some point it becomes a pain to have to copy-and-paste the context for every prompt.

Perhaps people will create a tool to save this additional context in the form of presets but this imposes additional effort on humans. At some point I think the amount of human curating and feedback that goes into these models will cause a collapse and backlash. We saw the same thing happen in the early days of search engines, when Google (fully automated) trounced Yahoo (human curated), leading to Yahoo's abandonment of human curation. We also see the same problem manifest itself at the Patent Office, where human review is policy. The entire patent system has become grossly dysfunctional at least partly due to the overwhelming complexity of this problem.

One thing I really liked was the "inner monologue" of the model performing a sequence of steps to answer a question by doing a search. If this could be generalized to other tasks it could be a home run for automated assistants (Google/Alex/Siri).


Thanks for the great comment. I agree and many of the points you're making here were touched on by John or in the QA section at the end, with the exception I think of the insight about the identity of the questioner. I think that can be solved pretty easily though just by asking people to add their age and country when signing up for a ChatGPT account. They'll probably have to add age anyway for legal reasons.

For the rest, it rapidly turns into the same set of problems you face when evaluating the truthfulness of any human authored answer. This is going to turn into a fight between people who want a Star Trek style truth machine that is far better than humans at generating true claims, people who think they want such a machine but don't (plenty of unpleasant truths out there), and people who are satisfied with "decent human" level honesty and integrity. Or maybe AI can achieve slightly better than human, I think even just the current set of improvements and ideas is likely enough to get there and OpenAI have already made a lot of progress in training opinions out of their LLM. A lot of why people get riled up about truth is the common practice of stating opinions as facts. GPT-4 is extremely reluctant to take positions on anything, which may yield unsatisfying prose but is an obvious and good move for turning down the heat on truthfulness fights.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: