Marcus might be biased but I don't think you're giving a good refutation, becaus...

Veedrac · on Aug 22, 2020

> It's a little bit like some sort of Chinese room, or asking a non-developer to answer you programming questions by looking like something that vaguely resembles your prompt and then picking the most upvoted answer on stackoverflow.

Except this isn't how it works. We know it can't be, because GPT-3 can do simple math, despite math being vastly harder with GPT-3's byte pair encoding (it doesn't use base-N, but some awful variable-length compressed format). These dismissals don't hold up to the evidence.

> GPT-3: "I have a lot of clothes"

Most people don't write “Yesterday I dropped my clothes off at the dry cleaner’s and I have yet to pick them up. Where are my clothes?” as a way to quiz themselves in the middle of a paragraph. The answer “At the dry cleaner's.” might be the answer you want, but it's a pretty contrived way of writing.

GPT-3 isn't answering your question, it's continuing your story. If you want it to give straight answers, rather than build a narrative, prompt it with a Q&A format and ask it explicitly.

Further, GPT-3's answers are literally chosen randomly, due to the high temperature and no best-of. You cannot select one answer out of a large such N to demonstrate that its assigned probabilities are bad, because that cherry-picking will naturally search for GPT-3's least favourable generations.

Barrin92 · on Aug 22, 2020

>because GPT-3 can do simple math

It can't actually, and again this is an example of the same issue. This was discussed earlier here[1]. Sometimes it produces correct arithmetic results on addition or subtraction of very small numbers, but again this is likely simply an artifact of training data. On virtually everything else it's accuracy drops to guesswork, and it doesn't even consistently get operations right that are more or less equivalent to what it just did before.

If it actually did understand mathematics, it would not be good at adding two or three digit numbers but fail at adding four digit numbers or doing some marginally more complicated looking operation. That is because that sort of mathematics isn't probabilistic. If it had learned actual mathematical principles, it would do it without these errors.

Mathematics doesn't consider of guessing the next language token in a mathematical equation from data, it consists of understanding the axioms of maths and then performing operations according to logical rules.

This problem is akin to the performance of ML in games like breakout. It looks great, but then you adjust the paddle by five pixels and it turns out it hasn't actually understood what the paddle or the point of the game is at all.

[1]https://news.ycombinator.com/item?id=23896326

Veedrac · on Aug 22, 2020

GPT-3's failure at larger addition sizes is almost fully due to BPE, which is incredibly pathological (392 is a ‘digit’, 393 is not; GPT-3 is also never told about the BPE scheme). When using commas, GPT-3 does OK at larger sizes. Not perfect, but certainly better than should be expected of it, given how bad BPEs are.

http://gptprompts.wikidot.com/logic:math

mlb_hn · on Aug 22, 2020

My thinking there wasn't because of BPEs, I think it's a graph traversal issue.

Isinlor · on Aug 22, 2020

If you give me a task of competing a story narrative, I find the following continuation to be quite likely:

> Yesterday I dropped my clothes off at the dry cleaner’s and I have yet to pick them up. Where are my clothes? I have a lot of clothes so I spend a lot of time looking for them.

Am I falling to actually understand what's going on? Or am I actually doing what I was supposed to do i.e. continue the narrative?