I worked in one of the big labs when the first large models came out and I can p... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		loveparade on Feb 11, 2023 \| parent \| context \| favorite \| on: Understanding Large Language Models – A Transforma... I worked in one of the big labs when the first large models came out and I can pretty confidently say that nobody in the field predicted this. Sure, there were always people who said "let's make models bigger because why not, we have the infra and it'll be a good paper" but nobody expected them to become this good just by being bigger and using more data. The consensus was that they'd hit a ceiling of what they can do much sooner.

tomr75 on Feb 11, 2023 [–]

aren't humans trained on massive data sets, why not expect the same for digital intelligence?

soraki_soladead on Feb 11, 2023 | | [–]

Only some model architectures continue to get better as you pump in more data. Transformers and their variants have this property more so than prior architectures.

mensetmanusman on Feb 13, 2023 | | [–]

Large yes, but millions of orders of magnitude less.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact