Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I worked in one of the big labs when the first large models came out and I can pretty confidently say that nobody in the field predicted this. Sure, there were always people who said "let's make models bigger because why not, we have the infra and it'll be a good paper" but nobody expected them to become this good just by being bigger and using more data. The consensus was that they'd hit a ceiling of what they can do much sooner.


aren't humans trained on massive data sets, why not expect the same for digital intelligence?


Only some model architectures continue to get better as you pump in more data. Transformers and their variants have this property more so than prior architectures.


Large yes, but millions of orders of magnitude less.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: