Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Understanding Large Language Models – A Transformative Reading List (sebastianraschka.com)
81 points by mariuz on Feb 11, 2023 | hide | past | favorite | 16 comments


Yeah, I was looking for a book about this topic, but there doesn't seem to be anything out there except for research articles, it is just too new.

When I look at the descriptions of the papers, it sounds incredibly complicated, and at the same time incredibly trivial. Clearly this whole domain is not well enough understood yet to be explained properly. Or maybe there is no interest yet in clear and concise explanations.


The description of the attention mechanism of GPT architectures and a couple of examples can be very brief. Then you have to supply your imagination and realize that the model simply found a whole bunch of very effective such attention measures by itself, which are all computed for every query, and they could be anything from the straight forward examples, or some more clever abstract types of attention. I think we'll need some more descriptions of what the more important attention measures it comes up with really are, to understand better.


That’s how blockchain was a few years back. Now everyone and their uncle has a “how blockchains work” article or video


And yet I feel like I know less at this point than I did then.


> Large language models have taken the public attention by storm – no pun intended.

As far as I can tell, no pun made, either!


To make a pun in this domain, attention is all you need!


"Attention" is the would be pun, I think.


The pun is in "attention" because GPT uses "attention" to weigh each input token and comes up with an attention score between whatever token is currently being generated and all the input tokens then it'll take those scores to determine what weight each contributes to the output. Something along those lines... I'm no GPT expert.


Did anyone predict the unreasonable effectiveness of LLMs?

We seemed to have stumbled upon something quite big as humans…


I worked in one of the big labs when the first large models came out and I can pretty confidently say that nobody in the field predicted this. Sure, there were always people who said "let's make models bigger because why not, we have the infra and it'll be a good paper" but nobody expected them to become this good just by being bigger and using more data. The consensus was that they'd hit a ceiling of what they can do much sooner.


aren't humans trained on massive data sets, why not expect the same for digital intelligence?


Only some model architectures continue to get better as you pump in more data. Transformers and their variants have this property more so than prior architectures.


Large yes, but millions of orders of magnitude less.


Perhaps Monica Anderson understood early? https://experimental-epistemology.ai/the-red-pill-of-machine...


Or maybe language just isn't that complex at all?

Same with pixels?

Dunno


We stumbled upon a mirror of our own cognitive bias.

Another chapter in the era of Narcissus and Echo.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: