The really big problem with LLMs for general AI usage is the requirement of text input - this is a very limited way to represent ideas and intelligence.
Our brain does not think in text, it just spurts out text as a side effect.
Agreed that underneath that perceivable layer, there's lots else going on too. But to be fair, there's also lots going on the weights of GPT-4, e.g. see https://arxiv.org/abs/2310.02207
Really, they don't require text they require tokens, and they can be built with toolchains that turn aby kind of input into tokens. They are called LLMs because they are initially trained on a large bosy of text with all or some of the available tokens mapped to text, but they can also be trained with mappings of other inputs into tokens (either distinct or overlapping the tokens mapped to text), and then you end up with a multimodal toolchain.
But even if they did require text, that's how a large portion of communication from work-issuer to human workers, and vice versa, goes, too, so even if it was true it wouldn't be all that limiting; we don’t communicate directly mentally withbother humans bypassing text.
Our brain does not think in text, it just spurts out text as a side effect.