I found it to be the best model if you want to talk about topics philosophical. It has no problems going deep and technical while other models tend to be afraid of overshooting the comprehension of the reader.
the hardware you typed this on was designed by hardware architects that write little to no code. just types up a spec to be implemented by verilog coders.
They could run fine on the CPU too. But these are mobile devices, therefore battery usage is another significant metric. Dedicated hardware is more energy efficient than general hardware, and GPU in particular is a power-hog.
Exactly. It's the same thing as video or audio encoding and decoding. Sure the CPU could do it, potentially use the GPU, but having actual hardware encoders and decoders for the most common codecs will save a lot of energy.
Not if GPU RAM is a limiter. Which it is for most models.
Unified memory is a serious architectural improvement.
How many GPUs does it take to match the RAM, and make up for the additional communication overhead, of a RAM-maxed Mac? Whatever the answer, it won’t fit in a MacBook Pro’s physical and energy envelopes. Or that of an all-in-one like the Studio.
If the reason the LLM retroactively invents for it's previous mistakes is still useful for getting the LLM to not make that kind of mistake again, then the distinction you're driving at doesn't matter.
reply