CPU and GPU have very different ways of scheduling instructions, requiring somehow different interfaces and programming models.. I'd hazard to say that a GPU and CPU with unified memory access (like the Apple's M series, and most mobile chips) is already such a consolidated system.
CISC only survived because CPUs now dedicate a ton of silicon to decoding the CISC stream into RISC-y microcode. RISC CPUs can avoid this completely, but it turns out backwards compatibility was important to the market and the transistor cost of "instruction decode" just adds like +1 pipeline depth or something.
> CISC only survived because CPUs now dedicate a ton of silicon to decoding the CISC stream into RISC-y microcode.
For Intel CPUs, this was somewhat true starting from the Pentium Pro (1995). The Pentium M (2004) introduced a technique called "micro-op fusion" that would bind multiple micro-ops together so you'd get combined micro-ops for things like "add a value from memory to a register". From that point onward, the Intel micro-ops got less and less RISCy until by Sandy Bridge (2011) they pretty much stopped resembling a RISC instruction set altogether. Other x86 implementations like K7/K8/K10 and Zen never had micro-ops that resembled RISC instructions.
> CPUs now dedicate a ton of silicon to decoding the CISC stream into RISC-y microcode.
In absolute terms, this is true. But in relative terms, you're talking less than 1% of the die area on a modern, heavily cached, heavily speculative, heavily predictive CPU.
I hadn't heard that, but certainly, there must have been many times when Intel held the crown of "biggest working hunk of silicon area devoted to RAM."
> It will just take on the appropriate functionality to keep all the compute in the same chip.
So, an iGPU/APU? Those exist already. Regardless, the most GPU-like CPU architecture in common use today is probably SPARC, with its 8-way SMT. Add per-thread vector SIMD compute to something like that, and you end up with something that has broadly similar performance constraints to an iGPU.