It's basically only x86 among modern ISAs that lets you do base + literal + regi...

snvzz · on Aug 25, 2022

>and I believe RISC-V is similar

Yes it is. It was evaluated, carefully weighted and discarded, as it was not worth it.

kaba0 · on Aug 25, 2022

But it will still execute with likely no extra time at all due to OOE and how fast arithmetics are.

murderfs · on Aug 25, 2022

OOE doesn't necessarily save you if you end up with a hard dependency on the value of the read (and even if it did, the little cores on ARM SoCs are in-order). This is a pretty obvious candidate for macro-op fusion, but I'm not sure whether this actually happens (and if it happens on ARM little cores, etc.)

NohatCoder · on Aug 25, 2022

If it is not on the hot path, it is likely free, but not guaranteed. If it is on the hot path then it is wasting a whole cycle. And of course in highly ALU-dependent code it is another instruction, so a fraction of a clock.

kaba0 · on Aug 25, 2022

What do you mean it wastes a whole cycle? It may indeed have worse performance due to blowing the instruction cache, but I don’t see why would out-of-order execution be slower on the hot path - I doubt there would be too many hot paths without any dependence on memory fetches outside specific benchmarks - the memory loads will take significantly more time even if they hit cache.