I think you misunderstand what's fundamentally possible with AMD's architecture. They can't wave a magic wand for a CUDA compatibility layer any better than Apple or Qualcomm can, it's not low-hanging fruit like DirectX or Win32 translation. Investing billions into translating CUDA on raster GPUs is a dead end.
AMD's best option is a greenfield GPU architecture that puts CUDA in the crosshairs, which is what they already did for datacenter customers with AMD Instinct.
Let's say you put 50-100 seasoned devs on the problem, and within 2-3 years, probably get ZLUDA to the point where most mainstream CUDA applications — ML training/inference, scientific computing, rendering — run correctly on AMD hardware at 70-80% of the performance you'd get from a native ROCm port. Even if its not optimal due to hardware differences, it would be genuinely transformative and commercially valuable.
This would give them runway for their parallel effort to build native greenfield libraries and toolkits and get adoption, and perhaps make some tweaks to future hardware iterations that make compatibility easier.
And while compatibility layers aren't illegal, they ordinarily have to be a cleanroom design. If AMD knew that the ZLUDA dev was decompiling CUDA drivers to reverse-engineer a translation layer, then legally they would be on very thin ice.
ROCm is supported by the minority of AMD GPUs, and is accelerated inconsistently across GPU models. 70-80% of ROCm's performance is an unclear target, to the point that a native ROCm port would be a more transparent choice for most projects. And even then, you'll still be outperformed by CUDA the moment tensor or convolution ops are called.
Those billions are much better-off being spent on new hardware designs, and ROCm integrations with preexisting projects that make sense. Translating CUDA to AMD hardware would only advertise why Nvidia is worth so much.
> it would be genuinely transformative and commercially valuable.
Bullshit. If I had a dime for every time someone told me "my favorite raster GPU will annihilate CUDA eventually!" then I could fund the next Nvidia competitor out of pocket. Apple didn't do it, Intel didn't do it, and AMD has tried three separate times and failed. This time isn't any different, there's no genuine transformation or commercial value to unlock with outdated raster-focused designs.
This is a big part of AMD still not having a proper foothold in the space: AMD Instinct is quite different from what regular folks can easily put in their workstation. In Nvidia-land I can put anything from mid-range gaming cards, over a 5090 to an RTX 6000 Pro in my machine and be confident that my CUDA code will scale somewhat acceptably to a datacenter GPU.
This is where I feel like Khronos could contribute, making a Compute Capability-equivalent hardware standard for vendors to implement. CUDA's versioning of hardware capabilities plays a huge role in clarifying the support matrix.
...but that requires buy-in from the rest of the industry, and it's doubtful FAANG is willing to thread that needle together. Nvidia's hedged bet against industry-wide cooperation is making Jensen the 21st century Mansa Musa.
No I'm arguing with someone who clearly doesn't understand GPUs
> invest BILLIONS to make this happen
As I have already said twice, they already have, it's called hipify and it works as well as you'd imagine it could (ie poorly because this is a dumb idea).