Note that this targets GFX11, which is RDNA3. Great for consumer, but not the enterprise (CDNA) level at all. In other words, not a "cuda moat killer".
I'm not the one who posted to HN but I am the project author. I'm working my way into doing multiple architectures as well as more modern GPUs too. I only did this because I used LLVM to check my work and I have an AMD GFX 11 card on my partners desktop (Which I use to test on sometimes when its free).
If you do have access to this kind of hardware and you're willing to test my implementations on it then I'm all ears! (You don't have too obviously :-) )