They claim that a small Chinese hedge fund could acquire $1bln in GPUs, with no state support, including many sanctioned chips, then trained a model optimized for a far smaller server compute size, and that they have a source at this very small fund who is willing to admit to export violations. A 40bln param active model is exactly the size you would expect from a server of the size they claim.
What’s more likely - that semianalysis made it up like they have a bunch of other things, or that all the above is true?