I was confused too; it has a weird format that I think hurts comprehension rather than aids it by making you think you're not looking straight at the data.
Ignore the fact that the "Training Days" axis is drawn diagonally. The system is creating about 40 agents per day; by the end of day 14 it's made 610 or so. The graph shows, for any given time of training (vertical axis, going down), what is the distribution of trained agents that it's chosen in order to be unexploitable (you wouldn't want to choose rock all the time in rock-paper-scissors, for example). So, for example, at the end of day 14, it's using a selection of agents with numbers 595 through 610 or so, which means they've all been created within the last day.
I think it's to help illustrate the time dimension as going forward rather than something that goes up and down. And also to not measure the hills against some global X-axis. It is confusing.
0: https://deepmind.com/blog/alphastar-mastering-real-time-stra...