They also trained 3B with 2 trillion tokens. > The number of training tokens is ...

craq · on Feb 29, 2024

And I was hoping to agree on this, but there is no 'SOTA StableLM-3b' with 2T tokens. Which is a big gap in the paper, because StableLM 3B is trained on 1T tokens for 4 epochs. And the benchmarks they report far exceed the benchmarks shown in the paper. You can find them in the official StableLM git and compare to the results in the paper https://github.com/Stability-AI/StableLM?tab=readme-ov-file#...

cs702 · on Feb 28, 2024

You're right. Thank you for pointing that out!