Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

They also trained 3B with 2 trillion tokens.

> The number of training tokens is a crucial factor for LLMs. To test the scalability of BitNet b1.58 in terms of tokens, we trained a BitNet b1.58 model with 2T tokens following the data recipe of StableLM-3B [ TBMR], which is the state-of-the-art open-source 3B model.

> [..]

> Our findings shows that BitNet b1.58 achieves a superior performance on all end tasks, indicating that 1.58-bit LLMs also have strong generalization capabilities.



And I was hoping to agree on this, but there is no 'SOTA StableLM-3b' with 2T tokens. Which is a big gap in the paper, because StableLM 3B is trained on 1T tokens for 4 epochs. And the benchmarks they report far exceed the benchmarks shown in the paper. You can find them in the official StableLM git and compare to the results in the paper https://github.com/Stability-AI/StableLM?tab=readme-ov-file#...


You're right. Thank you for pointing that out!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: