Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Super weird benchmarks


from what I gather it's finetuned to use OpenHand specifically so shows value on thsoe benchmark that target a whole system as a blackbox (i.e. agent + llm) more than directly target the llm input/outputs





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: