It was interesting to find out that Qwen 2.5 VL can output coordinates like Sonn... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		ewired 8 months ago \| parent \| context \| favorite \| on: Show HN: Magnitude – Open-source AI browser automa... It was interesting to find out that Qwen 2.5 VL can output coordinates like Sonnet 4, or does that use a different implementation?

anerli 8 months ago [–]

Both of them are "visually grounded" - meaning if you ask for the location of something in an image - they can output the exact x/y pixel coordinates! Not many models can do this, especially not many that are large enough to actually reason through sequences of actions well

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact