pmg0's comments

pmg0 · on Oct 15, 2024

Could you share some of those papers? I had a great discussion with Marc Fischer from the LMQL team [0] on this topic while at ICML earlier this year. Their work recommended decoding to natural language templates with mad lib-style constraints to follow that “happy path” you refer to, instead of decoding to a (relatively more specific latent) JSON schema [1]. Since you provided a template and knew the targeted tokens for generation you could strip your structured content out of the message. This technique also allowed for beam search where you can optimize tokens which lead to the tokens contain your expected strings, avoiding some weird token concatenation process. Really cool stuff!

[0] https://lmql.ai/ [1] https://arxiv.org/abs/2311.04954

pmg0 · on June 30, 2024

> But... if the browser model is being invisibly upgraded, how are we supposed to test out prompts and expect them to continue working without modifications against whatever future versions of the bundled model show up?

Pinning the design of a language model task against checkpoint with known functionality is critical to really support building cool and consistent features on top of it

However the alternative to an invisibly evolving model is deploying an innumerable number of base models and versions, which web pages would be free to select from. This would rapidly explode the long tail of models which users would need to fetch and store locally to use their web pages, eg HF's long tail of LoRA fine tunes all combinations of datasets & foundation models. How many foundation model + LoRAs can people store and run locally?

So it makes some sense for google to deploy a single model which they believe strikes a balance in the size/latency and quality space. They are likely looking for developers to build out on their platform first, bringing features to their browser first and directing usage towards their models. The most useful fuel to steer the training of these models is knowing what clients use it for