> For the current version, we are using a Mistral LLM (Mistral 7B) hosted within Mozilla’s GCP instance.
Based on this I would assume they are using GCP Vertex AI as that's going to be WAY cheaper than rolling it all themselves and hosting the model on a GCP server instance. I would also assume they'd be using the gcloud SDK for Vertex AI/Model Garden, which I believe means they can't just provide for a different endpoint and payload shape if you had a service elsewhere.
Eitherway, at the (presumed) scale they'll probably also be using GCPs API management service, so I would expect further abstraction between what the extension is sending and what the model/Vertex AI expects as a payload. This means providing that kind of "bring your own endpoint" experience would require more bespoke build-out time.
BUT who knows? Maybe this is just straight up hitting the out-of-the-box GCP Vertex AI REST API directly from the extension like some hobby project.
It's Google Cloud Platforms "AI" service[0], so actually more analogous to what is now called Azure AI Foundry[1], and what used to be called Azure OpenAI Studio.
Microsoft Copilot Studio[2] (formally Power Platform Power Virtual Agents) imho is unique in it's enterprise AI offering. I truly think Copilot Studio is going to be Microsoft's "killer app" when it comes to companies utilizing AI at scale internally, and not it's Azure service.
Based on this I would assume they are using GCP Vertex AI as that's going to be WAY cheaper than rolling it all themselves and hosting the model on a GCP server instance. I would also assume they'd be using the gcloud SDK for Vertex AI/Model Garden, which I believe means they can't just provide for a different endpoint and payload shape if you had a service elsewhere.
Eitherway, at the (presumed) scale they'll probably also be using GCPs API management service, so I would expect further abstraction between what the extension is sending and what the model/Vertex AI expects as a payload. This means providing that kind of "bring your own endpoint" experience would require more bespoke build-out time.
BUT who knows? Maybe this is just straight up hitting the out-of-the-box GCP Vertex AI REST API directly from the extension like some hobby project.