Actually, leaving out whether “safety” is inherently “bullshit” [0], it is both,...

Actually, leaving out whether “safety” is inherently “bullshit” [0], it is both, Llama Guard is a model, serving a similar function to the OpenAI moderation API, but in a weights-available model.

[0] “AI safety”, is often, and the movement that popularized the term is entirely, bullshit and largely a distraction from real and present social harms from AI. OTOH, relatively open tools that provide information to people building and deploying LLMs to understand their capacities in sensitive areas and the actual input and output are exactly the kind of things people who want to see less centralized black-box heavily censored models and more open-ish and uncensored models as the focus of development should like, because those are the things that make it possible for institutions to deploy such models in real world, significant applications.