Author here. I work at Sanity, so I'm obviously looking at this through our lens, but I tried to write it as general advice for anyone building developer tools and haven't thought that much about agents yet.
The thing that prompted this: I kept having conversations with teams where the first question was "how do we write better skills files?" and I kept finding that the issues were upstream in our error messages, CLI surface, and so on. I increasingly feel that Skill files is your "last resort."
The Workbench experiment in the post is admittedly simplified to unpack the mental model; you would probably set up evals etc to actually test it, which we have started doing at Sanity as well.
Happy to answer questions and channel more of our experience with agents (heh) and designing for them.
I work with Rosti at Sanity and helped get the word out on this. The core problem is that logos vary wildly in aspect ratio, pixel density, and visual weight, so lining them up at the same height or width never looks right.
Rosti's approach uses three things: proportional normalization based on aspect ratio, pixel density analysis (so a dense wordmark doesn't visually overpower a thin logomark), and a center-of-mass calculation for optical alignment. The perceptual sizing math builds on Dan Paquette's earlier work.
Author's colleague here. Rosti wrote this after we kept running into the same problem across customer sites.
The core formula is Dan Paquette's proportional normalization (aspect ratio raised to a scale factor), but the density compensation and visual center-of-mass calculation are the interesting additions.
The visual center algorithm weights each pixel by sqrt(colorDistance) * (alpha / 255) to find where the "ink" actually lives vs. the geometric center. It's a subtle effect but makes a noticeable difference with asymmetric logos.
I struggled a bit with what to point to as signs that it's not an LLM conception. Someone else had commented on the headlines as something that was AI-like, and since I could easily imagine a writing process that would lead to headlines like that, that's what I chose. A little too confidently perhaps, sorry.
But actually, I think I shouldn't have needed to identify any signs. It's the people claiming something's the work of an LLM based on little more than gut feelings, that should be asked to provide more substance. The length of sentences? Number of bullet points? That's really thin.
I don't think people should be obligated to spend time and effort justifying their reasoning on this. Firstly it's highly asymmetrical; you can generate AI content with little effort, whereas composing a detailed analysis requires a lot more work. It's also not easily articulatable.
However there is evidence that writers who have experience using LLMs are highly accurate at detecting AI generated text.
> Our experiments show that annotators who frequently use LLMs for writing tasks excel at detecting AI-generated text, even without any specialized training or feedback. In fact, the majority vote among five such “expert” annotators misclassifies only 1 of 300 articles, significantly outperforming most commercial and open-source detectors we evaluated even in the presence of evasion tactics like paraphrasing and humanization. Qualitative analysis of the experts’ free-form explanations shows that while they rely heavily on specific lexical clues, they also pick up on more complex phenomena within the text that are challenging to assess for automatic detectors. [0]
Like the paper says, it's easy to point to specific clues in ai generated text, like the overuse of em dashes, overuse of inline lists, unusual emoji usage, tile case, frequent use of specific vocab, the rule of three, negative parallelisms, elegant variation, false ranges etc. But harder to articulate and perhaps more important to recognition is overall flow, sentence structure and length, and various stylistic choices that scream AI.
Also worth noting that the author never actually stated that they did not use generative AI for this article. Saying that their hands were on the keyboard or that they reworked sentences and got feedback from coworkers doesn't mean AI wasn't used. That they haven't straight up said "No AI was used to write this article" is another indication.
> Also worth noting that the author never actually stated that they did not use generative AI for this article.
I expect that they did in some small way, especially considering the source.
But not to an extent where it was anywhere near as relevant as the actual points being made. "Please don't complain about tangential annoyances,", the guidelines say.
I don't mind at all that it's pointed out when an article is nothing more than AI ponderings. Sure, call out AI fluff, and in particular, call out an article that might contain incorrect confabulated information. This just wasn't that.
Fair point, we appreciated that courtesy, and I should have reached out first. That was a miss on my part.
My thinking was that it became public pretty quickly once your post went viral (folks were already connecting the dots in the replies), and it felt awkward to respond to the substance without being direct about the context.
But you're right that a heads-up would have been the better move.
(I wrote the response) Just because it's marketing, doesn't mean it can also be educational?
I am a marketer and a developer. But I also know that you don't get far by trying to trick people into your product. As a marketer, I also get front row seat seeing how software plays out for a lot of businesses out there, and I have done so for a lot of years. I wanted to share those perspectives in response to Lee's write-up.
So yes, obviously both these pieces make a case for how the software we're employed by solves problems. And anyone who has been in developer marketing for a while knows that the best strategy is to educate and try to do so with credibility.
The thing that prompted this: I kept having conversations with teams where the first question was "how do we write better skills files?" and I kept finding that the issues were upstream in our error messages, CLI surface, and so on. I increasingly feel that Skill files is your "last resort."
The Workbench experiment in the post is admittedly simplified to unpack the mental model; you would probably set up evals etc to actually test it, which we have started doing at Sanity as well.
Happy to answer questions and channel more of our experience with agents (heh) and designing for them.
reply