Yes, that's what happens when you code with AI and overlook things. I remember this specific bug vividly and that's why included it in the article. The bot was matching the implied probs from the odds for a game without matching the teams first. Sometimes bookmakers had Team A vs Team B, where Polymarket had Team B vs Team A for whatever reason and the code was not checking for this scenario. Not my proudest bug, to be fair..
True, but look at nuclear tests. There used to be around 50 tests every year, for decades. Now the only nuclear tests in the last 27 years were the six done by North Korea[1]. And there's still only nine countries with any nuclear weapons, and none in the past twenty years.
That's a bit better than just "it hasn't killed us yet".
I had a similar experience. I wanted to test it by asking it to summarise a scientific OMICs-related paper. It gave a warning about me potentially developing a bio-weapon or something like that. And switched back to Opus 4.8.
If you truly believe this, you've discovered a superpower over everyone else in the industry.
While everyone else is wasting time and money on the slower, more expensive models, you've found a way to outpace everyone for less money. Everyone else is wrong and you will get rich.
(I don't actually believe the premise is true, I'm just pointing out the logical conclusion to what you're saying so maybe we can reconsider the premise)
Senior Software Engineer with 10+ years of experience building AI-powered applications, cloud-native platforms, distributed systems, and enterprise software. Specialized in Python, TypeScript, AWS, GCP, Kubernetes, Terraform, LLM applications, RAG pipelines, and scalable microservice architectures. Experienced delivering production systems end-to-end, from architecture and infrastructure to deployment, observability, and AI integration. Open to Senior Software Engineer, Staff Engineer, Senior Backend Engineer, AI Engineer, Platform Engineer, DevOps Engineer, and Founding Engineer opportunities.
> The "offer, then remove" aspect is a bit eyebrow-raising -- it feels like they are trying to get subscribers to switch to usage-based billing, which makes me wonder if we'll ever get it after that June 22nd window.
It's getting to a point that it's offputting, and the next step would be to put it into "untrusted" bucket. Opus 4.7 already burned their credibility once, 2 more strikes remain.
> This is essentially a backdoor into all of your data.
No. Only if you would consider the Linux/macos/windows filesystem API a backdoor too. On your desktop any app with sufficient permissions can read all your data. Would you call that a backdoor?
Great article. I particularly enjoyed the approach to creating gibs.
Although it was a tech demo, I created something like this around the mid 90s. One thing I did that I don't see mentioned in this article was I used 8x8 (or 16x16) light maps on the textures, which allowed me to easily have things like flickering torches and rockets that lit up the hallways as they shot down them. Lightmaps can also be used to "bake in" lighting if desired.
Since the light map is "only" 8x8 you can afford to do some math on each luxel (each unit in the light map) to calculate distance and line of sight to light sources to determine a brightness value. When rendering the texture, the luxel was used with a lookup table to determine the actual color of the pixel being drawn. The light maps were updated 15 times a second if I recall correctly to help performance. Thanks to DJGPP, I was using inline assembly for the rendering. Since floating point math was slow at the time I used fixed point math which optimized well. The rendering was surprisingly performant on computers of the day.
I'm interested in how they work, but building anything like them, given the hardware I have, would be impractical. I've seen others use them, including to answer some questions I had, but the answers they gave were obvious, unhelpful, or wrong.
Even if they become more reliable, I like to understand and work things out for myself, rather than just be given the answer.
WG14 adopted variably modified types, a kind of dependent type. From a security standpoint it offers all the same qualities. It also in principle was easier to integrate from a backwards compatibility standpoint, with the exception of struct member analogs (which we now have but aren't yet standardized).
Maybe we would have been better off with Ritchie's counter proposal. But neither proposal was chiefly concerned with security, thus no proposals for, e.g., automatic bounds checking.
I've continued working on a tool for my daughter, our friends, and I to scan and index Pokemon cards. The tool is a phone app and website (https://MyBulkCards.com) The phone app uses the camera to scan a card and run the image through a couple of models, a record is written of the card along with the location. It's pretty basic, but I can store cards in boxes like “Box 1 AAA, Box 1 BBB, …” and find cards easy peasy. There’s also a friends feature so I can see what others have locally. We borrow cards from each other quite a bit.
It's been a super fun tool to build. The phone app just got approved in the Android app store. I have a bit of cleanup, but plan on releasing it soon.
Yet. But they are probably working with Chinese partners (including the government) on releasing something (maybe with Alibaba models instead of Google models, on a Chinese-local cloud rather than google cloud).
I just asked Fable to do a task that has nothing to do with cybersecurity or is dangerous at all but the defense kicked in and it switched to Opus... :(
I'm glad you made the comment because at the very least I learned a new German word (native English speaker and conversational in Spanish).
It's ironic to search for "alt meaning" and find a tertiary definition of "Pitched in the first octave above the treble staff; high" which would suggest more of the Spanish "alta" root rather than the Germanic root.
Now I'm curious how much origins are shared between Spanish and German.
Perhaps we can all agree English is a goofy language!
> In light of the ability of recent models to accelerate their own development, we’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design). Using Claude to develop competing models already violates our Terms of Service, but enforcing this restriction through our safeguards avoids accelerating the actors most willing to violate these terms.
Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user. Fable 5 will not fall back to a different model. Instead, the safeguards will limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT). These interventions will not affect the vast majority of coding work. We estimate they will impact ~0.03% of traffic, concentrated in fewer than 0.1% of organizations