More

nitros · 2026-03-30T20:20:44 1774902044

I'm suspicious of this prediction given the curl project's experiences...

jerf · 2026-03-30T20:30:20 1774902620

The people spamming curl did step one, "write me a vulnerability report on X" but skipped step two, "verify for me that it's actually exploitable". Tack on a step three where a reasonably educated user in the field of security research does a sanity check on the vulnerability implementation as well and you'll have a pipeline that doesn't generate a ton of false positives. The question then will rather be how cost-effective it is for the tokens and the still-non-zero human time involved.

MajesticHobo2 · 2026-03-30T20:33:26 1774902806

That was then, this is now. The new models are scarily good. If you're skeptical, just take an hour to replicate the strategy the article references. Point Claude at any open-source codebase you find interesting and instruct it to find exploitable vulnerabilities. Give it a well-defined endpoint if you want (e.g., "You must develop a Python script that triggers memory corruption via a crafted request") and see how well it does.

phyzome · 2026-03-30T21:20:03 1774905603

> That was then, this is now.

No, what we were seeing with curl was script kiddies. It wasn't about the quality of the models at all. They were not filtering their results for validity.

MajesticHobo2 · 2026-03-30T22:02:03 1774908123

It was definitely partially about model quality. The frontier models are capable of producing valid findings with (reasonably) complex exploit chains on the first pass (or with limited nudging) and are much less prone to making up the kinds of nonsensical reports that were submitted to curl. Compared to now, the old models essentially didn't work for security.

If those script kiddies had been using today's models instead and _still_ didn't do any filtering, a lot more of those bugs would have been true positives.

tptacek · 2026-03-30T20:21:47 1774902107

Everybody agrees that idiots were spamming curl with random just-plausible-enough-seeming output from old models.

tomjakubowski · 2026-03-30T20:32:30 1774902750

It sounds like what makes the pipeline in the article effective is the second stage, which takes in the vulnerability reports produced by the first level and confirms or rejects them. The article doesn't say what the rejection rate is there.

I don't think the spammers would think to write the second layer, they would most likely pipe the first layer (a more naive version of it too, probably) directly to the issue feed.

tptacek · 2026-03-30T20:42:13 1774903333

There are at least three differences:

* Carlini's team used new frontier models that have gotten materially better at finding vulnerabilities (talk to vulnerability researchers outside the frontier labs, they'll echo that). Stenberg was getting random slop from people using random models.

* Carlini's process is iterated exhaustively over the whole codebase; he's not starting with a repo and just saying "find me an awesome bug" and taking that and only that forward in the process.

* And then yes, Carlini is qualifying the first-pass findings with a second pass.

tomjakubowski · 2026-03-31T18:36:50 1774982210

Thanks, I hadn't considered the second point.

I guess the broader point I wanted to make is about the people responsible for the deluge of LLM-reported bugs and security vulnerabilities on countless open-source projects (not only on curl): they weren't considerate or thoughtful security researchers, they were spammers looking to raise their profile with fully automated, hands-off open source "contributions". I would expect that the spammers would continue to use whatever lowest common denominator tooling is available, and continue to cause these headaches for maintainers.

That doesn't mean frontier models and tooling built around them aren't genuinely useful to people doing serious security research: that does seem to be the case, and I'm glad for it.

wslh · 2026-03-30T20:27:08 1774902428

The problem is that you have all kind of "security spam" in the same way that social media is flooded by automatic, but on-topic, content. This doesn't mean that some very few reports are not correct.

One way to filter that out could be to receive the PoC of the exploit, and test it in some sandbox. I think what XBOW and others are doing is real.

gsnedders · 2026-03-31T05:54:21 1774936461

To some extent, that’s just the nature of the incentive structure — you pay money for exploits with no penalty for false reports? Just rely on monkeys typing reports till one happens to be correct.

nitros · 2026-03-17T22:09:05 1773785345

I don't see much motivation for fixing that when I can purchase a nrf52xx Bluetooth Beacon on aliexpress for €4 and flash it with firmware that pretends to be 50 different airtags, rotating every 10 minutes, and therefore bypassing all tracker detections.

extraduder_ire · 2026-03-18T00:08:52 1773792532

What's the battery life like on one of those?

nitros · 2026-03-18T10:51:48 1773831108

Months if the firmware properly sleeps.

nitros · 2026-03-08T23:30:03 1773012603

It's probably less damaging to recommend abusing Xanax than recommending using a LLM.

salicaster · 2026-03-10T12:11:07 1773144667

I solved my longstanding mental issues with one. Even therapeutic minimal doses of xanax will not do that.

Maybe you are right, though. Both are probably equally bad.

nitros · 2026-03-02T15:39:29 1772465969

Klopilot: Klo ~ Toilet Vibrierkot: Kot ~ Faeces

vedaba · 2026-03-02T17:09:31 1772471371

What is "Vibrierkot" supposed to sound like?

bluefirebrand · 2026-03-02T17:44:35 1772473475

Probably 'vibe code'?

vedaba · 2026-03-02T18:35:25 1772476525

Now I see it... Needed morning coffee to kick in

nitros · 2026-02-23T22:09:08 1771884548

How exactly does distilling a censored model produce an uncensored model?

nebezb · 2026-02-23T22:11:38 1771884698

It doesn't. Anthropic are, as usual, sounding an alarm to pull the ladder up from behind them.

janalsncm · 2026-02-23T23:53:42 1771890822

First of all this is not technically distillation, it is more imitation learning.

Second, you could do something like asking Claude to create 1 million prompt, offensive response, non offensive response triplets. Then train a model with DPO to prefer the offensive responses.

ncb9094 · 2026-02-23T23:21:50 1771888910

it technically can. there are patterns that emerge which manifest with no "safegurads" during training

nitros · 2025-11-07T18:05:42 1762538742

This is like saying Haskell doesn't have immutability because it has the state monad, or that rust doesn't because you can shadow with let.

Data is immutable and thats much more important than whether local variables can be modified imo.

nitros · 2025-10-18T13:15:44 1760793344

It's also becoming possible to have transparent rigid PCBs as of quite recently: https://www.pcbway.com/blog/News/Transparent_Rigid_PCBs_Laun...

Naturally it does mean you can't have a ground pour, so the PCB needs to be designed to look nice with it.

llbbdd · 2025-10-18T18:37:43 1760812663

This sounds sick, I looked for a photo and couldn't find one that seemed to be this specifically, just lots of the flexible ones, do you know if there's a photo ref somewhere?

nitros · 2025-09-26T20:09:54 1758917394

Also might be interesting to read about: https://embassy.dev/

nitros · 2025-09-16T11:48:42 1758023322

The following should fix that:

  defaults write org.gnu.Emacs NSAutoFillHeuristicControllerEnabled "NO"

kkylin · 2025-09-16T15:44:31 1758037471

Thanks! I'll give that a shot

nitros · 2025-09-13T07:55:41 1757750141

Eh, Emacs isn't really a system where broken behaviour goes unnoticed. It either works as you expect or something isn't working correctly.

eviks · 2025-09-13T08:15:34 1757751334

There is no magical system that can ensure no bug goes unnoticed

> or something isn't working correctly.

the issue isn't whether a bug exists, it's whether/when you notice this

resize2996 · 2025-09-13T14:42:10 1757774530

You're missing emacs insane observability and tight feedback loops.

eviks · 2025-09-13T14:45:07 1757774707

Why did that magic observability didn't help answer this simple question?

> Might this break some other deeply intertwined behavior someplace else? Probably

iLemming · 2025-09-15T16:31:42 1757953902

> why ... didn't help answer this simple question?

How do you know it didn't?

Someone may have observed it and fixed it someplace else, but from the point of OP it doesn't matter to them - their system still works "as desired". The world wasn't meant to be perfect for everyone, but you can certainly carve out your own "perfect" piece of it.

It seems you have only surface knowledge and little practice of using Emacs. You're not offering any evidence to your arguments and not suggesting any alternatives. Because there isn't a practical alternative that exists today to effectively replace Emacs. You're frantically trying to grasp for reasons not to use Emacs. Why are you looking for validation on an Emacs-themed thread? If you don't want to use Emacs, just don't. If you want to use Emacs but observing obstacles, then get more concrete, instead of complaining just for the sake of it.