I was hired in my current role to replace one such professor who was fired because he insisted on giving a majority of his class failing grades. And honestly, it was the right call; he was being kind of unprofessional about it. He was teaching a very difficult subject -- C++ -- as an expert, and then getting mad that people weren't also experts at C++ within 3 months. So I agree that professors should have more control and authority over their classes, but also at the same time those professors who fail large swathes of their classes can be really unpleasant.
Tbf the OPs blog and comments (including their sibling to your comment) are also heavily anecdotal.
> I’ve called November 2025 the November inflection point because that was when GPT-5.1 and Opus 4.5, combined with their respective coding agent harnesses, got good—good enough that we’ve spent the last six months adapting to agent systems that can reliably get useful work done.
Claiming a grand inflection point based on your own personal usage is very anecdotal.
If that were it I would absolutely agree with you. But this experience maps exactly to adoption trends. My job in the last 6 months has become so unrecognizeable to me it’s insane, the adoption at the very least at large companies is truly truly incredible, and it really does coincide with the quality of opus 4.5 (which has now been surpassed).
"Adoption trends" are just herd behavior which may or may not be driven by compelling anecdotes and may or may not be evidence of something more. I'm just saying it seems wrong to dismiss the post the way you did when the OP in question and your own post here are just more anecdotes.
No, if that were really true you wouldn’t see what you’re seeing today. You wouldn’t see entire companies completely retooled and refactored around these tools. You would see the mistake of “this is actually just herd behavior”, which involves such a colossal amount of impact to these companies entire stack and bottom line, resulting in systemic collapse. You don’t see that. Company leadership are not some idiot class of people, I don’t know why this is people’s prior. If companies get adoption wrong in either direction they are completely screwed. So you’re seeing people putting money where their mouth is, across the board.
Compelling anecdotes are not even the main source of evidence. Look at the enormous body of work on measurement of these systems. I always point people to epoch capability index as a good summary statistic of capabilities or METRs time horizon data which has now been topped out. They had a recent updated to the dataset, after which the corrected plots pointed to an even faster acceleration than before.
> You wouldn’t see entire companies completely retooled and refactored around these tools.
That's exactly what I'd expect people who are driven by hype and FOMO and YOLO and anecdotal evidence to do.
> resulting in systemic collapse.
Many people are noting the system is collapsing. Maybe it's not going as quickly as you expect, but there's definitely evidence of this from increased service outage frequency, billion dollar notes being passed in a circle between companies, open projects refusing AI contributions entirely because they're overwhelmed by crap, Sam Altman begging governments to force citizens to buy their product through "universal basic compute", etc.
> Look at the enormous body of work on measurement of these systems.
It's certainly possible to measure anything. Benchmarks are a form of evidence but they famously a) don't represent reality and b) can be easily gamed.
> That's exactly what I'd expect people who are driven by hype and FOMO and YOLO and anecdotal evidence to do.
Not at this scale.
> Many people are noting the system is collapsing.
On HN? any piece of evidence to support this? service outage frequency is not a sign of systemic collapse. billion dollar notes passed in a circle is brought up a lot and misunderstands how finance works. "open projects refusing AI contributions entirely because they're overwhelmed by crap" is not a systemic collapse, its not being able to adapt to a new world with new challenges. Btw "slop" is getting less and less sloppy.
> Sam Altman begging governments to force citizens to buy their product through "universal basic compute", etc.
Very interested in some citation detail that sounds like a headline quote of something more complex.
> It's certainly possible to measure anything. Benchmarks are a form of evidence but they famously a) don't represent reality and b) can be easily gamed.
I mean I work on benchmarks for a living I can tell you both of these things are true but only partially, and in aggregate they all tell a consistent story. Not to mention, static OSS benchmarks are not what these companies rely on. They have live traffic, ability to run A/B tests, full conversation traces, to ignore this is pretty incredible.
My point was claiming a broad inflection point based on your own personal usage is not "evidence driven", it's anecdote-driven. It's hard to disprove any claim you made because you didn't really make one that's disprovable, and your opinion on it now is still just an opinion.
Those have been around for a long long time, you may be focusing on anecdotes but the adoption numbers and performance trends speak for themselves and we’ve had performance trends for years. People can argue about whether or not enterprise level adoption has a clear ROI today but the fact that we’re at the point where entire large scale companies are already completely refactored, directly after opus 4.5, if that’s not a convincing enough signal I don’t know what is.
I think we're in agreement then; the point I was responded to was saying your blog was evidence-driven, and we can both agree it's not -- at least to the standard that would pass peer-review.
I started using an agent (Codex) on my repo and it went from a a few dozen clones to thousands (3383 this week). I dunno what the agents are doing to clone the repo so many times -- I'm not running 3000 agents or prompts, maybe 10 or so this week. But if this is typical, a 1000x increase in usage across the board can't be good on the system.
I agree with this. I've tried using agents over the last 2 months, and I feel they are just... bad. I spend more time trying to correct their inexplicable decisions than it takes to just go through step by step in a dialogue.
So in your utopia, what's the process to determine if someone is a gross and terrible homeless druggie that deserves to die on the sidewalk; versus someone like yourself who is very important and deserves all the help right away?
Society makes this judgement every day in a thousand different ways. Resources are limited. It's why we don't give 85-year-old's heart/lung transplants - the 30-year-old recipient can use it better/longer. Does that mean we don't give any health care to 85-year-olds? No, and to argue it is so is a slippery slope fallacy. It's why we don't have lights on all 4-way stops even though it's safer than stop signs.
Given that we make these judgements, the problem with your argument is that you paint the GP as some sort of monster for making the judgement and picking a spot on the scale. It's a valid thing to disagree with where he picks on the scale; it is invalid to argue that there should be no scale.
> it is invalid to argue that there should be no scale.
And to put that into policy gets rid of the scale for everyone. You can see it with abortion restrictions in various states. Instead of the doctor's expertise, the lawyers are the ones to decide.
Then again this is very much on point for the US. There are no experts other than lawyers. /s
> This is despite the fact that US colleges openly and actively discriminate against US citizens for grad school spots for 2 or 3 decades now.
Can you provide a citation for this specific claim? I used to do admissions to a grad program in the US, and we ended up admitting mostly foreign students soley because very few US citizens actually applied (probably only 10% of apps). Whether that's because they were not qualified or couldn't afford it I do not know. But it's not because they were openly and actively discriminated against.
reply