I think a lot of us would be fine for AA to be a for-profit enterprise earning money from donations and deals with companies. The service it provides is invaluable - free and DRM-free access to millions of titles in the world.
I have only worked in startups and I have been an early engineer in both of them. I would always get high privileges within a short time where I would have the access to create and delete resources. I don't think it's that uncommon.
The problem with agents is they regularly sidestep the guardrails and do what they want with a script anyway. The number of times I’ve seen Claude try to escape the folder it’s working in, and then for it to write a python script that does exactly what I told it it’s not allowed do supports that.
If you use SSO and have an AWS config that Claude is allowed to see to get the correct role in the first place, it will just pick the role and plough on anyway.
And this is why it is the height of irresponsibility to run LLMs on your system. We know they are unreliable and just make things up; it's extremely foolish to go "yeah I'm going to let that run commands".
It's not _really_ any different to running an undocumented third party binary. Is it the height of irresponsibility to run Windows, or VSCode, or Spotify?
I think the model we've got now is wrong, and the harnesses should be OS-level sandboxed, and the agents should be running in harness managed sandboxes.
Sounds like they're still giving the model the keys to the kingdom, which is my point, stop giving the model the avenue to do catastrophic mistakes, it makes no sense.
If you’re message is in response to me, which I think it is, I deliberately don’t give access to credentials and env variables. I’ve worked to create restrictions and seen AI models use very interesting methods to bypass them.
Even now my prompt says the AI must verify the path of the files it intends to edit, and get permission before editing one file at a time and only after permission. I stop it from ignoring those rules once a day at least.
I built www.propelcode.app with separate Linux containers, unless you disconnect the container and your computer from the internet the models can escape the sandbox and get information off of your machine.
I am open to being corrected and learning from you if you have a better method of sandboxing
I am using tmux but not disposable vm. I have thought about something like that but honestly some of the debugging work makes ephemeral environments hard to work with. How are you doing that in your workflow?
We kinda need to architect things with the assumption that all token-output from an LLM can be unpredictably sneaky and malicious.
Alas, humans suck at constant vigilance, we're built to avoid it whenever possible, so a "reverse centaur" future of "do what the AI says but only if you see it's good" is going to suck.
I built my own IDE to replace vscode / cursor so I could design the harness and ensure that the model tool access was secure and limited. But the rest of the industry is YOLO
The first step I do when I do any meaningful side project is to set up rds with snapshots. So any startup that doesnt do this one basic step already deserves to fail in my opinion.
Then next I've used AI agents like crazy, we even have linked mcp servers that let it query on the dev database. Haven't seen it try deleting everything a single time. I haven't seen any agent try to do anything destructive. Ever. Perhaps its just reflecting an outrageously bad engineer and nothing else.
I too have felt the same around me. There is this lack of faith in the institutions now, feeling of distrust. Someone on HN called this the era of shamelessness and I kind of agree to it. The top has gotten shameless and the people at the bottom are trying to scrabble whatever they can to become one of them so that they can escape this hellhole that has been created.
> I'm also a bit confused about how the people on the top think this will play out.
I don't know if they are really capable of thinking of the second and third order effects of what they're doing. There is something psychologically broken about many of the ultra-rich today where their behavior comes across as compulsive.
When you have a hole in your soul that can't be filled with a billion dollars, it simply can't be filled, and that black hole drives much of their behavior. You look at people like Trump and Musk, and they seem... miserable. Like, have you ever heard Trump have a genuine laugh of joy? Not the sort of sneering snicker of a bully, but one that comes from delight? Because I haven't.
We are all at the mercy of their actions, but it's almost like they're at the mercy of their irrational compulsions too.
Not that I'm saying they are deserving of sympathy or aren't responsible for their actions. But if we're looking for someone to pump the brakes on the crazy that's happening these days, it's sure as hell not going to be those hollow men.
I don't like being conspiratorial but it genuinely feels like the people at the top know some major catastrophe is coming and are just grabbing whatever resources they can while they can before retreating to their bunkers. Even the white house is trying to build a massive underground bunker using the ballroom on top as a excuse. I don't see why else they would all be willingly destroying society as they are right now unless they don't think it matters.
I don't want to give away too much due to anonymity reasons, but the problems are generally in the following areas (in order from hardest to easiest):
- One problem on using quantum mechanics and C*-algebra techniques for non-Markovian stochastic processes. The interchange between the physics and probability languages often trips the models up, so pretty much everything tends to fail here.
- Three problems in random matrix theory and free probability; these require strong combinatorial skills and a good understanding of novel definitions, requiring multiple papers for context.
- One problem in saddle-point approximation; I've just recently put together a manuscript for this one with a masters student, so it isn't trivial either, but does not require as much insight.
- One problem pertaining to bounds on integral probability metrics for time-series modelling.
Regarding the first problem: are you looking at NCP maps for non-Markovian processes given you mention C*-algebra? Or is it more of a continuous weak monitoring of a stochastic system that results in dynamics with memory effects?
I'd be very curious to know how any LLMs fare. I completely understand if you don't want to continue the discussion because of anonymity reasons.
More of the latter. It's a pet project of mine, and all of the LLMs tend to utterly fail at getting anywhere with it, at least in chats. In an agentic setup, it can chip away at some aspects, but it needs serious guidance on relevant language, notation, and concepts. To me, it demonstrates that the LLMs are not particularly good at crossing literatures, but then again, humans rarely seem to be good at that either...
It would be wonderful to have a deeper insight, but I understand that you can disclose your identity (I understand that you work in applied research field, right ? )
Yes, I do mostly applied work, but I come from a background in pure probability so I sometimes dabble in the fundamental stuff when the mood strikes.
Happy to try to answer more specific questions if anyone has any, but yes, these are among my active research projects so there's only so much I can say.
I saw something like this for a book. It was under an Instagram reel where the person was describing ways to improve your self-esteem. In the comments section someone mentioned a book that worked for them and it had a few replies saying how it worked for them too. I searched for the book and it was a very new book from an unknown author and zero reviews everywhere.
Exact same story at my place. Upper management decided it's a good idea to build on Azure because Microsoft promised some benefits. Things that ran reliable on GCP now need active firefighting on Azure
I see this being said often but I don't understand.
A lot of people posting there are young and may well be in their first relationship. It makes sense for them to ask a question in the community they spend their most time in - which is reddit
Yes that's what should be said to OpenAI. Now they should not cry about their T&Cs not being respected when they never cared about others' copyrights.
reply