In my experience LLMs (I speak mainly of Claude Code & Cursor) write very poor quality Rust.
They treat it like it's JavaScript, falling back to using String/&str needlessly instead of making new types. They do ugly `static Mutex<Refcell<` a-la global JS variables for info sharing instead of working out the lifetimes to do it properly. It loves making functions infallible and then panic-ing within them and certainly I wouldn't use them for unsafe at all - they hallucinate safety comments which are in fact, totally unsound.
Of course these are all surmountable with an experienced developer to regularly step in and unfuck the code, but forcing them into 'harder' territory where every problem is not solved by a .clone() and an Arc<Mutex<>> means they will spend minutes 'thinking' about basic lifetime issues until I step in and add the missing `move` in a closure.
We’ve developed incredibly strict and comprehensive clippy rules and found that to drastically improve the quality of the code as the LLM now should pass all clippy checks. You can add a clippy skill as well to attempt to turn “should” into “must.”
That is interesting. I make LLMs write C with the general hope that a simpler language they can manage well. That is not entirely true, though. They reason about C fluently indeed. The problem is, Claude pumps lots of bad C into the codebase if left unattended for 5 min.
So, I need some clean-up passes afterwards to get to some acceptable quality level (both by LLMs and my own eyes). At which point, Claude sees the problem clearly, for some mysterious reason.
Also, I use a C dialect heavly influenced by Go (slices, generics, no smart tricks, virtually no malloc).
> general hope that a simpler language they can manage well
It's the opposite; a language with lots of guardrails allows the AI to write better code especially as it is able to use the compiler and linter to guide it through the process. It's why OpenAI for example was able to disprove some recent theorem recently, due to the LLM converting its thoughts into a formal language theorem prover to then check its work.
Is THAT how people use AI? I thought _I_ was vibe coding by telling it to write one function at a time and making sure I understand every line it outputs.
"Vibe coding" is a term that means "Prompt the LLM with a request for a project and wait for it to finish". If you're reading and understanding anything it made, that's not vibe coding, that's agentic development.
this was true a year ago but not so much anymore. You still have to supervise the agents, but they can write maintainable code if you keep an eye on it.
The simplicity of Gorilla is attractive but for better compression ratios without too much extra compute I'd instead recommend Sprintz: https://github.com/dblalock/sprintz.
The downside is that (a) Sprintz requires the data to be quantised to fixed point integers, usually fine if the data is coming out of a sensor of some sort and (b) the Huffman coding step of Sprintz requires dynamic memory allocation, whilst Gorilla is almost trivially implemented without it.
An observer falling into the black hole would not observe any distortion in time. They would simply fall in, under the influence of gravity. From the perspective of a far-away observer it would look as if time is slowing down as the photons would take increasingly longer to escape. At the event horizon the photons would effectively be held in place. Eventually though, the last photon will have escaped and you will just observe a slightly larger black hole.
So the merger definitely happens from the point of view of the black holes. We might observe odd artifacts but they would eventually fade away.
Are you suggesting Rust should automatically insert the borrow annotation because it is able to see that a borrow is sufficient? That would be quite unintuitive and make it ambiguous whether a for loop is borrowing or consuming the iterator without reviewing the body. I'd strongly argue that it should unambiguously do either one or the other and not try and read the author's mind.
Yes, I'm suggesting it should do the right thing for the code the loop is actually trying to execute. I personally think this is exactly what Rust and its users have signed up for. I might be mistaken about that, but I think it's in line with the more general view that Rust is attempting to be as close as it can get to a language that reads like it has a garbage collector without having one.
> the more general view that Rust is attempting to be as close as it can get to a language that reads like it has a garbage collector without having one.
I've used Rust a fair amount, and I've never seen that expressed as a goal.
A couple of general principles followed by Rust are to prefer explicit code over implicit conversions and to support local reasoning. Those are both present here: the borrow needs to be made explicitly, rather than implicitly based on code later on.
I think you have misread the abstract. The 'low statistical significance' was a [prior work](https://iopscience.iop.org/article/10.3847/2041-8213/acf577). This paper has increased the significance to 3-sigmas which is on the lower end but still quite significant.
This is what an illegal meme looks like: "Tyler Kay, 26, wrote a post... calling for hotels housing asylum seekers to be set alight. He responded to several comments posted by others following his post, adding that it was “100% the plan”.
Kay also reposted... another message inciting action against a named immigration solicitors in Northampton" https://www.cps.gov.uk/cps/news/man-jailed-just-two-days-aft...
I think we are stretching the definition of a meme here. This was original content orchestrating attacks. Not some repost of a joke (however bad taste it might be).
At which point is the boundary between meme and instigator?
I'm pretty sure that's the point they're getting at - that the original person commenting was talking about this stuff like it's just memes, in bad faith.
I'm curious if a multimodal model would be better at the OCR step than tesseract? Probably would increase the cost but I wonder if that would be offset by needing less post processing.
I couldn't find any comparisons with Microsoft's TrOCR model. I guess they are for different purposes. But since you used Florence-2 for OCR, did you compare the two?
I don't want to jump to conclusions, but I don't feel confident using gpt4o/claude for OCR, as I often experience issues mentioned on this page https://github.com/Yuliang-Liu/MultimodalOCR
[edit] But it is not applicable to OCR specialised models like Florence-2
IME GPT-4V is a lot better than Tesseract, including on scanned document PDFs. The thing about frontier models is they aren’t free but they keep getting better too. I’m not using tesseract for anything anymore, for my tasks it’s obsolete.
My experience is that at least the models which are price-competitive (~= open weight and small enough to run on a 3/4090 - MiniCPM-V, Phi-3-V, Kosmos-2.5) are not as good as Tesseract or EasyOCR. They're often more accurate on plain text where their language knowledge is useful but on symbols, numbers, and weird formatting they're at best even. Sometimes they go completely off the rails when they see a dashed line or handwriting or an image, things which the conventional OCR tools can ignore or at least recover from.
I found Claude3 great an reading documents. Plus it can describe figures. The only issue I ran into was giving it a 2-column article, and if reading the first line on each column "kinda made sense" together it would treat the entire thing as 1 column.
They treat it like it's JavaScript, falling back to using String/&str needlessly instead of making new types. They do ugly `static Mutex<Refcell<` a-la global JS variables for info sharing instead of working out the lifetimes to do it properly. It loves making functions infallible and then panic-ing within them and certainly I wouldn't use them for unsafe at all - they hallucinate safety comments which are in fact, totally unsound.
Of course these are all surmountable with an experienced developer to regularly step in and unfuck the code, but forcing them into 'harder' territory where every problem is not solved by a .clone() and an Arc<Mutex<>> means they will spend minutes 'thinking' about basic lifetime issues until I step in and add the missing `move` in a closure.
reply