More

dalke · 2026-03-20T11:15:15 1774005315

You might consider prefixing with 'ChatGPT claims…' as a clearer expression of uncertainty.

dalke · 2026-03-19T11:18:31 1773919111

The effort to fact check with LLMs is also high. Here's one from a few days ago.

Someone used AI to generate an image in the style of a Charles Schulz Peanuts cartoon.

Someone else observed that there were 5 fingers on the characters, and quoted as Google AI as saying “Charlie Brown, along with other Peanuts characters, is generally depicted with four fingers on each hand (three fingers and one thumb) ...”

Yet if you go to the Wikipedia entry at https://en.wikipedia.org/wiki/Peanuts you'll see the kids have 5 fingers. Or take a look at the actual cartoons. Or read the TVTropes entry https://tvtropes.org/pmwiki/pmwiki.php/Main/FourFingeredHand... under "Comic Strips".

Fact checking this with human sources is easy and not ambiguous. While LLMs are being trained that many cartoon characters only have a thumb and three fingers - it is a trope for a reason - so isn't it logical for LLMs to give the wrong answer for a comic where the human characters are actually drawn with 5 fingers?

My experience with LLMs is they keep getting things wrong, when details matter.

Do you ask the LLM to fact check everything? (In which case, why isn't that part of the standard prompt?) Or do you only ask to fact check things where you are unsure about the answer? (In which case, is it the algorithm telling you what you want to hear?) When do you stop the fact checking?

lovelearning · 2026-03-19T12:21:33 1773922893

> When do you stop the fact checking?

Exactly the same calculus as fact checking anything else from any other source. What are the social/economic/ethical consequences to me if the answer is wrong or inaccurate or incomplete? How much time do I have to check? How thorough should I be?

I imagine this calculus isn't really that different for most people. Or is it?

As for your example, I believe it. But I also feel it's a rather outlier example involving image comprehension of an obscure factoid. That isn't typical of how I use LLMs which is mostly as text-based question answering engines and not what I had in mind when writing the comment.

I guess LLMs for image comprehension need a much higher level of skepticism.

dalke · 2026-03-19T13:16:04 1773926164

Well, in my case going to a Peanuts comic and looking at hands was pretty easy, and didn't involve any questions about negative environment or labor consequences, the massive hammering of web sites to gather data, centralization of power, and the like.

Like, "!w Peanuts" in my search bar, look at the image, and count fingers.

"a rather outlier example"

You wrote that you use AI to find "obscure connections" - aren't those all by definition outliers?

"mostly as text-based question"

I just now asked Google AI "how many fingers are on charlie brown's hand?"

It replied "In the Peanuts comic strip, Charlie Brown and the rest of the gang are traditionally drawn with four fingers (or three fingers and a thumb) on each hand."

No image comprehension, exactly as you had in mind. And completely false.

And that's from a training corpus which almost certainly includes statements that the kids are drawn with 5 fingers, since I confirmed that info on TVTropes and Reddit comments, like https://www.reddit.com/r/pics/comments/swod8/charlie_brown_h... .

lovelearning · 2026-03-19T16:00:58 1773936058

HN isn't showing me a reply option for your latest comment, so I'll reply here instead.

Just to clarify, I used plain Google search not Google AI mode. And opened search results which seemed "reputable," without knowing anything much about Peanuts cartoon or cartooning.

I had no idea at all about archive.org having it and didn't see it listed in the first two pages of search results.

I still find it confusing, especially given what the Variety.com link says which doesn't mention orientation. If the acceptable explanation for 4 vs 5 is orientation, why is it wrong when the AI generated 4 fingers? Does it not match the rest of the orientation?

Anyway, I'm not sure where this leaves LLMs. I'll explore image capabilities when I get some opportunity and keep your comment in mind.

dalke · 2026-03-20T06:10:14 1773987014

The comment about using Google was more a curiosity. I hadn't seen the Variety link until yesterday, when I went to Google to reproduce the answer to verify it was from a text query, not an image query. Both Google AI and one of the top answers included that Variety link. When you mentioned it again, it strongly suggested you were using Google as your primary search method.

I think the right way to interpret the Variety link is that it's a single paragraph about trying to capture the feel of the comic using 3D software. As you saw from Charlie Brown holding a baseball, Shulz didn't go for a realistic look, but still conveys the sense of grasping. Modeling all five fingers all the time would not give the movie the right feel.

I wonder now if Google AI incorporates text from the top results into its answer.

"why is it wrong when the AI generated 4 fingers?"

The original discussion was when person X used AI to generate a image "in the style f Charles Shulz" where the Peanuts characters had 5 fingers, then person Y noted the use of 5 fingers instead of the 4 which is common in comics and cartoon, and quoted Google AI as saying Peanuts was traditionally drawn with 4 fingers.

I yesterday verified that Google AI would generate the same wrong answer with a text query, so it was not an image interpretation issue.

FWIW, after looking at a few hundred Peanuts cartoons, I can confidently say the AI generated image was not in the style of Schulz. The generated fingers were too realistic, and the background too complicated and detailed. :)

This for me is another example of why using primary sources should be the first thing to consider when fact checking - not LLMs (my experience is they are horrible at details), and not secondary sources (which have their own biases).

Not everything has easily-accessed primary sources, but many do. I think it's all too easy to fall into the trap of accepting the LLM answer because it feels right and is easy to generate. At https://freethoughtblogs.com/stderr/2025/01/18/ai-art-just-r... you'll see someone asked about which river Marbot swam across to spy on the enemy camp. It replied "Elbe". Then I did a text search of an English translation of the book and found he used a boat to cross the Danube to spy on the enemy camp, and he swam into freezing waters to save an enemy soldier.

Again, do you ask the LLM to fact check itself every single time? If that's useful, why isn't it built into the prompt? Or, if you are supposed to double-check the LLM yourself, why would you consult a secondary source if the primary source is so easy to find and search? And in that case, why not just use the primary source?

Further, if you aren't in the habit of checking primary sources then you won't have the experience to know how to find and check primary sources.

lovelearning · 2026-03-19T13:57:55 1773928675

Even as a human, I find whatever sources Google shows to be inconsistent. I can't give any confident answer about the number of fingers. I think the answer is actually "4 sometimes and 5 other times."

So I'm not sure how much LLMs can handle this kind of inconsistency between "reputable" visual sources and text sources, nor how representative this example is.

A "reputable source" like Variety says this...

https://variety.com/2015/film/spotlight/charlie-brown-steve-...:

> “The rig would automatically move the features around so it would match the way Charles Schulz drew the character,” Heller says....In some drawings, Charlie Brown has just three fingers, while in others, he has five

Images from another website...

https://cartoonresearch.com/index.php/cartoons-at-bat-part-1... :

1. https://cartoonresearch.com/wp-content/uploads/2025/09/Lost-... -> 4 fingers

2. https://cartoonresearch.com/wp-content/uploads/2025/09/image... -> 4 fingers

Anyway this wasn't the type of obscure connections I was referring to though I can understand you interpreting it that way.

Personally I think this example supports what I said about "reputable sources." They can't be blindly trusted either because they may be inconsistent with each other and which one we choose to believe (Reddit.com or TVTropes.com or Variety.com) becomes entirely subjective.

dalke · 2026-03-19T15:41:13 1773934873

Your first link was cited in the 2nd half of Google AI's answer, and one of the top Google answers, so I think you are using Google as your information source.

The large majority of the images you link to show kids with 5 fingers, as well as 5-fingered baseball gloves. The cases of four fingers are due to orientation.

Your "1." also shows Marcie with five fingers. You see Charlie Brown with 4 fingers because he's holding a baseball. In 2. he's also holding a baseball. You would not see 5 fingers on one side because doing so would look strange.

In your unlabeled "0." there are plenty of kids with 5 fingers. There are some with fewer, but they are holding things or drawn in way to suggest we are seeing the hand from the side.

I don't understand your hesitancy. Your own samples should be enough for you to decisively conclude that the Google AI's claim that Peanuts was "traditionally drawn with four fingers (or three fingers and a thumb) on each hand" is wrong. If not, it sure seems like you trust Google AI over your own eyes. Why are you so hesitant to agree?

My point is that you don't need to consult secondary sources when the primary sources are easily available.

When this came up a few days ago, I spot checked the complete works of Peanuts, from a collection on archive.org at https://archive.org/details/peanutscomics19502000/Volume%201... . The consistent pattern across the nearly 50 years of Peanuts is the kids have five fingers unless obscured by orientation or objects.

You can do that yourself, and triple-check that Google AI's answer is clearly wrong.

Thus, I think it's a good example of how fact checking with LLMs can lead people astray, and the large negative externalities I mentioned combined with its well-known tendencies to make incorrect statements make it a very poor starting point when the primary source, at least in this case, is so easy to access.

If most of the sources are wrong, and LLMs are being trained on those, isn't it logical that the latter will also likely output that same wrong information?

When do you know if most of the sources are wrong, unless you yourself know most of the sources are wrong?

dalke · 2026-02-27T15:47:03 1772207223

The README says "PyMOL-RS is a clean-room rewrite" but when I look at ./pymol-mol/src/dss.rs I see things like:

  //! - PyMOL's layer3/Selector.cpp - SelectorAssignSS function
  //! - PyMOL's layer2/ObjectMolecule2.cpp - ObjectMoleculeGetCheckHBond function
  //! - PyMOL's layer1/SettingInfo.h - Default angle thresholds

and "matching PyMOL's cSS* flags from Selector.cpp"

While the Rust code is cleaned up and easier to read, I can see that it preserves similar data flow, uses similar variable names, and of course identical constants.

For example, this is PyMol layer3/Selector.cpp:

          /* look for antiparallel beta sheet ladders (single or double) 
          ...
          */

          if((r + 1)->real && (r + 2)->real) {

            for(b = 0; b < r->n_acc; b++) {     /* iterate through acceptors */
              r2 = (res + r->acc[b]) - 2;       /* go back 2 */
              if(r2->real) {

                for(c = 0; c < r2->n_acc; c++) {

                  if(r2->acc[c] == a + 2) {     /* found a ladder */

                    (r)->flags |= cSSAntiStrandSingleHB;
                    (r + 1)->flags |= cSSAntiStrandSkip;
                    (r + 2)->flags |= cSSAntiStrandSingleHB;

                    (r2)->flags |= cSSAntiStrandSingleHB;
                    (r2 + 1)->flags |= cSSAntiStrandSkip;
                    (r2 + 2)->flags |= cSSAntiStrandSingleHB;

                    /*                  printf("anti ladder %s %s to %s %s\n",
                       r->obj->AtomInfo[I->Table[r->ca].atom].resi,
                       r->obj->AtomInfo[I->Table[(r+2)->ca].atom].resi,
                       r2->obj->AtomInfo[I->Table[r2->ca].atom].resi,
                       r2->obj->AtomInfo[I->Table[(r2+2)->ca].atom].resi); */
                  }
                }
              }
            }

and this is pymol-rs's pymol-mol/src/dss.rs

        // Antiparallel ladder: i accepts j, (j-2) accepts (i+2)
        if a + 2 < n_res && res[a + 1].real && res[a + 2].real {
            for &acc_j in &acc_list {
                if acc_j < 2 || !res[acc_j].real {
                    continue;
                }
                let j_minus_2 = acc_j - 2;
                if !res[j_minus_2].real {
                    continue;
                }
                let acc_jm2_list: Vec<usize> = res[j_minus_2].acc.clone();
                for &acc_k in &acc_jm2_list {
                    if acc_k == a + 2 {
                        res[a].flags |= SsFlags::ANTI_STRAND_SINGLE_HB;
                        res[a + 1].flags |= SsFlags::ANTI_STRAND_SKIP;
                        res[a + 2].flags |= SsFlags::ANTI_STRAND_SINGLE_HB;
                        res[j_minus_2].flags |= SsFlags::ANTI_STRAND_SINGLE_HB;
                        if acc_j >= j_minus_2 + 2 {
                            res[j_minus_2 + 1].flags |= SsFlags::ANTI_STRAND_SKIP;
                        }
                        res[acc_j].flags |= SsFlags::ANTI_STRAND_SINGLE_HB;
                    }
                }
            }
        }

That's close enough that I really think you should include the PyMol license info, before Schrödinger's lawyers notice.

dalke · 2026-03-05T09:23:36 1772702616

Over the last few days I have learned that using code generation tools are increasingly used to create a "clean room" version of a product, using a definition which is far from its standard use.

See https://tuananh.net/2026/03/05/relicensing-with-ai-assisted-... with discussion at https://news.ycombinator.com/item?id=47257803

I believe your use of "clean room" is another example of misusing the term.

Could you clarify how it was developed? Who had access to the original source code? Were code generation tools used, and if so, how? Was the PyMol source part of the training set for those tools? How did you ensure no copyright violations?

Warren was a friend of mine, and a passionate believer in open source software. He wanted people to be able to modify PyMol for their own purposes, and asked only for a license acknowledgment. Schrodinger, to their great credit, continues to honor Warren by maintaining the Open-Source PyMOL product.

If this project was not developed under true clean room practices, I ask that you continue to honor his work by including the PyMol license in your Rust rewrite.

If it was true clean room development, why does ./crates/pymol-algos/src/align/ce.rs say "This is a faithful port of PyMOL's `ccealignmodule.cpp`.", with comments like "Equivalent to PyMOL's `calcS`" and references to the original code in comments, like: "PyMOL: for (row = 0; row < wSize - 2; row++)"?

zmactep · 2026-03-06T19:27:25 1772825245

Dear Andrew, first of all — thank you so much for your feedback, both the technical and the legal parts. Your earlier comments about SDF and PDB parsing corner cases are incredibly valuable.

PyMOL has been one of my primary tools for 15 years, and I've always held it in deep respect. This project was born entirely out of a desire to contribute something to molecular visualization in the modern world — something fast, modular, and with qualities I've been missing in existing tools. And as a source of inspiration, I took the best one: PyMOL.

Of course I spent a lot of time reading and studying its code, and I openly took concepts and algorithms from it. I don't hide that — it's why the project carries the name it does, and it's why the README has had an Acknowledgments section since the very first commit: "Inspired by PyMOL, created by Warren Lyford DeLano. This is an independent reimplementation, not affiliated with Schrödinger, Inc."

You are absolutely right about the "clean-room" wording — I used it loosely, meaning "rewritten from scratch in a different language with a different architecture," not in the legal sense. That was misleading, and I've already removed it from the README.

You're also right that DSS or CE was a fairly direct port of PyMOL's algorithm, and it should carry proper attribution. At the same time, many other parts — surface generation, cartoon rendering, the shading pipeline — are done quite differently, and the gap keeps growing. But that doesn't excuse insufficient attribution where code was closely followed.

Going forward, I'm focusing on genuinely new functionality — Rust plugin system, web interface, novel shading models (try set shading_mode, skripkin!) — things the original PyMOL never had. But this is not an attempt to distance the project from DeLano's creation. It's a respectful continuation of his ideas in a completely new product.

Thank you again — your comments are making this project better.

dalke · 2026-02-26T11:25:55 1772105155

You might mention in other forums, like the RDKit mailing list (though that's almost moribund).

I looked at the SDF reader, since that's what I know best. I see a few things which look like they need revisiting.

Line 75 has 'if name == "$$$$" {return self.parse_molecule();}' This isn't correct. This means the record name is "$$$$" (if you are RDKit), or it means the record is in the wrong format (if you are the CTFile specification, which explicitly prohibits that).

Also, does Rust have tail recursion? If not, the recursive nature of the code makes me think parsing a file containing 1 million lines of the form "$$$$\n" would likely blow the stack.

In principle the version number test for V2000 or V3000 should look at the specific column numbers, and not presence somewhere in the line. Someone like me might place a "V3000" in the obsolete fields, with a "V2000" in the correct vvvvvv field. ;)

The "Skip to end of molecule" code will break on real-world datasets. One classic problem is a company which used "$", "$$", "$$$" and "$$$$" to indicate cost, stored as tag data like:

  > <price>
  $$$$

  $$$$

where the first "$$$$" is part of the data item, and the second "$$$$" is the end of the SD record. This ended up causing a problem when an SDF reader somewhere in their system didn't parse data items correctly. (Another common failure in data item parsing is to ignore the requirement for a newline after the data item.)

I talk about "$$$$" more at http://www.dalkescientific.com/writings/diary/archive/2020/0... .

Then there's the "S SKP" field, which you'll almost certainly never see in real life! I've only seen it used in a published example of a JICST extended MOLfile. See http://www.dalkescientific.com/writings/diary/archive/2020/0...

Please don't let these comments get you down! These details are hard to get, and not obvious. It took me years to learn the rare corner cases.

I also haven't done molviz since the 1990s, or used PyMol (I was VMD person), so can't say anything about the overall project. We started with GL, and had to port to OpenGL. :)

PS. A bit of history for you. PyMol's and VMD's selection syntax look similar because both drew on the syntax in Axel Brunger's X-PLOR. Warren DeLano came out of Brunger's lab, and VMD was from Schulten's group, which were X-PLOR users. (Schulten was Brunger's PhD advisor.)

dalke · 2026-02-27T12:52:29 1772196749

I looked at the PDB parser.

Will you be adding support for using duplicate CONECT records to store bond type information? That's a RasMol extension that PyMol supports. You'll also need to support the pdb_conect_nodup option in the writer.

I see you interpret atom/hetatm serial numbers as an integer. Will you be using the base-36 or hybrid-36 variants (see https://cci.lbl.gov/hybrid_36/) which is a common way to handle more than 100,000 atoms?

Again, these are corner cases which come with experience. I've no expectation that a new program would handle them. I want you to know about them since they will be issues if you expect long-term uptake.

dalke · 2025-11-06T06:09:38 1762409378

I see you include the dot disconnect "." as part of the Bond definition.

You also define Chain as:

  Chain <<= pp.Group(pp.Optional(Bond) + pp.Or([Atom, RingClosure]))

I believe this means your grammar allows the invalid SMILES C=.N

dalke · 2025-11-05T20:48:30 1762375710

That's "SMILES".

Yes. Here is the yacc grammar for the SMILES parser in the RDKit. https://github.com/rdkit/rdkit/blob/master/Code/GraphMol/Smi...

There's also one from OpenSMILES at http://opensmiles.org/opensmiles.html#_grammar . It has a shift/reduce error (as I recall) that I was not competent enough to fix.

I prefer to parser almost completely in the lexer, with a small amount of lexer state to handle balanced parens, bracket atoms, and matching ring closures. See https://hg.sr.ht/~dalke/opensmiles-ragel and more specifically https://hg.sr.ht/~dalke/opensmiles-ragel/browse/opensmiles.r... .

dalke · 2025-11-06T06:14:54 1762409694

Oh, I should have pointed out my Python lexer-driven parser at https://hg.sr.ht/~dalke/smiview/browse/smiview.py

The lexer: https://hg.sr.ht/~dalke/smiview/browse/smiview.py?rev=tip#L3...

The lexer state transitions: https://hg.sr.ht/~dalke/smiview/browse/smiview.py?rev=tip#L3...

dalke · 2025-10-17T09:39:49 1760693989

Link to the actual "etude" https://archive.org/details/wetherell-etudes-for-programmers... (Wetherell has a number of small, interesting programming "etudes", one of is to write a TRAC interpreter.)

Some background. Calvin Mooers developed TRAC - the programming language - in the 1960s for "duffers", that is, people who were not computer scientists.

(A phrase he used in the writing of the time was "duffers", though I don't know if he specifically applied it to TRAC users.)

It was the first homoiconic language. There were a group of teens interested in programming that hung out with Mooers. One of these was L Peter Deutsch, who at 18 (and a year after writing LISP 1.5 for the PDP-1) helped develop the TRAC language and wrote the first TRAC implementation. Deutsch later implemented Ghostscript.

About 10 years later, Ted Nelson's "Computer Libs" suggested TRAC as one of the first three programming languages to start with. This made people more widely aware of TRAC, and of course people did their own implementations, as seen in this link.

Mooers, though, was very protective about what was "his." He pushed for software copyright production back in the 1960s. The best he could do was trademark the term "TRAC", and send cease&desist letters when someone used it. See this article from the first issue of Dr. Dobbs: https://archive.org/details/dr_dobbs_journal_vol_01/page/n12...

I talked with someone who had met one of Mooers' daughters around Cambridge. He knew Mooers was, and had (as I call) a copy of Computer Lib with him. He got invited to dinner with the Mooers family. All went well, until he revealed he had written a version of TRAC for himself. This was a sore point. Mooers got up and left. He wife commented that Mooers didn't like others playing with his toys.

dalke · 2025-10-08T12:07:36 1759925256

Could someone explain what "democratising" means here? Is it any different than "user-friendly", "enabling" or "simplifying"?

dalke · 2025-10-06T18:25:39 1759775139

On the return from my first trip to South Africa I carried 12 bottles of wine in my luggage.

That was back when flights included two free checked items.

On my second trip to Europe one of my suitcases was full of T-shirt swag to give out at a conference. Lugging both up the stairs, across the train tracks, and back down was a hassle.

Both of these were over 20 years ago.

And then there's the story at https://notalwaysright.com/a-steam-powered-cruise/392530/ of a couple trying to bring a full-size espresso machine on their cruise, so they can have their special coffee.

octo888 · 2025-10-06T20:45:08 1759783508

> On the return from my first trip to South Africa I carried 12 bottles of wine in my luggage.

Totally understandable! Amazing wine - I didn't want to leave Constantia. But I picked myself up and dragged myself to Stellenbosch !

And at the end of the trip, I didn't want to return home. Such an incredible country that still holds a very special place in my heart

bparsons · 2025-10-07T14:23:15 1759846995

You didn't have to pay duty on them?

dalke · 2025-10-07T14:56:58 1759849018

I ... probably did?

I didn't.

Things were a lot looser then. I brought a 6-pack of Negra Modelo as carry-on for a trip to Europe. The airport x-ray staff in Albuquerque recognized it on the screen, which impressed me. They had no problem with it.

dalke · 2025-10-06T17:50:34 1759773034

I was a nomad for about a year. Towards the end I was tired of the constant leaving.

I asked for advice from an NGO who moves countries often. She said what happens is the NGO members become part of the extended connection, which helps with that situation.

Even when I was a nomad, I wouldn't have been without a suitcase. My big hobby then was dancing - mostly salsa and tango - and I needed several changes of clothes and dance shoes. And, umm, not all black clothes.

To make it worse, indoor smoking was legal, so I would come home with stinky clothes that I wouldn't want to wear again until washing.

I also did some upper undergrad/grad level visiting teaching, and would stay at a staff members home, or in one case the home of the parents of one of the grad students. I brought a dozen or so greeting-style cards with nice pictures of the city I used to live in, so I could leave them as a thank you, with an image of what for them would be an exotic place.

radicalriddler · 2025-10-06T21:14:23 1759785263

I went backpacking last year for only a little over a month. Absolute pain in my chest when someone who I'd gotten to known over the past few days said it was their last day lol.