More

jdnier · 2026-04-01T01:00:39 1775005239

Finally, a bracket I can enjoy (that doesn't involve basketball).

jdnier · 2026-03-03T16:48:39 1772556519

> I think Claude Shannon’s spirit is probably proud to know that his name is now being associated with such advances. Hats off to Claude!

I didn't realize Claude was named after Claude Shannon!

https://en.wikipedia.org/wiki/Claude_Shannon

tzumaoli · 2026-03-03T18:15:56 1772561756

Trivia: Claude Shannon proposed the idea of predicting the next token (letter) using statistics/probabilities in the training data corpus in 1950: "Prediction and Entropy of Printed English" https://languagelog.ldc.upenn.edu/myl/Shannon1950.pdf

Anon84 · 2026-03-03T19:14:22 1772565262

It goes back a bit further than that. His 1948 “Mathematical theory of communication” [1] already has (what we would now call) a Markov chain language model, page 7 onwards. AFAIK, this was based on his classified WWII work so it was probably a few years older than that

[1] https://people.math.harvard.edu/~ctm/home/text/others/shanno...

aix1 · 2026-03-03T19:50:44 1772567444

I was just reading Norbert Wiener's "The Human Use of Human Beings" (1950) and this quote gave me a good chuckle:

"One may get a remarkable semblance of a language like English by taking a sequence of words, or pairs of words, or triads of words, according to the statistical frequency with which they occur in the language, and the gibberish thus obtained will have a remarkably persuasive similarity to good English."

Trinicode · 2026-03-03T21:14:14 1772572454

A letter is not a token, is it? Redundancy could hit 75% in long sentences, but Shannon was not predicting tokens or words, he was predicting letters (characters).

pfdietz · 2026-03-03T19:00:50 1772564450

It's like the diesel engine, which is named after Rudolf Engine.

ai_critic · 2026-03-03T19:17:04 1772565424

roer · 2026-03-03T21:21:49 1772572909

Is this a joke I don't get? His name was Rudolf Diesel, right?

stavros · 2026-03-04T00:05:46 1772582746

Yes, it is a fantastic joke and I laughed for ages, well played.

bread-wood · 2026-03-03T17:09:56 1772557796

Here I was assuming it was named after https://en.wikipedia.org/wiki/Claude_(alligator)

SenorKimchi · 2026-03-03T19:40:13 1772566813

And Claude had a collection of cycles, unicycles. Unfortunately the article is about something else altogether.

teekert · 2026-03-03T21:00:52 1772571652

Last time I asked Claude itself also didn’t know.

NitpickLawyer · 2026-03-03T17:29:25 1772558965

Wait till you hear about nvidia and their GPU architecture naming scheme :)

jdnier · 2025-11-24T01:19:03 1763947143

I had not heard of Glyphs, the tool the author used. I used to use Fontographer long ago.

https://glyphsapp.com/learn/recommendation:get-started

It's a great article!

antidamage · 2025-11-24T01:35:16 1763948116

Also a Fontographer user here. That's how you know you did font design in the last 90s.

jdnier · 2025-11-14T00:25:21 1763079921

DuckDb has a new "DuckLake" catalog format that would be another candidate to test. https://ducklake.select/

sukhavati · 2025-11-14T11:40:11 1763120411

for me the issue is that DuckLake's feature of flushing inlined data to parquet is still in alpha. one of the main issues with parquet is when writing small batches you end up with a lot of parquet files that are inefficient to work with using duckdb. to solve this ducklake inlines these small writes to the dbms you choose (postgres) but for a while it couldn't write them back to parquet. last I had checked this feature didn't yet exist, and now it seems to be in alpha which is nice to see, but I'd like some better support before I consider switching some personal data projects over. https://ducklake.select/docs/stable/duckdb/advanced_features...

erikcw · 2025-11-14T15:54:35 1763135675

Data inlining is also currently limited to only the DuckDB catalog (ie it doesn't work with Postgres cataglogs)[0]. It's improving very quickly though and I'm sure this will be expanded soon.

[0] https://ducklake.select/docs/stable/duckdb/advanced_features...

garganzol · 2025-11-14T01:52:01 1763085121

DuckLake format has an unresolved built-in chicken and egg conflict: it requires SQL database to represent its catalog. But this is what some people are running away from when they choose Parquet format in the first place. Parquet = easy, SQL = hard, adding SQL to Parquet makes the resulting format hard. I would expect a catalog to be in Parquet format as well, then it becomes something self-bootstrapping and usable.

datacynic · 2025-11-14T10:32:13 1763116333

DuckLake is more comparable to Iceberg and Delta than to raw parquet files. Iceberg requires a catalog layer too, a file system based one at its simplest. For DuckLake any RDBMS will do, including fs-based ones like DuckDB and SQLite. The difference is that DuckLake will use that database with all its ACID goodness for all metadata operations and there is no need to implement transactional semantics over a REST or object storage API.

matt123456789 · 2025-11-14T02:49:47 1763088587

It is not a chicken and egg problem, it is just a requirement to have an RDBMS available for systems like DuckLake and Hive to store their catalogs in. Metadata is relatively small and needs to provide ACID r/w => great RDBMS use case.

dsp_person · 2025-11-14T04:02:09 1763092929

What about file-based catalogs with Iceberg? Found one that puts it in a single json file: https://github.com/boringdata/boring-catalog

saxenaabhi · 2025-11-14T04:54:14 1763096054

Then concurrency suffers since you have to have locks when you update files.

That's also why ducklake performs better than others.

For many use cases this trade-off is worth it.

jdnier · 2025-10-31T20:11:56 1761941516

Yesterday there was a somewhat similar DuckDB post, "Frozen DuckLakes for Multi-User, Serverless Data Access". https://news.ycombinator.com/item?id=45702831

pacbard · 2025-10-31T23:39:51 1761953991

I set up something similar at work. But it was before the DuckLake format was available, so it just uses manually generated Parquet files saved to a bucket and a light DuckDB catalog that uses views to expose the parquet files. This lets us update the Parquet files using our ETL process and just refresh the catalog when there is a schema change.

We didn't find the frozen DuckLake setup useful for our use case. Mostly because the frozen catalog kind of doesn't make sense with the DuckLake philosophy and the cost-benefit wasn't there over a regular duckdb catalog. It also made making updates cumbersome because you need to pull the DuckLake catalog, commit the changes, and re-upload the catalog (instead of just directly updating the Parquet files). I get that we are missing the time travel part of the DuckLake, but that's not critical for us and if it becomes important, we would just roll out a PostgreSQL database to manage the catalog.

85392_school · 2025-10-31T20:21:55 1761942115

This also reminded me of an approach using SQLite: https://news.ycombinator.com/item?id=45748186

jdnier · 2025-10-19T01:22:25 1760836945

Looking up the etymology of "sargeable", I found this StackOverflow answer: https://dba.stackexchange.com/a/217983

And Google explains "The term 'sargable' is a portmanteau of "Search ARGument ABLE," formed by combining the words from a SQL database context."

jdnier · 2025-10-05T05:06:13 1759640773

If you want to do this rigorously, I suggest you read Robert D. Cameron's excellent paper "REX: XML Shallow Parsing with Regular Expressions" (1998).

https://www2.cs.sfu.ca/~cameron/REX.html

jdnier · 2025-09-28T03:06:43 1759028803

That must be for "Dissolution and crystallization of cobalt, copper and sodium chlorides". It's quite something to watch!

https://www.nikonsmallworld.com/galleries/2025-small-world-i...

jdnier · 2025-09-12T19:53:52 1757706832

> should buy the books

Yes I totally will, err..., oh my, ebook for $91.99, paperback for $127.99. What's going on with these prices? These aren't college textbooks. I'm glad to hear about the 3rd edition but the cost gives me pause.

WillAdams · 2025-09-13T12:29:57 1757766597

I recall them being less expensive when first released.

Either there has been a new printing which was done overseas and was affected by tariffs or more expensive for some other reason, or the copies in the warehouse were taxed as inventory (blame Congress for that, it was a major change in the tax law and it created the current mess of remaindered books and no back-list and ever spiraling book prices).

jdnier · 2025-08-20T07:56:34 1755676594

The paper they've submitted goes into a lot more detail. https://royalsocietypublishing.org/doi/10.1098/rspa.2025.029...