ai-inquisitor's comments

ai-inquisitor · 2026-03-24T18:22:18 1774376538

That model is "open weight", not open source. We have no idea what data Moonshot trained on.

airspresso · 2026-03-25T08:22:04 1774426924

I think we lost that terminology war. Open source models mean open weight. There are only a couple examples of fully open source models with open data and code, and the labs are not incentivized to go that far.

ai-inquisitor · 2026-03-19T19:03:46 1773947026

Enforcement of the device restriction would also mean they also are collecting information from your device about the app.

ai-inquisitor · 2026-03-18T19:54:50 1773863690

It's not doing that. If you look at the repository, it's adding a new commit with tiny parquet files every 5 minutes. This recent one only was a 20.9 KB parquet file: https://huggingface.co/datasets/open-index/hacker-news/commi... and the ones before it were a median of 5 KB: https://huggingface.co/datasets/open-index/hacker-news/tree/...

The bigger concern is how large the git history is going to get on the repository.

btown · 2026-03-18T20:16:42 1773865002

I recall that this became a big problem for the Homebrew project in terms of load on the repo, to the extent that Github asked them not to recommend/default-enable shallow clones for their users: https://github.com/Homebrew/brew/issues/15497#issuecomment-1...

This is likely to be lower traffic, and the history should (?) scale only linearly with new data, so likely not the worst thing. But it's something to be cognizant of when using SCM software in unexpected ways!

roncesvalles · 2026-03-18T20:32:35 1773865955

How would shallow clone be more stressful for GitHub than a regular clone?

enchilada · 2026-03-18T20:45:06 1773866706

Shallow clones (and the resulting lack of shared history data) break many assumptions that packfile optimisations rely on.

See also: https://github.com/orgs/Homebrew/discussions/225

vovavili · 2026-03-18T20:04:00 1773864240

This makes more sense. I still wonder if the author isn't just effectively recreating Apache Iceberg manually here.

tamnd · 2026-03-19T10:47:48 1773917268

I intentionally kept it lightweight. Just Parquet files + simple partitioning + commits on Hugging Face. That already covers most of what I need, without introducing a heavier stack or extra dependencies.

Also, I wanted something that is easy to consume anywhere. With this setup, you can point DuckDB or Polars directly at the data and start querying, no catalog or special tooling required.

tomrod · 2026-03-18T20:05:27 1773864327

Are they paying for the repo space, I wonder?

cyanydeez · 2026-03-18T20:25:08 1773865508

someones paying to keep name dropping Iceberg(tm)

mulmen · 2026-03-18T22:40:24 1773873624

Weird accusation. Iceberg is an Apache project. I don’t think anyone gets paid when you use it so not sure what the benefit of shilling would be. It is just a table format that’s well suited for this purpose. I would expect any professional to make a similar recommendation.

sureglymop · 2026-03-19T10:09:45 1773914985

So they are sharding by time/day?

I have a similar project right now where I am scraping a dataset that is only ever offering the current state. I am trying to preserve the history of this dataset and was thinking of using the same strategy. If anyone has experience or pointers in how to best add time as a dimension to an existing generic dataset, I'd love to read about it.

ai-inquisitor · 2026-03-18T17:42:28 1773855748

The good ol' folks at Stripe's collaborators Tempo Labs tried to make an RFC-style description page for MPP: https://paymentauth.org/ (full doc on IETF draft page: https://datatracker.ietf.org/doc/draft-ryan-httpauth-payment...)

I almost was going to point it out as evidence there was thought put into it. Nope, it's flimsy and AI generated.

Also, it contains provisions for scamming customers:

> 403 indicates the payment succeeded but access is denied by policy

No, it doesn't explain how to refund payments for customers you deny access to.

NetOpWibby · 2026-03-18T20:46:21 1773866781

I recently redesigned my blog to look like a modern RFC and I'm loving the way they've decided to render tables in their plain text, definitely gonna steal that.

On topic though, Stripe is trying to make themselves the Visa/Mastercard of crypto. They're in position to do so and it seems like Coinbase is their other half. I don't trust or like it though.

ahnick · 2026-03-18T21:36:32 1773869792

The best Visa/Mastercard of crypto already exists and is called Flexa. (https://flexa.co/payments#pricing)

NetOpWibby · 2026-03-18T22:19:39 1773872379

Oh wow, I never heard of this. I'm currently working on something similar with the same 1% rate, haha! WELP

Xirdus · 2026-03-18T18:10:20 1773857420

This one is even worse IMO

> Servers MAY return 402 when:

> * Offering optional paid features or premium content

This implies that a successful GET request to a resource that user already does have access to, might still return 402 instead of 200. This makes 402 basically unworkable.

pertsix · 2026-03-18T18:48:53 1773859733

An RFC is a request for comments, contributions.

Are you open to contributing to this RFC?

john_strinlai · 2026-03-18T20:18:17 1773865097

that doesnt sound nearly as fun as getting upvotes, if im honest

Xirdus · 2026-03-23T16:46:58 1774284418

I always assumed contributing to RFCs is about as easy as contributing to C++, which I always assumed is virtually impossible without a billion dollars or a billion citations of your academic papers.

darkwater · 2026-03-18T21:38:27 1773869907

Will they get a slice of the earnings in return by Stripe?

pear01 · 2026-03-18T19:40:57 1773862857

Was it AI generated? If so, should I just delegate my AI to do so?

ai-inquisitor · 2026-03-17T20:16:53 1773778613

This entire piece sounds AI written (or at minimum, heavily AI edited) with its "punchy" writing style LLMs love, negative parallelisms galore (see blockquotes 2, 3, 5), invented concept labels ("your calendar is a load-bearing wall" ???), attempts at humor sprinkled in. What a joke of a post.