Hacker Newsnew | past | comments | ask | show | jobs | submit | tcdent's commentslogin

5.5 min to train on a PDP/11 you mean to tell me we could have been doing this all along???

Yes. The Cray supercomputers from the 80s were crazy good matmul machines in particular. The quad-CPU Cray X-MP (1984) could sustain 800 MFLOPS to 1 GFLOPS, and with a 1 GB SSD, had enough computer power and bandwidth to train a 7-10M-parameter language model in about six months, and infer at 18-25 tok/sec.

A mid-90s Cray T3E could have handled GPT-2 124M, 24 years before OpenAI.

I also had a punch-card computer from 1965 learn XOR with backpropagation.

The hardware was never the bottleneck, the ideas were.


Post-quantum crypto is a good example of this. Lattice-based schemes were theorized in the 90s, but they took decades to actually reach production. The math existed, the hardware existed, and the ideas for making it work were just not there yet.

The hardware was never the bottleneck, the ideas were.

For sure. Minsky and Papert really set us back.


They should have lived to see the results of the bitter lesson.

Minsky came close (d. 2016) -- although he may have had other interests later in life, if the Epstein file dumps are to be believed.

The replies lol.

"Yes" Proceeds to talk about AI.


DSPy is cool from an integrated perspective but as someone who extensively develops agents, there have been two phases to the workflow that prevented me from adopting it:

1. Up until about six months ago, modifying prompts by hand and incorporating terminology with very specific intent and observing edge cases and essentially directing the LLM in a direction to the intended outcome was somewhat meticulous and also somewhat tricky. This is what the industry was commonly referring to as prompt engineering.

2. With the current state of SOTA models like Opus 4.6, the agent that is developing my applications alongside of me often has a more intelligent and/or generalized view of the system that we're creating.

We've reached a point in the industry where smaller models can accomplish tasks that were reserved for only the largest models. And now that we use the most intelligent models to create those systems, the feedback loop which was patterned by DSPy has essentially become adopted as part of my development workflow.

I can write an agent and a prompt as a first pass using an agentic coder, and then based on the observation of the performance of the agent by my agentic coder, continue to iterate on my prompts until I arrive at satisfactory results. This is further supported by all of the documentation, specifications, data structures, and other I/O aspects of the application that the agent integrates in which the coding agent can take into account when constructing and evaluating agentic systems.

So DSPy was certainly onto something but the level of abstraction, at least in my personal use case has, moved up a layer instead of being integrated into the actual system.


I think many people have the same experience! And that's the point I'm trying to make. There are patterns here that are worth adopting, whether or not you're using Dspy :)

Not worth it. It is a very significant performance hit.

With that said, people are trying to extend VRAM into system RAM or even NVMe storage, but as soon as you hit the PCI bus with the high bandwidth layers like KV cache, you eliminate a lot of the performance benefit that you get from having fast memory near the GPU die.


> With that said, people are trying to extend VRAM into system RAM or even NVMe storage

Only useful for prefill (given the usual discrete-GPU setup; iGPU/APU/unified memory is different and can basically be treated as VRAM-only, though a bit slower) since the PCIe bus becomes a severe bottleneck otherwise as soon as you offload more than a tiny fraction of the memory workload to system memory/NVMe. For decode, you're better off running entire layers (including expert layers) on the CPU, which local AI frameworks support out of the box. (CPU-run layers can in turn offload to storage for model parameters/KV cache as a last resort. But if you offload too much to storage (insufficient RAM cache) that then dominates the overhead and basically everything else becomes irrelevant.)"


I'm kindof surprised by this take.

Did you think the hesitancy of westerners engaging and relying on Chinese labs was due to vibes? There are fundamental cultural differences at play, wether we are comfortable admitting that or not.


I'm kind of surprised by this comment.

I wonder if someone made a comment citing "fundamental cultural differences" with how Israeli people did business, it would be as well received.

From my experience, dealing with Israeli companies and Chinese companies are pretty much the same.


Communism. I'm talking about communists.

If you're so brave, you should state what these fundamental cultural differences are.


I still find it revolting they're writing this stuff in typescript.


Just like you can read source code written by humans (and should if you take this stance) you can also read source code generated by LLMs. Then, when you find something unsavory and feel that your sentiment is warranted, make a contribution.


Well obviously, but a dirty kitchen is evidence that the meal might give you food poisoning, and there's no reason to visit every restaurant. Would you go see a movie that was advertised as AI-generated? (I do appreciate the author being upfront about it however.)


Some genAI video or image content can be made with creativity and be enjoyable. It gets boring with time, but our current AI boom allows some people to unleash an inner director.


I'm looking forward to those films, especially if they are adaptations made by the fan community instead of corporate studios.


Inference is run on shared hardware already, so they're not giving you the full bandwidth of the system by default. This most likely just allocates more resources to your request.


Major unlock I see coming with the advent of AI is the ability to access knowledge that has dropped out of the sociosphere because of obscurity and language barriers. This is a cool move in the right direction.


Is this even a problem that needs to be solved? How many people have 3d printed guns and used them?

Preemptive regulation is absurd.


Quite famously, Luigi Mangione. (allegedly)

Of course, this is silliness since it is very easy to just buy a gun in the US, and it is also legal to make one in your garage.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: