This was published right before people started experimentally validating the Landauer limit. I am not sure why it hasn’t been taken down at some point as the evidence has accumulated:
2012 — Bérut et al. (Nature) — They used a single colloidal silica bead (2 μm) trapped in a double-well potential created by a focused laser. By modulating the potential to erase the bit, they showed that mean dissipated heat saturates at the Landauer bound (k_B T ln 2) in the limit of long erasure cycles.
2014 — Jun et al. (PRL) — A higher-precision follow-up using 200 nm fluorescent particles in an electrokinetic feedback trap. Same basic physics, tighter error bars.
2016 — Hong et al. (Science Advances) — First test on actual digital memory hardware. Used arrays of sub-100 nm single-domain Permalloy nanomagnets and measured energy dissipation during adiabatic bit erasure using magneto-optic Kerr effect magnetometry. The measured dissipation was consistent with the Landauer limit within 2 standard deviations using the actual the basis of magnetic storage.
2018 — Guadenzi et al. (Nature Physics) — Opens with:
The erasure of a bit of information is an irreversible operation whose minimal entropy production of kB ln 2 is set by the Landauer limit1. This limit has been verified in a variety of classical systems, including particles in traps2,3 and nanomagnets4. Here, we extend it to the quantum realm by using a crystal of molecular nanomagnets as a quantum spin memory and showing that its erasure is still governed by the Landauer principle.
I'm not sure, but isn't 2 standard deviations a bit low? Especially so for something that can be done in a lab. It seems that 2 SD is the minimum threshold for getting published. Can we be sure that these are properly reviewed?
Could it be possible that you confused the number of standard deviations one needs to falsify something? For instance, if two things are different we may want to be as many SD as we can apart. Here, on the other hand, the data agree _within_ 2S D.
For people who want to ask a model for an app, or a website, or something at a level of “hey you make apps right, I have had this idea for years…” the experience is akin to a slot machine — sometimes they get what they imagined their description would create and it works, and sometimes they get a hollow chocolate approximation.
> I have an RTX 5070 with 12 GB VRAM and I wanted to run glm-4.7-flash:q8_0, which is a 31.8 GB model. The standard options are:
> Offload layers to CPU — works, but drops token/s by 5–10× because CPU RAM has no CUDA coherence. You end up waiting.
Use a smaller quantization — you lose quality. At q4_0 the model is noticeably worse on reasoning tasks.
> Buy a bigger GPU — not realistic for consumer hardware. A 48 GB card costs more than a complete workstation.
> None of those felt right, so I built an alternative: route the overflow memory to DDR4 via DMA-BUF, which gives the GPU direct access to system RAM over PCIe 4.0 without a CPU copy involved.
And then limps home with this caveat on the closest thing to a benchmark:
> The PCIe 4.0 link (~32 GB/s) is the bottleneck when the model overflows VRAM. The best strategy is to shrink the model until it fits — either with EXL3 quantization or ModelOpt PTQ — and use GreenBoost's DDR4 pool for KV cache only.
I think the reason it refers it to DDR4 is because that is how the user explained it to their coding agent. LLMs are great at perpetuating unnecessary specificity.
Given that 32 GB/s is significantly worse than CPU to RAM speeds these days, does the additional compute really make it any faster in practice? The KV cache is always on the GPU anyway unless you're doing something really weird, so it won't affect ingestion, and generation is typically bandwidth bound. With something like ×16 PCIe 6.0 it would actually make sense, but nothing less than that, or maybe for smaller dense models that are more compute bound with 8x PCIe 6.0 or 16x 5.0 but that's already below DDR5 speeds.
Additional compute is generally a win for prefill, while memory bandwidth is king for decode. KV cache however is the main blocker for long context, so it should be offloaded to system RAM and even to NVMe swap as context grows. Yes that's slow on an absolute basis but it's faster (and more power efficient, which makes everything else faster) than not having the cache at all, so it's still a huge win.
Well if you do that then you reverse the strengths of your system. It might be best to work with the context length you can offload, like a normal person.
No, they stop hunt their way to depressed prices where they then buy anticipating the recovery while you closed out your “safe” retirement positions at -15%.
You use a trailing stop loss. You get closed out 15% down from the top, not 15% down from purchase. The alternative in a 24 hour market is worse — the news of a real event hits and by the time you wake up and respond you’re down 50% or more and the stock isn’t coming back.
This policy change is to hunt profit from a safety mechanism used by retail traders.
It is something that should yield a lot of profit for 24 hour trading systems during a downturn.
That is a reasonable position, however the assumption that it is the administration that is gaming them vs other motivated parties is open for discussion.
It is in fact not at all reasonable. They are saying that the BLS stats can't be trusted because they totally misunderstand the survey methodology. That isn't a reason!
I’d counter that if we were doing a good job gathering data that these structural biases could be compensated for with more conservative initial numbers.
At some point a lack of decision to take compensating action becomes faking the numbers.
> if we were doing a good job gathering data that these structural biases could be compensated for with more conservative initial numbers
There is no more conservative. The data will bias in the direction of trend. The point of the data are, in part, to measure that trend. Fucking with it to make it politically correct to the statistically illiterate is precisely the sort of degradation of data we’re worried about.
(They’re also useless as a time series if the methodology changes quarter to quarter. That’s the job of analysis. Not the data.)
What you wrote suggests the data will bias predictably, which matches my understanding.
Reporting biased data as the default because the bias compensation is already built into the audience seems like a weak argument for not improving.
They can provide for the continuation of data visibility/granularity by releasing the prior numbers as previously calculated and at the same time changing the calculation of the headline number to be better compensated.
The simpler argument is that changing it at all will result in a negative step change in the reporting that no one wants to take accountability for.
> What you wrote suggests the data will bias predictably
Ex post facto. Before the fact, we don’t know.
Imagine you know the weather will be a strong gust regardless of direction. Averaging the models will produce a central estimate. But you know it will be biased away from the center. You just don’t know, until it happens, in which direction.
> They can provide for the continuation of data visibility/granularity by releasing the prior numbers as previously calculated and at the same time changing the calculation of the headline number to be better compensated
They do. These data are all recalculated with each methodological change. They’re just deprecated indices the media don’t report on because they’re of academic, not broad, concern.
> simpler argument is that changing it at all will result in a negative step change in the reporting
Simpler but wrong. Those data would be useless for the same reason we don’t let CEOs smooth revenues.
I’m confused by this discussion. It seems like you said the biases were structural because we know who reports early and that is why the early numbers are always revised down. Structural implies known in advance.
It also seems like you said they shouldn’t revise the numbers but now you are saying they already do.
The performance gap between Apple’s flash and a typical aftermarket NVMe drive in a Windows laptop is more attributable to controller design and integration than to trace length.
Apple can get away with less RAM because their flash storage is fast enough to make swapping barely noticeable. In contrast, most Windows machines incur a significant performance penalty when swapping.
2012 — Bérut et al. (Nature) — They used a single colloidal silica bead (2 μm) trapped in a double-well potential created by a focused laser. By modulating the potential to erase the bit, they showed that mean dissipated heat saturates at the Landauer bound (k_B T ln 2) in the limit of long erasure cycles.
https://www.physics.rutgers.edu/~morozov/677_f2017/Physics_6...
2014 — Jun et al. (PRL) — A higher-precision follow-up using 200 nm fluorescent particles in an electrokinetic feedback trap. Same basic physics, tighter error bars.
https://pmc.ncbi.nlm.nih.gov/articles/PMC4795654/
2016 — Hong et al. (Science Advances) — First test on actual digital memory hardware. Used arrays of sub-100 nm single-domain Permalloy nanomagnets and measured energy dissipation during adiabatic bit erasure using magneto-optic Kerr effect magnetometry. The measured dissipation was consistent with the Landauer limit within 2 standard deviations using the actual the basis of magnetic storage.
https://www.science.org/doi/10.1126/sciadv.1501492
2018 — Guadenzi et al. (Nature Physics) — Opens with:
The erasure of a bit of information is an irreversible operation whose minimal entropy production of kB ln 2 is set by the Landauer limit1. This limit has been verified in a variety of classical systems, including particles in traps2,3 and nanomagnets4. Here, we extend it to the quantum realm by using a crystal of molecular nanomagnets as a quantum spin memory and showing that its erasure is still governed by the Landauer principle.
https://www.nature.com/articles/s41567-018-0070-7
The Landauer limit is not conjecture.
reply