Hacker Newsnew | past | comments | ask | show | jobs | submit | danbruc's commentslogin

Died in August 2010 [1], so he was still alive when this happened in August 2006.

[1] https://www.aachen-gedenkt.de/traueranzeige/profdr-ingwalter...


The people from computer history muesuem would know moew. They would have asked this info witht hte person that reported the trove too them ...

  Maybe the professor ( or his legal representitive) instigated the donation to CHM???

I am not a mathematician and did not read the unit distance solution too carefully, but my impression was that it used a variation of a known technique to solve the problem. And that makes perfect sense to me, there are a lot of techniques and lot of less relevant problems, I am not surprised that one can solve some of them with known techniques that just nobody has tried [hard enough] before. I am much more sceptical when it come to the important unsolved problems where every known technique has probably been tried several times over. In those instances it will probably take a true leap in understanding to solve them and I am sceptical that large language models are well suited for that because of the way they work.

We're very fortunate to have had some very eminent mathematicians backfill the OpenAI proof with history, context, and a literature review [1]. Ideas behind the proof seem to have been "in the air". Indeed, looked at certain point of view, the OpenAI construction can be viewed as a high-dimensional generalization of a known low-dimensional one. In this vein see the remarks of Gowers, Sawin and Tsimerman in [1]. Are LLMs capable of "true leap[s] in understanding"? I have absolutely no idea. But LLMs keep surprising me.

[1] https://arxiv.org/html/2605.20695v1


Related but not strong enough. 17 x 17 x 17 = 4,913 is 2^8-smooth - no prime factors larger than 2^8 - and it is less than 2^16, but 17 x 17 = 289 does not fit into a byte. Smoothness is required but not sufficient for a product representation to exist.

Did some try to estimates what it would take to bake interference for a capable large language model into silicon so that one can pipeline inputs through it and produce outputs at one token per clock cycle?

I'd expect it to require too much RAM bandwidth to be feasible.

RAM is really slow at silicon speeds. Very little is reachable in one clock cycle, unless the clock cycle is abysmally slow.


No RAM. Instead of having a general purpose multiplier that multiplies an input with a weight stored in RAM, just have a multiplier that hardcodes the weight. In some sense replace each weight with a specialized multiplier and wire them together with accumulators and activation functions in between. And some registers for pipelining. If one goes for four bit quantization, one could have sixteen optimized multipliers, one for each possible weight, and the one just selects and connects them according to the model weights and structure.

Example. If you have a neuron with 16 inputs each 8 bit wide and with a 4 bit weight per input, you will have 16 specialized multipliers each scaling its input by the corresponding weight and then the 16 scaled inputs feed into an adder tree and finally an activation function.


That sounds like wiring the RAM information into order of magnitude same number of transistors. A modern CPU has (quick googling) 184B transistors. If they were bits then that's 23GB. But presumably a model bit needs more than one transistor to represent how it acts as a neuron with its interactions.

Then there's the current speedup in inference from restricting which subset of the model is used, which is not a "swap in" that would work with hard wired neurons.

But I dunno. Maybe. I'm just guessing.


All the primes above 2^32 are out, but that accounts for only two point something percent.

But also all of their multiples. I suspect that those account for the vast majority.

Each x is prime with probability 1/ln(x), each x has M/x multiples less than M, as a fraction of M that is just 1/x. Together that makes 1/(x ln(x)) with the indefinite integral ln(ln(x)). If we plug in 2^32 and 2^64 [1], we get ln(2). So about 69.3 % of all 64 bit integers should have a prime factor larger than 2^32 and therefore not be the product of two 32 bit integers. That leaves about 13 % unaccounted. Three prime factors all larger than 2^32/2, five prime factors all larger than 2^32/3, and so on cannot be packed into two 32 bit integers. Not sure to how much this will add up.

[1] The bounds are important because they guarantee that there is at most one prime factor from that range and this ensures that we are not double counting anything. If the upper bound was larger than the square of the lower bound, then we would have to worry about double counting numbers with more than one large prime factor.


Nice work. I guess "vast majority" is overstating the case, but majority anyway.

The simple explanation is just that lot of ways that numbers can multiply to produce the same product

Like an odd number x times an even number y, x* y produces the same product as x* 2 and y/2

Same for a multiple of 3 c and and a non multiple of 3 d, c * d = c/3 * d*3


This is just looking at it from a different perspective. Both, one 64 bit integer and the product of two 32 bit integers represent a number up to 2^64 with 64 bits. But while all 64 bit integers are unique, there are, as you say, several representation for some numbers as the product of two 32 bit integers and therefore it is impossible to represent all 64 bit integers. Commutativity alone costs you about 50 percent of all numbers in the range as x * y and y * x represent the same 64 bit number but with two different representation as a product of two 32 bit numbers, at least if x and y are different. But this tells you nothing about the numbers that you can not represent, only that they must exist. I was looking at it from this other perspective, which numbers are not representable as a product and why.

Probably still does not work. Assume a request takes X ms and let us look at what you will observe depending on where within a tick period it arrives.

If it arrives anywhere from 0 ms to (20 - X) ms after a tick, it will complete before the next tick, so the measured duration will be between X ms and 20 ms. If it arrives later in the tick period, it will miss the next tick and have to wait an additional tick period, so the measured duration will be between 20 ms and (20 + X) ms.

If you make N repetitions, you would normally see a spike of density 1 at X. With the 20 ms tick wait, you will see a uniform distribution of density 1/20 between X and (20 + X).

You would have to perform each request and then return the result exactly 20 ms after it was received in order to mask the request duration. But that just creates a new target, your timers and queues to delay the response. Or making the load so high, that requests take more than 20 ms.


Your second one is somewhat in the right direction, I think. My guess would be that you split the circle into eights, that keeps the tangent slope between zero and one. Then you can go along one axis from 0 to sin(45°) times the radius and the value along the other axis will always either stay the same or change by one as the derivative is between zero and one. As you move along your primary axis, you generally keep the value along the secondary axis unchanged but you accumulate the error. When the error crosses 0.5, you move one pixel along the secondary axis and adjust the error accordingly. And of course you always draw eight mirrored and rotated pixels. In some way an adaptation of the Bresenham algorithm for drawing lines.

Why is it »/newest« but »new«? Or is it »new« but »/newest«?


Checked new for the first time. [1]

[1] https://news.ycombinator.com/item?id=48322267


Thanks for the info!


According to Copilot you can get one or two offspring per year from a tulip. So if you spent the price of a really nice house on one of those, it will take you quite some time to multiply the price down into reasonable territory. And even if you stay in unreasonable price territory, an average home, it is one thing to find a buyer for one tulip at that price, it is a very different thing to find a bunch of them. And you are still looking at three, four, five years of tulip growing to get the price down to a tenth of what you paid.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: