curious: wdym by "getting separators right when generating multiple files in a single inference call"
context: created hypertokens an even more robust hashing mechanism to create context-addressable memory (CAM), one cheat code is make them prefix-free, lots of others that get deep into why models work the way they do, etc.
we dug into those sorts of questions with hypertokens, a robust hash for lines, code, tables/rows or any in-context token tagging to give models photographic memory
one mechanism we establish is that each model has a fidelity window, i.e., r tokens of content for s tag tokens; each tag token adds extra GUID-like marker capacity via its embedding vector; since 1,2,3 digit numbers only one token in top models, a single hash token lacks enough capacity & separation in latent space
we also show hash should be properly prefix-free, or unique symbols perp digit, e.g., if using A-K & L-Z to hash then A,R is legal hash whereas M,C is not permitted hash
we can do all this & more rather precisely as we show in our arXiv paper on same; next update goes deeper into group theory, info theory, etc. on boosting model recall, reasoning, tool calls, etc. by way of robust hashing
anecdotally, it seems like this helps find better places for code to sit, understands the nuances of a code base better, and does a better job avoiding duplicate functionality.
it's still very much a work in progress, the thing I'm struggling with most right now is to have claude even using the capability without directly telling it to.
there seems to be benefits to the native stack (which lists files and then hopes for the best) relative to this sometimes. Frankly, it seems to be better at understanding the file structure. Where this approach really shines is in understanding the code base.
one approach that can work is to tell model to load read skill and/or call shell script that overloads default, there are variety of ways to attempt this with any harness, claude specifically has hooks some of which allow go, no go, do this instead etc. and ya, agree on grokking code base, ast integration feels like natural next step
Interesting! Been building space-time coordinate system for AI models. Notionally agree in principle w.r.t. convex hull, clocks, etc. since we invoke similar machinery albeit in tokenized models. Need read this work more deeply to grok.
One question is to what extent you dig into or have considered oversampling? One of the core hypotheses we've converged on is that nearly all models are optimized for source coding vs. channel coding. The implication is path to AGI likely involves oversampling to capture channel coding gains and which will resolve phase errors, etc.
Random sampling naturally does this albeit inefficiently. Curious if you do something more structured than random in terms of oversampling and especially partial overlapped samples / think supersaturated subspaces / subchannels, etc.
Thank you for the profound insight. I completely agree that the path to AGI lies in channel coding (robustness and synchronization) rather than just source coding (compression).In CSCT, we don't just "sample" data; we process it as a continuous Projected Dynamical System. Here is how we address your points:
Structured Temporal Oversampling: Our stream-based approach effectively performs high-density oversampling in the time domain. Instead of random sampling, the theta-phase (hippocampal rhythm) in our MultiGate architecture creates structured, overlapping "integration windows" to capture temporal context.
Phase Error Resolution: Phase errors are resolved not by averaging (as in L2 models), but by NMDA-gating. The gate only opens when the anchor velocity and theta-phase align, physically "locking" the signal to a specific codebook vertex. This is a computational implementation of theta-gamma coupling.
Supersaturated Subspaces: Our Simplex constraint (L1) naturally handles what you call "supersaturated subspaces" by enforcing non-negative competition. This ensures that even with overlapping temporal samples, the resulting internal representation remains discrete and grounded within the convex hull.
By treating cognition as a communication channel between an "Anchor" and "Codebook," we prioritize the stability of the compositional mapping over the mere efficiency of representation.
ya, brain just noisy channel in same way we can treat LLMs; anything possible exists we are just sampling it, which distills to "mere" clock syncing
L1 & L2 constraints unwind that clock compression with suitable dilation; very easy to think only efficiency matters and not averaged out replicas; nature does that inherently via primes, we have to create those artificial waves, recreate that convex hull, etc.
all to say, great to see more work in this direction & perhaps we can compare notes sometime!
interesting is the idea the agent calls it or just alt to terminal bash etc tool calls hey your tool calls are all microvms, containers, isoshells, raw term, clawd/molt all credentials with weaker and weaker security demarcs?
my ideal scenario is a cloud web model getting access to a sandbox to run commands and read/write to files. but yeah it could be used as an alternative to bash and read write tools.
I did not get your second question exactly, but yeah microvms can be considered one of the secure ways to run your agent
Basically, just thinking that it’s more ideal to have the tool call the micro VM versus the agent, doing it in the sense of its mandated by the tool call
security matters if want to demarc where agents can play. running agent inside of strong VM is usually where starts container not enough for that full isolation only sees files you want it to etc
we've considered docker, firecracker, will add smol to working roster
context <> building something with QEMU
* required has to support LMW+AI (linux/mac/windows + android/ios)
there are scenarios in which we might spin micro vms inside that main vm, which by default is almost always Debian Linux distro with high probability.
one scenario is say ETL vm and AI vm isolated for various things
curious why building another microVM other than sheer joy of building, what smol does better or different, why use smol, etc. (microVMs to avoid etc also fair game :)
really great! adjacent well-done ASCII using Braille blocks on X this week:
nolen: "unicode braille characters are 2x4 rectangles of dots that can be individually set. That's 8x the pixels you normally get in the terminal! anyway here's a proof of concept terminal SVG renderer using unicode braille", https://x.com/itseieio/status/2011101813647556902
ashfn: "@itseieio You can use 'persistence of vision' to individually address each of the 8 dots with their own color if you want, there's some messy code of an example here", https://x.com/ashfncom/status/2011135962970218736
reply