I really appreciate your reply — it's an honor to hear from someone with such deep experience in the field. Your insights from decades of working with CAD and kernels are incredibly valuable, and it means a lot that you'd take the time to share them here.
The idea of parameterizing geometric problem spaces and learning from how different kernels handle them is strikingly similar to what compiler researchers have done in CS: generating corner cases, analyzing compile errors, and training AI to self-correct. AI coding is used widely in the industry currently, with tools like cursor gaining huge popularity.
And the move to a text-based representation is what makes this all tractable — binary formats never gave us that level of observability or editability. With source-level CAD, it becomes much more realistic to analyze failures, share test cases, and eventually integrate AI tools that can reason about geometry the same way they reason about code.
Today, a solid consists of the following entities that follow the golden rule; namely, each edge is shared by two and only two surfaces: [0]
Solid Cubish
6 faces bounded by 4 edges each, having endpoints, etc...
Each edge is a curve [1] that lies on the surface so as to bound it to a precision small enough that there are no gaps between the curves and the surface edges they define.
Various bindings and or other data elements:
Centroid
Vertices, each attached to three edge endpoints considered equal given a system tolerance.
...etc.
[0] Where an edge is alone, the resulting non manifold has a hole in it somewhere, and or is a surface body where a large number of edges stand alone.
Where an edge is shared by more than two others, that is a self-intersecting body.
Neither case is actually manufacturable. One can understand a lot just from edge checks too.
All edges alone, or unique = face.
No edges present = closed surface must be sphere, torus or elliptical solid body
...etc.
Also, in wireframe NURBS curve land, the most useful thing about the decision to represent all the analytic entities (line, arc, conic, hyp, parabola, circle, elipse...) as NURBS was to be able to reason programmatically with far fewer pain in the ass cases!
Eg: a trim function can be written to process any NURBS arguments. One that has to face lines, circles and friends ends up either converting to NURBS or handling trim line to circle, arc to conic, NURBS to ... you get the idea. Too messy.
Generating that data won't be cheap, but it can be distributed! If we had a few thousand users run scripts on their systems, we could get a large problem-solution data corpus.
[1] In modern CAD, everything is a curve. Lines are NURBS curves having only two control points. Earlier CAD actually used all the entity types directly, not just deriving them on the fly from the NURBS.
Arcs are curves with 3 specifically placed control points.
Hyperbola, Conic, Parabola, are the next order up, 4 control points, and above that is the Bspine. 5th degree, and above curves.
Why can't we tokenize those things and train some LLM like thing? I am going to ask my data science friends about this. Has me thinking!
At the core, it is all NURBS curves and surfaces. Those two can represent all that we need.
The relations are all just text, names of entities and how they are related.
Even the NURBS surfaces have text forms. At one point, some systems would let a person just define one by typing the U, V points / matrix values in.
Eg:
Plane [point 1, 2, 3...]
That data is where both the problems and answers are, in this training sense anyway.
How can it not?
What I put before was basically the idea of generating a case, say conic section and cube/rectangle.
Generate common volume case 1 in modern kernel and output text representation of it. That exists today.
Then generate ideal edge blend solution 1, and minimum radius case 1, maximum radius case 1.
Output those and we have in text:
Problem case 1 of problem space 1.txt
Ideal, or common edge blend solution.txt
Max radii case 1.txt
Minimum radii case1.txt
Then proceed to generate a bazillion of these, until the problem space of a conic section intersecting a rectangular body is represented fully enough for AI models to operate and even potentially demonstrate emergent behavior like they do on text and code today.
Edit: basically an LLM like thing becomes the kernel and the CAD system is how one talks to it. Not sure that came through before. Writing it out just in case.
And to be fair, I am still learning in this area. If what I put here is a no way, it would be most helpful to tell me or link me to why not. Thanks in advance.
Edit: Ahh, I see. Lol, read in the cad code and have an AI rewrite it? Maybe, but doubtful.
Edit: I got the number of control points discussion above wrong. The Arts and conics should be marked three control points each, with the difference being the constraint on the middle control point. Whoops!
Man you have been incredibly generous with what I suspect is an LLM chat. The OP types really strangely with some tell tale LLM writing structures and styles on a 3 day old account.
Llm’s are notoriously overly agreeable and polite in a weird way.
Like a human might say “cheers for the thoughtful response” and an LLM would say something more like this “Thanks a lot for the thoughtful and respectful reply! I really appreciate that you raising the engine issue without dismissing the whole idea.
”
The tell here is that it’s maintaining a polite engaging conversation while mirroring the semantic meaning of your reply without any conceptual depth.
You could have written anything and it would have agreed with you.
Anyone who has spent a few minutes genuinely thinking about how to build CAD can easily see that the geometry kernal is the crux of the problem and why open source attempts lag behind commercial CAD.
But the LLM user response here is overtly neutral essentially saying “thanks for raising this super obvious thing”
Its being overly polite. In another comment they say they are “honoured to have your reply” which again is just overly polite.
On a 3 day old account it just stinks.
OP is either a low effort poster farming for something or an english as second language user relying on an LLM to write for them way too much.
Or they are a naive user who wrote a low effort LLM generated proposal without looking at any of the prior art of the topic.
The idea of parameterizing geometric problem spaces and learning from how different kernels handle them is strikingly similar to what compiler researchers have done in CS: generating corner cases, analyzing compile errors, and training AI to self-correct. AI coding is used widely in the industry currently, with tools like cursor gaining huge popularity.
And the move to a text-based representation is what makes this all tractable — binary formats never gave us that level of observability or editability. With source-level CAD, it becomes much more realistic to analyze failures, share test cases, and eventually integrate AI tools that can reason about geometry the same way they reason about code.