There are aspects I didn't fully understand, maybe someone can explain : on one ...

thelastbender12 · on Oct 24, 2024

PyArrow string arrays store the entries (string values) contiguously in memory, so access is quicker, while object arrays have pointers to scattered memory locations in the heap.

I agree, I couldn't really figure how the new numpy string data type makes it work though.

ngoldbaum · on Oct 24, 2024

They’re stored on the DType instance. This requires that there’s only one DType instance per “owned” array buffer, which I figured out how to do along with Sebastian Berg and others using the new DType system.