Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A few years ago I was toying with JSON databases without loading, and came up with an idea to cache the parse tree of the documents after the first query, so that subsequent queries would run much faster. I called the technique semi-indexing and wrote a paper [1] on that; on my synthetic tests the speedups were significant (even 10x), but I never got the chance to test it on real workloads.

I wonder if would be useful to integrate the semi-index code [2] in this json_fdw; thanks for sharing the code, I'll try to see how feasible this is.

[1] http://www.di.unipi.it/~ottavian/files/semi_index_cikm.pdf

[2] https://github.com/ot/semi_index



This paper is fascinating. I have an extremely fast JSON parser for Java that I've been working on, on and off [1], and I'd love to optionally augment it with semi-indexes for cases where end-users are using a subset of a file.

[1] https://github.com/mmastrac/nanojson


Very interesting! Implementing a semi-index is very easy if you have implementations of Elias-Fano and balanced parentheses data structures.

For Java I recommend Sux4j [1] that has a very good implementation of Elias-Fano, but I think that balanced parentheses are very primitive. I was told by the author that a better data structure, based on Range-Min-Max trees, should be added soon.

[1] http://sux.dsi.unimi.it/


I've enjoyed playing with succinct data structures because they make interesting puzzles, and I've come across your papers. Succinct data structures seem to be a drain on performance, though, for ordinary kinds of datasets. It's not easy to find the kind of datasets that are so large that succinctness would allow them to fit in memory when they otherwise wouldn't (not working in computational biology, etc). Using regular data structures would probably increase performance more and meet most people's needs, and it would certainly reduce the code complexity.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: