Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In 2007 I worked for a firm with a 4 billion row join table in PostgreSQL. Might've been 7 or 8, I don't recall which. It ran on a quad core server with 16Gb of RAM. Joins going through this table took about 2-3 seconds to complete.


But I suspect the join must have been over an indexed column, so it did not touched 4bln rows, otherwise 2-3 seconds would be hard to believe. The group by query in the article must access all 3bln rows, which makes a huge difference.


All the columns were indexed.

I remember it well, because I was trying to explain why having tens of gigabytes of indexes wouldn't help them much if they only had 16Gb of RAM.

In terms of group-by performance, it depends a lot on the kind of data and how it's stored. For example, taking a sum on a columnar store is quite amenable to parallel solutions and a lot of databases will do that way.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: