In 2007 I worked for a firm with a 4 billion row join table in PostgreSQL. Might...

mixedbit · on Feb 26, 2013

But I suspect the join must have been over an indexed column, so it did not touched 4bln rows, otherwise 2-3 seconds would be hard to believe. The group by query in the article must access all 3bln rows, which makes a huge difference.

jacques_chester · on Feb 26, 2013

All the columns were indexed.

I remember it well, because I was trying to explain why having tens of gigabytes of indexes wouldn't help them much if they only had 16Gb of RAM.

In terms of group-by performance, it depends a lot on the kind of data and how it's stored. For example, taking a sum on a columnar store is quite amenable to parallel solutions and a lot of databases will do that way.