Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Google+ UIDs were UUIDs (or something looking very much like them).

They were nonsequential and sparse.

For those of us looking to 1) quantify activity and 2) migrate users and data off the system, they were pretty confounding.

Fortunately, Google also provided lists of those UUIDs by way of robots.txt sitemap files.

I used one sample of ~50k of those to estimate total G+ active users as of ~2014 (another group polled a random sampling of 500k for a more precise measurement). And when G+ folded in 2019, I'd provided that information plus some additional bits gleaned over the years and from some additional sources to estimate just how large the archive dataset might be, for ArchiveTeam.

One place where the sparse population might prove really useful is in telephony. Phone numbers as we know them today are densely populated, and in fact, frequently re-used (which is why your new phone is receiving debt-collection calls for its previous holder). It also makes war-dialing or random-dialing viable for robocallers.

If only 1 in 10 billion numbers was valid (about the saturation rate of G+ UUIDs), war-dialing / random dialing would be all but ineffective. If you could dial one number per second, you'd have a 50% chance of hitting a live number ... in 158 years.

(Of course, if you had a listing of valid numbers, again, see G+'s robots.txt files, your search space would be far smaller.)



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: