Why is this kind of data ever going to be useful? People have 200+ friends on their friends lists but only a handful of those are meaningful and the rest of those links are just about facebook values in the same way that capitalism is about the acquisition of money. So I'm a little afraid about people using this data to generate social "insight".
People have 200+ friends on their friends lists but only a handful of those are meaningful
What I have found, by making most of my Facebook behavior submitting links for comment (a la participation on HN), is that many more of my friends have become more meaningful as they interact with one another based on the only commonality they have--that they are all my friends. People who have never met one another even once get into great conversations on my Facebook link threads, and I learn a lot from them by lurking in the discussions. Eventually, I gain a bigger, more cohesive social network that way, including people all over the world who are talking about the day they have a "cocktail party" with me while enjoying serious conversation on thought-provoking issues. The value of the graph is all in how you use the graph.
Why is this kind of data ever going to be useful?
Accurately gathered data are almost always useful. The mark of a brilliant researcher is figuring out new questions to ask about data that are already gathered.
Everything you say is probably true. But like many trivial things, this could be a stepping stone into deeper insights. Some academic/marketer researches this data, which leads him to a question that can't be answered by the data (and which he may not have thought of otherwise). So he does that data collection and analysis.
One example I'm interested in exploring: eigenvector centrality analysis on the social graph. It would be like a PageRank algorithm for people, quantifying how influential individuals are. This necessitates a full social graph. Having only a subset of it would be like Google only looking at the first 10 links on a page for its search algorithm.
For the PageRank chunk of their search algorithm? If so, that is fascinating, because that means I could work with a graph subset and get good results. Could you link me?
Unfortunately, this is something I heard from a Googler at SMX East. And I don't know how applicable it is--a human would find a long list of navigation links boring, but to a search bot they have much more content than, e.g., a list of "related pages".
I don't think this data set in complete in that sense since only public profiles can be crawled in this way. At least that's what I gathered from the article.
Pete still uses us for his crawling. He switched over to AWS for some of his crawling because we had a throttle on some of the sites he wanted to crawl. But now that we know we can crawl those sites more quickly, we're looking into relaxing the throttle.
"You can make up a new title if you want, but if you put gratuitous editorial spin on it, the editors may rewrite it."
I much prefer HN submissions to use an original article title, the better to avoid duplicate submissions. And usually those titles are more interesting than titles users here make up.