Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The Man Who Looked Into Facebook's Soul (readwriteweb.com)
38 points by raju on Feb 9, 2010 | hide | past | favorite | 13 comments


Why is this kind of data ever going to be useful? People have 200+ friends on their friends lists but only a handful of those are meaningful and the rest of those links are just about facebook values in the same way that capitalism is about the acquisition of money. So I'm a little afraid about people using this data to generate social "insight".


People have 200+ friends on their friends lists but only a handful of those are meaningful

What I have found, by making most of my Facebook behavior submitting links for comment (a la participation on HN), is that many more of my friends have become more meaningful as they interact with one another based on the only commonality they have--that they are all my friends. People who have never met one another even once get into great conversations on my Facebook link threads, and I learn a lot from them by lurking in the discussions. Eventually, I gain a bigger, more cohesive social network that way, including people all over the world who are talking about the day they have a "cocktail party" with me while enjoying serious conversation on thought-provoking issues. The value of the graph is all in how you use the graph.

Why is this kind of data ever going to be useful?

Accurately gathered data are almost always useful. The mark of a brilliant researcher is figuring out new questions to ask about data that are already gathered.


Everything you say is probably true. But like many trivial things, this could be a stepping stone into deeper insights. Some academic/marketer researches this data, which leads him to a question that can't be answered by the data (and which he may not have thought of otherwise). So he does that data collection and analysis.


One example I'm interested in exploring: eigenvector centrality analysis on the social graph. It would be like a PageRank algorithm for people, quantifying how influential individuals are. This necessitates a full social graph. Having only a subset of it would be like Google only looking at the first 10 links on a page for its search algorithm.


Google used to limit the number of links they looked at. Now they don't explicitly limit it, but they'll stop if the links seem boring.


For the PageRank chunk of their search algorithm? If so, that is fascinating, because that means I could work with a graph subset and get good results. Could you link me?


Unfortunately, this is something I heard from a Googler at SMX East. And I don't know how applicable it is--a human would find a long list of navigation links boring, but to a search bot they have much more content than, e.g., a list of "related pages".


I don't think this data set in complete in that sense since only public profiles can be crawled in this way. At least that's what I gathered from the article.


Where false insight is most troubling is in the hands of a totalitarian regime. Don't think it can't happen here.


He using amazon cloud services , instead of 80 legs for the crawling.

Wasn't the economics of 80 legs was supposed to be much better then amazon ?


Pete still uses us for his crawling. He switched over to AWS for some of his crawling because we had a throttle on some of the sites he wanted to crawl. But now that we know we can crawl those sites more quickly, we're looking into relaxing the throttle.


Can we edit the title to not be so emo?


It's the original article title, which is preferred under the HN guidelines.

http://ycombinator.com/newsguidelines.html

"You can make up a new title if you want, but if you put gratuitous editorial spin on it, the editors may rewrite it."

I much prefer HN submissions to use an original article title, the better to avoid duplicate submissions. And usually those titles are more interesting than titles users here make up.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: