Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Exactly. If they're not tapping all communications on the net, then why are they building a data centre to store and process all of that data?

Because I personally can't imagine another datastream 1) of such magnitude and 2) worth billions of dollars to build a data centre for.



Even at 100k square feet, I don't think they have the capacity to record and store all data on the Internet backbone (they're recording a lot of data outside the US, too).

First, they'd need a shadow Internet backbone capable of transporting all that data to the DC. Second, they'd have to build a system with the same write-rate as all of the internet backbone. This stuff isn't cheap, and the NSA's budget is big, but not that big.

As 1/3rd of the US's internet traffic is porn, which you can really just de-dup by URL, it's a safe bet to say that they're filtering before they transmit data back home. The same goes for streaming video traffic. The remainder is mostly web with a small fraction being actual communication between people. For the NSA's mission, even implemented in the most evil way, they just don't have the money, means, or motivation to record everything. Instead, filtering on *hotmail.com connections, anything over SMTP ports, etc. makes substantially more sense.


Doing some digging, I find it rather hard to find data on the amount of traffic being sent in and out, and internally of the US. The best numbers I could find was from almost 10 years ago, which said around 970Gbit/s. Say it has gone up 100x, and we get a nice round number of 10 TB disk space needed per second.

We know, thanks to XCD, that 7523 hard drives per second is created by the storage industry (globally, a total of 650 million drives per year). Say that the average storage space is over the last 5 years, around 500GB.

That mean, so long NSA buy's 0.265% of all hard drives produced each year, they will have enough hard drives to record all data transmitted inside the US borders.

They would still need to write the data. One obvious way would be to store it on site, and transport the drives to a central place. Drives are not big, but it is a noticeable work, so if people were doing this, there should be more verifiable proof of it. If we include post-storage compression, finding duplicates and any other tricks, the numbers should be able to be lowered by 50-75% or so, and might be enough to send some through the wire and only the overflow through drives loaded onto trucks.

One could also ask what 0.265% of the storage industry output is in raw cash. To answer that, my answer is, I dont know :).


There could also be an incoming relevance filter. I'm sure the NSA has no interest in archiving all the pornography, spam, and cat pictures ever transmitted.


Also note that the majority of bandwidth nowadays is spent on videos and similar multimedia content. So if they notice that 100M people are watching Bieber's latest music video, they just need to store the YouTube URL. No need to store 100M copies of the video itself.


Why do you assume the same URL retrieves the same data every time? :-)


Why do you think that a massive block level deduplication system would care?


They don't need to store the traffic. All they need is to store the endpoints and timestamps and you could compress that quite well to have usable information, such as who talked to who and when.

Your actual data is just a subpoena away anyway in most cases.


Exactly, why store all the data in your super-expensive top-secret warehouse, when Facebook and Google will keep it for you for free?

Free == didn't come out of the NSA's budget


Money are of no objection to that industry. The combined budget of NSA, CIA, FBI, Pentagon, foreign bases, Afghanistan and Iraq occupation, etc. is in the trillions. Why trust FB or Google if all it costs to capture and store the information are money.


NSA is an intelligence agency, not a law enforcement agency. They don't operate within the court process. Which means they don't really do subpoenas or warrants. Furthermore, any data covertly collected by the NSA would be inadmissible in court anyway, which would make it impossible to get a subpoena or warrant based on it anyway.


In 1971 we put wire taps on Soviet underwater communication lines within territorial waters despite sound detection devices placed along the seabed. I'm sure the country that had the ingenuity and balls to pull off Operation Ivy Bells (http://en.wikipedia.org/wiki/Operation_Ivy_Bells) when it faced the existential threat of Communism is perfectly capable of figuring out how to do the exact same thing to packet communication in the continental United States as it faces the threat of Terrorism.


It is a fundamental issue of volume. Unless we assume that backbone providers and Internet companies habitually over-provision, then it would take about the same size pipe, and the same size data centers, to "copy" the Internet in real time.

Consider how big the Internet is. Even if the NSA has 5 100,000 square foot data centers, think about adding up the aggregate data center footprint of Microsoft, Facebook, Apple, Google, Yahoo, Amazon, Rackspace, etc, etc. That comes to a lot more than 500,000 square feet. And this does not even get into the enormous data resources in Asia and Europe.

A much more likely scenario is that they are heavily filtering the data in real time and keeping just what looks useful or suspicious. That is still scary, but less fantastic than the idea that every packet is getting stored by the NSA.


I agree that they are not keeping everything.

I've looked at my own traffic and how much content useful for storing I'd have and it's probably less than 500Mbytes per month. And that's counting downstream and upstream.

But net is not the only thing NSA is interested in storing - banking, other records, communications over satellites, etc goes in there as well.

What's truly scary is people are sort of "meh" about this. Or they don't believe that this is really happening, as most of this thread seems to be the case in point.


Then the most secure type of encryption could be using porn?


Based on statistics from tor exit nodes that that has been published, porn surfers is indeed using the most secure type of encryption.


Steganography is big business.


Citation needed.


The porn is in the top ten of their priority list. A great way to find the weak points and blackmail possibilities for current and future opposition leaders.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: