Agreed and not to mention that most people won't use anything close to the 50GB,...

lubos · on Jan 18, 2013

they say, all files will be client-side encrypted which won't allow them to "deduplicate" anything.

joenathan · on Jan 18, 2013

There was a discussion the other day on how this was possible, where a hash would be created and that would allow for both encryption and deduplication. Let me see if I can find it.

lubos · on Jan 18, 2013

I understand this would be possible but then they are essentially lying about their privacy attributes.

TylerE · on Jan 18, 2013

Doesn't seem like an unreasonable assumption to me.

rscale · on Jan 18, 2013

Dropbox used to say: "All files stored on Dropbox servers are encrypted (AES256) and are inaccessible without your account password." even though the files were accessible without their account password.

Mega might be doing the same thing: saying one thing to attract early adopters, and changing the marketing language once it gets broader adoption by people who don't care about that attribute. It's dishonest, but it would hardly be shocking to learn that a multiple felon was being dishonest.

X-Istence · on Jan 18, 2013

Just because a file is encrypted doesn't mean that a small block doesn't match another encrypted block even if the two are from two different files.

This would allow them to do deduplication at the block level (see ZFS for example).

rayiner · on Jan 18, 2013

Think about that one for a second. An encrypted block is essentially supposed to look like random data. I.e. if two people encrypt the same file with different keys, you shouldn't be able to tell that they're the same file (or your encryption sucks). So your block-level de-duping then depends on incidental matches between random data.

What's the probability of two 4KB (or whatever) blocks of random data being identical? Basically zero even with petabytes of data.

harryh · on Jan 18, 2013

see: convergent encryption

The encryption on the client doesn't use a random key. The key is a hash of the unencrypted contents of the file.

rayiner · on Jan 19, 2013

Reading list-ed. Interesting.

harryh · on Jan 19, 2013

The issue is that you don't get the full benefits of encryption.

If you upload the map to the rayiner family treasure that only you have seen you're good. No one else will be able to read it.

But if you upload the latest episode of Modern Family and Disney gets ahold of the same rip you used they (if they can get a government to help them out) can see what you did and charge you with copyright infringement (or whatever the appropriate crime would be).

tptacek · on Jan 19, 2013

After 3-4 years of high-profile CPA-2 attacks on TLS, .NET, Java, and other systems, you'd think we'd all be a lot more skeeved out about cryptosystems that demand known-plaintexts. There's already an obvious conceptual attack (beyond file confirmation) in naive "convergent encryption", which is that you can leverage small amounts of known plaintext to learn unknown plaintext.

nwh · on Jan 18, 2013

In practise any strong encryption (ie, not AES-ECB) is indistinguishable from random noise, and that's by design. Even trying to de-duplicate 4kb blocks of random noise would be a completely fruitless task. If it was possible, storage is probably cheaper than the CPU time to find similar blocks.

larvaetron · on Jan 18, 2013

I don't think this is true. 50 GB of free storage for each user would be cost prohibitive without (at least) block-level deduplication.

alsocasey · on Jan 18, 2013

Some kind of homomorphic encryption scheme maybe?