Decrypting Blind's Encrypted API

kccqzy · on March 21, 2020

This is yet another reminder that good JS minification tools exist that can absolutely change object properties into short minimal strings instead of descriptive names. It's called the Closure Compiler in advanced mode. You do have to have quite a bit of discipline in writing the JS to have that though. Some languages like ClojureScript actually do this by default, so it doesn't take much effort.

Also it helps if you don't have to use objects (with keys) to transfer data. What I mean is that there's little reason to use

    {
        "alias": "b6WJEDTp",
        "member_nickname": "faRw33",
        "created_at": "4d",
        "is_auth": "Y",
        "board_id": 114961,
        <snip>

when you instead can use a simple array

    [
        "b6WJEDTp",
        "faRw33",
        "4d",
        "Y",
        114961,
        <snip>

if you have some post-processing to transform array indices into object keys.

Both of these approaches also cut down on the amount of data transferred over the wire, so it saves data and helps speed up the site for users too.

_qwfv · on March 21, 2020

The thought of the latter just made me shudder. Removing keys locks your API in really unpleasant ways. More importantly, it's less human readable and harder to reason about. Please don't do this unless you've got a very specific need for the performance.

kccqzy · on March 21, 2020

Keys are not really removed in code, only from the wire format. In code you can still write `myObject.property` but the minifier translates that to say, `a[6]` instead. Naturally there would be tools for the developer to translate these arrays back into full objects in the case of debugging in production. Harder to reason about yes, but few devs would reason about their code using minified code, why should they reason about using minified wire format?

(Of course I must admit that this is only suitable for private APIs, not APIs published to explicitly allow third parties to use.)

Have you tried looking at, say, Gmail's XHR requests and responses?

fastball · on March 22, 2020

If you're going to do that, why not just forego human readability entirely and user an interchange format like protobuf or flatbuffers?

kccqzy · on March 22, 2020

Because if you use the protobuf wire format, all the parsing code needs to be written in JS and won't be as performant as JSON parsing built into the browser.

But yes you can in fact transform protocol buffers into JS arrays in the way I described. I'm essentially describing protobuf designed for JS. Imagine your protobuf definitions are read by a compiler which spits out JS classes with getters and setters. These getters and setters access the underlying array with an assigned index. Your minifier inlines these getters and setters into direct array access. Voilà.

fastball · on March 22, 2020

Right, but the bottleneck is generally not "how long it takes to parse the payload". For anything other than MB of data, I doubt you'll notice much of a difference between parsing JSON and flatbuffers. The bottleneck is how large and how long it takes when sending it over the wire.

kccqzy · on March 22, 2020

Right. Because protobuf parsing is not built into the browser, you have to add the time it takes to transfer protobuf code as well.

fastball · on March 23, 2020

But transferring the parser happens once, transferring JSON happens every time.

Really depends on the use case I guess. But any situation where I'm using JSON arrays instead of keyed objects for efficiency reasons is probably a situation where flatbuffers makes just as much (if not more) sense.

yonder · on March 22, 2020

Why do you assume that native JSON parsing is faster than protobuf parsing using JS?

In most cases protobuf is faster.

seangrogg · on March 22, 2020

Having spent a LOT of time looking at this, browsers come with built-in (nil cost) JSON parsers.

You need a proto parsing lib and a collection of .proto schemas to even begin using protobufs, so you need to be dealing with at least that much data saved before proto even starts being a win. While the parsing lib can be cached and is largely a rounding error over a long term, every iteration to the .proto files means fetching a new version which contains all the contents of the previous version (or else sacrificing backwards compatibility).

Beyond the additional payload costs you also have to factor in the API itself. Any win to keys can largely be obtained via compression so that's only a nominal win. APIs with many string values are not going to see many benefits, either, and may actually be better served by compression. The real win for proto is in large numbers but there aren't many APIs using many values in the 256-65k range (let alone higher). Proto does do really well with booleans and null, though. Unpacked arrays aren't a really strong win for them, either (though packed ones are a win for large arrays). They also have weird quirks for maps that don't let them achieve parity with JSON, IIRC.

Parsing time is not a huge win given normal API response sizes. I was parsing a JSON blob with 100k values four years ago on a shitty Dell in 2 seconds and can't think of anything near that size in the wild. Most API responses are going to be parsed faster than human perception rendering the point mostly moot.

The real win is the direct impact to spend on bandwidth that scales with size, but that comes at the cost of developer productivity and not everyone has Google's warchest and can afford SWEs memeing about how they get promoted by spending 2 years updating protos.

Having worked at Google, Protobuf is a solid choice when you're working in multiple languages on multiple internal machines and haven't already bought into other means of serializing data. But they do not particularly shine when targeting browsers unless there is a LOT of data going back and forth and your front-end engineering team doesn't mind working around jspb's quirks, opaque errors, and subtle nuances.

ec109685 · on March 25, 2020

Parsing json is faster than protobufs in general: https://jsoniter.com/

Protobufs weren’t built for speed.

petters · on March 22, 2020

That is problematic if your service is on 24/7 and new version clients have to talk to old version servers and vice versa.

Better to use protobufs at this point.

jiofih · on March 21, 2020

Is there any point in encrypting API payloads when the traffic is going via TLS?

chocolatkey · on March 21, 2020

Potentially to prevent MITM proxies on company computers from being able to sniff the traffic. Maybe because of what blind is about, that would make sense? Otherwise, if it's secure TLS, then no reason at all

Edit: maybe the reason they use public key for transmission is because you can't reverse that, and that would potentially be where your anonymous complaints your complaints (or whatever you do on blind) would be?

a1369209993 · on March 21, 2020

Note that as thenewnewguy says, anyone who can MITM your connection can also inject JS spyware (well, more so than usual) to exfiltrate your comments. That's harder (and much harder still to avoid discovery) than just sniffing the traffic, so it might be a useful stopgap, but for real security you need to fix your web browser to reject MITMed connections.

pheug · on March 22, 2020

Security through obscurity. Pretty sure Blind did this just to piss off scrapers. I mean I've been scraping them for some time until they pulled this encryption trick about a year ago. I didn't have time to reverse engineer their js and keep my scraper up to date, that'd take way more time than it took me to write some 100 lines to scrape them originally. So score 1 for them.

jiveturkey · on March 21, 2020

For blind? yes. It is designed to be anonymous from your employer. Many employers, especially those for which employees would enjoy anonymous complaining, have TLS-intercepting middleboxes.

bowmessage · on March 21, 2020

How would that work, unless the Blind posters are posting from corp-managed phones which have company-signed certs installed?

RyJones · on March 21, 2020

lots of places use an MDM profile if you connect to work email, for instance.

SlowRobotAhead · on March 21, 2020

This kind of deep packet inspection is in no way limited to phones.

bowmessage · on March 21, 2020

sure, my point was really: there's no way for your employer to intercept the packet contents unless they've got their own cert bundle installed, and you'd have to be pretty clueless to use this from a corp machine of any kind.

techslave · on March 22, 2020

most people are in fact clueless.

andersonmvd · on March 21, 2020

Reasons I can think of: depends whether you assume TLS is not going to be broken again and whether the TLS termination happens before the component you want to process the data, to do any sort of check, e.g., web application firewall. With the goal of reducing insider threat and reducing exposure of sensitive data to components that don't need to know such data. However usually it's a bad idea to solely rely on javascript crypto: https://www.nccgroup.trust/us/about-us/newsroom-and-events/b...

thenewnewguy · on March 21, 2020

In theory: no, because anyone able to break the TLS could just slip in some JS to capture your comments.

In practice: possibly, because many companies use TLS proxies that probably aren't doing that?

toast0 · on March 22, 2020

If your API goes through a CDN you don't fully trust.

muznar · on March 21, 2020

Nice article! I always wonder what the legal aspects of publishing a reverse engineering article for a private API are? Does the company that the API belongs to have rights to an obligatory take down request?

userbinator · on March 21, 2020

Is it really "private" if everyone with a browser and a brain can see what it's doing...?

artificial · on March 22, 2020

This is what's frustrating about accessing content online. Is it fair game if it's on a web server since the requester cannot determine intent? Legally it doesn't appear so.

muznar · on March 21, 2020

I also would like to know. Does it even count as intellectual property of the company?

bowmessage · on March 21, 2020

I've gone through this same exercise in the past in order to mass-delete a large number of comments on different threads. I was afraid that Blind may one day suffer a data leak. I attempted to reroll the crypto in Ruby, but ultimately failed and went the JS route, same as the author. I also had to roll my own sesion-token refresh logic. Finally I was wondering if any kind of data mining could be done with the tool, but I never took it that far. Thanks for the writeup!

choppaface · on March 21, 2020

Well they already had at least one breach: https://techcrunch.com/2018/12/20/blind-anonymous-app-data-e...

sonicggg · on March 21, 2020

You'd think that engineers from top - tier tech companies would know better, before sharing sensitive information on some random website.

choppaface · on March 22, 2020

A lot of engineers make money off Blind through referrals, if not through Blind's service ( https://www.rooftopslushie.com/ ) then through private messages.

tehlike · on March 21, 2020

People like venting.

seangrogg · on March 22, 2020

Others don't even care if the info was de-anonymized in the first place and just enjoyed the topics that were more openly discussed there.

tomsmeding · on March 22, 2020

So, they used asymmetric encryption for the request so that a MITM can't read that, but they used symmetric encryption for the response. Though it requires a MITM to fully analyse the code, it allows a MITM to decrypt any response. Cited possible reason (in the conclusion) is performance.

I think you don't have to resort to symmetric encryption here, even keeping performance in mind. What you do is generate a new asymmetric keypair on the client for every session, then send the public key over to the server. Then the server encrypts every response with that public key, allowing only the client to decrypt it.

Doing that, one can only read a session's network traffic, both ways, if they can read values of variables on the client -- but if one can do that, you can read everything anyway. ;)

EDIT: forgot to talk about performance -- you just use a so-called "envelope", where the sending party first encrypts the data symmetrically with a randomly generated key, then encrypts that random key with the asymmetric crypto. The pair (symmetrically encrypted data and the asymmetrically encrypted key) is sent to the receiver, which can use its private key to decrypt the symmetric key, with which it decrypts the data.

zapttt · on March 22, 2020

the sad state of web developers.

from the silly comments of "infinite scrolling" being definitive proof of a solid rest api behind and that php is or is not capable of either (the writing is too ambiguous). to the roundabout amateur obfuscation (the author calls encryption) that is entirely akin to the JavaScript that disabled right click to "copyright" the page's content in the 90s.

sigh

strictnein · on March 23, 2020

The code was mildly obfuscated. The data was encrypted.

dirtydroog · on March 22, 2020

And all those medium.com articles they write...