This is yet another reminder that good JS minification tools exist that can absolutely change object properties into short minimal strings instead of descriptive names. It's called the Closure Compiler in advanced mode. You do have to have quite a bit of discipline in writing the JS to have that though. Some languages like ClojureScript actually do this by default, so it doesn't take much effort.
Also it helps if you don't have to use objects (with keys) to transfer data. What I mean is that there's little reason to use
The thought of the latter just made me shudder. Removing keys locks your API in really unpleasant ways. More importantly, it's less human readable and harder to reason about. Please don't do this unless you've got a very specific need for the performance.
Keys are not really removed in code, only from the wire format. In code you can still write `myObject.property` but the minifier translates that to say, `a[6]` instead. Naturally there would be tools for the developer to translate these arrays back into full objects in the case of debugging in production. Harder to reason about yes, but few devs would reason about their code using minified code, why should they reason about using minified wire format?
(Of course I must admit that this is only suitable for private APIs, not APIs published to explicitly allow third parties to use.)
Have you tried looking at, say, Gmail's XHR requests and responses?
Because if you use the protobuf wire format, all the parsing code needs to be written in JS and won't be as performant as JSON parsing built into the browser.
But yes you can in fact transform protocol buffers into JS arrays in the way I described. I'm essentially describing protobuf designed for JS. Imagine your protobuf definitions are read by a compiler which spits out JS classes with getters and setters. These getters and setters access the underlying array with an assigned index. Your minifier inlines these getters and setters into direct array access. Voilà.
Right, but the bottleneck is generally not "how long it takes to parse the payload". For anything other than MB of data, I doubt you'll notice much of a difference between parsing JSON and flatbuffers. The bottleneck is how large and how long it takes when sending it over the wire.
But transferring the parser happens once, transferring JSON happens every time.
Really depends on the use case I guess. But any situation where I'm using JSON arrays instead of keyed objects for efficiency reasons is probably a situation where flatbuffers makes just as much (if not more) sense.
Having spent a LOT of time looking at this, browsers come with built-in (nil cost) JSON parsers.
You need a proto parsing lib and a collection of .proto schemas to even begin using protobufs, so you need to be dealing with at least that much data saved before proto even starts being a win. While the parsing lib can be cached and is largely a rounding error over a long term, every iteration to the .proto files means fetching a new version which contains all the contents of the previous version (or else sacrificing backwards compatibility).
Beyond the additional payload costs you also have to factor in the API itself. Any win to keys can largely be obtained via compression so that's only a nominal win. APIs with many string values are not going to see many benefits, either, and may actually be better served by compression. The real win for proto is in large numbers but there aren't many APIs using many values in the 256-65k range (let alone higher). Proto does do really well with booleans and null, though. Unpacked arrays aren't a really strong win for them, either (though packed ones are a win for large arrays). They also have weird quirks for maps that don't let them achieve parity with JSON, IIRC.
Parsing time is not a huge win given normal API response sizes. I was parsing a JSON blob with 100k values four years ago on a shitty Dell in 2 seconds and can't think of anything near that size in the wild. Most API responses are going to be parsed faster than human perception rendering the point mostly moot.
The real win is the direct impact to spend on bandwidth that scales with size, but that comes at the cost of developer productivity and not everyone has Google's warchest and can afford SWEs memeing about how they get promoted by spending 2 years updating protos.
Having worked at Google, Protobuf is a solid choice when you're working in multiple languages on multiple internal machines and haven't already bought into other means of serializing data. But they do not particularly shine when targeting browsers unless there is a LOT of data going back and forth and your front-end engineering team doesn't mind working around jspb's quirks, opaque errors, and subtle nuances.
Potentially to prevent MITM proxies on company computers from being able to sniff the traffic. Maybe because of what blind is about, that would make sense? Otherwise, if it's secure TLS, then no reason at all
Edit: maybe the reason they use public key for transmission is because you can't reverse that, and that would potentially be where your anonymous complaints your complaints (or whatever you do on blind) would be?
Note that as thenewnewguy says, anyone who can MITM your connection can also inject JS spyware (well, more so than usual) to exfiltrate your comments. That's harder (and much harder still to avoid discovery) than just sniffing the traffic, so it might be a useful stopgap, but for real security you need to fix your web browser to reject MITMed connections.
Security through obscurity. Pretty sure Blind did this just to piss off scrapers. I mean I've been scraping them for some time until they pulled this encryption trick about a year ago. I didn't have time to reverse engineer their js and keep my scraper up to date, that'd take way more time than it took me to write some 100 lines to scrape them originally. So score 1 for them.
For blind? yes. It is designed to be anonymous from your employer. Many employers, especially those for which employees would enjoy anonymous complaining, have TLS-intercepting middleboxes.
sure, my point was really: there's no way for your employer to intercept the packet contents unless they've got their own cert bundle installed, and you'd have to be pretty clueless to use this from a corp machine of any kind.
Reasons I can think of: depends whether you assume TLS is not going to be broken again and whether the TLS termination happens before the component you want to process the data, to do any sort of check, e.g., web application firewall. With the goal of reducing insider threat and reducing exposure of sensitive data to components that don't need to know such data. However usually it's a bad idea to solely rely on javascript crypto: https://www.nccgroup.trust/us/about-us/newsroom-and-events/b...
Nice article! I always wonder what the legal aspects of publishing a reverse engineering article for a private API are? Does the company that the API belongs to have rights to an obligatory take down request?
This is what's frustrating about accessing content online. Is it fair game if it's on a web server since the requester cannot determine intent? Legally it doesn't appear so.
I've gone through this same exercise in the past in order to mass-delete a large number of comments on different threads. I was afraid that Blind may one day suffer a data leak. I attempted to reroll the crypto in Ruby, but ultimately failed and went the JS route, same as the author. I also had to roll my own sesion-token refresh logic. Finally I was wondering if any kind of data mining could be done with the tool, but I never took it that far. Thanks for the writeup!
A lot of engineers make money off Blind through referrals, if not through Blind's service ( https://www.rooftopslushie.com/ ) then through private messages.
So, they used asymmetric encryption for the request so that a MITM can't read that, but they used symmetric encryption for the response. Though it requires a MITM to fully analyse the code, it allows a MITM to decrypt any response. Cited possible reason (in the conclusion) is performance.
I think you don't have to resort to symmetric encryption here, even keeping performance in mind. What you do is generate a new asymmetric keypair on the client for every session, then send the public key over to the server. Then the server encrypts every response with that public key, allowing only the client to decrypt it.
Doing that, one can only read a session's network traffic, both ways, if they can read values of variables on the client -- but if one can do that, you can read everything anyway. ;)
EDIT: forgot to talk about performance -- you just use a so-called "envelope", where the sending party first encrypts the data symmetrically with a randomly generated key, then encrypts that random key with the asymmetric crypto. The pair (symmetrically encrypted data and the asymmetrically encrypted key) is sent to the receiver, which can use its private key to decrypt the symmetric key, with which it decrypts the data.
from the silly comments of "infinite scrolling" being definitive proof of a solid rest api behind and that php is or is not capable of either (the writing is too ambiguous). to the roundabout amateur obfuscation (the author calls encryption) that is entirely akin to the JavaScript that disabled right click to "copyright" the page's content in the 90s.
Also it helps if you don't have to use objects (with keys) to transfer data. What I mean is that there's little reason to use
when you instead can use a simple array if you have some post-processing to transform array indices into object keys.Both of these approaches also cut down on the amount of data transferred over the wire, so it saves data and helps speed up the site for users too.