I'm unclear, why transform to a bytecode as a first step?
Wouldn't it be simpler to instead to transform to an AST and work against that for everything? Wouldn't it make sense to generate the bytecode as sort of a last step before doing heavy duty optimizations?
Seems like with JS you would be constantly transforming from bytecode -> AST -> bytecode -> machine code, such as every time a method adds a new field to an object or some optimization assumption is violated. It doesn't seem like it would be easier or faster to work with. I'm guessing there wouldn't be a whole lot of memory benefits either.
Would someone mind illuminating me on this design choice? (Granted, I've not taken a compilers course, so feel free to call me an idiot for not knowing something basic about how compilers work).
* Adding a new field or method to an object at runtime doesn't change the AST.
* You don't translate from bytecode to AST.
* In tiered JIT architectures--typical for dynamically-typed languages like JavaScript where as you point out effective types can mutate at runtime--most code is executed by an interpreter, not as compiled machine code.[1] Interpreters are more efficient when executing a bytecode than walking an AST.
* Translating an AST straight to machine code is much more difficult than translating a low-level intermediate representation, and such representations are often easy to express as simple bytecodes. Compilers like GCC and LLVM use intermediate representations: RTL and IR. Both are effectively assembly languages, and like all assembly languages trivial to express as bytecode. GCC also has GIMPLE, a higher-level intermediate representation earlier in the pipeline which expresses a generic AST. The fact that GIMPLE and RTL coexist drives home the point that you don't want to be translating tree-like representations straight to machine code.
* You don't need to take my word for this, or the word of anyone else. Every programmer should have experience writing parsers, compilers, and interpreters, regardless of whether they took a class or even went to university. By doing this most of the reasons for the way things are done will become immediately clear to you. Note that splitting strings with regular expressions does not count as parsing, not for these purposes. A good project, however, would be to write a regular expression parser, compiler, and interpreter. There are many examples to follow and you can copy their designs exactly; just don't copy+paste as actually going through the motions and understanding the complexities inherent in the implementations is how everything will become clear.
[1] How this mutation is handled and why tiered architectures are preferable is another topic entirely.
Definitely agree that most programmers would benefit from studying these topics. They have broader applications than many realize.
You can use tools like jscodeshift [0] to refactor code! I've previously written codemods to migrate away from a bad design decisions. The React team has also used codemods to introduce breaking changes. AST Explorer [1] is the best tool I've found for writing codemods since it gives you nearly instant feedback and allows for a good amount of exploration.
Swift and Rust also have their own higher-level intermediate representations that they translate their AST to before even going to LLVM IR (SIL and MIR respectively). Rust used to translate their AST directly into LLVM IR but they switched to going through MIR first because it unlocks a lot of optimizations as well as improved user-facing functionality such as non-lexical lifetimes in the borrow checker.
That being said, consistency of internal data structures is probably one of those areas that separates a JIT from an offline compiler. I can see the benefit of switching to a bytecode representation pretty early in that circumstance.
Every programmer should have experience writing parsers, compilers, and interpreters, regardless of whether they took a class or even went to university.
The majority of developers doing yet another software as a service CRUD app or your typical “dark matter developer” who will write software that will never see the light of day outside of the company they work for would never benefit from writing a compiler or interpreter.
And you personally have studied the curriculum of every single computer science/computer engineering degree and can say that every single one required writing a compiler or interpreter and that no person who hasn’t written one can call themselves an engineer?
Should we do an exercise of which US tech was actually researched by immigrants with a proper engineering degree, and how much of that tech is actually built on US?
Should we do an exercise in how much knowing how to write compilers helps in the day to day life of your average SAAS CRUD developer or bespoke app developer - my original statement?
It depends if the bespoke develop ever needs to parse some kind of data file to import into their CRUD application, provide customization points to their customers or configuration file parser.
Then there is the whole point if HR is hiring actual engineers.
For a very long time JSC did use an AST, and interpreting and executing an AST is expensive. The very first version of the byte code interpreter (all hail squirrelfish) was in the order of twice the speed of the AST, and the AST had been extensively micro-optimized at that point, to the extent that the dispatch cost itself (a single indirect call) was more than 10% of execution time.
There were also a bunch of correctness issues:
* Handling exceptions required manually check call results everywhere. There was an unending stream of minor correctness bugs caused by this. Looping semantics were similarly exciting. Because every call technically required those checks there was a significant cost there, even though there were numerous cases they were missed.
* Changes in language semantics required duplicate changes in many different places
* New language features require a lot of very similar behaviour, with minor variations that resulted in duplicated code, introducing new places to accidentally miss exception and loop checks.
* Memory overhead was massive, both in terms of actual usage, and also allocation costs.
* Significant reductions in maximum recursion depth. Back in the days of the AST interpreter JSC had a hard limit of 500 levels of recursion, with plenty of checks, and it would still periodically blow the system stack and crash.
The cost of producing the byte code in the first place is super low, and because the byte code is all essentially the most primitive of JS operations, new language features generally don't require the optimizing passes to be aware of any changes, because they're mostly just syntax around existing primitive operations.
AST is less transformable. We transform our bytecode as if it was just an IR.
ASTs are a pretty annoying form to use as a source of shared truth for OSR. JSC has 4 tiers so we need a convenient-to-use shared truth IR. That’s what bytecode is for. For example, bytecode offers high-scalability answers to questions like “where should I exit” and “what is live when I get there”.
Compiler IRs tend to have the property that operands are not themselves expressions; they can only be variables or maybe constants. That makes it easy to identify sequence points, which makes it easier to insert code or reshape the control flow graph.
A bytecode is easier to keep stable than an AST is; an AST changes as the language changes, but that doesn't mean you have to change the bytecode, only change the transformation from the AST to the bytecode.
I have no idea if this is the reason, but it's a possible one.
It's a bit of a micro-optimisation but ASTs are also usually represented fairly sparsely, so processing on bytecode could be much faster due to more efficient cache utilisation etc - some benefits from memory locality and also from bytecode usually being smaller in memory)
Kudos to Tadeu Zagallo and the JSC team for landing this awesome patch!
I've worked on WebKit memory performance in the past, so I'm well aware that these aren't low hanging fruits we're talking about. The type safety bonus features look great too. :)
Re: Direct vs. indirect threading (aka ordinary dispatch). I had the impression that recent Intel x86 chips did enough trace caching to render any performance distinction basically irrelevant. Is that right?
(Of course, Apple has their own ARM implementations to consider.)
EDIT: Here’s the paper I was thinking of in this regard:
My first thought when they said they switched from direct threading to indirect threading was that it may have been the worst possible time to do that given Spectre.
The negligible performance difference on modern Intel chips between direct and indirect threading is largely a result of their deep, heavily buffered branch predictors. Spectre mitigations are going to increase the performance differences. Most userspace applications will forgo mitigations in favor of keeping performance, but the browser was the big exception as it has to be concerned about in-process side-channels.
I can't speak to their existing mitigations directly, but more generally Spectre will be an ongoing saga for years to come.
I seriously doubt the mitigations in place, whatever they are, are comprehensive even for existing proven Spectre channels. The engines haven't even solved RowHammer. It's a very difficult problem domain, particularly for an application JIT'ing random, untrusted code from the Internet. In many respects they're (IMHO) quietly punting because there just aren't satisfactory solutions. Browsers are really stuck between a rock and a hard place, more so than VM providers like AWS EC2.
One of the more general solutions is simply to stop trusting the browser, if not your entire operating system. Keep your most sensitive secrets inaccessible to software to the greatest degree possible. Use smartcards and other hardware tokens for authentication, for example.
RowHammer is particularly frustrating because it's a combination of hardware/firmware/OS/software vulnerabilities; you need multiple levels of protection and it's hard to count on having them all.
I remember having a discussion about rowhammer on the v8 team when the original paper first came out - most of us thought it wouldn't be possible to exploit in JS but I suspected it would. Unfortunately I was right :/
I suspect at some point we'll discover WebGL can be used for rowhammering somehow too.
My point is you can't say "oh this approach may have poor behavior once Spectre mitigations are put in place", because the browser has already implemented Spectre mitigations.
There's more than one mitigation, and not all at the browser level. The kernel may also flush the branch cache on syscall entry, which is going to hurt performance if you depend on predictions being fast. Though as far as I know pretty much nobody uses that mitigation precisely because it's too slow.
Direct threading and indirect threading both rely heavily on the indirect predictor. You might even say that direct threading relies more heavily on indirect predictor state and in particular the contents of the IBTB because it has so many more branches and so more opportunity to store per-location history for particular instructions and hence more to lose if the IBTB is flushed.
Indirect threading relies more on strong and deep history pattern matching at a single location, and probably uses less state overall.
They’ve tried a bunch of things and basically landed on “we’ll do process isolation for Spectre but for Meltdown/Fallout/ZombieLoad/Other MDS vulnerabilities we’re basically fucked, so it’s on lower levels”
I believe that JSC is in the lead when it comes to throughput and latency. But that is at least partly based on benchmarks that I had a hand in designing.
I don’t know how the VMs stand against each other on memory. JSC is improving in this area a lot recently but I don’t know if it’s just catching up or leaping ahead or whatever.
When they talked about getting rid of their threaded interpreter, I was reminded of this article from 2008 about writing fast interpreters, if that's interesting to this audience:
This is interesting! If anyone from WebKit is in the comments, can you provide link(s) to the bugs discussing the bytecode caching API? I'd love to take a look but my Bugzilla search skills are evidently weak.
WASM byte code is structured into a tree, which means an interpreter loop wouldn't perform well. You'd really want to flatten it into a simpler bytecode before interpreting it -- and you'd want to do other transforms on it before optimizing it.
I don't think WASM bytecode is a good format for execution, and it's only mediocre as a compilation target.
Bad question on my part. More relevant: Why wasm? From what I gather, js seems to already have a bytecode for their JIT runtime. Why not expose that so that we can have C++ in the web?
Oh wow!! In a way it's even better than bytecode because it allows marking unused functions for lazy loading, giving speedups of 97% (in theoretical settings;)
WASM's (abandoned, because brotli is good enough) Binary Encoding proposal was essentially a binary AST compression system that worked for both JS trees (esprima) and wasm ASTs.
You get pretty good improvements by converting ASTs into SoA representation, interleaving them, and then compressing them. Fast to decode, too.
Because then any changes require duplicating the byte code parsers, and adding translation logic to manage those changes.
You must understand: any change would be an API break, and API cannot change. Deprecating an API takes years. You also can't change/add API in security updates, which means if a security fix required changing some aspect of the bytecode that change couldn't be made.
Lets say you change numbering of registers? Thats an API change.
Lets say you change number of opcodes? API change.
Any new API in macOS and iOS requires weeks of review cycle to ensure:
* It solves a problem in a generic manner: e.g. it doesn't solve a very specific version of a general issue
* It can be kept stable: e.g. it doesn't expose implementation details that may change
* It is ABI stable: direct memory access APIs cannot expose layout that can vary based on any internal details.
Shipping API is incredibly difficult if you care about ABI stability for software.
binary-ast doesn't give any significant benefits over JS, potentially a small reduction in size vs standard gzip responses, but I've yet to see compelling evidence binast offers any significant time saving to first execution.
It's important to realize: binary-ast is no more readily executable than plain JS, it is no more trustable than plain JS, and even if they were parsing is not been a significant part of ttfe in JSC in a long time. It's the interpreter codgen that eats up time, and the execution speed difference between AST (which JSC used for a looonng time, and the final builds beat the contemporary bytecode interpreters of other engines when it was replaced). The performance of an AST is so much slower than bytecode interpreter that you require a tiny amount of JS before the cost of codegen+execution beats just execution using the AST.
JSC makes breaking changes to the bytecode format on a regular basis. It’s an internal implementation detail and isn’t going to be exposed to standards.
Also the bytecode format has no memory safety guarantees so standardizing it wouldn’t be a great idea. It’s only trusted to obey basic VM invariants if we know that we generated it ourselves.
This will never happen because it turns internal IR (subject to change) into a formalized format that people will depend on every detail of, including bugs.
No, because that would require exposing implementation details. e.g. the first API version of the bytecode would be the fixed API, and any future changes would require a translation layer for converting from one version of the bytecode to the other.
The bytecode is also an internal format so is consider trusted - once you expose it you have to worry about invalid bytecode.
Bytecode is a broad category. The design of a bytecode suitable for interpreting JS has little in common with a bytecode for AOT compilation of static languages without tracing GC, other than that they're both made of bytes.
bin ast is/was an attempt to make a more compact version of JS that was easier to parse.
But parsing JS isn't a significant bottleneck, it's the subsequent work to produce the stuff that actually runs. IIRC proponents also claimed it meant you didn't have to worry as much about validation, which isn't true because it's untrusted content that comes from the internet. It must be validated.
binast also is not designed to be any more readily executable that JS - so even if it were supported step one would be "produce a bytecode that is fast".
And it is already shipped in Safari 12.1 ( Which means many are already using it )
Time and Time again it seems WebKit are the only team that wants to make the Web with better Web Pages with javascript , All the others seems to want the Web to be Fat Apps.
In the hope of anyone in Safari team is reading it.
Please make the Tab Overview Cache the Thumbnail or in List format, currently pressing Tab Overview will reload all the tabs in the background. I don't know if this is for generating thumbnails or other reason. Something that kills my machine when I have 300 Tabs, most of them are "cold" and not loaded.
"The new bytecode format uses approximately 50% less memory, which means a reduction of 10% in overall memory usage for JavaScript-heavy websites, such as Facebook or Reddit."
This is squarely to help webapps be faster and more memory efficient. This change is great, and doesn't align with the cargo cult of, "Make web pages great again."
> Time and Time again it seems WebKit are the only team that wants to make the Web with better Web Pages ( And we are far from perfecting it ), All the others seems to want the Web to be Apps.
I don’t see how this blog post supports that view, since it’s talking about optimizing JavaScript.
By Web page I mean including minimal Javascript, such as making initial JS loading faster, lower latency, lower memory, and in general much better UX. To me Chrome seems to be optimising for the wrong thing, like maximum throughput, WASM, Super fast in Compute intensive usage but eating memory like crazy.
"V8 team has built a new JavaScript interpreter, called Ignition, which can replace V8’s baseline compiler, executing code with less memory overhead and paving the way for a simpler script execution pipeline."
"V8 built-in functions (builtins) consume memory in every instance of V8. The builtin count, average size, and the number of V8 instances per Chrome browser tab have been growing significantly. This blog post describes how we reduced the median V8 heap size per website by 19% over the past year."
"V8 uses code caching to cache the generated code for frequently-used scripts. Starting with Chrome 66, we are caching more code by generating the cache after top-level execution. This leads to a 20-40% reduction in parse and compilation time during the initial load."
Seems like with JS you would be constantly transforming from bytecode -> AST -> bytecode -> machine code, such as every time a method adds a new field to an object or some optimization assumption is violated. It doesn't seem like it would be easier or faster to work with. I'm guessing there wouldn't be a whole lot of memory benefits either.
Would someone mind illuminating me on this design choice? (Granted, I've not taken a compilers course, so feel free to call me an idiot for not knowing something basic about how compilers work).