Looking back to when the Mozilla source code was released, back in 1998, the DOM bindings that existed at that point were generated from IDL files that described the various DOM APIs. That generation happened using a compiler called midl, and it was part of the Mozilla build system at the time (though it didn’t run by default), but it was a compiler that was only runnable on Windows. If you were developing on other platforms you needed to get your hands on a Windows computer in order to change or add a DOM API. The output of running midl was a C++ function per method/getter/setter in the DOM API, plus some other stuff to get constants and other details right. The methods/getters/setters that were generated this way did what you’d expect, found the pointer to the C++ object that was being touched, did the appropriate argument conversions, made the call to the actual C++ method, and then potentially converted the result to a type that was suitable to pass back to JS, and in the midst of that it also dealt with exception throwing in case we ran into problems, or the caller called a DOM method incorrectly. The generated code was then committed to the CVS source repository so that others who were not working on DOM APIs didn’t need to re-generate the bindings every time. The generated code also grew to be a significant amount of code, to the order of 2MB of compiled code if memory serves me right. This was a significant chunk of code back in the late 90′s.
The work to make the above change a reality landed on the CVS trunk on 5/8/2001, in what was then known as the XPCDOM landing. It was a massive change, done by several Netscape developers, including John Bandhauer, Peter Van der Beken, Mike Shaver, Mitch Stoltz, myself, and probably more people whose contributions I’ve since forgotten. The change gave us significant code size savings, memory usage savings, and it made the DOM code much easier to hack on. So over all, a good change, one that served us well for many years. It also introduced some problems though, some indirect ones, and other direct problems. One of the problems was that given the fact that we used this generic XPConnect library for reflecting the DOM to JS on the web that meant that some of the guts of XPConnect also became accessible to the web. That means we exposed the notion of calling QueryInterface() on things in the DOM to the web, and it also meant we exposed the global “Components” property on our global objects (i.e. window.Components). Neither of those things belonged on the web. Their use was never pushed, at least not intentionally, but they were there nonetheless, which inevitably meant that some sites started depending on them (fortunately only in small numbers AFAIK, but still). Another problem was that XPConnect depended on a particular prototype setup, in which the XPConnect-wrapped object’s immediate prototype was a flattened view of all interfaces that the object in question implemented. This lead to one problem in particular, which was that people were unable to override existing methods on inherited interfaces in Mozilla’s DOM. To give an example, if a site wanted to override Node.prototype.appendChild, they could do that, but their change would be shadowed by the flattened view of all DOM nodes’ interfaces that XPConnect put on the immediate prototype of every DOM node. With this setup a JS developer could still add to prototypes like Node.prototype, and those additions would be visible on all nodes. But changes didn’t work, and web developers kept stumbling over this problem.
Then over time this overall approach started showing other problems as well, beyond the functional problems I touched on above. The quirkiness of the browser DOM, plus the fact that more and more DOM APIs were added, led to nsDOMClassInfo.cpp growing significant in size, and it also grew to significant complexity, which lead to that file being pretty unwieldy and not very hacker friendly. Performance of the DOM bindings also started to become a problem. At first performance wasn’t a problem, the JS engine (then JS was fully interpreted) wasn’t very fast (at least not by any current standards), and likewise the C++ DOM wasn’t necessarily all that fast, which meant that the overhead of the bindings between the two worlds generally got lost in the noise of JS and the C++ code executing. But as the JS engine grew faster, and the C++ DOM likewise, the overhead of the bindings started standing out more and more.
Now XPConnect wasn’t necessarily slow, but it wasn’t necessarily fast either. It was a generic cross language communication layer, one that was even thread safe, which meant it needed to do a lot of stuff, including locking of various data structures etc. And the generic nature of the library of course meant that there’s few corners that can be cut to speed up cases that really matter for performance. The point is, in roughly the year 2005 or so, it was starting to become more and more of a bottleneck.
At that point, we started looking at optimizing XPConnect, w/o really changing how we used it in fundamental ways. There was some fat that got trimmed, and that helped, but those changes resulted in comparatively small improvements, not the significant gains we’d need long term.
Sometime before this point we had also added the cycle collector, which had the unfortunate side effect of making reference counting more expensive, and XPConnect was pretty reference counting happy. Peter Van der Beken, myself, and others pulled a good bit of heroics to eliminate a lot of the extra reference counting that was done, and that gave some good gains as well.
Then came 2008, with even more optimizations in the JS engine, including a tracing JIT. That made the overhead of the bindings stand out even more, again. Around that point, we had two plans to make significant improvements. The first one was Jason Orendorff’s work on quick stubs, which gave us shortcuts that bypassed a good bit of the slow paths in invoking certain methods/getters/setters on DOM objects. It wasn’t a catch-all approach, but it was one that we could explicitly use for things we believed were performance critical. What quick stubs did for us was that it gave us a code generator that could generate specific code for specific methods (based on a configuration file and the relevant IDL files), and this code could be made very fast. That was a big improvement. But it still left us with some XPConnect overhead in places where we didn’t want it, in particular with DOM object wrapping. Wrapping still went through the fairly heavy weight code that created new DOM object wrappers, or even looking up existing ones for objects that had already been wrapped (i.e. touched by JS before). The second of the two significant optimizations we did in 2008 was Peter Van der Beken’s work on caching the XPConnect wrapper on the DOM objects themselves. This was what became known as nsWrapperCache. That work left us with significant overhead in wrapping new objects, as in, the XPConnect wrapper construction code was still hurting us. But in the case where we were touching a DOM object from JS that had already been touched, we got a lot faster, partially because we were able to look up a wrapper for a DOM object w/o calling into QueryInterface(), which meant we didn’t do any reference counting on that path at all. Plus, we also avoid some thread safe hash table traffic, which helped too.
The next significant optimization after all that was Peter Van der Beken’s work in 2009 on lightweight DOM wrappers (a.k.a. slim wrappers, which is what I’ll call them from here on). These slim wrappers gave us the ability to wrap a DOM object w/o creating a heavy weight XPConnect wrapper (XPCWrappedNative). A slim wrapper is basically just a JSObject that we create and give it enough smarts to make the object look like a real DOM object. And a slim wrapper has built-in smarts that can morph the slim wrapper object into a real XPCWrappedNative object should the need arise, which did if for instance someone explicitly asked for the wrapper from C++, and there were other triggers too which caused a slim wrapper to morph. So with all that, we got to bypass even more of the thread safe hashes etc in XPConnect, which again sped us up. At this stage, the combination of slim wrappers, the wrapper cache, and quickstubs, finally started to give us some serious speed out of our DOM bindings.
Now, around this same time the JS engine team was in full swing making JS faster yet, the tracing JIT was getting even faster, and it was being used more frequently. And there was talk about JaegerMonkey, a full method JIT that would (and did) make JS performance significantly better once again. That again meant that DOM binding performance again got more important. We invested even more work in making our current infrastructure even faster. We started writing specialized hand coded quick stubs which would avoid even more QueryInterface() calls, and could also call straight into non-virtual methods in the C++ DOM. And we pulled all kinds of other tricks to cut out even more overhead, both in the bindings themselves, but also in the C++ DOM code. Lots of this work was done by Peter Van der Beken and Boris Zbarsky. Some of this work led us to some interesting realizations in the DOM code, one of which was that DOM tree traversal performance was heavily dependent on CPU cache utilization rather than actual binding instruction overhead. And this was not for the obvious reasons of us traversing a tree structure with not necessarily good memory locality, but instead it was the vtable reads that the code triggered due to us calling virtual methods during the tree traversal which was the bigger problem. The vtable reads were causing CPU cache misses, and that ended up being a significant performance hit.
At this point we had fairly well hit a performance wall with the current setup. We’d squeezed out pretty much all the performance we could realistically squeeze out of this code. We had created shortcuts around XPConnect, we had done what we could in the scriptable helpers, and we had optimized the C++ DOM implementation fairly heavily as well. Yet we were still behind the competition (i.e. WebKit) in raw DOM access performance.
And then there was type inference support on the horizon, which again made the JS engine faster, and DOM binding performance mattered even more again.
All this led us to start thinking seriously about a different approach to what we had here. And that will be the topic of my next blog post here, which is about our “new” DOM bindings.