Apparently, there are only two hard problems in computer science: cache invalidation and the naming of things (or so Phil Karlton’s dictum goes). Earlier this month, we invited representatives of Twitter, Facebook, SproutCore, Palm’s webOS, Microsoft’s “Office On The Web”, Yahoo, and Google to talk to us about the former problem (amongst other things), though we also learned something about the latter.
Caching is an important issue to get right on the web, not least of all because of the proliferation of web applications on mobile devices. The goals of our caching summit were to identify use cases that would help us move forward with caching and with HTTP request efficiency. How desirable was rolling up our sleeves to look at HTTP/1.1 Pipelining in Firefox, for instance? What else was needed at the HTTP layer? And was the vaunted HTML5 AppCache, implemented in Firefox 3.5 onwards, actually useful to developers? What else needed to be exposed to web applications, either within content or via additional headers?
Developer feedback is invaluable, and is increasingly the basis of how we want to evolve the next integral pieces of the web platform. Web developers are one of our primary constituencies; going forward, we want them to help us prioritize what we should implement, and what we need to focus on with respect to web standards. We chose our attendees wisely; if any group of people could talk about web applications at scale, the current performance of the cache, and their wishlist for future browser caching behavior on the web platform, it was this group of people. And the feedback they gave us was copious and useful — our work is cut-out for us. Notably, we’ve got a few actions we’re going to follow-up on:
-
Increase the default size of our disk and memory caches. Firefox’s disk cache is currently set at 50MB, a small-ish number given the amount of disk space available on hardware currently (and although this limit can be increased using advanced preferences, few users actually change the default). This is low-hanging fruit for us to fix. An interesting open question is whether we should determine disk cache size heuristically, in lieu of choosing a new upper bound. Steve Souders, who attended our caching summit, blogs about cache sizes, as well as premature cache evictions.
-
Conduct a “Mozilla Test-Pilot” project to get more data about how exactly we cache resources currently. This is part of a larger question about updating our caching algorithm. Like other browsers, we use an optimization of the Least Recently Used (LRU) caching algorithm, called LRU-SP. Data that we would want to gather includes determining what the hit rate, mean, variance and distribution of cached resources are. What’s the average lifetime? How about different modes where our LRU-SP replacement policy doesn’t work well for certain apps, where big resources (such as an essential script file) may get eliminated before smaller ones (such as map tiles)? We’ll also have to research the role that anti-virus software plays in routinely flushing out the cache, leading to further occurrences of untimely eviction of relevant resources.
-
Explore prioritization of resources in the cache based on MIME type. For instance, allowing for JavaScript (text/javascript) to always get higher priority in terms of what gets pruned by our LRU-SP algorithm. A good follow-up for this would be to get Chrome, IE, Apple, and Opera to discuss this with us, and then document what we come up with as a MIME-type based “naive” prioritization. We also want to allow developers to set resource priority on their own, perhaps through a header. This is likely to be an ongoing discussion.
-
Really figure out what hampers the adoption of HTTP/1.1 Pipelining on the web, including data from proxies and how they purge. While Pipelining is enabled in Mobile Firefox builds (such at that running on the Nokia N900 and on Android devices) by default, we have it turned off in desktop Firefox builds. We do this for a variety of reasons, not least of all the risk of performance slow downs if one of the pipelined requests slows down the others. It’s clear from our discussion that many who attended our caching summit think pipelining will help their applications perform much better. The situation on the web now with respect to pipelining is a kind of hostage’s dilemma: of the main desktop browsers, nobody has turned on pipelining, for fear of a scenario that slows down performance (leading to that particular browser being accused of “being slow”). The developers who visited us threw down the proverbial gauntlet; at a minimum we’ve got to figure out what hamstrings the use of pipelining on the web, and determine what we can actually do to remove those obstacles.
-
Figure out how to evolve the HTML5 AppCache, which frankly hasn’t seen the adoption we initially expected. While we tend to view parts of HTML5 such as Cache Manifests and
window.applicationCache
as yet another performance tool (to ensure web applications rapidly load upon subsequent accesses), it is different than the general browser cache. What’s in a name, and why is naming something amongst the hardest problems in computer science? The use of the word “cache” to describe the parts of HTML5 that deal with offline web applications has confused some developers. What we’re calling the HTML5 AppCache was primarily conceived of to enable offline web application use, but many applications (such as those built with SproutCore) treat it as an auxiliary cache, to ensure that applications have a rapid start-up time and generally perform well. Why, we were asked, should we have two things: a general purpose browser cache, and something else, uniquely for offline usage? On the one hand, the HTML5 AppCache allows web apps to act like native apps (enabling “rapid-launch icons” on some mobile handsets), perhaps even eventually integrating with native application launchers. And on the other hand, the HTML5 AppCache’s separateness from the general cache may mean that we coin different APIs to work with the general cache. In general, simultaneously evolving multiple APIs with “cache” in the name may be confusing. But that’s why naming is amongst the hard problems, and that’s why we have to architect the next iteration mindful of the potential for both redundancy as well as confusion.
We’ve got a tracking bug in place with a bold moniker: “Improve HTTP Cache.” You’ll see the gamut of changes we’d like to introduce here, including benchmarking our cache against Chromium’s (and perhaps just using Chromium’s cache code, if we need to).
Caching is important, but difficult. It would be fair to describe most of the near-term evolution of the web that way, whether that is the introduction of an indexable database capability, streaming video, or new layout models within CSS. These evolutions won’t necessarily happen within a standards body or on a listserv, but rather through rapid prototyping and meaningful feedback. That’s why we have to talk to web developers to help us do the right thing, and that’s why we’ll keep organizing meet-ups such as the recent caching summit.
23 comments