The post Improving Performance in Firefox and Across the Web with Speedometer 3 appeared first on Mozilla Hacks - the Web developer blog.
]]>This fulfills the vision set out in December 2022 to bring experts across the industry together in order to rethink how we measure browser performance, guided by a shared goal to reflect the real-world Web as much as possible. This is the first time the Speedometer benchmark, or any major browser benchmark, has been developed through a cross-industry collaboration supported by each major browser engine: Blink, Gecko, and WebKit. Working together means we can build a shared understanding of what matters to optimize, and facilitates broad review of the benchmark itself: both of which make it a stronger lever for improving the Web as a whole.
And we’re seeing results: Firefox got faster for real users in 2023 as a direct result of optimizing for Speedometer 3. This took a coordinated effort from many teams: understanding real-world websites, building new tools to drive optimizations, and making a huge number of improvements inside Gecko to make web pages run more smoothly for Firefox users. In the process, we’ve shipped hundreds of bug fixes across JS, DOM, Layout, CSS, Graphics, frontend, memory allocation, profile-guided optimization, and more.
We’re happy to see core optimizations in all the major browser engines turning into improved responsiveness for real users, and are looking forward to continuing to work together to build performance tests that improve the Web.
The post Improving Performance in Firefox and Across the Web with Speedometer 3 appeared first on Mozilla Hacks - the Web developer blog.
]]>The post Announcing Interop 2024 appeared first on Mozilla Hacks - the Web developer blog.
]]>The web platform is built on interoperability based on common standards. This offers users a degree of choice and control that sets the web apart from proprietary platforms defined by a single implementation. A commitment to ensuring that the web remains open and interoperable forms a fundamental part of Mozilla’s manifesto and web vision, and is why we’re so committed to shipping Firefox with our own Gecko engine.
However interoperability requires care and attention to maintain. When implementations ship with differences between the standard and each other, this creates a pain point for web authors; they have to choose between avoiding the problematic feature entirely and coding to specific implementation quirks. Over time if enough authors produce implementation-specific content then interoperability is lost, and along with it user agency.
This is the problem that the Interop Project is designed to address. By bringing browser vendors together to focus on interoperability, the project allows identifying areas where interoperability issues are causing problems, or may do in the near future. Tracking progress on those issues with a public metric provides accountability to the broader web community on addressing the problems.
The project works by identifying a set of high-priority focus areas: parts of the web platform where everyone agrees that making interoperability improvements will be of high value. These can be existing features where we know browsers have slightly different behaviors that are causing problems for authors, or they can be new features which web developer feedback shows is in high demand and which we want to launch across multiple implementations with high interoperability from the start. For each focus area a set of web-platform-tests is selected to cover that area, and the score is computed from the pass rate of these tests.
The Interop 2023 project covered high profile features like the new :has() selector, and web-codecs, as well as areas of historically poor interoperability such as pointer events.
The results of the project speak for themselves: every browser ended the year with scores in excess of 97% for the prerelease versions of their browsers. Moreover, the overall Interoperability score — that is the fraction of focus area tests that pass in all participating browser engines — increased from 59% at the start of the year to 95% now. This result represents a huge improvement in the consistency and reliability of the web platform. For users this will result in a more seamless experience, with sites behaving reliably in whichever browser they prefer.
For the :has() selector — which we know from author feedback has been one of the most in-demand CSS features for a long time — every implementation is now passing 100% of the web-platform-tests selected for the focus area. Launching a major new platform feature with this level of interoperability demonstrates the power of the Interop project to progress the platform without compromising on implementation diversity, developer experience, or user choice.
As well as focus areas, the Interop project also has “investigations”. These are areas where we know that we need to improve interoperability, but aren’t at the stage of having specific tests which can be used to measure that improvement. In 2023 we had two investigations. The first was for accessibility, which covered writing many more tests for ARIA computed role and accessible name, and ensuring they could be run in different browsers. The second was for mobile testing, which has resulted in both Mobile Firefox and Chrome for Android having their initial results in wpt.fyi.
Following the success of Interop 2023, we are pleased to confirm that the project will continue in 2024 with a new selection of focus areas, representing areas of the web platform where we think we can have the biggest positive impact on users and web developers.
New focus areas for 2024 include, among other things:
The full list of focus areas is available in the project README.
In addition to the new focus areas, we will carry over some of the 2023 focus areas where there’s still more work to be done. Of particular interest is the Layout focus area, which will combine the previous Flexbox, Grid and Subgrid focus area into one area covering all the most important layout primitives for the modern web. On top of that the Custom Properties, URL and Mouse and Pointer Events focus areas will be carried over. These represent cases where, even though we’ve already seen large improvements in Interoperability, we believe that users and web authors will benefit from even greater convergence between implementations.
As well as focus areas, Interop 2024 will also feature a new investigation into improving the integration of WebAssembly testing into web-platform-tests. This will open up the possibility of including WASM features in future Interop projects. In addition we will extend the Accessibility and Mobile Testing investigations, as there is more work to be done to make those aspects of the platform fully testable across different implementations.
The post Announcing Interop 2024 appeared first on Mozilla Hacks - the Web developer blog.
]]>The post Option Soup: the subtle pitfalls of combining compiler flags appeared first on Mozilla Hacks - the Web developer blog.
]]>During the Firefox 120 beta cycle, a new crash signature appeared on our radars with significant volume.
At that time, the distribution across operating systems revealed that more than 50% of the crash volume originates from Ubuntu 18.04 LTS users.
The main process crashes in a CanvasRenderer
thread, with the following call stack:
0 firefox std::locale::operator= 1 firefox std::ios_base::imbue 2 firefox std::basic_ios<char, std::char_traits<char> >::imbue 3 libxul.so sh::InitializeStream<std::__cxx11::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> > > /build/firefox-ZwAdKm/firefox-120.0~b2+build1/gfx/angle/checkout/src/compiler/translator/Common.h:238 3 libxul.so sh::TCompiler::setResourceString /build/firefox-ZwAdKm/firefox-120.0~b2+build1/gfx/angle/checkout/src/compiler/translator/Compiler.cpp:1294 4 libxul.so sh::TCompiler::Init /build/firefox-ZwAdKm/firefox-120.0~b2+build1/gfx/angle/checkout/src/compiler/translator/Compiler.cpp:407 5 libxul.so sh::ConstructCompiler /build/firefox-ZwAdKm/firefox-120.0~b2+build1/gfx/angle/checkout/src/compiler/translator/ShaderLang.cpp:368 6 libxul.so mozilla::webgl::ShaderValidator::Create /build/firefox-ZwAdKm/firefox-120.0~b2+build1/dom/canvas/WebGLShaderValidator.cpp:215 6 libxul.so mozilla::WebGLContext::CreateShaderValidator const /build/firefox-ZwAdKm/firefox-120.0~b2+build1/dom/canvas/WebGLShaderValidator.cpp:196 7 libxul.so mozilla::WebGLShader::CompileShader /build/firefox-ZwAdKm/firefox-120.0~b2+build1/dom/canvas/WebGLShader.cpp:98
At first glance, we want to blame WebGL. The C++ standard library functions cannot be at fault, right?
But when looking at the WebGL code, the crash occurs in the perfectly valid lines of C++ summarized below:
std::ostringstream stream; stream.imbue(std::locale::classic());
This code should never crash, and yet it does. In fact, taking a closer look at the stack gives a first lead for investigation:
Although we crash into functions that belong to the C++ standard library, these functions appear to live in the firefox binary.
This is an unusual situation that never occurs with official builds of Firefox.
It is however very common for distribution to change the configuration settings and apply downstream patches to an upstream source, no worries about that.
Moreover, there is only a single build of Firefox Beta that is causing this crash.
We know this thanks to a unique identifier associated with any ELF binary.
Here, if we choose any specific version of Firefox 120 Beta (such as 120b9), the crashes all embed the same unique identifier for firefox.
Now, how can we guess what build produces this weird binary?
A useful user comment mentions that they regularly experience this crash since updating to 120.0~b2+build1-0ubuntu0.18.04.1.
And by looking for this build identifier, we quickly reach the Firefox Beta PPA.
Then indeed, we are able to reproduce the crash by installing it in a Ubuntu 18.04 LTS virtual machine: it occurs when loading any WebGL page!
With the binary now at hand, running nm -D ./firefox
confirms the presence of several symbols related to libstdc++ that live in the text section (T
marker).
Templated and inline symbols from libstdc++ usually appear as weak (W
marker), so there is only one explanation for this situation: firefox has been statically linked with libstdc++, probably through -static-libstdc++
.
Fortunately, the build logs are available for all Ubuntu packages.
After some digging, we find the logs for the 120b9 build, which indeed contain references to -static-libstdc++
.
But why?
Again, everything is well documented, and thanks to well trained digging skills we reach a bug report that provides interesting insights.
Firefox requires a modern C++ compiler, and hence a modern libstdc++, which is unavailable on old systems like Ubuntu 18.04 LTS.
The build uses -static-libstdc++
to close this gap.
This just explains the weird setup though.
What about the crash?
Since we can now reproduce it, we can launch Firefox in a debugger and continue our investigation.
When inspecting the crash site, we seem to crash because std::locale::classic()
is not properly initialized.
Let’s take a peek at the implementation.
const locale& locale::classic() { _S_initialize(); return *(const locale*)c_locale; }
_S_initialize()
is in charge of making sure that c_locale
will be properly initialized before we return a reference to it.
To achieve this, _S_initialize()
calls another function, _S_initialize_once()
.
void locale::_S_initialize() { #ifdef __GTHREADS if (!__gnu_cxx::__is_single_threaded()) __gthread_once(&_S_once, _S_initialize_once); #endif if (__builtin_expect(!_S_classic, 0)) _S_initialize_once(); }
In _S_initialize()
, we first go through a wrapper for pthread_once()
: the first thread that reaches this code consumes _S_once
and calls _S_initialize_once()
, whereas other threads (if any) are stuck waiting for _S_initialize_once()
to complete.
This looks rather fail-proof, right?
There is even an extra direct call to _S_initialize_once()
if _S_classic
is still uninitialized after that.
Now, _S_initialize_once()
itself is rather straightforward: it allocates _S_classic
and puts it within c_locale
.
void locale::_S_initialize_once() throw() { // Need to check this because we could get called once from _S_initialize() // when the program is single-threaded, and then again (via __gthread_once) // when it's multi-threaded. if (_S_classic) return; // 2 references. // One reference for _S_classic, one for _S_global _S_classic = new (&c_locale_impl) _Impl(2); _S_global = _S_classic; new (&c_locale) locale(_S_classic); }
The crash looks as if we never went through _S_initialize_once()
, so let’s put a breakpoint there and see what happens.
And just by doing this, we already notice something suspicious.
We do reach _S_initialize_once()
, but not within the firefox binary: instead, we only ever reach the version exported by liblgpllibs.so.
In fact, liblgpllibs.so is also statically linked with libstdc++, such that firefox and liblgpllibs.so both embed and export their own _S_initialize_once()
function.
By default, symbol interposition applies, and _S_initialize_once()
should always be called through the procedure linkage table (PLT), so that every module ends up calling the same version of the function.
If symbol interposition were happening here, we would expect that liblgpllibs.so would reach the version of _S_initialize_once() exported by firefox rather than its own, because firefox was loaded first.
So maybe there is no symbol interposition.
This can occur when using -fno-semantic-interposition
.
Each version of the standard library would live on its own, independent from the other versions.
But neither the Firefox build system nor the Ubuntu maintainer seem to pass this flag to the compiler.
However, by looking at the disassembly for _S_initialize()
and _S_initialize_once()
, we can see that the exported global variables (_S_once
, _S_classic
, _S_global
) are subject to symbol interposition:
These accesses all go through the global offset table (GOT), so that every module ends up accessing the same version of the variable.
This seems strange given what we said earlier about _S_initialize_once()
.
Non-exported global variables (c_locale
, c_locale_impl
), however, are accessed directly without symbol interposition, as expected.
We now have enough information to explain the crash.
When we reach _S_initialize()
in liblgpllibs.so, we actually consume the _S_once
that lives in firefox, and initialize the _S_classic
and _S_global
that live in firefox.
But we initialize them with pointers to well initialized variables c_locale_impl
and c_locale
that live in liblgpllibs.so!
The variables c_locale_impl
and c_locale
that live in firefox, however, remain uninitialized.
So if we later reach _S_initialize()
in firefox, everything looks as if initialization has happened.
But then we return a reference to the version of c_locale
that lives in firefox, and this version has never been initialized.
Boom!
Now the main question is: why do we see interposition occur for _S_once
but not for _S_initialize_once()
?
If we step back for a minute, there is a fundamental distinction between these symbols: one is a function symbol, the other is a variable symbol.
And indeed, the Firefox build system uses the -Bsymbolic-function
flag!
The ld man page describes it as follows:
-Bsymbolic-functions When creating a shared library, bind references to global function symbols to the definition within the shared library, if any. This option is only meaningful on ELF platforms which support shared libraries.
As opposed to:
-Bsymbolic When creating a shared library, bind references to global symbols to the definition within the shared library, if any. Normally, it is possible for a program linked against a shared library to override the definition within the shared library. This option is only meaningful on ELF platforms which support shared libraries.
Nailed it!
The crash occurs because this flag makes us use a weird variant of symbol interposition, where symbol interposition happens for variable symbols like _S_once
and _S_classic
but not for function symbols like _S_initialize_once()
.
This results in a mismatch regarding how we access global variables: exported global variables are unique thanks to interposition, whereas every non-interposed function will access its own version of any non-exported global variable.
With all the knowledge that we have now gathered, it is easy to write a reproducer that does not involve any Firefox code:
/* main.cc */ #include <iostream> extern void pain(); int main() { pain(); std::cout << "[main] " << std::locale::classic().name() <<"\n"; return 0; } /* pain.cc */ #include <iostream> void pain() { std::cout << "[pain] " << std::locale::classic().name() <<"\n"; } # Makefile all: $(CXX) pain.cc -fPIC -shared -o libpain.so -static-libstdc++ -Wl,-Bsymbolic-functions $(CXX) main.cc -fPIC -c -o main.o $(CC) main.o -fPIC -o main /usr/lib/gcc/x86_64-redhat-linux/13/libstdc++.a -L. -Wl,-rpath=. -lpain -Wl,-Bsymbolic-functions ./main clean: $(RM) libpain.so main
Understanding the bug is one step, and solving it is yet another story.
Should it be considered a libstdc++ bug that the code for locales is not compatible with -static-stdlibc++ -Bsymbolic-functions
?
It feels like combining these flags is a very nice way to dig our own grave, and that seems to be the opinion of the libstdc++ maintainers indeed.
Overall, perhaps the strangest part of this story is that this combination did not cause any trouble up until now.
Therefore, we suggested to the maintainer of the package to stop using -static-libstdc++
.
There are other ways to use a different libstdc++ than available on the system, such as using dynamic linking and setting an RPATH
to link with a bundled version.
Doing that allowed them to successfully deploy a fixed version of the package.
A few days after that, with the official release of Firefox 120, we noticed a very significant bump in volume for the same crash signature. Not again!
This time the volume was coming exclusively from users of NixOS 23.05, and it was huge!
After we shared the conclusions from our beta investigation with them, the maintainers of NixOS were able to quickly associate the crash with an issue that had not yet been backported for 23.05 and was causing the compiler to behave like -static-libstdc++
.
To avoid such mess in the future, we added detection for this particular setup in Firefox’s configure.
We are grateful to the people who have helped fix this issue, in particular:
The post Option Soup: the subtle pitfalls of combining compiler flags appeared first on Mozilla Hacks - the Web developer blog.
]]>The post Puppeteer Support for the Cross-Browser WebDriver BiDi Standard appeared first on Mozilla Hacks - the Web developer blog.
]]>The WebDriver BiDi protocol is supported starting with Puppeteer v21.6.0. When calling puppeteer.launch
pass in "firefox"
as the product option, and "webDriverBiDi"
as the protocol option:
const browser = await puppeteer.launch({
product: 'firefox',
protocol: 'webDriverBiDi',
})
You can also use the "webDriverBiDi"
protocol when testing in Chrome, reflecting the fact that WebDriver BiDi offers a single standard for modern cross-browser automation.
In the future we expect "webDriverBiDi"
to become the default protocol when using Firefox in Puppeteer.
Puppeteer has had experimental support for Firefox based on a partial re-implementation of the proprietary Chrome DevTools Protocol (CDP). This approach had the advantage that it worked without significant changes to the existing Puppeteer code. However the CDP implementation in Firefox is incomplete and has significant technical limitations. In addition, the CDP protocol itself is not designed to be cross browser, and undergoes frequent breaking changes, making it unsuitable as a long-term solution for cross-browser automation.
To overcome these problems, we’ve worked with the WebDriver Working Group at the W3C to create a standard automation protocol that meets the needs of modern browser automation clients: this is WebDriver BiDi. For more details on the protocol design and how it compares to the classic HTTP-based WebDriver protocol, see our earlier posts.
As the standardization process has progressed, the Puppeteer team has added a WebDriver BiDi backend in Puppeteer, and provided feedback on the specification to ensure that it meets the needs of Puppeteer users, and that the protocol design enables existing CDP-based tooling to easily transition to WebDriver BiDi. The result is a single protocol based on open standards that can drive both Chrome and Firefox in Puppeteer.
Not yet; WebDriver BiDi is still a work in progress, and doesn’t yet cover the full feature set of Puppeteer.
Compared to the Chrome+CDP implementation, there are some feature gaps, including support for accessing the cookie store, network request interception, some emulation features, and permissions. These features are actively being standardized and will be integrated as soon as they become available. For Firefox, the only missing feature compared to the Firefox+CDP implementation is cookie access. In addition, WebDriver BiDi already offers improvements, including better support for multi-process Firefox, which is essential for testing some websites. More information on the complete set of supported APIs can be found in the Puppeteer documentation, and as new WebDriver-BiDi features are enabled in Gecko we’ll publish details on the Firefox Developer Experience blog.
Nevertheless, we believe that the WebDriver-based Firefox support in Puppeteer has reached a level of quality which makes it suitable for many real automation scenarios. For example at Mozilla we have successfully ported our Puppeteer tests for pdf.js from Firefox+CDP to Firefox+WebDriver BiDi.
We currently don’t have a specific timeline for removing CDP support. However, maintaining multiple protocols is not a good use of our resources, and we expect WebDriver BiDi to be the future of remote automation in Firefox. If you are using the CDP support outside of the context of Puppeteer, we’d love to hear from you (see below), so that we can understand your use cases, and help transition to WebDriver BiDi.
For any issues you experience when porting Puppeteer tests to BiDi, please open issues in the Puppeteer issue tracker, unless you can verify the bug is in the Firefox implementation, in which case please file a bug on Bugzilla.
If you are currently using CDP with Firefox, please join the #webdriver matrix channel so that we can discuss your use case and requirements, and help you solve any problems you encounter porting your code to WebDriver BiDi.
Update: The Puppeteer team have published “Harness the Power of WebDriver BiDi: Chrome and Firefox Automation with Puppeteer“.
The post Puppeteer Support for the Cross-Browser WebDriver BiDi Standard appeared first on Mozilla Hacks - the Web developer blog.
]]>The post Firefox Developer Edition and Beta: Try out Mozilla’s .deb package! appeared first on Mozilla Hacks - the Web developer blog.
]]>.deb
package available for Developer Edition and Beta!
We’ve set up a new APT repository for you to install Firefox as a .deb
package. These packages are compatible with the same Debian and Ubuntu versions as our traditional binaries.
Your feedback is invaluable, so don’t hesitate to report any issues you encounter to help us improve the overall experience.
Adopting Mozilla’s Firefox .deb
package offers multiple benefits:
.deb
is integrated into Firefox’s release process,.deb
package, simply follow these steps:<code># Create a directory to store APT repository keys if it doesn't exist:
sudo install -d -m 0755 /etc/apt/keyrings
# Import the Mozilla APT repository signing key:
wget -q <a class="c-link" href="https://packages.mozilla.org/apt/repo-signing-key.gpg" target="_blank" rel="noopener noreferrer" data-stringify-link="https://packages.mozilla.org/apt/repo-signing-key.gpg" data-sk="tooltip_parent">https://packages.mozilla.org/apt/repo-signing-key.gpg</a> -O- | sudo tee /etc/apt/keyrings/packages.mozilla.org.asc > /dev/null
# The fingerprint should be 35BAA0B33E9EB396F59CA838C0BA5CE6DC6315A3
gpg -n -q --import --import-options import-show /etc/apt/keyrings/packages.mozilla.org.asc | awk '/pub/{getline; gsub(/^ +| +$/,""); print "\n"$0"\n"}'
# Next, add the Mozilla APT repository to your sources list:
echo "deb [signed-by=/etc/apt/keyrings/packages.mozilla.org.asc] <a class="c-link" href="https://packages.mozilla.org/apt" target="_blank" rel="noopener noreferrer" data-stringify-link="https://packages.mozilla.org/apt" data-sk="tooltip_parent">https://packages.mozilla.org/apt</a> mozilla main" | sudo tee -a /etc/apt/sources.list.d/mozilla.list > /dev/null
# Update your package list and install the Firefox .deb package:
sudo apt-get update && sudo apt-get install firefox-beta # Replace "beta" by "devedition" for Developer Edition
fr
in the example below with the desired language code:sudo apt-get install firefox-beta-l10n-fr
sudo apt-get update
:apt-cache search firefox-beta-l10n
The post Firefox Developer Edition and Beta: Try out Mozilla’s .deb package! appeared first on Mozilla Hacks - the Web developer blog.
]]>With llamafile, you can effortlessly convert large language model (LLM) weights into executables. Imagine transforming a 4GB file of LLM weights into a binary that runs smoothly on six different operating systems, without requiring installation.
The post Introducing llamafile appeared first on Mozilla Hacks - the Web developer blog.
]]>Today we’re announcing the first release of llamafile and inviting the open source community to participate in this new project.
llamafile lets you turn large language model (LLM) weights into executables.
Say you have a set of LLM weights in the form of a 4GB file (in the commonly-used GGUF format). With llamafile you can transform that 4GB file into a binary that runs on six OSes without needing to be installed.
This makes it dramatically easier to distribute and run LLMs. It also means that as models and their weights formats continue to evolve over time, llamafile gives you a way to ensure that a given set of weights will remain usable and perform consistently and reproducibly, forever.
We achieved all this by combining two projects that we love: llama.cpp (a leading open source LLM chatbot framework) with Cosmopolitan Libc (an open source project that enables C programs to be compiled and run on a large number of platforms and architectures). It also required solving several interesting and juicy problems along the way, such as adding GPU and dlopen() support to Cosmopolitan; you can read more about it in the project’s README.
This first release of llamafile is a product of Mozilla’s innovation group and developed by Justine Tunney, the creator of Cosmopolitan. Justine has recently been collaborating with Mozilla via MIECO, and through that program Mozilla funded her work on the 3.0 release (Hacker News discussion) of Cosmopolitan. With llamafile, Justine is excited to be contributing more directly to Mozilla projects, and we’re happy to have her involved.
llamafile is licensed Apache 2.0, and we encourage contributions. Our changes to llama.cpp itself are licensed MIT (the same license used by llama.cpp itself) so as to facilitate any potential future upstreaming. We’re all big fans of llama.cpp around here; llamafile wouldn’t have been possible without it and Cosmopolitan.
We hope llamafile is useful to you and look forward to your feedback.
The post Introducing llamafile appeared first on Mozilla Hacks - the Web developer blog.
]]>The post Mozilla AI Guide Launch with Summarization Code Example appeared first on Mozilla Hacks - the Web developer blog.
]]>Our vision is for the AI Guide to be the starting point for every new developer to the space and a place to revisit for clarity and inspiration, ensuring that AI innovations enrich everyday life. The AI Guide’s initial focus begins with language models and the aim is to become a collaborative community-driven resource covering other types of models.
To start the first few sections in the Mozilla AI Guide go in-depth on the most asked questions about Large Language Models (LLMs). AI Basics covers the concepts of AI, ML, LLMs, what these concepts mean and how they are related. This section also breaks down the pros and cons of using an LLM. Language Models 101 continues to build on the shared knowledge of AI basics and dives deeper into the next level with language models. It will answer questions such as “What does ‘training’ an ML model mean” or “What is ‘human in the loop’ approach?”
We will jump to the last section on Choosing ML Models and demonstrate in code below what can be done using open source models to summarize certain text. You can access the Colab Notebook here or continue reading:
Unlike other guides, this one is designed to help pick the right model for whatever it is you’re trying to do, by:
We’re going to hone in on “text summarization” as our first task.
Great question. Most available LLMs worth their salt can do many tasks, including summarization, but not all of them may be good at what specifically you want them to do. We should figure out how to evaluate whether they actually can or not.
Also, many of the current popular LLMs are not open, are trained on undisclosed data and exhibit biases. Responsible AI use requires careful choices, and we’re here to help you make them.
Finally, most large LLMs require powerful GPU compute to use. While there are many models that you can use as a service, most of them cost money per API call. Unnecessary when some of the more common tasks can be done at good quality with already available open models and off-the-shelf hardware.
Over the last few decades, engineers have been blessed with being able to onboard by starting with open source projects, and eventually shipping open source to production. This default state is now at risk.
Yes, there are many open models available that do a great job. However, most guides don’t discuss how to get started with them using simple steps and instead bias towards existing closed APIs.
Funding is flowing to commercial AI projects, who have larger budgets than open source contributors to market their work, which inevitably leads to engineers starting with closed source projects and shipping expensive closed projects to production.
We’re going to:
For simplicity’s sake, let’s grab Mozilla’s Trustworthy AI Guidelines in string form
Note that in the real world, you will likely have to use other libraries to extract content for any particular file type.
import textwrap
content = """Mozilla's "Trustworthy AI" Thinking Points:
PRIVACY: How is data collected, stored, and shared? Our personal data powers everything from traffic maps to targeted advertising. Trustworthy AI should enable people to decide how their data is used and what decisions are made with it.
FAIRNESS: We’ve seen time and again how bias shows up in computational models, data, and frameworks behind automated decision making. The values and goals of a system should be power aware and seek to minimize harm. Further, AI systems that depend on human workers should protect people from exploitation and overwork.
TRUST: People should have agency and control over their data and algorithmic outputs, especially considering the high stakes for individuals and societies. For instance, when online recommendation systems push people towards extreme, misleading content, potentially misinforming or radicalizing them.
SAFETY: AI systems can carry high risk for exploitation by bad actors. Developers need to implement strong measures to protect our data and personal security. Further, excessive energy consumption and extraction of natural resources for computing and machine learning accelerates the climate crisis.
TRANSPARENCY: Automated decisions can have huge personal impacts, yet the reasons for decisions are often opaque. We need to mandate transparency so that we can fully understand these systems and their potential for harm."""
Great. Now we’re ready to start summarizing.
The AI space is moving so fast that it requires a tremendous amount of catching up on scientific papers each week to understand the lay of the land and the state of the art.
It’s some effort for an engineer who is brand new to AI to:
For the working engineer on a deadline, this is problematic. There’s not much centralized discourse on working with open source AI models. Instead there are fragmented X (formerly Twitter) threads, random private groups and lots of word-of-mouth transfer.
However, once we have a workflow to address all of the above, you will have the means to forever be on the bleeding age of published AI research
For now, we recommend Huggingface and their large directory of open models broken down by task. This is a great starting point. Note that larger LLMs are also included in these lists, so we will have to filter.
In this huge list of summarization models, which ones do we choose?
We don’t know what any of these models are trained on. For example, a summarizer trained on news articles vs Reddit posts will perform better on news articles.
What we need is a set of metrics and benchmarks that we can use to do apples-to-apples comparisons of these models.
The steps below can be used to evaluate any available model for any task. It requires hopping between a few sources of data for now, but we will be making this a lot easier moving forward.
Steps:
The easiest way to do this is using Papers With Code, an excellent resource for finding the latest scientific papers by task that also have code repositories attached.
First, filter Papers With Code’s “Text Summarization” datasets by most cited text-based English datasets.
Let’s pick (as of this writing) the most cited dataset — the “CNN/DailyMail” dataset. Usually most cited is one marker of popularity.
Now, you don’t need to download this dataset. But we’re going to review the info Papers With Code have provided to learn more about it for the next step. This dataset is also available on Huggingface.
You want to check 3 things:
First, check the license. In this case, it’s MIT licensed, which means it can be used for both commercial and personal projects.
Next, see if the papers using this dataset are recent. You can do this by sorting Papers in descending order. This particular dataset has many papers from 2023 – great!
Finally, let’s check whether the data is from a credible source. In this case, the dataset was generated by IBM in partnership with the University of Montréal. Great.
Now, let’s dig into how we can evaluate models that use this dataset.
Next, we look for measured metrics that are common across datasets for the summarization task. BUT, if you’re not familiar with the literature on summarization, you have no idea what those are.
To find out, pick a “Subtask” that’s close to what you’d like to see. We’d like to summarize the CNN article we pulled down above, so let’s choose “Abstractive Text Summarization”.
Now we’re in business! This page contains a significant amount of new information.
There are mentions of three new terms: ROUGE-1, ROUGE-2 and ROUGE-L. These are the metrics that are used to measure summarization performance.
There is also a list of models and their scores on these three metrics – this is exactly what we’re looking for.
Assuming we’re looking at ROUGE-1 as our metric, we now have the top 3 models that we can evaluate in more detail. All 3 are close to 50, which is a promising ROUGE score (read up on ROUGE).
OK, we have a few candidates, so let’s pick a model that will run on our local machines. Many models get their best performance when running on GPUs, but there are many that also generate summaries fast on CPUs. Let’s pick one of those to start – Google’s Pegasus.
# first we install huggingface's transformers library
%pip install transformers sentencepiece
Then we find Pegasus on Huggingface. Note that part of the datasets Pegasus was trained on includes CNN/DailyMail which bodes well for our article summarization. Interestingly, there’s a variant of Pegasus from google that’s only trained on our dataset of choice, we should use that.
from transformers import PegasusForConditionalGeneration, PegasusTokenizer
import torch
# Set the seed, this will help reproduce results. Changing the seed will
# generate new results
from transformers import set_seed
set_seed(248602)
# We're using the version of Pegasus specifically trained for summarization
# using the CNN/DailyMail dataset
model_name = "google/pegasus-cnn_dailymail"
# If you're following along in Colab, switch your runtime to a
# T4 GPU or other CUDA-compliant device for a speedup
device = "cuda" if torch.cuda.is_available() else "cpu"
# Load the tokenizer
tokenizer = PegasusTokenizer.from_pretrained(model_name)
# Load the model
model = PegasusForConditionalGeneration.from_pretrained(model_name).to(device)
# Tokenize the entire content
batch = tokenizer(content, padding="longest", return_tensors="pt").to(device)
# Generate the summary as tokens
summarized = model.generate(**batch)
# Decode the tokens back into text
summarized_decoded = tokenizer.batch_decode(summarized, skip_special_tokens=True)
summarized_text = summarized_decoded[0]
# Compare
def compare(original, summarized_text):
print(f"Article text length: {len(original)}\n")
print(textwrap.fill(summarized_text, 100))
print()
print(f"Summarized length: {len(summarized_text)}")
compare(content, summarized_text)
Article text length: 1427 Trustworthy AI should enable people to decide how their data is used.<n>values and goals of a system should be power aware and seek to minimize harm.<n>People should have agency and control over their data and algorithmic outputs.<n>Developers need to implement strong measures to protect our data and personal security. Summarized length: 320
Alright, we got something! Kind of short though. Let’s see if we can make the summary longer…
set_seed(860912)
# Generate the summary as tokens, with a max_new_tokens
summarized = model.generate(**batch, max_new_tokens=800)
summarized_decoded = tokenizer.batch_decode(summarized, skip_special_tokens=True)
summarized_text = summarized_decoded[0]
compare(content, summarized_text)
Article text length: 1427 Trustworthy AI should enable people to decide how their data is used.<n>values and goals of a system should be power aware and seek to minimize harm.<n>People should have agency and control over their data and algorithmic outputs.<n>Developers need to implement strong measures to protect our data and personal security. Summarized length: 320
Well, that didn’t really work. Let’s try a different approach called ‘sampling’. This allows the model to pick the next word according to its conditional probability distribution (specifically, the probability that said word follows the word before).
We’ll also be setting the ‘temperature’. This variable works to control the levels of randomness and creativity in the generated output.
set_seed(118511)
summarized = model.generate(**batch, do_sample=True, temperature=0.8, top_k=0)
summarized_decoded = tokenizer.batch_decode(summarized, skip_special_tokens=True)
summarized_text = summarized_decoded[0]
compare(content, summarized_text)
Article text length: 1427 Mozilla's "Trustworthy AI" Thinking Points:.<n>People should have agency and control over their data and algorithmic outputs.<n>Developers need to implement strong measures to protect our data. Summarized length: 193
Shorter, but the quality is higher. Adjusting the temperature up will likely help.
set_seed(108814)
summarized = model.generate(**batch, do_sample=True, temperature=1.0, top_k=0)
summarized_decoded = tokenizer.batch_decode(summarized, skip_special_tokens=True)
summarized_text = summarized_decoded[0]
compare(content, summarized_text)
Article text length: 1427 Mozilla's "Trustworthy AI" Thinking Points:.<n>People should have agency and control over their data and algorithmic outputs.<n>Developers need to implement strong measures to protect our data and personal security.<n>We need to mandate transparency so that we can fully understand these systems and their potential for harm. Summarized length: 325
Now let’s play with one other generation approach called top_k sampling — instead of considering all possible next words in the vocabulary, the model only considers the top ‘k’ most probable next words.
This technique helps to focus the model on likely continuations and reduces the chances of generating irrelevant or nonsensical text.
It strikes a balance between creativity and coherence by limiting the pool of next-word choices, but not so much that the output becomes deterministic.
set_seed(226012)
summarized = model.generate(**batch, do_sample=True, top_k=50)
summarized_decoded = tokenizer.batch_decode(summarized, skip_special_tokens=True)
summarized_text = summarized_decoded[0]
compare(content, summarized_text)
Article text length: 1427 Mozilla's "Trustworthy AI" Thinking Points look at ethical issues surrounding automated decision making.<n>values and goals of a system should be power aware and seek to minimize harm.People should have agency and control over their data and algorithmic outputs.<n>Developers need to implement strong measures to protect our data and personal security. Summarized length: 355
Finally, let’s try top_p sampling — also known as nucleus sampling, is a strategy where the model considers only the smallest set of top words whose cumulative probability exceeds a threshold ‘p’.
Unlike top_k which considers a fixed number of words, top_p adapts based on the distribution of probabilities for the next word. This makes it more dynamic and flexible. It helps create diverse and sensible text by allowing less probable words to be selected when the most probable ones don’t add up to ‘p’.
set_seed(21420041)
summarized = model.generate(**batch, do_sample=True, top_p=0.9, top_k=50)
summarized_decoded = tokenizer.batch_decode(summarized, skip_special_tokens=True)
summarized_text = summarized_decoded[0]
compare(content, summarized_text)
# saving this for later.
pegasus_summarized_text = summarized_text
Article text length: 1427 Mozilla's "Trustworthy AI" Thinking Points:.<n>People should have agency and control over their data and algorithmic outputs.<n>Developers need to implement strong measures to protect our data and personal security.<n>We need to mandate transparency so that we can fully understand these systems and their potential for harm. Summarized length: 325
To continue with the code example and see a test with another model, and to learn how to evaluate ML model results (a whole another section), click here to view the Python Notebook and click “Open in Colab” to experiment with your own custom code.
Note this guide will be constantly updated and new sections on Data Retrieval, Image Generation, and Fine Tuning will be coming next.
Shortly after today’s launch of the Mozilla AI Guide, we will be publishing our community contribution guidelines. It will provide guidance on the type of content developers can contribute and how it can be shared. Get ready to share any great open source AI projects, implementations, video and audio models.
Together, we can bring together a cohesive, collaborative and responsible AI community.
A special thanks to Kevin Li and Pradeep Elankumaran who pulled this great blog post together.
The post Mozilla AI Guide Launch with Summarization Code Example appeared first on Mozilla Hacks - the Web developer blog.
]]>The post Down and to the Right: Firefox Got Faster for Real Users in 2023 appeared first on Mozilla Hacks - the Web developer blog.
]]>This can be especially challenging with complex client software running third-party code like Firefox, and it’s a big reason why we’ve undertaken the Speedometer 3 effort alongside other web browsers. Our goal is to build performance tests that simulate real-world user experiences so that browsers have better tools to drive improvements for real users on real webpages. While it’s easy to see that benchmarks have improved in Firefox throughout the year as a result of this work, what we really care about is how much those wins are being felt by our users.
In order to measure the user experience, Firefox collects a wide range of anonymized timing metrics related to page load, responsiveness, startup and other aspects of browser performance. Collecting data while holding ourselves to the highest standards of privacy can be challenging. For example, because we rely on aggregated metrics, we lack the ability to pinpoint data from any particular website. But perhaps even more challenging is analyzing the data once collected and drawing actionable conclusions. In the future we’ll talk more about these challenges and how we’re addressing them, but in this post we’d like to share how some of the metrics that are fundamental to how our users experience the browser have improved throughout the year.
Let’s start with page load. First Contentful Paint (FCP) is a better metric for felt performance than the `onload` event. We’re tracking the time it takes between receiving the first byte from the network to FCP. This tells us how much faster we are giving feedback to the user that the page is successfully loading, so it’s a critical metric for understanding the user experience. While much of this is up to web pages themselves, if the browser improves performance across the board, we expect this number to go down.
We can see that this time improved from roughly 250ms at the start of the year to 215ms in October. This means that a user receives feedback on page loads almost 15% faster than they did at the start of the year. And it’s important to note that this is all the result of optimization work that didn’t even explicitly target pageload.
In order to understand where this improvement is coming from, let’s look at another piece of timing data: the amount of time that was spent executing JavaScript code during a pageload. Here we are going to look at the 95th percentile, representing the most JS heavy pages and highlighting a big opportunity for us to remove friction for users.
This shows the 95th percentile improving from ~1560ms at the beginning of the year, to ~1260ms in October. This represents a considerable improvement of 300ms, or almost 20%, and is likely responsible for a significant portion of the reduced FCP times. This makes sense, since Speedometer 3 work has led to significant optimizations to the SpiderMonkey JavaScript engine (a story for another post).
We’d also like to know how responsive pages are after they are loaded. For example, how smooth is the response when typing on the keyboard as I write this blogpost! The primary metric we collect here is the “keypress present latency”; the time between a key being pressed on the keyboard and its result being presented onto the screen. Rendering some text to the screen may sound simple, but there’s a lot going on to make that happen – especially when web pages run main thread JavaScript to respond to the keypress event. Most typing is snappy and primarily limited by hardware (e.g. the refresh rate of the monitor), but it’s extremely disruptive when it’s not. This means it’s important to mitigate the worst cases, so we’ll again look at the 95th percentile.
Once again we see a measurable improvement. The 95th percentile hovered around 65ms for most of the year and dropped to under 59ms after the Firefox 116 and 117 releases in August. A 10% improvement to the slowest keypresses means users are experiencing more instantaneous feedback and fewer disruptions while typing.
We’ve been motivated by the improvements we’re seeing in our telemetry data, and we’re convinced that our efforts this year are having a positive effect on Firefox users. We have many more optimizations in the pipeline and will share more details about those and our overall progress in future posts.
The post Down and to the Right: Firefox Got Faster for Real Users in 2023 appeared first on Mozilla Hacks - the Web developer blog.
]]>The post Built for Privacy: Partnering to Deploy Oblivious HTTP and Prio in Firefox appeared first on Mozilla Hacks - the Web developer blog.
]]>Mozilla builds a number of tools that help people defend their privacy online, but the need for these tools reflects a world where companies view invasive data collection as necessary for building good products and making money. A zero-sum game between privacy and business interests is not a healthy state of affairs. Therefore, we dedicate considerable effort to developing and advancing new technologies that enable businesses to achieve their goals without compromising peoples’ privacy. This is a focus of our work on web standards, as well as in how we build Firefox itself.
Building an excellent browser while maintaining a high standard for privacy sometimes requires this kind of new technology. For example: we put a lot of effort into keeping Firefox fast. This involves extensive automated testing, but also monitoring how it’s performing for real users. Firefox currently reports generic performance metrics like page-load time but does not associate those metrics with specific sites, because doing so would reveal peoples’ browsing history. These internet-wide averages are somewhat informative but not particularly actionable. Sites are constantly deploying code changes and occasionally those changes can trigger performance bugs in browsers. If we knew that a specific site got much slower overnight, we could likely isolate the cause and fix it. Unfortunately, we lack that visibility today, which hinders our ability to make Firefox great.
This is a classic problem in data collection: We want aggregate data, but the naive way to get it involves collecting sensitive information about individual people. The solution is to develop technology that delivers the same insights while keeping information about any individual person verifiably private.
In recent years, Mozilla has worked with others to advance two such technologies — Oblivious HTTP and the Prio-based Distributed Aggregation Protocol (DAP) — towards being proper internet standards that are practical to deploy in production. Oblivious HTTP works by routing encrypted data through an intermediary to conceal its source, whereas DAP/Prio splits the data into two shares and sends each share to a different server [1]. Despite their different shapes, both technologies rely on a similar principle: By processing the data jointly across two independent parties, they ensure neither party holds the information required to reveal sensitive information about someone.
We therefore need to partner with another independent and trustworthy organization to deploy each technology in Firefox. Having worked for some time to develop and validate both technologies in staging environments, we’ve now taken the next step to engage Fastly to operate an OHTTP relay and Divvi Up to operate a DAP aggregator. Both Fastly and ISRG (the nonprofit behind Divvi Up and Let’s Encrypt) have excellent reputations for acting with integrity, and they have staked those reputations on the faithful operation of these services. So even in a mirror universe where we tried to persuade them to cheat, they have a strong incentive to hold the line.
Our objective at Mozilla is to develop viable alternatives to the things that are wrong with the internet today and move the entire industry by demonstrating that it’s possible to do better. In the short term, these technologies will help us keep Firefox competitive while adhering to our longstanding principles around sensitive data. Over the long term, we want to see these kinds of strong privacy guarantees become the norm, and we will continue to work towards such a future.
Footnotes:
[1] Each approach is best-suited to different scenarios, which is why we’re investing in both. Oblivious HTTP is more flexible and can be used in interactive contexts, whereas DAP/Prio can be used in situations where the payload itself might be identifying.
The post Built for Privacy: Partnering to Deploy Oblivious HTTP and Prio in Firefox appeared first on Mozilla Hacks - the Web developer blog.
]]>The post Faster Vue.js Execution in Firefox appeared first on Mozilla Hacks - the Web developer blog.
]]>This requires a deliberate analysis of the ecosystem — starting with real user experiences and identifying the essential technical elements underlying them. We built several new tests from scratch, and also updated some existing tests from Speedometer 2 to use more modern versions of widely-used JavaScript frameworks.
When the Vue.js test was updated from Vue 2 to Vue 3, we noticed some performance issues in Firefox. The root of the problem was Proxy object usage that was introduced in Vue 3.
Proxies are hard to optimize because they’re generic by design and can behave in all sorts of surprising ways (e.g., modifying trap functions after initialization, or wrapping a Proxy with another Proxy). They also weren’t used much on the performance-critical path when they were introduced, so we focused primarily on correctness in the original implementation.
Speedometer 3 developed evidence that some Proxies today are well-behaved, critical-path, and widely-used. So we optimized these to execute completely in the JIT — specializing for the shapes of the Proxies encountered at the call-site and avoiding redundant work. This makes reactivity in Vue.js significantly faster, and we also anticipate improvements on other workloads.
This change landed in Firefox 118, so it’s currently on Beta and will ride along to Release by the end of September.
Over the course of the year Firefox has improved by around 40% on the Vue.js benchmark from work like this. More importantly, and as we hoped, we’re observing real user metric improvements across all page loads in Firefox as we optimize Speedometer 3. We’ll share more details about this in a subsequent post.
The post Faster Vue.js Execution in Firefox appeared first on Mozilla Hacks - the Web developer blog.
]]>The post Autogenerating Rust-JS bindings with UniFFI appeared first on Mozilla Hacks - the Web developer blog.
]]>I work on the Firefox sync team at Mozilla. Four years ago, we wrote a blog post describing our strategy to ship cross-platform Rust components for syncing and storage on all our platforms. The vision was to consolidate the separate implementations of features like history, logins, and syncing that existed on Firefox Desktop, Android, and iOS.
We would replace those implementations with a core written in Rust and a set of hand-written foreign language wrappers for each platform: JavaScript for Desktop, Kotlin for Android, and Swift for iOS.
Since then, we’ve learned some lessons and had to modify our strategy. It turns out that creating hand-written wrappers in multiple languages is a huge time-sink. The wrappers required a significant amount of time to write, but more importantly, they were responsible for many serious bugs.
These bugs were easy to miss, hard to debug, and often led to crashes. One of the largest benefits of Rust is memory safety, but these hand-written wrappers were negating much of that benefit.
To solve this problem, we developed UniFFI: a Rust library for auto-generating foreign language bindings. UniFFI allowed us to create wrappers quickly and safely, but there was one issue: UniFFI supported Kotlin and Swift, but not JavaScript, which powers the Firefox Desktop front-end. UniFFI helped us ship shared components for Firefox Android and iOS, but Desktop remained out of reach.
This changed with Firefox 105 when we added support for generating JavaScript bindings via UniFFI which enabled us to continue pushing forward on our single component vision. This project validated some core concepts that have been in UniFFI from the start but also required us to extend UniFFI in several ways. This blog post will walk through some of the issues that arose along the way and how we handled them.
This project has already been tried at least once before at Mozilla. The team was able to get some of the functionality supported, but some parts remained out of reach. One of the first things we realized was that the general approach the previous attempts took would probably not support the UniFFI features we were using in our components.
Does this mean the previous work was a failure? Absolutely not. The team left behind a wonderful trove of design documents, discussions, and code that we made sure to study and steal from. In particular, there was an ADR that discussed different approaches which we studied, as well as a working C++/WebIDL code that we repurposed for our project.
UniFFI bindings live on top of an FFI layer using the C ABI that we call “the scaffolding.” Then the user API is defined on top of the scaffolding layer, in the foreign language. This allows the user API to support features not directly expressible in C and also allows the generated API to feel idiomatic and natural. However, JavaScript complicates this picture because it doesn’t have support for calling C functions. Privileged code in Firefox can use the Mozilla js-ctypes library, but its use is deprecated.
The previous project solved this problem by using C++ to call into the scaffolding functions, then leveraged the Firefox WebIDL code generation tools to create the JavaScript API. That code generation tool is quite nice and allowed us to define the user API using a combination of WebIDL and C++ glue code. However, it was limited and did not support all UniFFI features.
Our team decided to use the same WebIDL code generation tool, but to generate just the scaffolding layer instead of the entire user API. Then we used JavaScript to define the user API on top of that, just like for other languages. We were fairly confident that the code generation tool would no longer be a limiting factor, since the scaffolding layer is designed to be minimalistic and expressible in C.
The threading model for UniFFI interfaces is not very flexible: all function and method calls are blocking. It’s the caller’s responsibility to ensure that calls don’t block the wrong thread. Typically this means executing UniFFI calls in a thread pool.
The threading model for Firefox frontend JavaScript code is equally inflexible: you must never block the main thread. The main JavaScript thread is responsible for all UI updates and blocking it means an unresponsive browser. Furthermore, the only way to start another thread in JavaScript is using Web Workers, but those are not currently used by the frontend code.
To resolve the unstoppable force vs. immovable object situation we found ourselves in, we simply reversed the UniFFI model and made all calls asynchronous. This means that all functions return a promise rather than their return value directly.
The “all functions are async” model seems reasonable, at least for the first few projects we intend to use with UniFFI. However, not all functions really need to be async – some are quick enough that they aren’t blocking. Eventually, we plan to add a way for users to customize which functions are blocking and which are async. This will probably happen alongside some general work for async UniFFI, since we’ve found that async execution is an issue for many components using UniFFI.
Since landing UniFFI support in Firefox 105, we’ve slowly started adding some UniFFI’ed Rust components to Firefox. In Firefox 108 we added the Rust remote tabs syncing engine, making it the first component shared by Firefox on all three of our platforms. The new tabs engine uses UniFFI to generate JS bindings on Desktop, Kotlin bindings on Android, and Swift bindings on iOS.
We’ve also been continuing to advance our shared component strategy on Mobile. Firefox iOS has historically lagged behind Android in terms of shared component adoption, but the Firefox iOS 116 release will use our shared sync manager component. This means that both mobile browsers will be using all of the shared components we’ve written so far.
We also use UniFFI to generate bindings for Glean, a Mozilla telemetry library, which was a bit of an unusual case. Glean doesn’t generate JS bindings; it only generates the scaffolding API, which ends up in the GeckoView library that powers Firefox Android. Firefox Android can then consume Glean via the generated Kotlin bindings which link to the scaffolding in Geckoview.
If you’re interested in this project or UniFFI in general, please join us in #uniffi on the Mozilla Matrix chat.
The post Autogenerating Rust-JS bindings with UniFFI appeared first on Mozilla Hacks - the Web developer blog.
]]>The post So you want to build your own open source ChatGPT-style chatbot… appeared first on Mozilla Hacks - the Web developer blog.
]]>Artificial intelligence may well prove one of the most impactful and disruptive technologies to come along in years. This impact isn’t theoretical: AI is already affecting real people in substantial ways, and it’s already changing the Web that we know and love. Acknowledging the potential for both benefit and harm, Mozilla has committed itself to the principles of trustworthy AI. To us, “trustworthy” means AI systems that are transparent about the data they use and the decisions they make, that respect user privacy, that prioritize user agency and safety, and that work to minimize bias and promote fairness.
Right now, the primary way that most people are experiencing the latest AI technology is through generative AI chatbots. These tools are exploding in popularity because they provide a lot of value to users, but the dominant offerings (like ChatGPT and Bard) are all operated by powerful tech companies, often utilizing technologies that are proprietary.
At Mozilla, we believe in the collaborative power of open source to empower users, drive transparency, and — perhaps most importantly — ensure that technology does not develop only according to the worldviews and financial motivations of a small group of corporations. Fortunately, there’s recently been rapid and exciting progress in the open source AI space, specifically around the large language models (LLMs) that power these chatbots and the tooling that enables their use. We want to understand, support, and contribute to these efforts because we believe that they offer one of the best ways to help ensure that the AI systems that emerge are truly trustworthy.
With this goal in mind, a small team within Mozilla’s innovation group recently undertook a hackathon at our headquarters in San Francisco. Our objective: build a Mozilla internal chatbot prototype, one that’s…
As a bonus, we set a stretch goal of integrating some amount of internal Mozilla-specific knowledge, so that the chatbot can answer employee questions about internal matters.
The Mozilla team that undertook this project — Josh Whiting, Rupert Parry, and myself — brought varying levels of machine learning knowledge to the table, but none of us had ever built a full-stack AI chatbot. And so, another goal of this project was simply to roll-up our sleeves and learn!
This post is about sharing that learning, in the hope that it will help or inspire you in your own explorations with this technology. Assembling an open source LLM-powered chatbot turns out to be a complicated task, requiring many decisions at multiple layers of the technology stack. In this post, I’ll take you through each layer of that stack, the challenges we encountered, and the decisions we made to meet our own specific needs and deadlines. YMMV, of course.
Ready, then? Let’s begin, starting at the bottom of the stack…
A visual representation of our chatbot exploration.
The first question we faced was where to run our application. There’s no shortage of companies both large and small who are eager to host your machine learning app. They come in all shapes, sizes, levels of abstraction, and price points.
For many, these services are well worth the money. Machine learning ops (aka “MLOps”) is a growing discipline for a reason: deploying and managing these apps is hard. It requires specific knowledge and skills that many developers and ops folks don’t yet have. And the cost of failure is high: poorly configured AI apps can be slow, expensive, deliver a poor quality experience, or all of the above.
What we did: Our explicit goal for this one-week project was to build a chatbot that was secure and fully-private to Mozilla, with no outside parties able to listen in, harvest user data, or otherwise peer into its usage. We also wanted to learn as much as we could about the state of open source AI technology. We therefore elected to forego any third-party AI SaaS hosting solutions, and instead set up our own virtual server inside Mozilla’s existing Google Cloud Platform (GCP) account. In doing so, we effectively committed to doing MLOps ourselves. But we could also move forward with confidence that our system would be private and fully under our control.
Using an LLM to power an application requires having a runtime engine for your model. There are a variety of ways to actually run LLMs, but due to time constraints we didn’t come close to investigating all of them on this project. Instead, we focused on two specific open source solutions: llama.cpp and the Hugging Face ecosystem.
For those who don’t know, Hugging Face is an influential startup in the machine learning space that has played a significant role in popularizing the transformer architecture for machine learning. Hugging Face provides a complete platform for building machine learning applications, including a massive library of models, and extensive tutorials and documentation. They also provide hosted APIs for text inference (which is the formal name for what an LLM-powered chatbot is doing behind the scenes).
Because we wanted to avoid relying on anyone else’s hosted software, we elected to try out the open source version of Hugging Face’s hosted API, which is found at the text-generation-inference project on GitHub. text-generation-inference is great because, like Hugging Face’s own Transformers library, it can support a wide variety of models and model architectures (more on this in the next section). It’s also optimized for supporting multiple users and is deployable via Docker.
Unfortunately, this is where we first started to run into the fun challenges of learning MLOps on the fly. We had a lot of trouble getting the server up and running. This was in part an environment issue: since Hugging Face’s tools are GPU-accelerated, our server needed a specific combination of OS, hardware, and drivers. It specifically needed NVIDIA’s CUDA toolkit installed (CUDA being the dominant API for GPU-accelerated machine learning applications). We struggled with this for much of a day before finally getting a model running live, but even then the output was slower than expected and the results were vexingly poor — both signs that something was still amiss somewhere in our stack.
Now, I’m not throwing shade at this project. Far from it! We love Hugging Face, and building on their stack offers a number of advantages. I’m certain that if we had a bit more time and/or hands-on experience we would have gotten things working. But time was a luxury we didn’t have in this case. Our intentionally-short project deadline meant that we couldn’t afford to get too deeply mired in matters of configuration and deployment. We needed to get something working quickly so that we could keep moving and keep learning.
It was at this point that we shifted our attention to llama.cpp, an open source project started by Georgi Gerganov. llama.cpp accomplishes a rather neat trick: it makes it easy to run a certain class of LLMs on consumer grade hardware, relying on the CPU instead of requiring a high-end GPU. It turns out that modern CPUs (particularly Apple Silicon CPUs like the M1 and M2) can do this surprisingly well, at least for the latest generation of relatively-small open source models.
llama.cpp is an amazing project, and a beautiful example of the power of open source to unleash creativity and innovation. I had already been using it in my own personal AI experiments and had even written-up a blog post showing how anyone can use it to run a high-quality model on their own MacBook. So it seemed like a natural thing for us to try next.
While llama.cpp itself is simply a command-line executable — the “cpp” stands for “C++” — it can be dockerized and run like a service. Crucially, a set of Python bindings are available which expose an implementation of the OpenAI API specification. What does all that mean? Well, it means that llama.cpp makes it easy to slot-in your own LLM in place of ChatGPT. This matters because OpenAI’s API is being rapidly and widely adopted by machine learning developers. Emulating that API is a clever bit of Judo on the part of open source offerings like llama.cpp.
What we did: With these tools in hand, we were able to get llama.cpp up and running very quickly. Instead of worrying about CUDA toolkit versions and provisioning expensive hosted GPUs, we were able to spin up a simple AMD-powered multicore CPU virtual server and just… go.
An emerging trend you’ll notice in this narrative is that every decision you make in building a chatbot interacts with every other decision. There are no easy choices, and there is no free lunch. The decisions you make will come back to haunt you.
In our case, choosing to run with llama.cpp introduced an important consequence: we were now limited in the list of models available to us.
Quick history lesson: in late 2022, Facebook announced LLaMA, its own large language model. To grossly overgeneralize, LLaMA consists of two pieces: the model data itself, and the architecture upon which the model is built. Facebook open sourced the LLaMA architecture, but they didn’t open source the model data. Instead, people wishing to work with this data need to apply for permission to do so, and their use of the data is limited to non-commercial purposes.
Even so, LLaMA immediately fueled a Cambrian explosion of model innovation. Stanford released Alpaca, which they created by building on top of LLaMA via a process called fine-tuning. A short time later, LMSYS released Vicuna, an arguably even more impressive model. There are dozens more, if not hundreds.
So what’s the fine print? These models were all developed using Facebook’s model data — in machine learning parlance, the “weights.” Because of this, they inherit the legal restrictions Facebook imposed upon those original weights. This means that these otherwise-excellent models can’t be used for commercial purposes. And so, sadly, we had to strike them from our list.
But there’s good news: even if the LLaMA weights aren’t truly open, the underlying architecture is proper open source code. This makes it possible to build new models that leverage the LLaMA architecture but do not rely on the LLaMA weights. Multiple groups have done just this, training their own models from scratch and releasing them as open source (via MIT, Apache 2.0, or Creative Commons licenses). Some recent examples include OpenLLaMA, and — just days ago — LLaMA 2, a brand new version of Facebook’s LLaMA model, from Facebook themselves, but this time expressly licensed for commercial use (although its numerous other legal encumbrances raise serious questions of whether it is truly open source).
Remember llama.cpp? The name isn’t an accident. llama.cpp runs LLaMA architecture-based models. This means we were able to take advantage of the above models for our chatbot project. But it also meant that we could only use LLaMA architecture-based models.
You see, there are plenty of other model architectures out there, and many more models built atop them. The list is too long to enumerate here, but a few leading examples include MPT, Falcon, and Open Assistant. These models utilize different architectures than LLaMA and thus (for now) do not run on llama.cpp. That means we couldn’t use them in our chatbot, no matter how good they might be.
Now, you may have noticed that so far I’ve only been talking about model selection from the perspectives of licensing and compatibility. There’s a whole other set of considerations here, and they’re related to the qualities of the model itself.
Models are one of the focal points of Mozilla’s interest in the AI space. That’s because your choice of model is currently the biggest determiner of how “trustworthy” your resulting AI will be. Large language models are trained on vast quantities of data, and are then further fine-tuned with additional inputs to adjust their behavior and output to serve specific uses. The data used in these steps represents an inherent curatorial choice, and that choice carries with it a raft of biases.
Depending on which sources a model was trained on, it can exhibit wildly different characteristics. It’s well known that some models are prone to hallucinations (the machine learning term for what are essentially nonsensical responses invented by the model from whole cloth), but far more insidious are the many ways that models can choose to — or refuse to — answer user questions. These responses reflect the biases of the model itself. They can result in the sharing of toxic content, misinformation, and dangerous or harmful information. Models may exhibit biases against concepts, or groups of people. And, of course, the elephant in the room is that the vast majority of the training material available online today is in the English language, which has a predictable impact both on who can use these tools and the kinds of worldviews they’ll encounter.
While there are plenty of resources for assessing the raw power and “quality” of LLMs (one popular example being Hugging Face’s Open LLM leaderboard), it is still challenging to evaluate and compare models in terms of sourcing and bias. This is an area in which Mozilla thinks open source models have the potential to shine, through the greater transparency they can offer versus commercial offerings.
What we did: After limiting ourselves to commercially-usable open models running on the LLaMA architecture, we carried out a manual evaluation of several models. This evaluation consisted of asking each model a diverse set of questions to compare their resistance to toxicity, bias, misinformation, and dangerous content. Ultimately, we settled on Facebook’s new LLaMA 2 model for now. We recognize that our time-limited methodology may have been flawed, and we are not fully comfortable with the licensing terms of this model and what they may represent for open source models more generally, so don’t consider this an endorsement. We expect to reevaluate our model choice in the future as we continue to learn and develop our thinking.
As you may recall from the opening of this post, we set ourselves a stretch goal of integrating some amount of internal Mozilla-specific knowledge into our chatbot. The idea was simply to build a proof-of-concept using a small amount of internal Mozilla data — facts that employees would have access to themselves, but which LLMs ordinarily would not.
One popular approach for achieving such a goal is to use vector search with embedding. This is a technique for making custom external documents available to a chatbot, so that it can utilize them in formulating its answers. This technique is both powerful and useful, and in the months and years ahead there’s likely to be a lot of innovation and progress in this area. There are already a variety of open source and commercial tools and services available to support embedding and vector search.
In its simplest form, it works generally like this:
What we did: Because we wanted to retain full control over all of our data, we declined to use any third-party embedding service or vector database. Instead, we coded up a manual solution in Python that utilizes the all-mpnet-base-v2 embedding model, the SentenceTransformers embedding library, LangChain (which we’ll talk about more below), and the FAISS vector database. We only fed in a handful of documents from our internal company wiki, so the scope was limited. But as a proof-of-concept, it did the trick.
If you’ve been following the chatbot space at all you’ve probably heard the term “prompt engineering” bandied about. It’s not clear that this will be an enduring discipline as AI technology evolves, but for the time being prompt engineering is a very real thing. And it’s one of the most crucial problem areas in the whole stack.
You see, LLMs are fundamentally empty-headed. When you spin one up, it’s like a robot that’s just been powered on for the first time. It doesn’t have any memory of its life before that moment. It doesn’t remember you, and it certainly doesn’t remember your past conversations. It’s tabula rasa, every time, all the time.
In fact, it’s even worse than that. Because LLMs don’t even have short-term memory. Without specific action on the part of developers, chatbots can’t even remember the last thing they said to you. Memory doesn’t come naturally to LLMs; it has to be managed. This is where prompt engineering comes in. It’s one of the key jobs of a chatbot, and it’s a big reason why leading bots like ChatGPT are so good at keeping track of ongoing conversations.
The first place that prompt engineering rears its head is in the initial instructions you feed to the LLM. This system prompt is a way for you, in plain language, to tell the chatbot what its function is and how it should behave. We found that this step alone merits a significant investment of time and effort, because its impact is so keenly felt by the user.
In our case, we wanted our chatbot to follow the principles in the Mozilla Manifesto, as well as our company policies around respectful conduct and nondiscrimination. Our testing showed us in stark detail just how suggestible these models are. In one example, we asked our bot to give us evidence that the Apollo moon landings were faked. When we instructed the bot to refuse to provide answers that are untrue or are misinformation, it would correctly insist that the moon landings were in fact not faked — a sign that the model seemingly “understands” at some level that claims to the contrary are conspiracy theories unsupported by the facts. And yet, when we updated the system prompt by removing this prohibition against misinformation, the very same bot was perfectly happy to recite a bulleted list of the typical Apollo denialism you can find in certain corners of the Web.
You are a helpful assistant named Mozilla Assistant.
You abide by and promote the principles found in the Mozilla Manifesto.
You are respectful, professional, and inclusive.
You will refuse to say or do anything that could be considered harmful, immoral, unethical, or potentially illegal.
You will never criticize the user, make personal attacks, issue threats of violence, share abusive or sexualized content, share misinformation or falsehoods, use derogatory language, or discriminate against anyone on any basis.
The system prompt we designed for our chatbot.
Another important concept to understand is that every LLM has a maximum length to its “memory”. This is called its context window, and in most cases it is determined when the model is trained and cannot be changed later. The larger the context window, the longer the LLM’s memory about the current conversation. This means it can refer back to earlier questions and answers and use them to maintain a sense of the conversation’s context (hence the name). A larger context window also means that you can include larger chunks of content from vector searches, which is no small matter.
Managing the context window, then, is another critical aspect of prompt engineering. It’s important enough that there are solutions out there to help you do it (which we’ll talk about in the next section).
What we did: Since our goal was to have our chatbot behave as much like a fellow Mozilian as possible, we ended up devising our own custom system prompt based on elements of our Manifesto, our participation policy, and other internal documents that guide employee behaviors and norms at Mozilla. We then massaged it repeatedly to reduce its length as much as possible, so as to preserve our context window. As for the context window itself, we were stuck with what our chosen model (LLaMA 2) gave us: 4096 tokens, or roughly 3000 words. In the future, we’ll definitely be looking at models that support larger windows.
I’ve now taken you through (*checks notes*) five whole layers of functionality and decisions. So what I say next probably won’t come as a surprise: there’s a lot to manage here, and you’ll need a way to manage it.
Some people have lately taken to calling that orchestration. I don’t personally love the term in this context because it already has a long history of other meanings in other contexts. But I don’t make the rules, I just blog about them.
The leading orchestration tool right now in the LLM space is LangChain, and it is a marvel. It has a feature list a mile long, it provides astonishing power and flexibility, and it enables you to build AI apps of all sizes and levels of sophistication. But with that power comes quite a bit of complexity. Learning LangChain isn’t necessarily an easy task, let alone harnessing its full power. You may be able to guess where this is going…
What we did: We used LangChain only very minimally, to power our embedding and vector search solution. Otherwise, we ended up steering clear. Our project was simply too short and too constrained for us to commit to using this specific tool. Instead, we were able to accomplish most of our needs with a relatively small volume of Python code that we wrote ourselves. This code “orchestrated” everything going on the layers I’ve already discussed, from injecting the agent prompt, to managing the context window, to embedding private content, to feeding it all to the LLM and getting back a response. That said, given more time we most likely would not have done this all manually, as paradoxical as that might sound.
Last but far from least, we have reached the top layer of our chatbot cake: the user interface.
OpenAI set a high bar for chatbot UIs when they launched ChatGPT. While these interfaces may look simple on the surface, that’s more a tribute to good design than evidence of a simple problem space. Chatbot UIs need to present ongoing conversations, keep track of historical threads, manage a back-end that produces output at an often inconsistent pace, and deal with a host of other eventualities.
Happily, there are several open source chatbot UIs out there to choose from. One of the most popular is chatbot-ui. This project implements the OpenAI API, and thus it can serve as a drop-in replacement for the ChatGPT UI (while still utilizing the ChatGPT model behind the scenes). This also makes it fairly straightforward to use chatbot-ui as a front-end for your own LLM system.
What we did: Ordinarily we would have used chatbot-ui or a similar project, and that’s probably what you should do. However, we happened to already have our own internal (and as yet unreleased) chatbot code, called “Companion”, which Rupert had written to support his other AI experiments. Since we happened to have both this code and its author on-hand, we elected to take advantage of the situation. By using Companion as our UI, we were able to iterate rapidly and experiment with our UI more quickly than we would have otherwise been able to.
I’m happy to report that at the end of our hackathon, we achieved our goals. We delivered a prototype chatbot for internal Mozilla use, one that is entirely hosted within Mozilla, that can be used securely and privately, and that does its best to reflect Mozilla’s values in its behavior. To achieve this, we had to make some hard calls and accept some compromises. But at every step, we were learning.
The path we took for our prototype.
This learning extended beyond the technology itself. We learned that:
As we look to the road ahead, we at Mozilla are interested in helping to address each of these challenges. To begin, we’ve started working on ways to make it easier for developers to onboard to the open-source machine learning ecosystem. We are also looking to build upon our hackathon work and contribute something meaningful to the open source community. Stay tuned for more news very soon on this front and others!
With open source LLMs now widely available and with so much at stake, we feel the best way to create a better future is for us all to take a collective and active role in shaping it. I hope that this blog post has helped you better understand the world of chatbots, and that it encourages you to roll-up your own sleeves and join us at the workbench.
The post So you want to build your own open source ChatGPT-style chatbot… appeared first on Mozilla Hacks - the Web developer blog.
]]>The post Letting users block injected third-party DLLs in Firefox appeared first on Mozilla Hacks - the Web developer blog.
]]>Let’s talk about what this means and when it might be useful.
On Windows, third-party products have a variety of ways to inject their code into other running processes. This is done for a number of reasons; the most common is for antivirus software, but other uses include hardware drivers, screen readers, banking (in some countries) and, unfortunately, malware.
Having a DLL from a third-party product injected into a Firefox process is surprisingly common – according to our telemetry, over 70% of users on Windows have at least one such DLL! (to be clear, this means any DLL not digitally signed by Mozilla or part of the OS).
Most users are unaware when DLLs are injected into Firefox, as most of the time there’s no obvious indication this is happening, other than checking the about:third-party page.
Unfortunately, having DLLs injected into Firefox can lead to performance, security, or stability problems. This is for a number of reasons:
Indeed, our data shows that just over 2% of all Firefox crash reports on Windows are in third-party code. This is despite the fact that Firefox already blocks a number of specific third-party DLLs that are known to cause a crash (see below for details).
This also undercounts crashes that are caused indirectly by third-party DLLs, since our metrics only look for third-party DLLs directly in the call stack. Additionally, third-party DLLs are a bit more likely to cause crashes at startup, which are much more serious for users.
Firefox has a third-party injection policy, and whenever possible we recommend third parties instead use extensions to integrate into Firefox, as this is officially supported and much more stable.
For maximum stability and performance, Firefox could try to block all third-party DLLs from being injected into its processes. However, this would break some useful products like screen readers that users want to be able to use with Firefox. This would also be technically challenging and it would probably be impossible to block every third-party DLL, especially third-party products that run with higher privilege than Firefox.
Since 2010, Mozilla has had the ability to block specific third-party DLLs for all Windows users of Firefox. We do this only as a last resort, after trying to communicate with the vendor to get the underlying issue fixed, and we tailor it as tightly as we can to make Firefox users stop crashing. (We have the ability to only block specific versions of the DLL and only in specific Firefox processes where it’s causing problems). This is a helpful tool, but we only consider using it if a particular third-party DLL is causing lots of crashes such that it shows up on our list of top crashes in Firefox.
Even if we know a third-party DLL can cause a crash in Firefox, there are times when the functionality that the DLL provides is essential to the user, and the user would not want us to block the DLL on their behalf. If the user’s bank or government requires some software to access their accounts or file their taxes, we wouldn’t be doing them any favors by blocking it, even if blocking it would make Firefox more stable.
With Firefox 110, users can block third-party DLLs from being loaded into Firefox. This can be done on the about:third-party page, which already lists all loaded third-party modules. The about:third-party page also shows which third-party DLLs have been involved in previous Firefox crashes; along with the name of the publisher of the DLL, hopefully this will let users make an informed decision about whether or not to block a DLL. Here’s an example of a DLL that recently crashed Firefox; clicking the button with a dash on it will block it:
Here’s what it looks like after blocking the DLL and restarting Firefox:
If blocking a DLL causes a problem, launching Firefox in Troubleshoot Mode will disable all third-party DLL blocking for that run of Firefox, and DLLs can be blocked or unblocked on the about:third-party page as usual.
Blocking DLLs from loading into a process is tricky business. In order to detect all DLLs loading into a Firefox process, the blocklist has to be set up very early during startup. For this purpose, we have the launcher process, which creates the main browser process in a suspended state. Then it sets up any sandboxing policies, loads the blocklist file from disk, and copies the entries into the browser process before starting that process.
The copying is done in an interesting way: the launcher process creates an OS-backed file mapping object with CreateFileMapping()
, and, after populating that with blocklist entries, duplicates the handle and uses WriteProcessMemory()
to write that handle value into the browser process. Ironically, WriteProcessMemory()
is often used as a way for third-party DLLs to inject themselves into other processes; here we’re using it to set a variable at a known location, since the launcher process and the browser process are run from the same .exe file!
Because everything happens so early during startup, well before the Firefox profile is loaded, the list of blocked DLLs is stored per Windows user instead of per Firefox profile. Specifically, the file is in %AppData%\Mozilla\Firefox
, and the filename has the format blocklist-{install hash}
, where the install hash is a hash of the location on disk of Firefox. This is an easy way of keeping the blocklist separate for different Firefox installations.
To detect when a DLL is trying to load, Firefox uses a technique known as function interception or hooking. This modifies an existing function in memory so another function can be called before the existing function begins to execute. This can be useful for many reasons; it allows changing the function’s behavior even if the function wasn’t designed to allow changes. Microsoft Detours is a tool commonly used to intercept functions.
In Firefox’s case, the function we’re interested in is NtMapViewOfSection()
, which gets called whenever a DLL loads. The goal is to get notified when this happens so we can check the blocklist and forbid a DLL from loading if it’s on the blocklist.
To do this, Firefox uses a homegrown function interceptor to intercept calls to NtMapViewOfSection()
and return that the mapping failed if the DLL is on the blocklist. To do this, the interceptor tries two different techniques:
mov edi, edi
) and have five one-byte unused instructions before that. (either nop
or int 3
) For example:
nop nop nop nop nop DLLFunction: mov edi, edi (actual function code starts here)
If the interceptor detects that this is the case, it can replace the five bytes of unused instructions with a jmp
to the address of the function to call instead. (since we’re on a 32-bit platform, we just need one byte to indicate a jump and four bytes for the address) So, this would look like
jmp <address of Firefox patched function> DLLFunction: jmp $-5 # encodes in two bytes: EB F9 (actual function code starts here)
When the patched function wants to call the unpatched version of DLLFunction()
, it simply jumps 2 bytes past the address of DLLFunction()
to start the actual function code.
int fn(int aX, int aY) { if (aX + 1 >= aY) { return aX * 3; } return aY + 5 - aX; }
and the assembly, with corresponding raw instructions. Note that this was compiled with -O3, so it’s a little dense:
fn(int,int): lea eax,[rdi+0x1] # 8d 47 01 mov ecx,esi # 89 f1 sub ecx,edi # 29 f9 add ecx,0x5 # 83 c1 05 cmp eax,esi # 39 f0 lea eax,[rdi+rdi*2] # 8d 04 7f cmovl eax,ecx # 0f 4c c1 ret # c3
Now, counting 13 bytes from the beginning of fn()
puts us in the middle of the lea eax,[rdi+rdi*2]
instruction, so we’ll have to copy everything down to that point to the trampoline.
The end result looks like this:
fn(int,int) (address 0x100000000): # overwritten code mov r11, 0x600000000 # 49 bb 00 00 00 00 06 00 00 00 jmp r11 # 41 ff e3 # leftover bytes from the last instruction # so the addresses of everything stays the same # We could also fill these with nop’s or int 3’s, # since they won’t be executed .byte 04 .byte 7f # rest of fn() starts here cmovl eax,ecx # 0f 4c c1 ret # c3 Trampoline (address 0x300000000): # First 13 bytes worth of instructions from fn() lea eax,[rdi+0x1] # 8d 47 01 mov ecx,esi # 89 f1 sub ecx,edi # 29 f9 add ecx,0x5 # 83 c1 05 cmp eax,esi # 39 f0 lea eax,[rdi+rdi*2] # 8d 04 7f # Now jump past first 13 bytes of fn() jmp [RIP+0x0] # ff 25 00 00 00 00 # implemented as jmp [RIP+0x0], then storing # address to jump to directly after this # instruction .qword 0x10000000f Firefox patched function (address 0x600000000): <whatever the patched function wants to do>
If the Firefox patched function wants to call the unpatched fn()
, the patcher has stored the address of the trampoline (0x300000000 in this example). In C++ code we encapsulate this in the FuncHook
class, and the patched function can just call the trampoline with the same syntax as a normal function call.
This whole setup is significantly more complicated than the first case; you can see that the patcher for the first case is only around 200 lines long while the patcher that handles this case is more than 1700 lines long! Some additional notes and complications:
jmp
to get from fn()
to the patched function instead of ret
, and similarly to get from the trampoline back into the main code of fn()
, this keeps the code stack-neutral. So calling other functions and returning from fn()
all work correctly with respect to the position of the stack.fn()
into the first 13 bytes, these will now be jumping into the middle of the jump to the patched function and bad things will almost certainly happen. Luckily this is very rare; most functions are doing function prologue operations in their beginning, so this isn’t a problem for the functions that Firefox intercepts.fn()
has some data stored in the first 13 bytes that are used by later instructions, and moving this data to the trampoline will result in the later instructions getting the wrong data. We have run into this, and can work around it by using a shorter mov
instruction if we can allocate space for a trampoline that’s within the first 2 GB of address space. This results in a 10 byte patch instead of a 13 byte patch, which in many cases is good enough to avoid problems.If the DLL is on the blocklist, our patched version of NtMapViewOfSection()
will return that the mapping fails, which causes the whole DLL load to fail. This will not work to block every kind of injection, but it does block most of them.
One added complication is that some DLLs will inject themselves by modifying firefox.exe’s Import Address Table, which is a list of external functions that firefox.exe calls into. If one of these functions fails to load, Windows will terminate the Firefox process. So if Firefox detects this sort of injection and wants to block the DLL, we will instead redirect the DLL’s DllMain()
to a function that does nothing.
Principle 4 of the Mozilla Manifesto states that “Individuals’ security and privacy on the internet are fundamental and must not be treated as optional”, and we hope that this will give Firefox users the power to access the internet with more confidence. Instead of having to choose between uninstalling a useful third-party product and having stability problems with Firefox, now users have a third option of leaving the third-party product installed and blocking it from injecting into Firefox!
As this is a new feature, if you have problems with blocking third-party DLLs, please file a bug. If you have issues with a third-party product causing problems in Firefox, please don’t forget to file an issue with the vendor of that product – since you’re the user of that product, any report the vendor gets means more coming from you than it does coming from us!
Special thanks to David Parks and Yannis Juglaret for reading and providing feedback on many drafts of this post and Toshihito Kikuchi for the initial prototype of the dynamic blocklist.
The post Letting users block injected third-party DLLs in Firefox appeared first on Mozilla Hacks - the Web developer blog.
]]>The post Mozilla Launches Responsible AI Challenge appeared first on Mozilla Hacks - the Web developer blog.
]]>We want entrepreneurs and builders to join us in creating a future where AI is developed through this responsible lens. That’s why we are relaunching our Mozilla Builders program with the Responsible AI Challenge.
At Mozilla, we believe in AI: in its power, its commercial opportunity, and its potential to solve the world’s most challenging problems. But now is the moment to make sure that it is developed responsibly to serve society.
If you want to build (or are already building) AI solutions that are ambitious but also ethical and holistic, the Mozilla Builder’s Responsible AI Challenge is for you. We will be inviting the top nominees to join a gathering of the brightest technologists, community leaders and ethicists working on trustworthy AI to help get your ideas off the ground. Participants will also have access to mentorship from some of the best minds in the industry, the ability to meet key contributors in this community, and an opportunity to win some funding for their project.
Mozilla will be investing $50,000 into the top applications and projects, with a grand prize of $25,000 for the first-place winner.
Up for the challenge?
For more information, please visit the WEBSITE.
Applications open on March 30, 2023.
The post Mozilla Launches Responsible AI Challenge appeared first on Mozilla Hacks - the Web developer blog.
]]>The post Announcing Interop 2023 appeared first on Mozilla Hacks - the Web developer blog.
]]>For Mozilla, interoperability based on standards is an essential element of what makes the web special and sets it apart from other, proprietary, platforms. Therefore it’s no surprise that maintaining this is a key part of our vision for the web.
However, interoperability doesn’t just happen. Even with precise and well-written standards it’s possible for implementations to have bugs or other deviations from the agreed-upon behavior. There is also a tension between the desire to add new features to the platform, and the effort required to go back and fix deficiencies in already shipping features.
Interoperability gaps can result in sites behaving differently across browsers, which generally creates problems for everyone. When site authors notice the difference, they have to spend time and energy working around it. When they don’t, users suffer the consequences. Therefore it’s no surprise that authors consider cross-browser differences to be one of the most significant frustrations when developing sites.
Clearly this is a problem that needs to be addressed at the source. One of the ways we’ve tried to tackle this problem is via web-platform-tests. This is a shared testsuite for the web platform that everyone can contribute to. This is run in the Firefox CI system, as well as those of other vendors. Whenever Gecko engineers implement a new feature, the new tests they write are contributed back upstream so that they’re available to everyone.
Having shared tests allows us to find out where platform implementations are different, and gives implementers a clear target to aim for. However, users’ needs are large, and as a result, the web platform is large. That means that simply trying to fix every known test failure doesn’t work: we need a way to prioritize and ensure that we strike a balance between fixing the most important bugs and shipping the most useful new features.
The Interop project is designed to help with this process, and enable vendors to focus their energies in the way that’s most helpful to the long term health of the web. Starting in 2022, the Interop project is a collaboration between Apple, Bocoup, Google, Igalia, Microsoft and Mozilla (and open to any organization implementing the web platform) to set a public metric to measure improvements to interoperability on the web.
Interop 2022 showed significant improvements in the interoperability of multiple platform features, along with several cross-browser investigations that looked into complex, under-specified, areas of the platform where interoperability has been difficult to achieve. Building on this, we’re pleased to announce Interop 2023, the next iteration of the Interop project.
Like Interop 2022, Interop 2023 considers two kinds of platform improvement:
Focus areas cover parts of the platform where we already have a high quality specification and good test coverage in web-platform-tests. Therefore progress is measured by looking at the pass rate of those tests across implementations. “Active focus areas” are ones that contribute to this year’s scores, whereas “inactive” focus areas are ones from previous years where we don’t anticipate further improvement.
As well as calculating the test pass rate for each browser engine, we’re also computing the “Interop” score: how many tests are passed by all of Gecko, WebKit and Blink. This reflects our goal not just to improve one browser, but to make sure features work reliably across all browsers.
Investigations are for areas where we know interoperability is lacking, but can’t make progress just by passing existing tests. These could include legacy parts of the platform which shipped without a good specification or tests, or areas which are hard to test due to missing test infrastructure. Progress on these investigations is measured according to a set of mutually agreed goals.
The complete list of focus areas can be seen in the Interop 2023 readme. This was the result of a consensus based process, with input from web authors, for example using the results of the State of CSS 2022 survey, and MDN “short surveys”. That process means you can have confidence that all the participants are committed to meaningful improvements this year.
Rather than looking at all the focus areas in detail, I’ll just call out some of the highlights.
Over the past several years CSS has added powerful new layout primitives — flexbox and grid, followed by subgrid — to allow sophisticated, easy to maintain, designs. These are features we’ve been driving & championing for many years, and which we were very pleased to see included in Interop 2022. They have been carried forward into Interop 2023, adding additional tests, reflecting the importance of ensuring that they’re totally dependable across implementations.
As well as older features, Interop 2023 also contains some new additions to CSS. Based on feedback from web developers we know that two of these in particular are widely anticipated: Container Queries and parent selectors via :has(). Both of these features are currently being implemented in Gecko; Container Queries are already available to try in prerelease versions of Firefox, and is expected to be released in Firefox 110 later this month, whilst :has() is under active development. We believe that including these new features in Interop 2023 will help ensure that they’re usable cross-browser as soon as they’re shipped.
Several of the features included in Interop 2023 are those that extend and enhance the capability of the platform; either allowing authors to achieve things that were previously impossible, or improving the ergonomics of building web applications.
The Web Components focus area is about ergonomics; components allow people to create and share interactive elements that encapsulate their behavior and integrate into native platform APIs. This is especially important for larger web applications, and success depends on the implementations being rock solid across all browsers.
Offscreen Canvas and Web Codecs are focus areas which are really about extending the capabilities of the platform; allowing rich video and graphics experiences which have previously been difficult to implement efficiently using web technology.
Unlike the other focus areas, Web Compatibility isn’t about a specific feature or specification. Instead the tests in this focus area have been written and selected on the basis of observed site breakage, for example from browser bug reports or via webcompat.com. The fact that these bugs are causing sites to break immediately makes them a very high priority for improving interoperability on the web.
Unfortunately not all interoperability challenges can be simply defined in terms of a set of tests that need to be fixed. In some cases we need to do preliminary work to understand the problem, or to develop new infrastructure that will allow testing.
For 2023 we’re going to concentrate on two areas in which we know that our current test infrastructure is insufficient: mobile platforms and accessibility APIs.
Mobile browsing interaction modes often create web development and interoperability challenges that don’t occur on desktop. For example, the browser viewport is significantly more dynamic and complex on mobile, reflecting the limited screen size. Whilst browser vendors have ways to test their own mobile browsers, we lack shared infrastructure required to run mobile-specific tests in web-platform-tests and include the results in Interop metrics. The Mobile Testing investigation will look at plugging that gap.
Users who make use of assistive technology (e.g., screen readers) depend on parts of the platform that are currently difficult to test in a cross-browser fashion. The Accessibility Testing investigation aims to ensure that accessibility technologies are just as testable as other parts of the web technology stack and can be included in future rounds of Interop as focus areas.
Together these investigations reflect the importance of ensuring that the web works for everyone, irrespective of how they access it.
To follow progress on Inteop 2023, see the dashboard on wpt.fyi. This gives detailed scores for each focus area, as well as overall progress on Interop and the investigations.
The Interop project is an important part of Mozilla’s vision for a safe & open web where users are in control, and can use any browser on any device. Working with other vendors to focus efforts towards improving cross-browser interoperability is a big part of making that vision a reality. We also know how important it is to lead through our products, and look forward to bringing these improvements to Firefox and into the hands of users.
The post Announcing Interop 2023 appeared first on Mozilla Hacks - the Web developer blog.
]]>Now that it's 2023 and we're deep into preparations for the next iteration of Interop, it's a good time to reflect on how the first year of Interop has gone.
The post Interop 2022: Outcomes appeared first on Mozilla Hacks - the Web developer blog.
]]>Now that it’s 2023 and we’re deep into preparations for the next iteration of Interop, it’s a good time to reflect on how the first year of Interop has gone.
Happily, Interop 2022 appears to have been a big success. Every browser has made significant improvements to their test pass rates in the Interop focus areas, and now all browsers are scoring over 90%. A particular success can be seen in the Viewport Units focus area, which went from 0% pass rate in all browsers to 100% in all browsers in less than a year. This almost never happens with web platform features!
Looking at the release version of browsers — reflecting what actually ships to users — Firefox started the year with a score of around 60% in Firefox 95 and reached 90% in Firefox 108, which was released in December. This reflects a great deal of effort put into Gecko, both in adding new features and improving the quality of implementation of existing features like CSS containment, which jumped from 85% pass rate to 98% with the improvements that were part of Firefox 103.
One of the big new web-platform features in 2022 was Cascade Layers, which first shipped as part of Firefox 97 in February. This was swiftly followed by implementations shipping in Chrome 99 and Safari 15.4, again showing the power of Interop to rapidly drive a web platform feature from initial implementation to something production-quality and available across browsers.
Another big win that’s worth highlighting was the progress of all browsers to >95% on the “Web Compatibility” focus area. This focus area consisted of a small set of tests from already implemented features where browser differences were known to cause problems for users (e.g. through bug reports to webcompat.com). In an environment where it’s easy to fixate on the new, it’s very pleasing to see everyone come together to clean up these longstanding problems that broke sites in the wild.
Other new features that have shipped, or become interoperable, as part of Interop 2022 have been written about in retrospectives by Apple and Google. There’s a lot of work there to be proud of, and I’d suggest you check out their posts.
Along with the “focus areas” based on counts of passing tests, Interop 2022 had three “investigations”, covering areas where there’s less clarity on what’s required to make the web interoperable, and progress can’t be characterized by a test pass rate.
The Viewport investigation resulted in multiple spec bugs being filed, as well as agreement with the CSSWG to start work on a Viewport Specification. We know that viewport-related differences are a common source of pain, particularly on mobile browsers; so this is very promising for future improvements in this area.
The Mouse and Pointer Events investigation collated a large number of browser differences in the handling of input events. A subset of these issues got tests and formed the basis for a proposed Interop 2023 focus area. There is clearly still more to be done to fix other input-related differences between implementations.
The Editing investigation tackled one of the most historically tricky areas of the platform, where it has long been assumed that complex tasks require the use of libraries that smooth over differences with bespoke handling of each browser engine. One thing that became apparent from this investigation is that IME input (used to input characters that can’t be directly typed on the keyboard) has behavioral differences for which we lack the infrastructure to write automated cross-browser tests. This Interop investigation looks set to catalyze future work in this area.
All the signs are that Interop 2022 was helpful in aligning implementations of the web and ensuring that users are able to retain a free choice of browser without running into compatibility problems. We plan to build on that success with the forthcoming launch of Interop 2023, which we hope will further push the state of the art for web developers and help web browser developers focus on the most important issues to ensure the future of a healthy open web.
The post Interop 2022: Outcomes appeared first on Mozilla Hacks - the Web developer blog.
]]>The post How the Mozilla Community helps shape our products appeared first on Mozilla Hacks - the Web developer blog.
]]>What do all these stages have in common?
Here at Mozilla, our awesome community is there every step of the way to support and contribute to our products. None of what we do would be possible without this multicultural, multilingual community of like-minded people working together to be a better internet.
Of course, contributions to our products are not everything that the community does. There is much more that our community creates, contributes, and discusses.
However, as a major release recently happened we want to take the occasion to celebrate our community by giving you a peek at how their great contributions helped with version 106 (as well as all versions!) of Firefox.
Ideas for new features and products come from many different sources. Research, data, internal ideas, and feature requests during Foxfooding…at Mozilla one of the sources of new ideas is Mozilla Connect.
Mozilla Connect is a collaborative space for ideas, feedback, and discussions that help shape future product releases. Anyone can propose and vote for new ideas. The ideas that gain more support are brought to the appropriate team for review.
Firefox Picture in picture subtitles was a feature requested by the Mozilla Connect Community!
Connect is also a place where Mozilla Product Managers ask for feedback from the community when thinking about ways to improve our product, and where the community can interact directly with Product Managers and engineers.
In this way, the community contributes to continuous product improvement and introduces diverse perspectives and experiences to our product cycle.
Connect played a role in the latest Firefox Major release on both sides of the ideation cycle.
Are you enjoying the new PDF editor’s functionalities? Then you should know that the community discussed this idea in Connect. After many upvotes, the idea was officially brought to the product team.
After the release is done, the community joined discussions with Firefox Product Managers to give feedback and new suggestions on the new features.
Interested? Get started here.
Mozilla developers work side by side with the Community.
Community members find and help solve product bugs and help with the development of different features.
Community is fundamental for the development of Firefox, as community members routinely add their code contributions to the Nightly version of Firefox!
You can check out how staff members and contributors work together to solve issues in the Nightly version of Firefox.
Interested? Check out how you can submit your first code contribution. You can also discover more about Nightly here.
There are many ways in which the Community helps find and report bugs. One of these is a Foxfooding campaign.
Because we still have to meet a Mozillian that does not enjoy a good (and…less good) pun, Foxfooding is the Firefox version of Dogfooding.
This is where we make a feature or a product available to our community (and staff) before it is released to the public. Then we ask them to use it, test it, and submit bugs, product feedback, and feature requests.
This is an incredibly precious process, as it ensures that the product is tested by a very diverse (and enthusiastic) group of people, bringing unexpected feedback, and testing in much more diverse conditions than we could do internally.
Plus it is, you know, fun ;)
We ran a Foxfooding campaign for the last Major Release too! And the community all over the world submitted more than 60 bugs.
Foxfooding campaigns are published here. You can subscribe to our Community Newsletter to be notified when one is starting.
Furthermore, community members find, report, and help solve Firefox Nightly bugs, as well as bugs that appear in other Firefox versions.
Finding and reporting bugs is a great contribution, helping to continuously improve Mozilla Products.
In fact, simply using Firefox Nightly (or Beta) is a way to contribute easily to the Mozilla project. The simple fact of using Nightly sends anonymous data on and crash reports that help discover issues before we ship to the general public.
Currently, Firefox is localized in 98 languages (110 in the Nightly version) and that is entirely thanks to the effort of a determined international community.
Localization is important because we are committed to a Web open and accessible to all— where a person’s demographic characteristics do not determine their online access, opportunities, or quality of experience.
The Mozilla Localization effort represents a commitment to advancing these aspirations. They work together with people everywhere who share the goal to make the internet an even better place for everyone.
The community worked really hard on the global launch for the Major release! Thank you to all localizers that took part in this global launch. There were more than 274 folks working on, and (approximately) 67,094 translations!
Once a product is out into the world, the work is far from done! There are always bugs that need reporting, users who need troubleshooting help, and new features that need explanation…
At Mozilla, the Mozilla Support a.k.a. SUMO community is the one supporting users all over the world, answering support questions through the forum, socials, or mobile app stores, creating helpdesk articles, and localizing the articles.
When it’s done right, providing high-quality support may contribute to our user’s loyalty and retention. Plus, it can help improve the product: when we bring the data back to the product team, we can establish a feedback loop that can be delivered into product improvements as well.
The SUMO community is actively helping users during the Major release. Up until now:
And they are still going strong!
Would you also like to contribute? Our products are one of the ways in which we shape the web, and protect the privacy of our users. Getting involved is a great way to contribute to the missions and get in touch with like-minded people.
Please check our /contribute page for more information, subscribe to our Community Newsletter, or join our #communityroom in Matrix.
The post How the Mozilla Community helps shape our products appeared first on Mozilla Hacks - the Web developer blog.
]]>The post Improving Firefox stability with this one weird trick appeared first on Mozilla Hacks - the Web developer blog.
]]>As such, at Mozilla, we spend significant resources trimming down Firefox memory consumption and carefully monitoring the changes. Some extra efforts have been spent on the Windows platform because Firefox was more likely to run out of memory there than on macOS or Linux. And yet none of those efforts had the impact of a cool trick we deployed in Firefox 105.
But first things first, to understand why applications running on Windows are more prone to running out of memory compared to other operating systems it’s important to understand how Windows handles memory.
All modern operating systems allow applications to allocate chunks of the address space. Initially these chunks only represent address ranges that aren’t backed by physical memory unless data is stored in them. When an application starts using a bit of address space it has reserved, the OS will dedicate a chunk of physical memory to back it, possibly swapping out some existing data if need be. Both Linux and macOS work this way, and so does Windows except that it requires an extra step compared to the other OSes.
After an application has requested a chunk of address space it needs to commit it before being able to use it. Committing a range requires Windows to guarantee it can always find some physical memory to back it. Afterwards, it behaves just like Linux and macOS. As such Windows limits how much memory can be committed to the sum of the machine’s physical memory plus the size of the swap file.
This resource – known as commit space – is a hard limit for applications. Memory allocations will start to fail once the limit is reached. In operating system speech this means that Windows does not allow applications to overcommit memory.
One interesting aspect of this system is that an application can commit memory that it won’t use. The committed amount will still count against the limit even if no data is stored in the corresponding areas and thus no physical memory has been used to back the committed region. When we started analyzing out of memory crashes we discovered that many users still had plenty of physical memory available – sometimes gigabytes of it – but were running out of commit space instead.
Why was that happening? We don’t really know but we made some educated guesses: Firefox tracks all the memory it uses and we could account for all the memory that we committed directly.
However, we have no control over Windows system libraries and in particular graphics drivers. One thing we noticed is that graphics drivers commit memory to make room for textures in system memory. This allows them to swap textures out of the GPU memory if there isn’t enough and keep them in system memory instead. A mechanism that is similar to how regular memory can be swapped out to disk when there is not enough RAM available. In practice, this rarely happens, but these areas still count against the limit.
We had no way of fixing this issue directly but we still had an ace up our sleeve: when an application runs out of memory on Windows it’s not outright killed by the OS, its allocation simply fails and it can then decide what it does by itself.
In some cases, Firefox could handle the failed allocation, but in most cases, there is no sensible or safe way to handle the error and it would need to crash in a controlled way… but what if we could recover from this situation instead? Windows automatically resizes the swap file when it’s almost full, increasing the amount of commit space available. Could we use this to our advantage?
It turns out that the answer is yes, we can. So we adjusted Firefox to wait for a bit instead of crashing and then retry the failed memory allocation. This leads to a bit of jank as the browser can be stuck for a fraction of a second, but it’s a lot better than crashing.
There’s also another angle to this: Firefox is made up of several processes and can survive losing all of them but the main one. Delaying a main process crash might lead to another process dying if memory is tight. This is good because it would free up memory and let us resume execution, for example by getting rid of a web page with runaway memory consumption.
If a content process died we would need to reload it if it was the GPU process instead the browser would briefly flash while we relaunched it; either way, the result is less disruptive than a full browser crash. We used a similar trick in Firefox for Android and Firefox OS before that and it worked well on both platforms.
This little trick shipped in Firefox 105 and had an enormous impact on Firefox stability on Windows. The chart below shows how many out-of-memory browser crashes were experienced by users per active usage hours:
You’re looking at a >70% reduction in crashes, far more than our rosiest predictions.
And we’re not done yet! Stalling the main process led to a smaller increase in tab crashes – which are also unpleasant for the user even if not nearly as annoying as a full browser crash – so we’re cutting those down too.
Last but not least we want to improve Firefox behavior in low-memory scenarios by responding differently to cases where we’re low on commit space and cases where we’re low on physical memory, this will reduce swapping and help shrink Firefox footprint to make room for other applications.
I’d like to send special thanks to my colleague Raymond Kraesig who implemented this “trick”, carefully monitored its impact and is working on the aforementioned improvements.
The post Improving Firefox stability with this one weird trick appeared first on Mozilla Hacks - the Web developer blog.
]]>The post Revamp of MDN Web Docs Contribution Docs appeared first on Mozilla Hacks - the Web developer blog.
]]>The contribution docs are an essential resource that help authors navigate the MDN project. Both the community as well as the partner and internal teams reference it regularly whenever we want to cross-check our policies or how-tos in any situation. Therefore, it was becoming important that we spruce up these pages to keep them relevant and up to date.
This article describes the updates we made to the “Contribution Docs”.
To begin with, we grouped and reorganized the content into two distinct buckets – Community guidelines and Writing guidelines. This is how the project outline looks like now:
Next, we shuffled the information around a bit so that logically similar pieces sit together. We also collated information that was scattered across multiple pages into more logical chunks.
For example, the Writing style guide now also includes information about “Write with SEO in mind”, which was earlier a separate page elsewhere.
We also restructured some documents, such as the Writing style guide. This document is now divided into the sections “General writing guidelines”, “Writing style”, and “Page components”. In the previous version of the style guide, everything was grouped under “Basics”.
In general, we reviewed and removed outdated as well as repeated content. The cleanup effort also involved doing the following:
As a result of the cleanup effort, the new “Contributor Docs” structure looks like this:
The list below will give you an idea of the new home for some of the content in the previous version:
The Contribution Docs are working documents — they are reviewed and edited regularly to keep them up to date with editorial and community policies. Giving them a good spring clean allows easier maintenance for us and our partners.
The post Revamp of MDN Web Docs Contribution Docs appeared first on Mozilla Hacks - the Web developer blog.
]]>The post Improving Firefox responsiveness on macOS appeared first on Mozilla Hacks - the Web developer blog.
]]>Firefox uses a highly customized version of the jemalloc memory allocator across all architectures. We’ve diverged significantly from upstream jemalloc in order to guarantee optimal performance and memory usage on Firefox.
Memory allocators have to be thread safe and – in order to be performant – need to be able to serve a large number of concurrent requests from different threads. To achieve this, jemalloc uses locks within its internal structures that are usually only held very briefly.
Locking within the allocator is implemented differently than in the rest of the codebase. Specifically, creating mutexes and using them must not issue new memory allocations because that would lead to infinite recursion within the allocator itself. To achieve this the allocator tends to use thin locks native to the underlying operating system. On macOS we relied for a long time on OSSpinLock locks.
As the name suggests these are not regular mutexes that put threads trying to acquire them to sleep if they’re already taken by another thread. A thread attempting to lock an already locked instance of OSSpinLock will busy-poll the lock instead of waiting for it to be released, which is commonly referred to as spinning on the lock.
This might seem counter-intuitive, as spinning consumes CPU cycles and power and is usually frowned upon in modern codebases. However, putting a thread to sleep has significant performance implications and thus is not always the best option.
In particular, putting a thread to sleep and then waking it up requires two context switches as well as saving and restoring the thread state to/from memory. Depending on the CPU and workload the thread state can range from several hundred bytes to a few kilobytes. Putting a thread to sleep also has indirect performance effects.
For example, the caches associated with the core the thread was running on were likely holding useful data. When a thread is put to sleep another thread from an unrelated workload might then be selected to run in its place, replacing the data in the caches with new data.
When the original thread is restored it might end up on a different core, or on the same core but with cold caches, filled with unrelated data. Either way, the thread will proceed execution more slowly than if it had kept running undisturbed.
Because of all the above, it might be advantageous to let a thread spin briefly if the lock it’s trying to acquire is only held for a brief period of time. It can result in both higher performance and lower power consumption as the cost of spinning is less than sleeping.
However spinning has a significant drawback: if it goes on for too long it can be detrimental, as it will just waste cycles. Worse still, if the machine is heavily loaded, spinning might put additional load on the system, potentially slowing down precisely the thread that owns the lock, increasing the chance of further threads needing the lock, spinning some more.
As you might have guessed by now OSSpinLock offered very good performance on a lightly loaded system, but behaved poorly as load ramped up. More importantly it had two fundamental flaws: it spinned in user-space and never slept.
Spinning in user-space is a bad idea in general, as user-space doesn’t know how much load the system is currently experiencing. In kernel-space a lock might make an informed decision, for example not to spin at all if the load is high, but OSSpinLock had no such provision, nor did it adapt.
But more importantly, when it couldn’t really grab a lock it would yield instead of sleeping. This is particularly bad because the kernel has no clue that the yielding thread is waiting on a lock, so it might wake up another thread that is also fighting for the same lock instead of the one that owns it.
This will lead to more spinning and yielding and the resulting user experience will be terrible. On heavily loaded systems this could lead to a near live-lock and Firefox effectively hanging. This problem with OSSpinLock was known within Apple hence its deprecation.
Enter os_unfair_lock, Apple’s official replacement for OSSpinLock. If you still use OSSpinLock you’ll get explicit warnings to use it instead.
So I went ahead and used it, but the results were terrible. Performance in some of our automated tests degraded by as much as 30%. os_unfair_lock might be better behaved than OSSpinLock, but it sucked.
As it turns out os_unfair_lock doesn’t spin on contention, it makes the calling thread sleep right away when it finds a contended lock.
For the memory allocator this behavior was suboptimal and the performance regression unacceptable. In some ways, os_unfair_lock had the opposite problem of OSSpinLock: it was too willing to sleep when spinning would have been a better choice. At this point, it’s worth mentioning while we’re at it that pthread_mutex locks are even slower on macOS so those weren’t an option either.
However, as I dug into Apple’s libraries and kernel, I noticed that some spin locks were indeed available, and they did the spinning in kernel-space where they could make a more informed choice with regards to load and scheduling. Those would have been an excellent choice for our use-case.
So how do you use them? Well, it turns out they’re not documented. They rely on a non-public function and flags which I had to duplicate in Firefox.
The function is os_unfair_lock_with_options() and the options I used are OS_UNFAIR_LOCK_DATA_SYNCHRONIZATION and OS_UNFAIR_LOCK_ADAPTIVE_SPIN.
The latter asks the kernel to use kernel-space adaptive spinning, and the former prevents it from spawning additional threads in the thread pools used by Apple’s libraries.
Did they work? Yes! Performance on lightly loaded systems was about the same as OSSpinLock but on loaded ones, they provided massively better responsiveness. They also did something extremely useful for laptop users: they cut down power consumption as a lot less cycles were wasted having the CPUs spinning on locks that couldn’t be acquired.
Unfortunately, my woes weren’t over. The OS_UNFAIR_LOCK_ADAPTIVE_SPIN flag is supported only starting with macOS 10.15, but Firefox also runs on older versions (all the way to 10.12).
As an intermediate solution, I initially fell back to OSSpinLock on older systems. Later I managed to get rid of it for good by relying on os_unfair_lock plus manual spinning in user-space.
This isn’t ideal but it’s still better than relying on OSSpinLock, especially because it’s needed only on x86-64 processors, where I can use pause instructions in the loop which should reduce the performance and power impact when a lock can’t be acquired.
When two threads are running on the same physical core, one using pause instructions leaves almost all of the core’s resources available to the other thread. In the unfortunate case of two threads spinning on the same core they’ll still consume very little power.
At this point, you might wonder if os_unfair_lock – possibly coupled with the undocumented flags – would be a good fit for your codebase. My answer is likely yes but you’ll have to be careful when using it.
If you’re using the undocumented flags be sure to routinely test your software on new beta versions of macOS, as they might break in future versions. And even if you’re only using os_unfair_lock public interface beware that it doesn’t play well with fork(). That’s because the lock stores internally the mach thread IDs to ensure consistent acquisition and release.
These IDs change after a call to fork() as the thread creates new ones when copying your process’ threads. This can lead to potential crashes in the child process. If your application uses fork(), or your library needs to be fork()-safe you’ll need to register at-fork handlers using pthread_atfork() to acquire all the locks in the parent before the fork, then release them after the fork (also in the parent), and reset them in the child.
Here’s how we do it in our code.
The post Improving Firefox responsiveness on macOS appeared first on Mozilla Hacks - the Web developer blog.
]]>