An update on MDN Web Docs’ localization strategy

In our previous post — MDN Web Docs evolves! Lowdown on the upcoming new platform — we talked about many aspects of the new MDN Web Docs platform that we’re launching on December 14th. In this post, we’ll look at one aspect in more detail — how we are handling localization going forward. We’ll talk about how our thinking has changed since our previous post, and detail our updated course of action.

Updated course of action

Based on thoughtful feedback from the community, we did some additional investigation and determined a stronger, clearer path forward.

First of all, we want to keep a clear focus on work leading up to the launch of our new platform, and making sure the overall system works smoothly. This means that upon launch, we still plan to display translations in all existing locales, but they will all initially be frozen — read-only, not editable.

We were considering automated translations as the main way forward. One key issue was that automated translations into European languages are seen as an acceptable solution, but automated translations into CJK languages are far from ideal — they have a very different structure to English and European languages, plus many Europeans are able to read English well enough to fall back on English documentation when required, whereas some CJK communities do not commonly read English so do not have that luxury.

Many folks we talked to said that automated translations wouldn’t be acceptable in their languages. Not only would they be substandard, but a lot of MDN Web Docs communities center around translating documents. If manual translations went away, those vibrant and highly involved communities would probably go away — something we certainly want to avoid!

We are therefore focusing on limited manual translations as our main way forward instead, looking to unfreeze a number of key locales as soon as possible after the new platform launch.

Limited manual translations

Rigorous testing has been done, and it looks like building translated content as part of the main build process is doable. We are separating locales into two tiers in order to determine which will be unfrozen and which will remain locked.

  • Tier 1 locales will be unfrozen and manually editable via pull requests. These locales are required to have at least one representative who will act as a community lead. The community members will be responsible for monitoring the localized pages, updating translations of key content once the English versions are changed, reviewing edits, etc. The community lead will additionally be in charge of making decisions related to that locale, and acting as a point of contact between the community and the MDN staff team.
  • Tier 2 locales will be frozen, and not accept pull requests, because they have no community to maintain them.

The Tier 1 locales we are starting with unfreezing are:

  • Simplified Chinese (zh-CN)
  • Traditional Chinese (zh-TW)
  • French (fr)
  • Japanese (ja)

If you wish for a Tier 2 locale to be unfrozen, then you need to come to us with a proposal, including evidence of an active team willing to be responsible for the work associated with that locale. If this is the case, then we can promote the locale to Tier 1, and you can start work.

We will monitor the activity on the Tier 1 locales. If a Tier 1 locale is not being maintained by its community, we shall demote it to Tier 2 after a certain period of time, and it will become frozen again.

We are looking at this new system as a reasonable compromise — providing a path for you the community to continue work on MDN translations providing the interest is there, while also ensuring that locale maintenance is viable, and content won’t get any further out of date. With most locales unmaintained, changes weren’t being reviewed effectively, and readers of those locales were often confused between using their preferred locale or English, their experience suffering as a result.

Review process

The review process will be quite simple.

  • The content for each Tier 1 locale will be kept in its own separate repo.
  • When a PR is made against that repo, the localization community will be pinged for a review.
  • When the content has been reviewed, an MDN admin will be pinged to merge the change. We should be able to set up the system so that this happens automatically.
  • There will also be some user-submitted content bugs filed at https://github.com/mdn/sprints/issues, as well as on the issue trackers for each locale repo. When triaged, the “sprints” issues will be assigned to the relevant localization team to fix, but the relevant localization team is responsible for triaging and resolving issues filed on their own repo.

Machine translations alongside manual translations

We previously talked about the potential involvement of machine translations to enhance the new localization process. We still have this in mind, but we are looking to keep the initial system simple, in order to make it achievable. The next step in Q1 2021 will be to start looking into how we could most effectively make use of machine translations. We’ll give you another update in mid-Q1, once we’ve made more progress.

About Chris Mills

Chris Mills is a senior tech writer at Mozilla, where he writes docs and demos about open web apps, HTML/CSS/JavaScript, A11y, WebAssembly, and more. He loves tinkering around with web technologies, and gives occasional tech talks at conferences and universities. He used to work for Opera and W3C, and enjoys playing heavy metal drums and drinking good beer. He lives near Manchester, UK, with his good lady and three beautiful children.

More articles by Chris Mills…


8 comments

  1. Janet Swisher

    Glad to hear that you’ve found a way to support active localization communities, while minimizing the maintenance burden of inactive locales.

    December 8th, 2020 at 10:22

    1. Chris Mills

      Thanks Janet, lovely to hear from you!

      December 9th, 2020 at 08:18

  2. Eric Shepherd

    While I”m thrilled that a solution has been found to provide good support for localization, I’m still disappointed by the loss of the user-friendly aspects of the current contribution workflow. I definitely agree that having a contribute-review-publish model is key to improving the overall quality of MDN content and the communication among the contributors and staff.

    But as most people who’ve spent a long stretch of time working with me know, I’m a huge believer in the WYSIWYG editing approach for documentation content. I’m frustrated to see MDN throw away this key feature of its current design. Obviously, while I was still on staff, I was frequently verbal with my concerns about this. I was also very surprised to find out how many people prefer to edit the HTML by hand over using a nice WYWIWYG editor.

    Perhaps the problem was the quirks and flaws in the existing editor experience, many of which are caused by the poor support for editing within the HTML spec at this time. Regardless, once you learned these problems, you could usually avoid them easily enough.

    It’s probably for the best that I’m not on staff anymore (even though I totally miss it). Going to an edit-the-source approach would have at least doubled how long it took me to get anything done, and would have driven me absolutely insane to boot. I’d have hated it. And I can say that any chances of my continued contributions — even small ones — have, sadly, died entirely. ☹️

    December 8th, 2020 at 10:52

    1. Chris Mills

      Lovely to hear from you Sheppy, and we miss you too! You’ve made your feelings on this subject abundantly clear throughout ;-)

      It is worth noting that we are intending to add to the Yari toolset as we move forward after the launch, based on feedback that comes in. It is also worth noting that having the workflow based on GitHub makes it easy to plug it in to whatever workflow/toolset you want. I’m sure there are more folks out there than just you that value the WYSIWYG approach, and that a solution will emerge for such contributors.

      December 9th, 2020 at 08:21

  3. Daniele Mte90 Scasciafratte

    Honestly I am against the automatic localization also for european languages, in my case italian.

    In our experience (also as developer) this is very bad because this means that technical terms will be localized (not all the languages have the relative for a term and use the English version) at the same time terms that are localized in a way, inside Firefox or Chrome as example, in the documentation have different terms (for the same feature) with all the confusion that this can have to the users.

    In this way the volunteers will use their time to fix the automatic localization, WordPress community had that in the past and was removed after few months for all this issues and cleaning required years because somewhere there are still those issues.

    The same reasons why big companies don’t automatically localize stuff without humans to avoid these errors because create an issue with the project brand.

    December 9th, 2020 at 05:36

    1. Chris Mills

      Hi Daniele!

      In an ideal world, we wouldn’t use automated translations. I appreciate that they are not completely ideal. But the suggested solution is a compomise between allowing active locales to keep being manually maintained where there is interest in doing so, while not keeping around too many outdated, non-maintained translations. We don’t have the resources to properly maintain all these locales, without the help of community teams.

      We are intending to use a glossary of terms (maintained inside the Firefox team) along with the automated translation system, to ensure that technical terms are either not translated at all, or translated consistently, as appropriate. With the aim of reducing such confusion.

      December 9th, 2020 at 08:17

  4. Jean-Baptiste

    if your machine translation mechanism is generic, it could be used by everyone, and therefore receive contributions from everyone.

    Let’s say the content follow these steps:

    1. Human-managed English repository
    2. Machine-managed translation template repository
    3. Human-managed translation repository (via a translation platform)
    4. Machine-completed translation repository
    5. Publication repository

    The translation machine would only intervene between 3 and 4, the human would go through the “normal” translation process to correct what is not right. This correction process would therefore be generic (what would be specific is the way to easily access the string to be translated).

    As a result, the translation machine is generic, everyone can add content to it, or test alternative engines depending on the language.

    In existing translation memories, we have for example what the amagama project is doing: https://github.com/translate/amagama-updater/

    or what I do with Fedora content https://jibecfed.fedorapeople.org/partage/fedora-localization-statistics/f32/language/fr/

    or the different Weblate around the world: https://weblate.org/fr/news/archive/weblate-even-more-open-now/

    or the Ubuntu langpacks: https://launchpad.net/~ubuntu-langpack/+maintained-packages

    I can help in this work, with the hope that:
    * any free project can query the translation engine for its own needs (although prior authorization might be required)
    * every language has a right to exist, even if Mozilla will focus on its own shortlist.

    Realistic scenario:

    * could we fill in the gaps in translations of the Fedora translation for example?
    * could we fill in the gaps in the LibreOffice translation?
    * etc…

    December 12th, 2020 at 03:41

  5. Harvey Liu

    Haven’t visited to MDN for some time and really surprised that the whole website shifted to Github. (I planned to comment on the Oct. article but the comment there was closed.)
    Like many others already suggested, WYWIWYG editor is way better than the current solution where everyone send PR on Github. Other sites like Wikipedia do have better managements over user edits.
    Personally, I do find the edit history important as well. In a way I can see how such technology evolved and how it can be linked to other technologies in history.
    The activities of user profiles also disappear along the “decouple” and now the profile is nothing more than a description. I used to imagine MDN community can grow to a forum like StackOverflow.

    December 18th, 2020 at 13:54

Comments are closed for this article.