In our previous post, An update on MDN Web Docs’ localization strategy, we explained our broad strategy for moving forward with allowing translation edits on MDN again. The MDN localization communities are waiting for news of our progress on unfreezing the top-tier locales, and here we are. In this post we’ll look at where we’ve got to so far in 2021, and what you can expect moving forward.
Normalizing slugs between locales
Previously on MDN, we allowed translators to localize document URL slugs as well as the document title and body content. This sounds good in principle, but has created a bunch of problems. It has resulted in situations where it is very difficult to keep document structures consistent.
If you want to change the structure or location of a set of documentation, it can be nearly impossible to verify that you’ve moved all of the localized versions along with the
en-US versions — some of them will be under differently-named slugs both in the original and new locations, meaning that you’d have to spend time tracking them down, and time creating new parent pages with the correct slugs, etc.
As a knock-on effect, this has also resulted in a number of localized pages being orphaned (not being attached to any parent
en-US pages), and a number of
en-US pages being translated more than once (e.g. localized once under the existing
en-US slug, and then again under a localized slug).
For example, the following table shows the top-level directories in the
en-US locale as of Feb 1, 2021, compared to that of the
To make the non-
en-US locales consistent and manageable, we are going to move to having
en-US slugs only — all localized pages will be moved under their equivalent location in the
en-US tree. In cases where that location cannot be reliably determined — e.g. where the documents are orphans or duplicates — we will put those documents into a specific storage directory, give them an appropriate prefix, and ask the maintenance communities for each unfrozen locale to sort out what to do with them.
- Every localized document will be kept in a separate repo to the
en-UScontent, but will have a corresponding
en-USdocument with the same slug (folder path).
- At first this will be enforced during deployment — we will move all the localized documents so that their locations are synchronized with their
en-USequivalents. Every document that does not have a corresponding
en-USdocument will be prefixed with
orphanedduring deployment. We plan to further automate this to check whenever a PR is created against the repo. We will also funnel back changes from the main
en-UScontent repo, i.e. if an
en-USpage is moved, the localized equivalents will be automatically moved too.
- All locales will be migrated, unfortunately, some documents will be marked as orphaned and some others will be marked as conflicting (as in adding a prefix
conflictingto their slug). Conflicting documents have a corresponding
en-USdocument with multiple translations in the same locale.
- We plan to delete, archive, or move out orphaned/conflicting content.
- Nothing will be lost since everything is in a git repo (even if something is deleted, it can still be recovered from the git history).
Processes for identifying unmaintained content
The other problem we have been wrestling with is how to identify what localized content is worth keeping, and what isn’t. Since many locales have been largely unmaintained for a long time, they contain a lot of content that is very out-of-date and getting further out-of-date as time goes on. Many of these documents are either not relevant any more at all, incomplete, or simply too much work to bring up to date (it would be better to just start from nothing).
It would be better for everyone involved to just delete this unmaintained content, so we can concentrate on higher-value content.
The criteria we have identified so far to indicate unmaintained content is as follows:
- Pages that should have compat tables, which are missing them.
- Pages that should have interactive examples and/or embedded examples, which are missing them.
- Pages that should have a sidebar, but don’t.
- Pages where the KumaScript is breaking so much that it’s not really renderable in a usable way.
These criteria are largely measurable; we ran some scripts on the translated pages to calculate which ones could be marked as unmaintained (they match one or more of the above). The results are as follows:
If you look for compat, interactive examples, live samples, orphans, and all sidebars:
- Unmaintained: 30.3%
- Disconnected (orphaned): 3.1%
If you look for compat, interactive examples, live samples, orphans, but not sidebars:
- Unmaintained: 27.5%
- Disconnected (orphaned): 3.1%
This would allow us to get rid of a large number of low-quality pages, and make dealing with localizations easier.
We created a spreadsheet that lists all the pages that would be put in the unmaintained category under the above rules, in case you were interested in checking them out.
Stopping the display of non-tier 1 locales
After we have unfrozen the “tier 1” locales (
zh-TW), we are planning to stop displaying other locales. If no-one has the time to maintain a locale, and it is getting more out-of-date all the time, it is better to just not show it rather than have potentially harmful unmaintained content available to mislead people.
This makes sense considering how the system currently works. If someone has their browser language set to say
fr, we will automatically serve them the
fr version of a page, if it exists, rather than the
en-US version — even if the
fr version is old and really out-of-date, and the
en-US version is high-quality and up-to-date.
Going forward, we will show
en-US and the tier 1 locales that have active maintenance communities, but we will not display the other locales. To get a locale displayed again, we require an active community to step up and agree to have responsibility for maintaining that locale (which means reviewing pull requests, fixing issues filed against that locale, and doing a reasonable job of keeping the content up to date as new content is added to the
If you are interested in maintaining an unmaintained locale, we are more than happy to talk to you. We just need a plan. Please get in touch!
Note: Not showing the non-tier 1 locales doesn’t mean that we will delete all the content. We are intending to keep it available in our archived-content repo in case anyone needs to access it.
The immediate next step is to get the tier 1 locales unfrozen, so we can start to get those communities active again and make that content better. We are hoping to get this done by the start of March. The normalizing slugs work will happen as part of this.
After that we will start to look at stopping the display of non-tier 1 localized content — that will follow soon after.
Identifying and removing unmaintained content will be a longer game to play — we want to involve our active localization communities in this work for the tier 1 locales, so this will be done after the other two items.
About Chris Mills