November 30 CX Update: New Template Editor

Content Translation is getting a major new feature: Completely re-written support for templates. It was in design, testing and development since June 2016, and the first version of this feature was released today, Wednesday November 30 to Wikipedia in Catalan and Hebrew, and tomorrow, December 1st to Wikipedia in all languages.

The goal of this new feature is to make it easy to translate the templates across languages.

We want to give more control to all the people who use the Content Translation feature directly or are affected by it: translators, other editors of articles that were created as translations, and template maintainers.

Templates are used heavily in all Wikimedia projects. When Content Translation’s development started in 2014, the developers gave it very basic template support. Templates that used a whole paragraph, such as infoboxes and long quotations, were usually skipped completely. Shorter templates inside paragraphs, such as references, unit conversions, quotes in other languages, “citation needed”, etc., were adapted to a corresponding template in the target language when possible, or substituted with wiki syntax.

While this was useful for the creation of much more than 100,000 new articles in a lot of languages, this was far from perfect. It was confusing that infoboxes and whole paragraphs of quotations were not shown during the translation, and they had to be inserted manually after creating the first version of the translated article. References were frequently adapted incorrectly and inserted a lot of hard-to-maintain wiki syntax.

We now start to address this issue by letting translators choose what to do with each template. No templates are silently ignored now, so infoboxes and all other templates are shown in the source article column during the translation. When clicking on a template, a card on the sidebar will let the translator choose what to with the template. It’s possible to skip a template entirely (“Skip template”) or to insert the wiki syntax of the template as it appears in the original language (“Keep original template”). If an equivalent template is available in the target language, it will be possible to insert it, and edit the parameters one by one (“Use equivalent template”).

tradueix-la-pagina-viquipedia-l-enciclopedia-lliure-1
The template editor, while translating the article Shalom Meir Tower from English into Catalan. All the parameter names are shown, and can be added one by one. After adding all the needed parameters, close the editor and the template will be shown.

If the equivalent templates have the same parameter names, their values will be copied automatically. If the parameter names are different, but the template in the target language has TemplateData defined with names of parameters and aliases that are the same as the parameter names in the source language, they can also be adapted automatically. You can read more about TemplateData at mediawiki.org.

tradueix-la-pagina-viquipedia-l-enciclopedia-lliure
The template, inserted after translation. Notice that the template is rendered during the translation and the differences between the design in the different languages are easy to see.

Wikis have people who develop and maintain the templates in them. This is also an opportunity for all wikis—large, medium, and small—to take a look at their templates and improve them. Here are several things that can be done:

  • Add TemplateData (link: https://www.mediawiki.org/wiki/Help:TemplateData) to templates that don’t have it yet. This will allow Content Translation and Visual Editor to show template insertion and editing forms where all the parameters are displayed conveniently.
  • Consider adding aliases for template parameter names that correspond to parameters in wikis in other languages from which articles are frequently translated into your language. You can see from which languages articles are translated most often into yours by going to the page Special:CXStats in your wiki.
  • Consider making the types of parameter more similar across languages. For example, in some languages images are provided as complete file links (e.g. “ {{Infobox person|image=[[File:Sophie Kowalevski.jpg|thumb|300px|Sofia Kovalevskaya, 1880]]}}”) and others have separate parameters for file name, size and caption (e.g. {{Infobox person|name=Sofia Kovalevskaya|image=Sophie Kowalevski.jpg|image_size=300|caption=Sofia Kovalevskaya, 1880}}). Making the parameter structure similar to the structure in the language from which articles are often translated will make the work considerably more efficient for translators and article maintainers.

As noted earlier, this is only the first release of this feature. Templates on Wikimedia projects are very diverse, and while the developers tested the new template editor with many templates in many languages, it is impossible for us to test it with all the different templates—there are just too many of them. Because of this, it may be impossible to adapt some templates at first. As always, we’d love to hear from you about templates that can’t be adapted, and about other bugs. We nevertheless believe that this feature is already an improvement over the way that templates were handled till today, and we are continuing the development to make template translation easier and more efficient based on your input.

You can read more about the design and the development of this feature, as well as details for its future improvements in Phabricator task T139332.

August 29 CX Update: Easier machine translation control, less saving errors, and more wiki syntax and templates clean-up

Highlights of recently deployed Content Translation changes:

  • One of the most common complaints about the Content Translation editing interface was that it’s too easy to remove a paragraph and there is no way to undo it. The button that removes the paragraph was in the “Automatic translation” card, which confused many translators. To address this, this card was completely redesigned, to make editing and configuring machine translation easier. (task description)
  • For several days links to foreign languages were inserted instead of internal links. This was fixed. (bug report)
  • ISBN links were frequently added with <nowiki> tags. This is now fixed. (bug report)
  • Some users couldn’t save translations and saw as “Internal database error”. This was fixed. (bug report)
  • Many fixes were made for common citation templates in Spanish, Portuguese, Polish, Welsh and other languages (see T142753 for an example of such a fix). This is a step towards generally more robust support for template adaptation (in progress), which will give translators and wiki editing communities more flexibility, ease and control of the translated content.

June 24 CX Update: Cleaner wiki syntax, better AbuseFilter support, and more improvements

Welcome back to CX updates!

For some time the development team took a break from developing Content Translation frontend features to focus on some background fixes and on other projects that were on the back-burner. Now we are back to making major updates to our article translation platform.

The areas on which we focus at the moment and for the next couple of months are making the wiki syntax of the published pages cleaner and easier to maintain after the first version of the translated article is created, and making template and reference adaptation more stable. There is much to do there, but here are some changes were already deployed:

If there was no corresponding template in the target language, but there was a template with the same name, it was used for adapting the template to the translation. This was wrong and sometimes completely unrelated templates were adapted, creating confusing content. This will not happen any longer, and only templates that are directly linked using an interlanguage link in Wikidata will be used for adaptation. (bug report)

Some pages were published with HTML tags with ContentTranslation-specific attributes such as “data-cx-draft”, “cx-segment”, “cx-link” and others. They are unnecessary in articles, and had to be removed manually by editors. This was fixed and is not supposed to happen any longer. (bug report)

Adapting references of some kinds was generating errors, and it made it impossible to publish a translation. This was fixed. (code change)

Some other things we worked on recently:

  • All messages generated by AbuseFilter were shown while writing a translation. This included some messages that don’t affect translation publishing, and this was very confusing. Now only warnings that affect page publishing are shown. (bug report)
  • Some users were seeing too many gray interlanguage links that was too long to be useful. Its length is now limited to three items. (bug report)
  • When support for a new machine translation engine is added to a language pair, it will be shown as a tip in the Automatic translation card in the sidebar. (task description)
  • Translation from namespaces other than the article namespace was sometimes failing when the namespace name was translated in the other wiki. In particular, this affected the Medical Translations Projects. This is now fixed. (bug report)
  • A pop-up window that invites users to create an article in Content Translation was shown when creating user pages using the Visual Editor. Content Translation is not intended for user pages, so we no longer show this pop-up on user pages. (bug report)
  • Some language codes, most notably Norwegian, were handled incorrectly because of inconsistencies in the actual language codes and which domain code the Wikipedia uses. We now normalize language codes. (bug report)
  • Using the “Clear paragraph” button could generate errors that prevent publishing. This was fixed. (bug report)
  • Paragraph-level parallel corpora are now fully accessible through an API. We are also preparing to make dumps of parallel corpora available for download. This should be useful to all machine translation developers and researchers.
  • The gray interlanguage links that suggest translation to a different language were not shown in Internet Explorer. This was fixed. (bug report)

March 3 CX Update: Personal Statistics, AbuseFilter Handling, Less Unnecessary HTML, and More

A bunch of new features and many bug fixes in ContentTranslation were deployed today!

  • We noticed that AbuseFilter was preventing the publishing of dozens of translations every day. This very often happened in translations that were made in good faith: for example, community-defined filters in Wikipedias in some languages disallow linking to certain websites, which are allowed in the Wikipedia from the article is being translated, and when a user would copy this link to the translation, the publishing of the whole article was blocked. From now, the Content Translation interface shows a warning that such a thing may happen and emphasizes the paragraphs that includes the problem that the AbuseFilter is complaining about. (bug report)
  • Simple personal statistics about translations made by the user are shown on the dashboard. (bug report)
  • Pressing Enter in a translation section was inserting unnecessary HTML tags (like <div>) in some browsers. This was fixed. (bug report)
  • The links tool is not shown any longer in headings. It’s technically possible to put links in wiki page headings, but it’s rare and it was creating issues, so now it’s prevented. (bug report)
  • Links are now correctly adapted to the Belarusian Taraškievica Wikipedia. This was failing because of the migration from the be-x-old language code to be-tarask. (bug report)
  • Gray interlanguage links to unnecessary language variants were appearing in the sidebar. This is now fixed. (bug report)
  • There is now a link from the ContentTranslation extension description on Special:Version directly to Special:ContentTranslation. (code change)

February 14 CX Update: A New Way to Enable the Feature, Machine Translation for Persian, and Other Fixes

Only a small update this time, as we prepare to major changes in handling AbuseFilter and saving translations.

  • A user who hasn’t enabled the Content Translation beta feature will now be able to enable it directly from the Special:ContentTranslation page without having to go through the beta preferences. This should make it easier, for example, to send direct links to the special page directly to your friends without long instructions on how to enable the beta feature. (bug report)
  • The notifications from Content Translation that appear at the top of the screens were rephrased to conform to the recently refreshed notifications format. Thanks to Stephane Bisson from the Collaboration team for doing this update. (bug report)
  • Under some conditions, sub-pages in non-main namespaces couldn’t be loaded for translation. This is now fixed. (bug report)
  • Machine translation using Yandex was enabled for the Persian language. (bug report)

February 8 CX Update: Fixed Infinite Loops, More Machine Translation Support, and Improved Suggestions

First of all, congratulations to all Content Translation users: There are now 50,000 published articles! In the Wikimedia Blog you can read more on that, along with a round-up of Content Translation’s first year.

Because of some technical issues, scheduled deployments of new features were again delayed for a few weeks. On February 4th they were finally resumed, and here are the most important updates:

  • If a user started a translation, deleted it, and then started it again, the translation interface would go into an “infinite loop” of loading, and become unusable. This is now fixed. (bug report)
  • Featured articles are now shown as suggestions only if there are no other useful suggestions to show. (bug report)
  • The link from the dashboard to the tool that shows articles that don’t exist in your language is removed, on the premise that the integrated suggestions are more useful.
  • Machine translation using Yandex is now available for Albanian, Armenian, Bashkir, Polish and Uzbek.

January 15 CX Update: Personalized Suggestions, More Machine Translations, and Other Fixes

Happy new year, happy birthday Wikipedia, and happy birthday ContentTranslation, which was deployed to the first eight languages a year ago!

ContentTranslation updates are back after a delay, during which there were no usual scheduled software deployments because of year-end holidays, fundraising and absences.

The most important recently-released feature is Personalized Suggestions. The suggestions tab now shows automatically selected suggestions of articles to translate according to the user’s editing history. This feature was developed together with Leila Zia, Ellery Wulczyn and Robert West from the Wikimedia Foundation’s Research team, as well as Jure Leskovec, Robert’s advisor at CS faculty at Stanford.

Translation using the Apertium engine from Hindi to Urdu is now enabled by default. Translation using Yandex was enabled from English, Ukrainian and Belarusian into Russian. More languages may be added soon.

Failure to publish a translation because of AbuseFilter is now shown more clearly: the AbuseFilter message is displayed to the translator, so it will be much easier to fix it. We are researching other common publishing errors daily in order to get them fixed, too.

The colors of the alerts and the notifications at the top of the translation interface were updated, and now they are in shades of red for errors and shades of green for positive notifications.

Finally, a bug which was showing incorrect data for the last week of the year in the statistics chart was fixed.

The team is now working on improving the suggestions system further, monitoring and fixing errors at restoring and publishing translations, improving performance, and upgrading the translation storage in a way that will allow machine translation developers to improve their translation engines (the “Parallel Corpora” feature). Expect more details about this in future posts.