June 24 CX Update: Cleaner wiki syntax, better AbuseFilter support, and more improvements

Welcome back to CX updates!

For some time the development team took a break from developing Content Translation frontend features to focus on some background fixes and on other projects that were on the back-burner. Now we are back to making major updates to our article translation platform.

The areas on which we focus at the moment and for the next couple of months are making the wiki syntax of the published pages cleaner and easier to maintain after the first version of the translated article is created, and making template and reference adaptation more stable. There is much to do there, but here are some changes were already deployed:

If there was no corresponding template in the target language, but there was a template with the same name, it was used for adapting the template to the translation. This was wrong and sometimes completely unrelated templates were adapted, creating confusing content. This will not happen any longer, and only templates that are directly linked using an interlanguage link in Wikidata will be used for adaptation. (bug report)

Some pages were published with HTML tags with ContentTranslation-specific attributes such as “data-cx-draft”, “cx-segment”, “cx-link” and others. They are unnecessary in articles, and had to be removed manually by editors. This was fixed and is not supposed to happen any longer. (bug report)

Adapting references of some kinds was generating errors, and it made it impossible to publish a translation. This was fixed. (code change)

Some other things we worked on recently:

  • All messages generated by AbuseFilter were shown while writing a translation. This included some messages that don’t affect translation publishing, and this was very confusing. Now only warnings that affect page publishing are shown. (bug report)
  • Some users were seeing too many gray interlanguage links that was too long to be useful. Its length is now limited to three items. (bug report)
  • When support for a new machine translation engine is added to a language pair, it will be shown as a tip in the Automatic translation card in the sidebar. (task description)
  • Translation from namespaces other than the article namespace was sometimes failing when the namespace name was translated in the other wiki. In particular, this affected the Medical Translations Projects. This is now fixed. (bug report)
  • A pop-up window that invites users to create an article in Content Translation was shown when creating user pages using the Visual Editor. Content Translation is not intended for user pages, so we no longer show this pop-up on user pages. (bug report)
  • Some language codes, most notably Norwegian, were handled incorrectly because of inconsistencies in the actual language codes and which domain code the Wikipedia uses. We now normalize language codes. (bug report)
  • Using the “Clear paragraph” button could generate errors that prevent publishing. This was fixed. (bug report)
  • Paragraph-level parallel corpora are now fully accessible through an API. We are also preparing to make dumps of parallel corpora available for download. This should be useful to all machine translation developers and researchers.
  • The gray interlanguage links that suggest translation to a different language were not shown in Internet Explorer. This was fixed. (bug report)

March 3 CX Update: Personal Statistics, AbuseFilter Handling, Less Unnecessary HTML, and More

A bunch of new features and many bug fixes in ContentTranslation were deployed today!

  • We noticed that AbuseFilter was preventing the publishing of dozens of translations every day. This very often happened in translations that were made in good faith: for example, community-defined filters in Wikipedias in some languages disallow linking to certain websites, which are allowed in the Wikipedia from the article is being translated, and when a user would copy this link to the translation, the publishing of the whole article was blocked. From now, the Content Translation interface shows a warning that such a thing may happen and emphasizes the paragraphs that includes the problem that the AbuseFilter is complaining about. (bug report)
  • Simple personal statistics about translations made by the user are shown on the dashboard. (bug report)
  • Pressing Enter in a translation section was inserting unnecessary HTML tags (like <div>) in some browsers. This was fixed. (bug report)
  • The links tool is not shown any longer in headings. It’s technically possible to put links in wiki page headings, but it’s rare and it was creating issues, so now it’s prevented. (bug report)
  • Links are now correctly adapted to the Belarusian Taraškievica Wikipedia. This was failing because of the migration from the be-x-old language code to be-tarask. (bug report)
  • Gray interlanguage links to unnecessary language variants were appearing in the sidebar. This is now fixed. (bug report)
  • There is now a link from the ContentTranslation extension description on Special:Version directly to Special:ContentTranslation. (code change)

February 14 CX Update: A New Way to Enable the Feature, Machine Translation for Persian, and Other Fixes

Only a small update this time, as we prepare to major changes in handling AbuseFilter and saving translations.

  • A user who hasn’t enabled the Content Translation beta feature will now be able to enable it directly from the Special:ContentTranslation page without having to go through the beta preferences. This should make it easier, for example, to send direct links to the special page directly to your friends without long instructions on how to enable the beta feature. (bug report)
  • The notifications from Content Translation that appear at the top of the screens were rephrased to conform to the recently refreshed notifications format. Thanks to Stephane Bisson from the Collaboration team for doing this update. (bug report)
  • Under some conditions, sub-pages in non-main namespaces couldn’t be loaded for translation. This is now fixed. (bug report)
  • Machine translation using Yandex was enabled for the Persian language. (bug report)

February 8 CX Update: Fixed Infinite Loops, More Machine Translation Support, and Improved Suggestions

First of all, congratulations to all Content Translation users: There are now 50,000 published articles! In the Wikimedia Blog you can read more on that, along with a round-up of Content Translation’s first year.

Because of some technical issues, scheduled deployments of new features were again delayed for a few weeks. On February 4th they were finally resumed, and here are the most important updates:

  • If a user started a translation, deleted it, and then started it again, the translation interface would go into an “infinite loop” of loading, and become unusable. This is now fixed. (bug report)
  • Featured articles are now shown as suggestions only if there are no other useful suggestions to show. (bug report)
  • The link from the dashboard to the tool that shows articles that don’t exist in your language is removed, on the premise that the integrated suggestions are more useful.
  • Machine translation using Yandex is now available for Albanian, Armenian, Bashkir, Polish and Uzbek.

January 15 CX Update: Personalized Suggestions, More Machine Translations, and Other Fixes

Happy new year, happy birthday Wikipedia, and happy birthday ContentTranslation, which was deployed to the first eight languages a year ago!

ContentTranslation updates are back after a delay, during which there were no usual scheduled software deployments because of year-end holidays, fundraising and absences.

The most important recently-released feature is Personalized Suggestions. The suggestions tab now shows automatically selected suggestions of articles to translate according to the user’s editing history. This feature was developed together with Leila Zia, Ellery Wulczyn and Robert West from the Wikimedia Foundation’s Research team, as well as Jure Leskovec, Robert’s advisor at CS faculty at Stanford.

Translation using the Apertium engine from Hindi to Urdu is now enabled by default. Translation using Yandex was enabled from English, Ukrainian and Belarusian into Russian. More languages may be added soon.

Failure to publish a translation because of AbuseFilter is now shown more clearly: the AbuseFilter message is displayed to the translator, so it will be much easier to fix it. We are researching other common publishing errors daily in order to get them fixed, too.

The colors of the alerts and the notifications at the top of the translation interface were updated, and now they are in shades of red for errors and shades of green for positive notifications.

Finally, a bug which was showing incorrect data for the last week of the year in the statistics chart was fixed.

The team is now working on improving the suggestions system further, monitoring and fixing errors at restoring and publishing translations, improving performance, and upgrading the translation storage in a way that will allow machine translation developers to improve their translation engines (the “Parallel Corpora” feature). Expect more details about this in future posts.

November 1 CX Update: Starred Suggestions and Translation Interface Bug Fixes

After a delay in the deployment of new features to Wikimedia sites for the last couple of weeks, this week we are back to normal deployment schedule, and we have several significant updates.

The major new feature is the ability to mark suggested articles as something that you want to translate later by “starring” them (task description), as well as discarding suggestions in which you are not interested. This update is another step for making sophisticated and personalized lists of article to translate, which are designed to help translators be more efficient in completing the coverage of encyclopedic topics in their languages. For more details about the state of the Translation Suggestions, see the recently published Wikimedia Blog post: Article suggestions—a new feature for Content Translation.

Other than that, these bug fixes were deployed:

  • Images without caption were not properly published, which was confusing, because it appeared in the translation view, but not in the article. This is now fixed. (bug report)
  • Adding a red link was not, by itself, triggering auto-saving. Now it does. (bug report)
  • Long words in the titles of the columns in the translation interface were shown only partially. Now they are wrapped so the whole title would be seen. (bug report)

October 23 CX Update: Machine Translation and Suggestions in More Languages, and Other Fixes

Last week there was no automatic software deployment to Wikipedia sites for technical reasons, so there are relatively few CX updates this time. The usual update schedule is supposed to resume next week.

The following fixes were deployed recently:

  • While a translation in progress was being loaded, the translation column was empty. This was confusing. Now a loading indicator is shown at the top. (bug report)
  • The automatic selection of languages in which suggestions are shown is improved. (bug report)
  • Suggestions are now enabled in the following languages:
    • from English to all languages
    • from German, Hebrew, Italian, Polish, Swedish, Vietnamese, Finnish and Dutch to English
    • from Simple English to Gujarati and Hindi
    • from Swedish to Finnish
    • from Swedish and Norwegian Bokmål to Norwegian Nynorsk
  • New language pairs are added to Apertium machine translation:
    • Arabic to Maltese
    • Maltese to Arabic
    • Spanish to Italian
    • Italian to Spanish
    • Icelandic to Swedish
    • Swedish to Icelandic
    • Romanian to Spanish