News

Image: Fabien Barral via Unsplash
Fri 22 Sep 2017

The challenge

The Slovenian Social Science Data Archives (ADP) are keen supporters of CESSDA and the cross-national harmonization of archives. We firmly believe that translating the ELSST thesaurus is an important step towards achieving this goal. However, as a small team, we lack the necessary resources to fully engage in the translation project.

In the past few years ADP has been liaising with the Slovenian Common Language Resources and Technology Infrastructure (CLARIN), sharing our knowledge and experience. For ADP, using digital language technologies offered a promising way to reduce the time and effort of the translation process

Automatic translation

Translating ELSST into Slovenian was carried out as a joint project consisting of two steps: automatic translation undertaken by a team of language technology experts, followed by manual editing of the translation by ADP with the support of terminology experts from the relevant subject domains.

In the first phase, the expert team selected and prepared several translation sources. The linguistic expert chose more general translation resources while ADP proposed subject dictionaries. Before translating, all terms and translations from the various translation sources were converted to upper case (as required by ELSST) and all plural-form ELSST source language terms were changed to singular to match the form in the translation sources.

Next, each whole English term was looked up in every translation source, and the results collated. Often the same translation was found in multiple sources. If no translations of the whole term were found, translations were constructed. English terms were subdivided and each subpart was translated independently. The translations of the subparts were then combined to produce a final Slovenian translation of the source term.

Manual editing

In the second phase, ADP team performed a manual check of the automatic translations, verifying and editing them if needed. This phase was subdivided into five tasks:

  • Choosing the best option among the translations produced from the various sources (as a result of the automatic translation)
  • Checking and highlighting the terms with potentially problematic translations (e.g. no appropriate translation, multiple options)
  • Checking the translations where issues were detected and seeking advice from subject experts
  • Consulting the linguistic expert on Slovenian grammar rules
  • Confirming the final list of translations

This was the first time that ELSST translation has been undertaken using semi-automatic translation. We believe that this allowed us not only to produce appropriate Slovenian translations but also to reduce our workload.

Further information

A more thorough explanation of the process described above and the algorithms used is beyond the scope of this blog post. However, should you be interested in reading more we are happy to hear from you and provide you with additional information. You can contact us either by replying to this blog post or by sending an email to
arhiv.podatkov@fdv.uni-lj.si.

Sonja Bezjak and Irena Bolko

See also CESSDA ELSST New Release 21 September 2017