Frequently Asked Questions – and their Answers
The data will go to the EC (DG Translate) to support the improvement of the machine translation system MT@EC.
Supporting your own language is supporting Europe and vice versa. Only with your help and with the provision of your language resources, CEF AT can be made fit to your needs. Within the CEF programme, CEF AT is available for free to public administrations in all EU member states and CEF affiliated countries (Iceland and Norway). So for your data, you receive a better service.
If translations are done externally, typically, you as contractor, can ask for the translation memories of your data. It is best to negotiate with the language service provider and make sure, that you obtain the translation memories and/or additional data relating to your translation. This is important to do in any case, because for subsequent translations, you may be able to negotiate a better price. In addition, there are several benefits associated with language data you hold (see point 4.2 below).
Most data held by the public sector actually is public data. Many ministries have – at minimum – various types of information available online for their citizens (e.g. news, legal texts, official communications, interviews, brochures, background information etc.) and this information is also typically provided at least in one foreign language. In Germany, for instance, the website of the national government is always at least tri-lingual, all information is provided in German, English, and French. Even more, in addition to several thousands of tri-lingual news articles published by the federal government and ministries, there are more than 4.000 tri-lingual official brochures created by the different German federal ministries on all topics relevant to these ministries (http://www.bundesregierung.de/SiteGlobals/Forms/Webs/Breg/Suche/EN/Infomaterial/Solr_Infomaterial_Startseite_Formular.html?nn=771722). The translation memories and original files of such data can all be shared without any worries.
You can upload data to the ELRC repository in three simple steps:
2. Provide a basic description for the language resource (title, short description, language(s))
3. Upload the .zip file
MT@EC is the current tool for machine translation used and provided by the EC. It exists already since 26th of June 2013. It has a web user interface in 24 languages for a human-to-machine use case, or can be used as a web service in a machine-to-machine scenario. It uses a highly secured protocol (sTESTA) coupled with the European identification ECAS which guarantees confidentiality of data. MT@EC can be used by any Member State administration free of charge. More detailed information is available online at http://ec.europa.eu/dgs/translation/translationresources/machine_transla...
CEF AT (Automated Translation platform of the Connecting Europe Facility CEF) is part of CEF Digital to provide automatic translation services with the goal of making digital services accessible to anyone everywhere from whatever language into the user’s language. CEF AT should empower in particular European public online services such as Europeana (http://www.europeana.eu/), the Open Data Portal (https://open-data.europa.eu/en/data/), the Online Dispute Resolution Platform etc. More detailed information on CEF is available online at http://ec.europa.eu/digital-agenda/en/connecting-europe-facility
MT@EC can be used by any Member State administration free of charge. It can be accessed as follows:
- Staff working for EU institutions or agencies can use MT@EC with their ECAS account credentials.
Staff working for a public administration in an EU country should follow these steps:
- Sign up for your personal ECAS account and password (using only your professional email address).
- Send an email to DGT-MT@ec.europa.eu asking to have an ECAS account. Indicate what your job involves and which public administrative body you work for. Don't forget to include your full signature.
- DGT will create your MT@EC account and notify you.
Machine translation / MT@EC is substantially helping to make the translation process more productive to help you translate more efficiently. Translators at the EC are responsible for translating content into all official EU languages. More than 2 000 translators are currently employed by DG Translation, while another 5 000 work at EU institutions. In 2014, they translated more than 2.3M pages.
MT@EC is used daily for French, Spanish, Portuguese and Italian to produce initial translations that are then post edited in a very efficient way. For some other languages (e.g. German) the quality of translation is not yet high enough for heavy usage. Even though in the last year, significant progress could be achieved here through domain specific engines. In particular for reports and texts (e.g. from economics), MT@EC is used successfully to automatically translate these documents. In other cases, the tool is helpful for rapidly scanning long texts in a foreign language and deciding, e.g. which passages to focus on for human translation.
Overall, the quality of translation is directly linked to the availability of good data in this language: If the data for MT is good, then the MT system will be good as well. This is why we are here: To improve the MT@EC tool with your data so it is of better use for you.
Not necessarily. Machine translation can actually provide a good basis for learning languages.
Initially, it can be used to bridge the gap for people who are not able to speak a particular language, until they have acquired some initial language skills. For instance, at university level, machine translation is actually used to cover the original language gap of foreign students. Lectures are translated automatically and simultaneously into English, so that foreign students can get an idea what they are about and catch up for further language integration.
For users who already have some proficiency in a foreign language, machine translation can be used to improve their language skills. E.g. when translating books or texts, users get some direct experience with the foreign language, they see possibly new sentence constructions or how things they did not know before could be translated. Like this, they learn by practice and by experience.
It is true that certain languages are more difficult for current MT systems to handle, because of their free morphology or their free constituent order. However, other methods are being explored. For example, in the Hungarian workshop the MT expert presented a new model (based on neural networks) that seems more suitable for languages like Hungarian, Finnish or Esthonian. Moreover, the European Commission funds several actions (see e.g. http://www.qt21.eu/) to investigate and develop MT solutions for languages which currently receive only sub-optimal MT support.
What needs to be stressed is that, no matter what the methodology is, huge amounts of (preferably) parallel resources are needed for the systems implementation, since these systems rely on machine learning; and this need for data is primarily being addressed in the workshop objectives.
For a start, if you translate internally, human translation can be done more easily and faster if you share and can built on previous translations from the same domain. Moreover, if you outsource your translations, you can at least negotiate with your language service providers a better price in return for sharing data. Last but not least, keeping hold of your data and managing them adequately can have several additional benefits (see point 4.2 below).
Especially in the public sector there is a great diversity in the organization (or absence of organization) of the management of translations and corresponding data flows: from digitized workflows with term lists and storage of translation memories up to almost purely paper-based workflows.
From an organizational point, much benefit can arise even from small changes in the organizational dealing with language data. Suggested actions that can be taken without major effort include:
- Analysis of all phases of data development
Based on this, creation of a “data management plan” (DMP), even very basic one:
- Which data is important?
- Where is it stored?
- Can it be further processed?
- Document all relevant data
- If possible, use the web as additional publication channel and reap benefits of linked data (see http://www.w3.org/DesignIssues/LinkedData.html)
(Check presentation “Best practice for the future: Capitalize on your valuable data”)