What are Language Resources?

The term language resources refers to sets of language data and descriptions in machine readable form, including written and spoken corpora, grammars, and terminology databases. Language resources can be used to build, improve, or evaluate natural language systems such as machine translation engines.

To develop the automated translation systems for the CEF Automated Translation platform, the ELRC initiative aims to gather language resources in all official languages of EU. The initiative seeks large general-domain corpora, whether monolingual (e.g. official corpora of national languages) or multilingual, as well as domain-specific language resources in the fields of consumer rights, culture, legal domain, social security, health, public procurement, etc.

Discover Automated Translation

Automated translation, also known as machine translation, allows users to instantly translate words, sentences, full documents, and websites from one language to another. Translations are performed at speeds of up to one sentence per second – infinitely faster than any human can ever translate. Though automated translation does not provide the same level of quality and accuracy as human translation, they provide quick insight into the general meaning or “gist” of a text, thus helping us to cross language barriers between nations and facilitate multilingual communication.

To ensure quick translations of texts, automated translation systems are trained on huge amounts of existing human translations. Using sophisticated algorithms, the automated translation systems then mine this parallel data to produce instant translations of texts.

Automated translation systems can be further improved by adding industry-specific terminology, linguistic rules, monolingual data, and other language resources. This effectively customizes, or tailors, a system to a particular domain or industry. 

How to contribute?

Any contributor may submit Language Resources to us at any exploitation stage: simple internet links to websites (Sources), raw data, or fully-packaged data (Language Resources).

