Parallel text


Parallel text, or multilingual concordance, is a computer tool for managing parallel corpus. By metonymy, the multilingual concordance also designates the corpus.

A parallel corpus is a collection of texts by groups such as two by two in each group, these texts are mutual translations. The European acquis communautaire is an example where each group has a text to each of the official languages of the European Union. All the groups designating the laws governing the European community.

Many are bilingual corpus. Hansard corpus of the Canadian Parliament (English↔French) is a known example because it was one of the first to be digitized and made available for linguistics researchers. In these cases, the tools are bilingual concordances.

These corpora are more numerous and accessible. Their origin is:

  • the obligation to publish a number of official languages for international organizations

  • states with several official languages

  • newspaper publications in several languages

  • book translation

  • software documentation

Digital parallel corpora are an important source for all machine translation tools. From this corpus:

  • it can directly operate the documents, searching for a word or phrase and view its occurrence in the source text and the target text line on the same passage; the advantage is to show the expression in its full context

  • segmenting sentences documents and aligning sentences, one can produce a translation memory or training corpus for machine translation

  • analyzing the co-occurrences of terms in documents between different languages and thus build multilingual lexicon.

Translation from Wikipedia

Leave a Reply

Your email address will not be published. Required fields are marked *