The translation process (in the human sense) can be divided into three phases:
- understanding: assimilation of the meaning conveyed by a text, the mean of a writer …;
- deverbalisation: forgetting words and conservation of meaning; “Operation by which a subject is aware of the meaning of a message by losing awareness of words and phrases that gave it body”;
- re-expression: reformulation of the meaning into the target language.
In computer terms, understanding becomes the analysis, deverbalisation becomes the transfer and re-expression becomes generation. These process steps are modeled in the triangle of Vauquois. This model is useful because, in order to go from the source to the target, there are several possible paths that make up the different approaches proposed to date. The higher the level of conceptualization, the shorter the path of the transfer. There are four main options:
- Direct transfer: no conceptualization, all translation based on the transfer. Translation by example and statistical translation work at this level. Translation is seen as a decoding process.
- Syntactic transfer: the transfer at the syntactic level. Generally, its representation is the syntactic tree. The analysis produces a syntactic representation for the source language. The transfer is to produce a syntactic representation for the target language from it. Finally generation of product phrase in the target language. Automatic translation rule-based is representative of this category. The rules allow different transformations.
- Semantic transfer: transfer at semantic level. This path uses humans. The semantics of the language representation models are described by pragmatics. The semantics can be described by an ontology. There are only few approaches to machine translation representative semantic transfer.
- Interlanguage: This level eliminates the need to transfer. The interlanguage becomes universal. And only remain the analysis and generation processes. Interlanguage is also referred to as the pivot language. DLT is an unfinished attempt of this approach. The UNL language is also an example of computer formal language for representing the meaning of a statement. The approach is appealing because the effort is for a given language to produce an analyzer and generator for interlanguage. One then has all the translations to and from languages also possessing the analyzer and generator. This approach is difficult and has not been successful on a large scale.
Currently, the translation engines are mainly by rules or statistical. A hybrid called path emerges. Systran, Google Translate, and Bing Translator are using hybrid approaches.
The requirements depend on the intended approach: rule-based translation (word-for-word, transfer, pivot), translated by example, statistical translation.
Automatic translation rule-based requires:
- dictionary entries
- linguistic rules
Translation by example and statistical translation require:
- Translation memory (set of translated texts)
It may, in addition, require linguistic analysis tools such as:
- identifiers (tokens)
- morphosyntactic tagger
- possibly: chunker, parser