BLEU Performance

|

BLEU has frequently been reported as correlating well with human judgement,[1][2][3] and remains a benchmark for the assessment of any new evaluation metric. There are however a number of criticisms that have been voiced. It has been noted that although … Read More

Automatic Language Processing Advisory Committee (ALPAC)

|

One of the constituent parts of the ALPAC report was a study comparing different levels of human translation with machine translation output, using human subjects as judges. The human judges were specially trained for the purpose. The evaluation study compared … Read More

Round-trip translation

|

Although this may intuitively be a good method of evaluation, it has been shown that round-trip translation is a, “poor predictor of quality”. The reason why it is such a poor predictor of quality is reasonably intuitive. When a round-trip … Read More

Evaluation of machine translation

|

Various methods for the evaluation for machine translation have been employed. This article will focus on the evaluation of the output of machine translation, rather than on performance or usability evaluation. Before covering the large scale studies, a brief comment … Read More