Evaluation of machine translation

Machine Translation

Various methods for the evaluation for machine translation have been employed. This article will focus on the evaluation of the output of machine translation, rather than on performance or usability evaluation.

Before covering the large scale studies, a brief comment will be made on one of the more pervasive evaluation techniques, that of round-trip translation (or “back translation”). One of the typical ways for lay people to assess the quality of a machine translation engine is through translating from a source language into a target language, and then back to the source language using the same engine.


  • Banerjee, S. and Lavie, A. (2005) “METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments” in Proceedings of Workshop on Intrinsic and Extrinsic Evaluation Measures for MT and/or Summarization at the 43rd Annual Meeting of the Association of Computational Linguistics (ACL-2005), Ann Arbor, Michigan, June 2005
  • Church, K. and Hovy, E. (1993) “Good Applications for Crummy Machine Translation”. Machine Translation, 8 pp. 239–258
  • Coughlin, D. (2003) “Correlating Automated and Human Assessments of Machine Translation Quality” in MT Summit IX, New Orleans, USA pp. 23–27
  • Doddington, G. (2002) “Automatic evaluation of machine translation quality using n-gram cooccurrence statistics”. Proceedings of the Human Language Technology Conference (HLT), San Diego, CA pp.128–132
  • Gaspari, F. (2006) “Look Who’s Translating. Impersonations, Chinese Whispers and Fun with Machine Translation on the Internet” in Proceedings of the 11th Annual Conference of the European Association of Machine Translation
  • Lavie, A., Sagae, K. and Jayaraman, S. (2004) “The Significance of Recall in Automatic Metrics for MT Evaluation” in Proceedings of AMTA 2004, Washington DC. September 2004
  • Papineni, K., Roukos, S., Ward, T., and Zhu, W. J. (2002). “BLEU: a method for automatic evaluation of machine translation” in ACL-2002: 40th Annual meeting of the Association for Computational Linguistics pp. 311–318
  • Somers, H. (2005) “Round-trip Translation: What Is It Good For?
  • Somers, H., Gaspari, F. and Ana Niño (2006) “Detecting Inappropriate Use of Free Online Machine Translation by Language Students – A Special Case of Plagiarism Detection”. Proceedings of the 11th Annual Conference of the European Association of Machine Translation, Oslo University (Norway) pp. 41–48
  • ALPAC (1966) “Languages and machines: computers in translation and linguistics”. A report by the Automatic Language Processing Advisory Committee, Division of Behavioral Sciences, National Academy of Sciences, National Research Council. Washington, D.C.: National Academy of Sciences, National Research Council, 1966. (Publication 1416.)
  • Turian, J., Shen, L. and Melamed, I. D. (2003) “Evaluation of Machine Translation and its Evaluation”. Proceedings of the MT Summit IX, New Orleans, USA, 2003 pp. 386–393
  • White, J., O’Connell, T. and O’Mara, F. (1994) “The ARPA MT Evaluation Methodologies: Evolution, Lessons, and Future Approaches”. Proceedings of the 1st Conference of the Association for Machine Translation in the Americas. Columbia, MD pp. 193–205
  • White, J. (1995) “Approaches to Black Box MT Evaluation”. Proceedings of MT Summit V

Further reading

This guide is licensed under the GNU Free Documentation License. It uses material from the Wikipedia.

Video: Analysing Machine Translation

Leave a Reply

Your email address will not be published.