Evaluation of Machine Translation

Machine translation software can attain different levels of quality depending on the applied linguistic procedure, its degree of sophistication, and the size of incorporated dictionaries. Development never stops, and the quality of machine translation software is constantly improving.

The MetaMorpho system contains a thorough linguistic description of the source language, and attempts to identify the entire structure of each sentence. If it fails, it uses the recognized parts to compile the translation. As this is quite often the case, translations are imperfect.

Whom is it good for?

A translation of such quality is primarily useful to those not speaking the language at all, since this way they can get an insight into foreign texts. It is also useful to those familiar with the language, if they want to take a quick glance of a text, or skim through several pages, as machine translation only takes approx. 15 seconds to translate a typewritten page. Our aim is to extend the grammatical and lexical database until reaching a level where an experienced translator is able to produce the target text in less time by post-correcting machine-translated text.


The quality of machine translation software can be evaluated using several methods. These evaluation methods are based on comparison. Comparisons can be made between two machine translation programs, or between human-translated and machine-translated texts. Below are the results of two tests. In the first one, translations done by MetaMorpho are compared with translations done by human translators. Then this result is compared with the results of other (English to German) translation software against human translations. In the second test, humans evaluate several English to Hungarian translation programs. Note that the comparison itself and not the given mark is to be considered more reliable.

Comparison with English to German translation software

The most widely used procedure is the so-called BLEU test. The test is essentially a comparison of a machine-translated text with three translations of the same text done by humans. The BLEU index can be calculated from the number of word sequences of different length occurring in translations done by humans and machines. The method can even be used for comparing the quality of programs translating into different languages, provided that human-made translations for the various languages are also available. For this test, we prepared the German translation of the English source text, since currently software working in this language pair provide the best quality. The results prove that the quality attained by MetaMorpho is nearly on par with that of the best known programs; moreover, our program produced the longest matching word sequence.
Comparison with English to Hungarian translation software

English to Hungarian translation programs can be compared directly. For the comparison, we used a test sequence prepared for English to German translation. The 1 to 5 grading system was used for evaluation, as it can be easily applied for rating machine translation software. The test results show that MetaMorpho provides the best English to Hungarian translations when compared with other products commercially available.
Sample translation in MorphoWord:

