MetaMorpho TermX is an English and Hungarian terminology extractor program. Possible fields of application:
- Translators, especially when they are working in teams, create glossaries so that terminology is used consistently. Ideally, this is done before they start translating the text. This tool helps translators mark frequent words and phrases as terminology and also give translations. Automatic translation works both from Hungarian to English and from English to Hungarian.
- Glossaries also can be helpful for those who want to get the gist of the text without reading it. A list of frequent words can be gained in seconds and gives a comprehensive overview of the contents.
- This tool can also help to create the index of a book.
- Terminology extraction is also helpful in machine translation. Defining terminology and its correct translation before automatic translation can significantly improve the quality of the output.
The program incorporates several kinds of linguistic knowledge. Most frequent words are excluded by stopword lists. Besides the built-in language dependent stopword lists, users can define their own stopwords, so they can avoid finding unwanted terms repeatedly when they run the extractor again. Words are analyzed and stemmed by morphological analyzers. This ensures that word occurrences are counted correctly and differences in inflection are ignored. Phrases are looked for using syntactic information. Finally, MetaMorpho TermX contains built-in machine translation software that can automatically translate the terminology. The machine translated glossary might need postediting, but in most cases the translation is acceptable.
The program offers many filters: users can set the minimum frequency of words; the number of words in the phrases. There is a filter for unknown words or proper names, and also for part of speech function of words.The list can be ordered alphabetically and by frequency, and the direction of the ordering can also be changed easily.
The program recognizes the most important file formats: DOC, DOCX, XLS, RTF, HTML, PDF, XML and TXT. In order to be able to handle most of these formats (except TXT and PDF) the program requires Microsoft Office to be installed on your computer.
Lists can be exported in TSV and TMX formats. TSV (also known as CSV) is a simple plain text table, where the fields are separated by tabulator characters. This format is highly portable, tab separated tables can be opened practically in any spreadsheet, e.g. Microsoft Excel. The TMX format may be familiar to translators. Translator Memory eXchange format is used to export and import data between different TM applications.
This program is a novelty. Efficient terminology extraction without linguistic background is impossible. TermX uses stopword lists, morphology, syntactic and translation rules. Its intuitive user interface is easy to use. TermX is reasonably priced, as it is not an add-on of an expensive translation memory tool.
Test
Download the free online trial from here. The program offers full functionality for a 7-day evaluation period, and the number of characters is also limited.
System
IBM compatible PC, 512 MB RAM, 500 MB HDD; Windows 2000, XP, Vista or Windows 7 operating system.
Extraction of Microsoft Office and also HTML and XML formats requires Microsoft Office 2000, XP or 2003; DOCX format requires Microsoft Office 2007.
Activation
The program requires activation before use. Activation is automatic on computers with internet connection, in other cases it happens through electronic mail.
Screenshots
An extraction result:
Glossary compilation is supported by machine translation:
Filters:
Selection of the right meaning is supported by concordances: