References

Please cite the following articles if you use the resources on this site: (click here for BibTeX sources in UTF-8 -- use \usepackage[utf8]{inputenc} in your LaTeX source)

Attila Novák (2008). Language resources for Uralic minority languages. In: Briony Williams, Mikel L Forcada, Kepa Sarasola (eds.) Proceedings of the SALTMIL Workshop at LREC 2008: Collaboration: interoperability between people in the creation of language resources for less-resourced languages. pp. 27–32.

Gábor Prószéky, Attila Novák (2005).Computational Morphologies for Small Uralic Languages. In: Arppe Antti, Lauri Carlson, Krister Linden, Jussi Piitulainen, Mickael Suominen, Martti Vainio, Hanna Westerlund, Anssi Yli-Jyrä (eds.) Inquiries into Word, Constraints and Contexts. (Festschrift in the Honour of Kimmo Koskenniemi on his 60th Birthday). Stanford (CA): CSLI Publications. pp. 150–157.

and, specifically, for Nganasan:

István Endrédy, László Fejes, Attila Novák, Beatrix Oszkó, Gábor Prószéky, Sándor Szeverényi, Zsuzsa Várnai, Beáta Wágner-Nagy (2010). Nganasan – Computational Resources of a Language on the Verge of Extinction. In: Kepa Sarasola, Francis M. Tyers, Mikel L. Forcada (eds.) Creation and Use of Basic Lexical Resources for Less-Resourced Languages: 7th SaLTMiL Workshop (LREC-2010). pp. 41–44.

for Komi:

Attila Novák (2004). Creating a Morphological Analyzer and Generator for the Komi language. In: Julie Carson-Berndsen (ed.) Proceedings of the SALTMIL Workshop at LREC 2004: Proceedings of the SALTMIL Workshop at LREC 2004. pp. 64–67.

for Mansi and Khanti:

Fejes László, Novák Attila (2010): Obi-ugor morfológiai elemzők és korpuszok. [Ob Ugric morphological analyzers and corpora] In: Tanács Attila, Vincze Veronika (eds.) VII. Magyar Számítógépes Nyelvészeti Konferencia [Seventh Hungarian Conference on Computational Linguistics]: MSZNY 2010. Szeged: Szegedi Tudományegyetem, 2010. pp. 284–291.

About the resources available on this site

The computational morphologies available on this site have been created in a series of research projects by Researchers of the Department of Finno-Ugric and Historical Linguistics of the Research Institute for Linguistics of the Hungarian Academy of Sciences and Attila Novák, a computational linguist working at MorphoLogic.

The projects, funded by Hungarian Scientific Research Fund (OTKA) and the National Research and Development Programme (NKFP), laying the foundations of the tools presented here have been the following:

OTKA 71707	Ob Ugric morphological analyzers and corpora
OTKA K 60807	Development of a morphological analyzer for Nganasan
OTKA T 048309	Linguistic databases for Permic languages
NKFP-5/135/01	A Complex Uralic Linguistic Database

The linguists participating in the creation of the individual analyzers and analyzed corpora have been:

Nganasan	Beáta Wagner-Nagy, Zsuzsa Várnai, Sándor Szeverényi
Komi	László Fejes
Udmurt	László Fejes
Northern Mansi (3 versions for 3 corpora: Chr. Vog., WT, VNGY)	László Fejes, Nóra Wenszky (translation of glosses and texts to English), Csilla Horváth, Attila Novák (digitization of stem lexicons, disambiguation of texts)
Synya Khanti	Eszter Ruttkay-Miklián, (stem database, disambiguation of analyses) László Fejes, (grammar, suffix database) Nóra Wenszky (translation of glosses and texts to English)
Kazym Khanti	Mária Sipos, (stem database, disambiguation of analyses) László Fejes, (grammar, suffix database) Nóra Wenszky (translation of glosses to English)

For the Finno-Ugric languages, Komi, Udmurt and Mansi, the Humor morphological analyzer engine of MorphoLogic is used. The Nganasan morphology was implemented using the Xerox xfst toolset.

The web interface was created by István Endrédy and Attila Novák and is hosted by MorphoLogic.

The analyzers

Komi-Zyryan

The analyzer for standard Komi-Zyryan with Cyrillic orthography is based on the Humor morphological analyzer engine of MorphoLogic. The lexicon was derived from the Коми-роч кывчукӧр (Komi-Russian Dictionary) by L. M. Beznosikova, Ye. A. Aybabina and R. I. Kosnyreva (2000, Коми небӧг лэдзанін / Коми книжное издательство, Syktyvkar). The analyzer was developed by Attila Novák (technical background) and László Fejes (language specialist).

Acknowledgements

We'd like to thank

G. V. Fedyuneva, Institute of language, literature and history of the Komi Sciences Center of the Uralic Division of the Russian Academy of Sciences for the electronic sources of the Komi-Russian Dictionary;
Nadezhda Manova and Ilya Mityushev for providing the first texts in an electronic format;
Nikolay Kuznetsov, Tartu University, for consultation.

Mansi (WT)

Analyzers for the Northern Mansi dialect are based on the Humor morphological analyzer engine of MorphoLogic. This version uses the transcription of the text collection Wogulische Texte mit einem Glossar (1976, Akadémiai Kiadó, Budapest) by Béla Kálmán. The lexicon is based on the vocabulary of the same book. Glosses for the stems are presented in English, German and Hungarian. The analyzer was developed by Attila Novák (technical background) and László Fejes (language specialist). The English translation of the glosses was added by Nóra Wenszky.

Acknowledgements

We'd like to thank

Katalin Sipőcz, Szeged University, for consultation;
Csilla Horváth, Szeged University, for digitization of the corpus and lexicon sources;
Nóra Wenszky for English glossing and translation of the corpus to English.

Mansi (Chr. Vog.)

Analyzers for the Northern Mansi dialect are based on the Humor morphological analyzer engine of MorphoLogic. This version uses the transcription of the text collection Chrestomathia Vogulica (1963, Tankönyvkiadó, Budapest) by Béla Kálmán. The lexicon is based on the vocabulary of the same book. Glosses for the stems are presented in English, German and Hungarian. The analyzer was developed by Attila Novák (technical background) and László Fejes (language specialist). The English translation of the glosses was added by Nóra Wenszky. Morphological annotation of the corpus has been disambiguated by Csilla Horváth and Attila Novák.

Acknowledgements

We'd like to thank

Katalin Sipőcz, Szeged University, for consultation;
Csilla Horváth, Szeged University, for digitization of the corpus and lexicon sources and disambiguation of morphological analyses;
Nóra Wenszky for English glossing and translation of the corpus to English.

Mansi (VNGY)

Analyzers for the Northern Mansi dialect are based on the Humor morphological analyzer engine of MorphoLogic. This version uses the transcription of the text collection Vogul Népköltési Gyűjtemény (1892-96, Magyar Tudományos Akadémia, Budapest) by Bernát Munkácsi. The lexicon is based on the vocabulary Wogulisches Wörterbuch (Gesammelt von Bernát Munkácsi, geordnet, bearbeitet und herausgeben von Béla Kálmán, Akadémiai Kiadó, Budapest, 1986), digitized by Attila Novák, corrected by Lászó Fejes. Glosses for the stems are presented in English, German and Hungarian. The analyzer was developed by Attila Novák (technical background) and László Fejes (language specialist). The English translation of the glosses was added by Nóra Wenszky.

Acknowledgements

We'd like to thank

Katalin Sipőcz, Szeged University, for consultation;
Nóra Wenszky for English glossing.

Kazym Khanti

The analyzer for the Kazym Khanti dialect is based on the Humor morphological analyzer engine of MorphoLogic. The stem lexicon was created by Mária Sipos. Glosses for the stems are presented in English and Hungarian. The analyzer was developed by Attila Novák (technical background) and László Fejes (language specialist). The English translation of the glosses was added by Nóra Wenszky. Morphological annotation of the corpus has been disambiguated by Mária Sipos.

Synya Khanti

The analyzer for the Synya Khanti dialect is based on the Humor morphological analyzer engine of MorphoLogic. The stem lexicon was created by Eszter Ruttkay-Miklián. Glosses for the stems are presented in English and Hungarian. The analyzer was developed by Attila Novák (technical background) and László Fejes (language specialist). The English translation of the glosses was added by Nóra Wenszky. Morphological annotation of the corpus was disambiguated by Eszter Ruttkay-Miklián. The corpus presented on the website was collected and transcribed by Eszter Ruttkay-Miklián.

Nganasan

The Nganasan analyzer uses a Latin based phonemic transcription (various transcription versions are available on the web interface of the tools, which are variants of the transciption used in the text collection Chrestomathia Nganasanica (2002, SZTE Finnugor Tanszék – MTA Nyelvtudományi Intézet, Szeged – Budapest), edited by Beáta Wagner-Nagy). The lexicon is based on the vocabulary of the same book and the following dictionary: N. T. Kosťerkina, A. Č. Momďe, T. Ju. Ždanova. Slovar’ nganasansko-russkij i russko-nganasanskij. Prosvesčen’ije, Sankt-Peťerburg, 2001.
Glosses are provided in Hungarian. The analyzer was developed by Attila Novák (grammar and implemetation of the computational morphology) and Beáta Wagner-Nagy, Zsuzsa Várnai and Sándor Szeverényi (language specialists, lexicon).

Acknowledgements

We'd like to thank

Valentin Gusev, Institute of Linguistics, Russian Academy of Sciences, for the assistance that he provided to us while working on the development of the morphology.

Udmurt

The analyzer for the standard Udmurt with Cyrillic orthography is based on the Humor morphological analyzer engine of MorphoLogic. The lexicon is derived from the Udmurt–magyar szótár (Udmurt–Hungarian Dictionary) by István Kozmács (2002, Savaria University Press, Szombathely). The meaning of the stems is given in Hungarian. The analyzer was developed by Attila Novák (technical background) and László Fejes (language specialist).

Acknowledgements

We'd like to thank

István Kozmács, Szeged University, for the electronic sources of the Udmurt–Hungarian Dictionary;
Jorma Luutonen, University of Turku, for providing text in electronic format;
the publishing house Удмуртия (Udmurtiya) and the editorial offices of Kenesh, Udmurt Dunnye, Vordskem Kyl, Invozho and Kizili for providing text in electronic format;
Galina Lesnikova, Elena Rodionova, Olga Ignatyeva, and Olga Urasinova for consultation.