References

Please cite the following articles if you use the resources on this site: (click here for BibTeX sources in UTF-8 -- use \usepackage[utf8]{inputenc} in your LaTeX source)

Attila Novák (2008). Language resources for Uralic minority languages. In: Briony Williams, Mikel L Forcada, Kepa Sarasola (eds.) Proceedings of the SALTMIL Workshop at LREC 2008: Collaboration: interoperability between people in the creation of language resources for less-resourced languages. pp. 27–32.

Gábor Prószéky, Attila Novák (2005).Computational Morphologies for Small Uralic Languages. In: Arppe Antti, Lauri Carlson, Krister Linden, Jussi Piitulainen, Mickael Suominen, Martti Vainio, Hanna Westerlund, Anssi Yli-Jyrä (eds.) Inquiries into Word, Constraints and Contexts. (Festschrift in the Honour of Kimmo Koskenniemi on his 60th Birthday). Stanford (CA): CSLI Publications. pp. 150–157.

and, specifically, for Nganasan:

István Endrédy, László Fejes, Attila Novák, Beatrix Oszkó, Gábor Prószéky, Sándor Szeverényi, Zsuzsa Várnai, Beáta Wágner-Nagy (2010). Nganasan – Computational Resources of a Language on the Verge of Extinction. In: Kepa Sarasola, Francis M. Tyers, Mikel L. Forcada (eds.) Creation and Use of Basic Lexical Resources for Less-Resourced Languages: 7th SaLTMiL Workshop (LREC-2010). pp. 41–44.

for Komi:

Attila Novák (2004). Creating a Morphological Analyzer and Generator for the Komi language. In: Julie Carson-Berndsen (ed.) Proceedings of the SALTMIL Workshop at LREC 2004: Proceedings of the SALTMIL Workshop at LREC 2004. pp. 64–67.

for Mansi and Khanti:

Fejes László, Novák Attila (2010): Obi-ugor morfológiai elemzők és korpuszok. [Ob Ugric morphological analyzers and corpora] In: Tanács Attila, Vincze Veronika (eds.) VII. Magyar Számítógépes Nyelvészeti Konferencia [Seventh Hungarian Conference on Computational Linguistics]: MSZNY 2010. Szeged: Szegedi Tudományegyetem, 2010. pp. 284–291.

About the resources available on this site

The computational morphologies available on this site have been created in a series of research projects by Researchers of the Department of Finno-Ugric and Historical Linguistics of the Research Institute for Linguistics of the Hungarian Academy of Sciences and Attila Novák, a computational linguist working at MorphoLogic.

The projects, funded by Hungarian Scientific Research Fund (OTKA) and the National Research and Development Programme (NKFP), laying the foundations of the tools presented here have been the following:

The linguists participating in the creation of the individual analyzers and analyzed corpora have been:

For the Finno-Ugric languages, Komi, Udmurt and Mansi, the Humor morphological analyzer engine of MorphoLogic is used. The Nganasan morphology was implemented using the Xerox xfst toolset.

The web interface was created by István Endrédy and Attila Novák and is hosted by MorphoLogic.

The analyzers

Komi-Zyryan

The analyzer for standard Komi-Zyryan with Cyrillic orthography is based on the Humor morphological analyzer engine of MorphoLogic. The lexicon was derived from the Коми-роч кывчукӧр (Komi-Russian Dictionary) by L. M. Beznosikova, Ye. A. Aybabina and R. I. Kosnyreva (2000, Коми небӧг лэдзанін / Коми книжное издательство, Syktyvkar). The analyzer was developed by Attila Novák (technical background) and László Fejes (language specialist).

Acknowledgements

We'd like to thank

Mansi (WT)

Analyzers for the Northern Mansi dialect are based on the Humor morphological analyzer engine of MorphoLogic. This version uses the transcription of the text collection Wogulische Texte mit einem Glossar (1976, Akadémiai Kiadó, Budapest) by Béla Kálmán. The lexicon is based on the vocabulary of the same book. Glosses for the stems are presented in English, German and Hungarian. The analyzer was developed by Attila Novák (technical background) and László Fejes (language specialist). The English translation of the glosses was added by Nóra Wenszky.

Acknowledgements

We'd like to thank

Mansi (Chr. Vog.)

Analyzers for the Northern Mansi dialect are based on the Humor morphological analyzer engine of MorphoLogic. This version uses the transcription of the text collection Chrestomathia Vogulica (1963, Tankönyvkiadó, Budapest) by Béla Kálmán. The lexicon is based on the vocabulary of the same book. Glosses for the stems are presented in English, German and Hungarian. The analyzer was developed by Attila Novák (technical background) and László Fejes (language specialist). The English translation of the glosses was added by Nóra Wenszky. Morphological annotation of the corpus has been disambiguated by Csilla Horváth and Attila Novák.

Acknowledgements

We'd like to thank

Mansi (VNGY)

Analyzers for the Northern Mansi dialect are based on the Humor morphological analyzer engine of MorphoLogic. This version uses the transcription of the text collection Vogul Népköltési Gyűjtemény (1892-96, Magyar Tudományos Akadémia, Budapest) by Bernát Munkácsi. The lexicon is based on the vocabulary Wogulisches Wörterbuch (Gesammelt von Bernát Munkácsi, geordnet, bearbeitet und herausgeben von Béla Kálmán, Akadémiai Kiadó, Budapest, 1986), digitized by Attila Novák, corrected by Lászó Fejes. Glosses for the stems are presented in English, German and Hungarian. The analyzer was developed by Attila Novák (technical background) and László Fejes (language specialist). The English translation of the glosses was added by Nóra Wenszky.

Acknowledgements

We'd like to thank

Kazym Khanti

The analyzer for the Kazym Khanti dialect is based on the Humor morphological analyzer engine of MorphoLogic. The stem lexicon was created by Mária Sipos. Glosses for the stems are presented in English and Hungarian. The analyzer was developed by Attila Novák (technical background) and László Fejes (language specialist). The English translation of the glosses was added by Nóra Wenszky. Morphological annotation of the corpus has been disambiguated by Mária Sipos.

Synya Khanti

The analyzer for the Synya Khanti dialect is based on the Humor morphological analyzer engine of MorphoLogic. The stem lexicon was created by Eszter Ruttkay-Miklián. Glosses for the stems are presented in English and Hungarian. The analyzer was developed by Attila Novák (technical background) and László Fejes (language specialist). The English translation of the glosses was added by Nóra Wenszky. Morphological annotation of the corpus was disambiguated by Eszter Ruttkay-Miklián. The corpus presented on the website was collected and transcribed by Eszter Ruttkay-Miklián.

Nganasan

The Nganasan analyzer uses a Latin based phonemic transcription (various transcription versions are available on the web interface of the tools, which are variants of the transciption used in the text collection Chrestomathia Nganasanica (2002, SZTE Finnugor Tanszék – MTA Nyelvtudományi Intézet, Szeged – Budapest), edited by Beáta Wagner-Nagy). The lexicon is based on the vocabulary of the same book and the following dictionary: N. T. Kosťerkina, A. Č. Momďe, T. Ju. Ždanova. Slovar’ nganasansko-russkij i russko-nganasanskij. Prosvesčen’ije, Sankt-Peťerburg, 2001.
Glosses are provided in Hungarian. The analyzer was developed by Attila Novák (grammar and implemetation of the computational morphology) and Beáta Wagner-Nagy, Zsuzsa Várnai and Sándor Szeverényi (language specialists, lexicon).

Acknowledgements

We'd like to thank

Udmurt

The analyzer for the standard Udmurt with Cyrillic orthography is based on the Humor morphological analyzer engine of MorphoLogic. The lexicon is derived from the Udmurt–magyar szótár (Udmurt–Hungarian Dictionary) by István Kozmács (2002, Savaria University Press, Szombathely). The meaning of the stems is given in Hungarian. The analyzer was developed by Attila Novák (technical background) and László Fejes (language specialist).

Acknowledgements

We'd like to thank