References
Please cite the following articles if you use the resources on this site: (click here for BibTeX sources in UTF-8 -- use \usepackage[utf8]{inputenc} in your LaTeX source)
Attila Novák (2008). Language resources for Uralic minority languages. In: Briony Williams, Mikel L Forcada, Kepa Sarasola (eds.) Proceedings of the SALTMIL Workshop at LREC 2008: Collaboration: interoperability between people in the creation of language resources for less-resourced languages. pp. 27–32.
Gábor Prószéky, Attila Novák (2005).Computational Morphologies for Small Uralic Languages. In: Arppe Antti, Lauri Carlson, Krister Linden, Jussi Piitulainen, Mickael Suominen, Martti Vainio, Hanna Westerlund, Anssi Yli-Jyrä (eds.)
Inquiries into Word, Constraints and Contexts. (Festschrift in the Honour of Kimmo Koskenniemi on his 60th Birthday). Stanford (CA): CSLI Publications. pp. 150–157.
and, specifically, for Nganasan:
István Endrédy, László Fejes, Attila Novák, Beatrix Oszkó, Gábor Prószéky, Sándor Szeverényi, Zsuzsa Várnai, Beáta Wágner-Nagy (2010). Nganasan – Computational Resources of a Language on the Verge of Extinction.
In: Kepa Sarasola, Francis M. Tyers, Mikel L. Forcada (eds.) Creation and Use of Basic Lexical Resources for Less-Resourced Languages: 7th SaLTMiL Workshop (LREC-2010). pp. 41–44.
for Komi:
Attila Novák (2004). Creating a Morphological Analyzer and Generator for the Komi language. In: Julie Carson-Berndsen (ed.) Proceedings of the SALTMIL Workshop at LREC 2004: Proceedings of the SALTMIL Workshop at LREC 2004. pp. 64–67.
for Mansi and Khanti:
Fejes László, Novák Attila (2010): Obi-ugor morfológiai elemzők és korpuszok. [Ob Ugric morphological analyzers and corpora] In: Tanács Attila, Vincze Veronika (eds.)
VII. Magyar Számítógépes Nyelvészeti Konferencia [Seventh Hungarian Conference on Computational Linguistics]: MSZNY 2010. Szeged: Szegedi Tudományegyetem, 2010. pp. 284–291.
About the resources available on this site
The computational morphologies available on this site have been created in a
series of research projects by Researchers of the Department of Finno-Ugric and
Historical Linguistics of the Research Institute for Linguistics of the
Hungarian Academy of Sciences and Attila Novák, a computational linguist
working at MorphoLogic.
The projects, funded by Hungarian Scientific Research Fund (OTKA) and the National Research and Development Programme (NKFP),
laying the foundations of the tools presented here have been the following:
The linguists participating in the creation of the individual analyzers and analyzed corpora have
been:
| Nganasan | Beáta Wagner-Nagy, Zsuzsa Várnai, Sándor Szeverényi |
| Komi | László Fejes |
| Udmurt | László Fejes |
Northern Mansi (3 versions for 3 corpora: Chr. Vog., WT, VNGY) | László Fejes, Nóra Wenszky (translation of glosses and texts to English), Csilla Horváth, Attila Novák (digitization of stem lexicons, disambiguation of texts) |
| Synya Khanti | Eszter Ruttkay-Miklián, (stem database, disambiguation of analyses) László Fejes, (grammar, suffix database) Nóra Wenszky (translation of glosses and texts to English) |
| Kazym Khanti | Mária Sipos, (stem database, disambiguation of analyses) László Fejes, (grammar, suffix database) Nóra Wenszky (translation of glosses to English) |
For the Finno-Ugric languages, Komi, Udmurt and Mansi, the Humor
morphological analyzer engine of MorphoLogic is used. The Nganasan morphology
was implemented using the Xerox xfst toolset.
The web interface was created by István Endrédy and Attila Novák and is hosted by MorphoLogic.
The analyzers
Komi-Zyryan
The analyzer for standard Komi-Zyryan with Cyrillic orthography is based on the Humor
morphological analyzer engine of MorphoLogic. The lexicon was derived from the Коми-роч кывчукӧр (Komi-Russian
Dictionary) by L. M. Beznosikova, Ye. A. Aybabina and R. I. Kosnyreva
(2000, Коми небӧг лэдзанін / Коми книжное издательство, Syktyvkar). The analyzer was developed by
Attila Novák (technical background) and László Fejes (language specialist).
Acknowledgements
We'd like to thank
- G. V. Fedyuneva, Institute of language, literature and history
of the Komi Sciences Center of the Uralic Division of the Russian
Academy of Sciences for the electronic sources of the Komi-Russian
Dictionary;
- Nadezhda Manova and Ilya Mityushev for providing the first texts in an electronic format;
- Nikolay Kuznetsov, Tartu University, for consultation.
Mansi (WT)
Analyzers for the Northern Mansi dialect are based on the Humor morphological
analyzer engine of MorphoLogic. This version uses the transcription of the
text collection Wogulische Texte mit einem Glossar (1976, Akadémiai
Kiadó, Budapest) by Béla Kálmán. The lexicon is based on the vocabulary
of the same book. Glosses for the stems are presented in English, German and
Hungarian. The analyzer was developed by Attila Novák (technical background)
and László Fejes (language specialist). The English translation of the
glosses was added by Nóra Wenszky.
Acknowledgements
We'd like to thank
Mansi (Chr. Vog.)
Analyzers for the Northern Mansi dialect are based on the Humor morphological
analyzer engine of MorphoLogic. This version uses the transcription of the
text collection Chrestomathia Vogulica (1963, Tankönyvkiadó,
Budapest) by Béla Kálmán. The lexicon is based on the vocabulary of the
same book. Glosses for the stems are presented in English, German and
Hungarian. The analyzer was developed by Attila Novák (technical background)
and László Fejes (language specialist). The English translation of the
glosses was added by Nóra Wenszky. Morphological annotation of the corpus
has been disambiguated by Csilla Horváth and Attila Novák.
Acknowledgements
We'd like to thank
- Katalin Sipőcz,
Szeged University, for consultation;
- Csilla Horváth, Szeged University,
for digitization of the corpus and lexicon sources and disambiguation of morphological analyses;
- Nóra Wenszky for English glossing and translation of the corpus to English.
Mansi (VNGY)
Analyzers for the Northern Mansi dialect are based on the Humor morphological
analyzer engine of MorphoLogic. This version uses the transcription of the
text collection Vogul Népköltési Gyűjtemény (1892-96, Magyar
Tudományos Akadémia, Budapest) by Bernát Munkácsi. The lexicon is based
on the vocabulary Wogulisches Wörterbuch (Gesammelt von Bernát
Munkácsi, geordnet, bearbeitet und herausgeben von Béla Kálmán,
Akadémiai Kiadó, Budapest, 1986), digitized by Attila Novák, corrected by
Lászó Fejes. Glosses for the stems are presented in English, German and
Hungarian. The analyzer was developed by Attila Novák (technical background)
and László Fejes (language specialist). The English translation of the
glosses was added by Nóra Wenszky.
Acknowledgements
We'd like to thank
Kazym Khanti
The analyzer for the Kazym Khanti dialect is based on the Humor morphological
analyzer engine of MorphoLogic. The stem lexicon was created by Mária Sipos.
Glosses for the stems are presented in English and Hungarian. The analyzer
was developed by Attila Novák (technical background) and László Fejes
(language specialist). The English translation of the glosses was added by
Nóra Wenszky. Morphological annotation of the corpus has been disambiguated
by Mária Sipos.
Synya Khanti
The analyzer for the Synya Khanti dialect is based on the Humor morphological
analyzer engine of MorphoLogic. The stem lexicon was created by Eszter
Ruttkay-Miklián. Glosses for the stems are presented in English and
Hungarian. The analyzer was developed by Attila Novák (technical background)
and László Fejes (language specialist). The English translation of the
glosses was added by Nóra Wenszky. Morphological annotation of the corpus
was disambiguated by Eszter Ruttkay-Miklián. The corpus presented on
the website was collected and transcribed by Eszter Ruttkay-Miklián.
Nganasan
The Nganasan analyzer uses a Latin based phonemic transcription (various
transcription versions are available on the web interface of the tools, which
are variants of the transciption used in the text collection Chrestomathia
Nganasanica (2002, SZTE Finnugor Tanszék – MTA Nyelvtudományi
Intézet, Szeged – Budapest), edited by Beáta Wagner-Nagy). The lexicon is
based on the vocabulary of the same book and the following dictionary: N. T.
Kosťerkina, A. Č. Momďe, T. Ju. Ždanova. Slovar’ nganasansko-russkij
i russko-nganasanskij. Prosvesčen’ije, Sankt-Peťerburg, 2001.
Glosses are provided in Hungarian. The analyzer was developed by Attila
Novák (grammar and implemetation of the computational morphology) and Beáta
Wagner-Nagy, Zsuzsa Várnai and Sándor Szeverényi (language specialists,
lexicon).
Acknowledgements
We'd like to thank
- Valentin Gusev, Institute of Linguistics, Russian Academy of Sciences, for the assistance that he provided to us while working on the development of the morphology.
Udmurt
The analyzer for the standard Udmurt with Cyrillic orthography is based on the Humor morphological analyzer engine of MorphoLogic. The lexicon
is derived from the Udmurt–magyar szótár (Udmurt–Hungarian Dictionary) by István Kozmács (2002, Savaria University Press, Szombathely). The meaning
of the stems is given in Hungarian. The analyzer was developed by Attila Novák
(technical background) and László Fejes (language specialist).
Acknowledgements
We'd like to thank
- István Kozmács,
Szeged University,
for the electronic sources of the Udmurt–Hungarian Dictionary;
- Jorma Luutonen, University of Turku,
for providing text in electronic format;
- the publishing house Удмуртия (Udmurtiya) and the editorial offices of Kenesh, Udmurt Dunnye, Vordskem Kyl, Invozho and Kizili for providing
text in electronic format;
- Galina Lesnikova, Elena Rodionova, Olga Ignatyeva, and Olga Urasinova for consultation.