The data-driven Bulgarian WordNet: BTBWN
DOI:
https://doi.org/10.11649/cs.1713Keywords:
Bulgarian WordNet, WordNet mappings, data-driven WordNet constructionAbstract
The paper presents our work towards the simultaneous creation of a data-driven WordNet for Bulgarian and a manually annotated treebank with semantic information. Such an approach requires synchronization of the word senses in both - syntactic and lexical resources, without limiting the WordNet senses to the corpus or vice versa. Our strategy focuses on the identification of senses used in BulTreeBank, but the missing senses of a lemma also have been covered through exploration of bigger corpora. The identified senses have been organized in synsets for the Bulgarian WordNet. Then they have been aligned to the Princeton WordNet synsets. Various types of mappings are considered between both resources in a cross-lingual aspect and with respect to ensuring maximum connectivity and potential for incorporating the language specific concepts. The mapping between the two WordNets (English and Bulgarian) is a basis for applications such as machine translation and multilingual information retrieval.
References
Erjavec, T., & Fišer, D. (2006). Building Slovene WordNet. In Proceedings of the 5th International Conference on Language Resources and Evaluations (LREC 2006) (pp. 1678-1683). Retrieved from http://www.lrec-conf.org/proceedings/lrec2006/
Fellbaum, C. (Ed.). (1998). WordNet: An electronic lexical database. Cambridge, MA: MIT Press. DOI: https://doi.org/10.7551/mitpress/7287.001.0001
Gurevych, I., Eckle-Kohler, J., Hartmann, S., Matuschek, M., Meyer, C. M., & Wirth, C. (2012). UBY - A large-scale unified lexical-semantic resource based on LMF. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (pp. 580-590). Avignon: The Association for Computer Linguistics. Retrieved from http://www.aclweb.org/anthology/E12-1059
Hajnicz, E. (2014). The procedure of lexico-semantic annotation of "Składnica" Treebank. In Proceedings of LREC-2014 (pp. 2290-2297). Retrieved from http://www.lrec-conf.org/proceedings/lrec2014/pdf/444_Paper.pdf
McCrae, J. P. (2018). Mapping WordNet instances to Wikipedia. In Proceedings of the 9th Global WordNet Conference (GWC 2018) (pp. 62-69). Singapore: Global WordNet Association. Retrieved from http://compling.hss.ntu.edu.sg/events/2018-gwc/pdfs/gwc-2018-proceedings.pdf
Navigli, R., & Ponzetto, S. P. (2012). Babelnet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence, 193, 217-250. https://doi.org/10.1016/j.artint.2012.07.001 DOI: https://doi.org/10.1016/j.artint.2012.07.001
O'Grady, W. (1998). The syntax of idioms. Natural Language and Linguistic Theory, 16(2), 279-312. https://doi.org/10.1023/A:1005932710202 DOI: https://doi.org/10.1023/A:1005932710202
Pociello, E., Agirre, E., & Aldezabal, W. (2011). Methodology and construction of the Basque WordNet. Language Resources and Evaluation, 45(2), 121-142. https://doi.org/10.1007/s10579-010-9131-y DOI: https://doi.org/10.1007/s10579-010-9131-y
Popov, A., Kancheva, S., Manova, S., Radev, I., Simov, K., & Osenova, P. (2014). The sense annotation of BulTreeBank. In V. Henrich, E. Hinrichs, D. de Kok, P. Osenova, & A. Przepiórkowski (Eds.), Proceedings of the Thirteenth International Workshop on Treebanks and Linguistic Theories (TLT13) (pp. 127-136). Retrieved from http://tlt13.sfs.uni-tuebingen.de/tlt13-proceedings.pdf
Postma, M., Miltenburg, E. van, Segers, R., Schoen, A., & Vossen, P. (2016). Open Dutch WordNet. In V. Barbu Mititelu, C. Forascu, C. Fellbaum, & P. Vossen (Eds.), Proceedings of the Eighth Global WordNet Conference (pp. 300—308). Retrieved from http://jiangbian.me/papers/2016/gwc2016.pdf
Raffaelli, I., Tadić, M., Bekavac, B., & Agić, Z. (2008). Building Croatian WordNet. In A. Tanács, D. Csendes, V. Vincze, C. Fellbaum, & P. Vossen (Eds.), GWC 2008: The Fourth Global WordNet Conference, Szeged, Hungary, January 22-25, 2008: Proceedings (pp. 349-359). Retrieved from http://www.inf.u-szeged.hu/projectdirs/gwc2008/GWC2008_Proceedings_Final.pdf
Rudnicka, E., Maziarz, M., Piasecki, M., & Szpakowicz, S. (2012). A strategy of mapping Polish WordNet onto Princeton WordNet. In M. Kay & C. Boitet (Eds.), Proceedings of COLING 2012: Poster (pp. 1039—1048). Retrieved from http://anthology.aclweb.org/C/C12/C12-2.pdf
Simov, K. (2009). Ontology-based lexicon of Bulgarian. Journal for Language Technology and Computational Linguistics, 24(2), 40-55.
Simov, K., & Osenova, P. (2008). Language resources and tools for ontology-based semantic annotation. In Proceeding of OntoLex 2008 Workshop at LREC 2008 (pp. 9-13). Retrieved from http://www.lrec-conf.org/proceedings/lrec2008/
Simov, K., & Osenova, P. (2014). Formalizing multiwords as catenae in a treebank and in a lexicon. In V. Henrich, E. Hinrichs, D. de Kok, P. Osenova, & A. Przepiórkowski (Eds.), Proceedings of the Thirteenth International Workshop on Treebanks and Linguistic Theories (TLT13) (pp. 198-207). Retrieved from http://tlt13.sfs.uni-tuebingen.de/tlt13-proceedings.pdf
Simov, K., & Osenova, P. (2015). Catena operations for unified dependency analysis. In Proceedings of the Third International Conference on Dependency Linguistics (Depling 2015) (pp. 320- 329). Uppsala: Uppsala University. Retrieved from http://www.aclweb.org/anthology/W15-2135
Vossen, P., Bond, F., McCrae, J. P., & Fellbaum, C. (2016). CILI: The Collaborative Interlingual Index. In V. Barbu Mititelu, C. Forascu, C. Fellbaum, & P. Vossen (Eds.), Proceedings of the Eighth Global WordNet Conference (pp. 50-57). Retrieved from http://jiangbian.me/papers/2016/gwc2016.pdf
Downloads
Published
Issue
Section
License
Copyright (c) 2018 Petya Osenova, Kiril Simov

This work is licensed under a Creative Commons Attribution 3.0 Unported License.



