Application of multilingual corpus in contrastive studies (on the example of the Bulgarian-Polish-Lithuanian parallel corpus)

Authors

  • Ludmila Dimitrova Институт по математикa и информатика, Българска академия на науките [Institute of Mathematics and Informatics, Bulgarian Academy of Sciences], София [Sofia]
  • Violetta Koseska-Toszewa Instytut Slawistyki PAN [Institute of Slavic Studies, Polish Academy of Sciences], Warszawa [Warsaw]
  • Danuta Roszko Instytut Slawistyki PAN [Institute of Slavic Studies, Polish Academy of Sciences], Warszawa [Warsaw]
  • Roman Roszko Instytut Slawistyki PAN [Institute of Slavic Studies, Polish Academy of Sciences], Warszawa [Warsaw]

DOI:

https://doi.org/10.11649/cs.2010.013

Keywords:

multilingual electronic corpora, parallel and comparable corpora, corpus annotation, lexical databases, multilingual electronic dictionaries

Abstract

Application of multilingual corpus in contrastive studies (on the example of the Bulgarian-Polish-Lithuanian parallel corpus)

In this paper we present applications of a trilingual corpus in language research. Comparative and contrastive studies of Polish and Bulgarian as well as Polish and Lithuanian have been already conducted, but up to the best of our knowledge no such studies exist for Bulgarian and Lithuanian. On the one hand, it is interesting to note that two Slavic languages are compared to a Baltic language (Lithuanian). On the other hand, the three languages are marginally present in the EU because of the later ascension of the three countries to the EU. The paper shortly describes the first electronic Bulgarian–Polish–Lithuanian experimental corpus, currently under development only for research. We also focus our attention on the morphosyntactic annotation of the parallel trilingual corpus according to the Corpus Encoding Standard: we present a review of the Part-of-Speech (POS) classification of the participle in the three languages – Bulgarian, Polish, and Lithuanian in comparison to another POS, the adjective. We briefly discuss tagsets for corpus annotation from the point of view of possible unification in the future with some examples.

References

Published

2015-11-24

Issue

Section

Around the Problems of Language Corpora and Electronic Dictionaries in Slavic Languages