Application of multilingual corpus in contrastive studies (on the example of the Bulgarian-Polish-Lithuanian parallel corpus)

Ludmila Dimitrova; Violetta Koseska-Toszewa; Danuta Roszko; Roman Roszko

doi:10.11649/cs.2010.013

Authors

Ludmila Dimitrova Институт по математикa и информатика, Българска академия на науките [Institute of Mathematics and Informatics, Bulgarian Academy of Sciences], София [Sofia]
Violetta Koseska-Toszewa Instytut Slawistyki PAN [Institute of Slavic Studies, Polish Academy of Sciences], Warszawa [Warsaw]
Danuta Roszko Instytut Slawistyki PAN [Institute of Slavic Studies, Polish Academy of Sciences], Warszawa [Warsaw]
Roman Roszko Instytut Slawistyki PAN [Institute of Slavic Studies, Polish Academy of Sciences], Warszawa [Warsaw]

DOI:

https://doi.org/10.11649/cs.2010.013

Keywords:

multilingual electronic corpora, parallel and comparable corpora, corpus annotation, lexical databases, multilingual electronic dictionaries

Abstract

Application of multilingual corpus in contrastive studies (on the example of the Bulgarian-Polish-Lithuanian parallel corpus)

In this paper we present applications of a trilingual corpus in language research. Comparative and contrastive studies of Polish and Bulgarian as well as Polish and Lithuanian have been already conducted, but up to the best of our knowledge no such studies exist for Bulgarian and Lithuanian. On the one hand, it is interesting to note that two Slavic languages are compared to a Baltic language (Lithuanian). On the other hand, the three languages are marginally present in the EU because of the later ascension of the three countries to the EU. The paper shortly describes the first electronic Bulgarian–Polish–Lithuanian experimental corpus, currently under development only for research. We also focus our attention on the morphosyntactic annotation of the parallel trilingual corpus according to the Corpus Encoding Standard: we present a review of the Part-of-Speech (POS) classification of the participle in the three languages – Bulgarian, Polish, and Lithuanian in comparison to another POS, the adjective. We briefly discuss tagsets for corpus annotation from the point of view of possible unification in the future with some examples.

Application of multilingual corpus in contrastive studies (on the example of the Bulgarian-Polish-Lithuanian parallel corpus)

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

Make a Submission

Language

Indexing

Metrics

Latest publications

Other Journals

Publisher

Membership

Partnership