The System of Register Labels in plWordNet

Authors

  • Marek Maziarz Politechnika Wrocławska [Wrocław University of Technology], Wrocław , Politechnika Wrocławska [Wrocław University of Technology], Wrocław
  • Maciej Piasecki Politechnika Wrocławska [Wrocław University of Technology], Wrocław , Politechnika Wrocławska [Wrocław University of Technology], Wrocław
  • Stan Szpakowicz School of Electrical Engineering & Computer Science, University of Ottawa, Ottawa , School of Electrical Engineering & Computer Science, University of Ottawa, Ottawa

DOI:

https://doi.org/10.11649/cs.2015.013

Keywords:

wordnets, plWordNet, lexical register, large-scale wordnet expansion, inter-annotator agreement

Abstract

Stylistic registers influence word usage. Both traditional dictionaries and wordnets assign lexical units to registers, and there is a wide range of solutions. A system of register labels can be flat or hierarchical, with few labels or many, homogeneous or decomposed into sets of elementary features. We review the register label systems in lexicography, and then discuss our model, designed for plWordNet, a large wordnet for Polish. There follows a detailed comparative analysis of several register systems in Polish lexical resources. We also present the practical effect of the adoption of our flat, small and homogeneous system: a relatively high consistency of register assignment in plWordNet, as measured by inter-annotator agreement on a manageable sample. Large-scale conclusions for the whole plWordNet remain to be made once the annotation has been completed, but the experience half-way through this labour-intensive exercise is very encouraging.

References

Artstein, R. & Poesio, M. (2008). Inter-coder agreement for computational linguistics. Computational Linguistics, 34(4), 555–596. http://dx.doi.org/10.1162/coli.07-034-R2 DOI: https://doi.org/10.1162/coli.07-034-R2

Atkins, B. T. S. & Rundell, M. (2008). The Oxford guide to practical lexicography. Oxford: Oxford University Press.

Biber, D. & Conrad, S. (2009). Register, genre, and style. Cambridge: Cambridge University Press. Retrieved from http://dx.doi.org/10.1017/CBO9780511814358 DOI: https://doi.org/10.1017/CBO9780511814358

Biber, D. (1995). Dimensions of register variation: A cross-linguistic comparison. Cambridge: Cambridge University Press. DOI: https://doi.org/10.1017/CBO9780511519871

Biber, D. (2006). University language: A corpus-based study of spoken and written registers. Amsterdam: John Benjamins. Retrieved from http://dx.doi.org/10.1075/scl.23 DOI: https://doi.org/10.1075/scl.23

Bowker, J. (2013). Variation across spoken and written registers in internal corporate communication: Multimodality and blending in evolving genres. In J. Bamford, S. Cavalieri, & G. Diani (Eds.), Variation and change in spoken and written discourse (pp. 47–64). Amsterdam: John Benjamins. Retrieved from http://dx.doi.org/10.1075/ds.21.08bow DOI: https://doi.org/10.1075/ds.21.08bow

Buttler, D. & Markowski, A. (1998). Słownictwo wspólnoodmianowe, książkowe i potoczne współczesnej polszczyzny. Język a kultura, (1), 179–203.

DeBose, C. E. (1992). Codeswitching: Black English and standard English in the African-American linguistic repertoire. Journal of Multilingual and Multicultural Development, 13(1–2), 157–167. http://doi.org/10.1080/01434632.1992.9994489 DOI: https://doi.org/10.1080/01434632.1992.9994489

DiCiccio, T. J. & Efron, B. (1996). Bootstrap confidence intervals. Statistical Science, 11(3), 189–212. http://dx.doi.org/10.1214/ss/1032280214 DOI: https://doi.org/10.1214/ss/1032280214

DiCiccio, T. J. & Romano, J. P. (1988). A review of bootstrap confidence intervals. Journal of the Royal Statistical Society. Series B (Methodological), 50(3), 338–354. DOI: https://doi.org/10.1111/j.2517-6161.1988.tb01732.x

Doroszewski, W. (Ed.). (1958–1962). Słownik języka polskiego (Vol. 3). Warszawa: PWN.

Dubisz, S. (2006). Wstęp. In S. Dubisz (Ed.), Uniwersalny słownik języka polskiego PWN. Wersja 3.0 [CD]. Warszawa: Wydawnictwo Naukowe PWN.

Eckert, P. & Rickford, J. (2001). Style and sociolinguistic variation. Cambridge: Cambridge University Press. DOI: https://doi.org/10.1017/CBO9780511613258

Engelking, A., Markowski, A., & Weiss, E. (1989). Kwalifikatory w słownikach — próba systematyzacji. Poradnik Językowy, (5), 300–309.

Gregory, M. (1967). Aspects of varieties differentiation. Journal of Linguistics, 3(02), 177–197. http://dx.doi.org/10.1017/S0022226700016601 DOI: https://doi.org/10.1017/S0022226700016601

Halliday, M. A. K. (2002). The construction of knowledge and value in the grammar of scientific discourse: With reference to Charles Darwin’s The origin of species (1990). In J. Webster (Ed.), Collected works of M.A.K. Halliday (Vol. 2: Linguistic studies of text and discourse, pp. 168–193). London: Continuum.

Hartmann, R. R. K. & James, G. (2002). Dictionary of lexicography. London: Routledge. DOI: https://doi.org/10.4324/9780203017685

Hausmann, F. J. (1989). Die Markierung im allgemeinen einsprachigen Wörterbuch: Eine Übersicht. In F. J. Hausmann, O. Reichmann, H. E. Wiegand, & L. Zgusta (Eds.), Wörterbücher: Ein internationales Handbuch zur Lexikographie (Vol. 5.1, pp. 649–657). New York: De Gruyter. DOI: https://doi.org/10.1515/9783110095852.1

Heacock, P. (Ed.). (1995–2011). Cambridge dictionaries online. Cambridge: Cambridge University Press.

Kurkiewicz, J. (2007). Kwalifikatory w Wielkim słowniku języka polskiego. In P. Żmigrodzki & R. Przybylska (Eds.), Nowe studia leksykograficzne. Kraków: Wydawnictwo Lexis.

Landis, J. R. & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174. http://dx.doi.org/10.2307/2529310 DOI: https://doi.org/10.2307/2529310

Lyons, M. (2013). Register variation. In F. R. Volkmar (Ed.), Encyclopedia of autism spectrum disorders (p. 2534). New York: Springer. http://www.springerlink.com/index/10.1007/978-1-4419-1698-3_983

Maybin, J. & Swann, J. (2009). The Routledge Companion to English Language Studies. New York: Routledge. DOI: https://doi.org/10.4324/9780203878958

Maziarz, M., Piasecki, M., & Szpakowicz, S. (2013). The chicken-and-egg problem in wordnet design: Synonymy, synsets and constitutive relations. Language Resources and Evaluation, 47(3), 769–796. http://dx.doi.org/10.1007/s10579-012-9209-9 DOI: https://doi.org/10.1007/s10579-012-9209-9

Maziarz, M., Piasecki, M., Rudnicka, E., & Szpakowicz, S. (2014). Registers in the system of semantic relations in plWordNet. In Proceedings of 7th International Global Wordnet Conference (pp. 330–337).

Milroy, L. & Gordon, J. (2003). Sociolinguistics: Method and interpretation. Cambridge, MA: Blackwell Publishing Ltd. Retrieved from http://dx.doi.org/10.1002/9780470758359 DOI: https://doi.org/10.1002/9780470758359

Piotrowski, T. (2001). Zrozumieć leksykografię. Warszawa: PWN.

Reidsma, D. & Carletta, J. (2007). Reliability measurement without limits. Computational Linguistics, 1(1), 1–8.

Simpson, J. (2013). Oxford English Dictionary. Oxford: Oxford University Press. Retrieved from public.oed.com/

Svensén, B. (2009). A handbook of lexicography: The theory and practice of dictionary-making. New York: Cambridge University Press.

Trudgill, P. (1999). Standard English: What it isn’t. In T. Bex & R. J. Watts (Eds.), Standard English: The widening debate (pp. 117–128). London: Routledge.

Downloads

Published

2015-12-31

Issue

Section

Semantics, Corpus Linguistics and Computer Linguistics

Similar Articles

1-10 of 86

You may also start an advanced similarity search for this article.

Most read articles by the same author(s)

1 2 > >>