Inside Baseball: Coverage, quality, and culture in the Global WordNet
DOI:
https://doi.org/10.11649/cs.1712Keywords:
wordnet, lexicography, vocabulary, named entities, multilingualAbstract
The Global WordNet is succeeding in producing relatively open linguistic data that is coordinated to a degree among numerous languages. The project has grown organically, with no overall plan or direction. The result is a certain amount of incoherence in determining what items should be treated in wordnets, and how the various wordnets should aspire to consistent quality. Using the example of terms related to baseball, which constitute a non-trivial portion of the Princeton WordNet, this paper discusses problems of coverage selection both for English and for other languages, as well as methods to improve quality and depth through public review of current content, and contribution of missing terms and definitions. It is proposed that proper names be removed entirely from WordNet and treated as a separate project, and that individual languages produce annexes of indigenous concepts that can be readily considered within sister projects as a supplement to the Anglo-American weighting of the current endeavor. To produce a consistent product that transmits inter-intelligible understanding at a high level across languages, it is proposed that an open committee of interested stakeholders convene to consider the project's goals and develop a roadmap for how to achieve them.
References
Benjamin, M. (2014). Molecular lexicography: A lexical data model for Human Language Technology. Retrieved March 2, 2018, from https://kamusi.org/molecular_lexicography
Benjamin, M. (2015). Crowdsourcing microdata for cost-effective and reliable lexicography. In Proceedings of AsiaLex 2015 Hong Kong (pp. 213-221).
Benjamin, M. (2016). Problems and procedures to make Wordnet Data (Retro)Fit for a multilingual dictionary. In V. Barbu Mititelu, C. Forascu, C. Fellbaum, & P. Vossen (Eds.), Proceedings of the Eighth Global WordNet Conference (pp. 27-33). Retrieved from http://jiangbian.me/papers/2016/gwc2016.pdf
Bond, F., & Foster, R. (2013). Linking and extending an open multilingual wordnet. In 51st Annual Meeting of the Association for Computational Linguistics: ACL-2013 (pp. 1352-1362). Sofia: Association for Computational Linguistics (ACL).
Boyd-Graber, J., Fellbaum, C., Osherson, D., & Schapire, R. (2006). Adding dense, weighted connections to WordNet. In P. Sojka, K.-S. Choi, C. Fellbaum, & P. Vossen (Eds.), GWC 2006: Third International WordNet Conference, GWC 2006 Jeju Island, Korea, January 22-26, 2006: Proceedings (pp. 29-35). Retrieved from http://semanticweb.kaist.ac.kr/conference/gwc/pdf2006/gwc06.pdf
Fellbaum, C. (Ed.). (2008). WordNet: An electronic lexical database. Cambridge, MA: MIT Press.
Fellbaum, C. (2016). How and when to add a new concept and how to define it. Paper presented at Workshop on the Collaborative Interlingual Index, Global WordNet Conference 2016, Bucharest, Romania.
Francis, W., & Kucera, H. (1979). Brown Corpus Manual. Providence, RI: Department of Linguistics, Brown University. Retrieved from http://clu.uni.no/icame/manuals/BROWN/INDEX.HTM
Grishman, R., Macleod, C., & Meyers, A. (1994). Comlex Syntax: Building a computational lexicon. In COLING '94 Proceedings of the 15th conference on Computational linguistics (Vol. 1, pp. 268-272). https://doi.org/10.3115/991886.991931 DOI: https://doi.org/10.3115/991886.991931
Hornby, A. S. (Ed.). (1980). Oxford advanced learner's dictionary of current English. Oxford: Oxford University Press.
Manning, K. (2013, November). How many saints are there? US Catholic, 78(11), 46. Retrieved March 2, 2018, from http://www.uscatholic.org/articles/201310/how-many-saints-are-there-28027
Mead, R. (2010, January 4). What do you call it? The New Yorker. Retrieved March 2, 2018, from https://www.newyorker.com/magazine/2010/01/04/what-do-you-call-it
Miller, G. (2008a). Forward. In C. Fellbaum (Ed.), WordNet: An electronic lexical database (pp. xv-xxii). Cambridge, MA: MIT Press.
Miller, G. (2008b). Nouns in Wordnet. In C. Fellbaum (Ed.), WordNet: An electronic lexical database (pp. 23-46). Cambridge, MA: MIT Press.
Mojapelo, M. (2016). Semantics of body parts in African WordNet: A case of Northern Sotho. In V. Barbu Mititelu, C. Forascu, C. Fellbaum, & P. Vossen (Eds.), Proceedings of the Eighth Global WordNet Conference (pp. 233-241). Retrieved from http://jiangbian.me/papers/2016/gwc2016.pdf
Mrini, K., & Benjamin, M. (2017). Towards Producing Human-Validated Translation Resources for the Fula language through WordNet Linking. In The Proceedings of the First Workshop on Human-Informed Translation and Interpreting Technology (HiT-IT) (pp. 58-64). Varna: RANLP. https://doi.org/10.26615/978-954-452-042-7_008 DOI: https://doi.org/10.26615/978-954-452-042-7_008
Mrini, K., & Benjamin, M. (in press). Linking the English Wiktionary: A source for new multilingual data for Kamusi and WordNet. Linguistic Issues in Language Technology: Special Issue on Linking, Integrating and Extending Wordnets.
Osborn, D., Dwyer, D., & Donohoe, J. (1993). A Fulfulde (Maasina)-English-French Lexicon: A root-based compilation drawn from extant sources followed by English-Fulfulde and French-Fulfulde listings. East Lansing: Michigan State University Press.
Piasecki, M., Szpakowicz, S., Maziarz, M., & Rudnicka, E. (2016). plWordNet 3.0 - Almost there. In V. Barbu Mititelu, C. Forascu, C. Fellbaum, & P. Vossen (Eds.), Proceedings of the Eighth Global WordNet Conference (pp. 290-299). Retrieved from http://jiangbian.me/papers/2016/gwc2016.pdf
Rodríguez, H., Climent, S., Vossen, P., Bloksma, L., Peters, W., Alonge, A., Bertagna, F., & Roventini, A. (1998). The top-down strategy for building EuroWordNet: Vocabulary, base concepts, and top ontology. In P. Vossen (Ed.), EuroWordNet: A multilingual database with lexical semantic networks (pp. 45-80). Dordrecht: Springer. https://doi.org/10.1007/978-94-017-1491-4_3 DOI: https://doi.org/10.1007/978-94-017-1491-4_3
Slaughter, L., Wang, W., Morgado da Costa, L., & Bond, F. (2018). Enhancing the Collaborative Interlingual Index for Digital Humanities: Cross-linguistic analysis in the domain of theology. In F. Bond, C. Fellbaum, & P. Vossen (Eds.), Proceedings of the 9th Global Wordnet Conference, Singapore, 8-12 January 2018. Global Wordnet Association.
Vossen, P. (1998). Introduction to EuroWordNet. In P. Vossen (Ed.), EuroWordNet: A multilingual database with lexical semantic networks (pp. 1-17). Dordrecht: Springer. https://doi.org/10.1007/978-94-017-1491-4_1 DOI: https://doi.org/10.1007/978-94-017-1491-4_1
Vossen, P., Soria, C., & Monachini, M. (2013). Wordnet-LMF: A standard representation for multilingual Wordnets. In G. Francopoulo & P. Paroubek (Eds.), LMF Lexical Markup Framework (pp. 51-66). Hoboken, NJ: Hermess/Lavoisier. https://doi.org/10.1002/9781118712696.ch4 DOI: https://doi.org/10.1002/9781118712696.ch4
Downloads
Published
Issue
Section
License
Copyright (c) 2018 Martin Benjamin

This work is licensed under a Creative Commons Attribution 3.0 Unported License.



