Inside Baseball: Coverage, quality, and culture in the Global WordNet

Authors

  • Martin Benjamin Kamusi Project International, Lausanne , Kamusi Project International, Lozanna

DOI:

https://doi.org/10.11649/cs.1712

Keywords:

wordnet, lexicography, vocabulary, named entities, multilingual

Abstract

The Global WordNet is succeeding in producing relatively open linguistic data that is coordinated to a degree among numerous languages. The project has grown organically, with no overall plan or direction. The result is a certain amount of incoherence in determining what items should be treated in wordnets, and how the various wordnets should aspire to consistent quality. Using the example of terms related to baseball, which constitute a non-trivial portion of the Princeton WordNet, this paper discusses problems of coverage selection both for English and for other languages, as well as methods to improve quality and depth through public review of current content, and contribution of missing terms and definitions. It is proposed that proper names be removed entirely from WordNet and treated as a separate project, and that individual languages produce annexes of indigenous concepts that can be readily considered within sister projects as a supplement to the Anglo-American weighting of the current endeavor. To produce a consistent product that transmits inter-intelligible understanding at a high level across languages, it is proposed that an open committee of interested stakeholders convene to consider the project's goals and develop a roadmap for how to achieve them.

References

Benjamin, M. (2014). Molecular lexicography: A lexical data model for Human Language Technology. Retrieved March 2, 2018, from https://kamusi.org/molecular_lexicography

Benjamin, M. (2015). Crowdsourcing microdata for cost-effective and reliable lexicography. In Proceedings of AsiaLex 2015 Hong Kong (pp. 213-221).

Benjamin, M. (2016). Problems and procedures to make Wordnet Data (Retro)Fit for a multilingual dictionary. In V. Barbu Mititelu, C. Forascu, C. Fellbaum, & P. Vossen (Eds.), Proceedings of the Eighth Global WordNet Conference (pp. 27-33). Retrieved from http://jiangbian.me/papers/2016/gwc2016.pdf

Bond, F., & Foster, R. (2013). Linking and extending an open multilingual wordnet. In 51st Annual Meeting of the Association for Computational Linguistics: ACL-2013 (pp. 1352-1362). Sofia: Association for Computational Linguistics (ACL).

Boyd-Graber, J., Fellbaum, C., Osherson, D., & Schapire, R. (2006). Adding dense, weighted connections to WordNet. In P. Sojka, K.-S. Choi, C. Fellbaum, & P. Vossen (Eds.), GWC 2006: Third International WordNet Conference, GWC 2006 Jeju Island, Korea, January 22-26, 2006: Proceedings (pp. 29-35). Retrieved from http://semanticweb.kaist.ac.kr/conference/gwc/pdf2006/gwc06.pdf

Fellbaum, C. (Ed.). (2008). WordNet: An electronic lexical database. Cambridge, MA: MIT Press.

Fellbaum, C. (2016). How and when to add a new concept and how to define it. Paper presented at Workshop on the Collaborative Interlingual Index, Global WordNet Conference 2016, Bucharest, Romania.

Francis, W., & Kucera, H. (1979). Brown Corpus Manual. Providence, RI: Department of Linguistics, Brown University. Retrieved from http://clu.uni.no/icame/manuals/BROWN/INDEX.HTM

Grishman, R., Macleod, C., & Meyers, A. (1994). Comlex Syntax: Building a computational lexicon. In COLING '94 Proceedings of the 15th conference on Computational linguistics (Vol. 1, pp. 268-272). https://doi.org/10.3115/991886.991931 DOI: https://doi.org/10.3115/991886.991931

Hornby, A. S. (Ed.). (1980). Oxford advanced learner's dictionary of current English. Oxford: Oxford University Press.

Manning, K. (2013, November). How many saints are there? US Catholic, 78(11), 46. Retrieved March 2, 2018, from http://www.uscatholic.org/articles/201310/how-many-saints-are-there-28027

Mead, R. (2010, January 4). What do you call it? The New Yorker. Retrieved March 2, 2018, from https://www.newyorker.com/magazine/2010/01/04/what-do-you-call-it

Miller, G. (2008a). Forward. In C. Fellbaum (Ed.), WordNet: An electronic lexical database (pp. xv-xxii). Cambridge, MA: MIT Press.

Miller, G. (2008b). Nouns in Wordnet. In C. Fellbaum (Ed.), WordNet: An electronic lexical database (pp. 23-46). Cambridge, MA: MIT Press.

Mojapelo, M. (2016). Semantics of body parts in African WordNet: A case of Northern Sotho. In V. Barbu Mititelu, C. Forascu, C. Fellbaum, & P. Vossen (Eds.), Proceedings of the Eighth Global WordNet Conference (pp. 233-241). Retrieved from http://jiangbian.me/papers/2016/gwc2016.pdf

Mrini, K., & Benjamin, M. (2017). Towards Producing Human-Validated Translation Resources for the Fula language through WordNet Linking. In The Proceedings of the First Workshop on Human-Informed Translation and Interpreting Technology (HiT-IT) (pp. 58-64). Varna: RANLP. https://doi.org/10.26615/978-954-452-042-7_008 DOI: https://doi.org/10.26615/978-954-452-042-7_008

Mrini, K., & Benjamin, M. (in press). Linking the English Wiktionary: A source for new multilingual data for Kamusi and WordNet. Linguistic Issues in Language Technology: Special Issue on Linking, Integrating and Extending Wordnets.

Osborn, D., Dwyer, D., & Donohoe, J. (1993). A Fulfulde (Maasina)-English-French Lexicon: A root-based compilation drawn from extant sources followed by English-Fulfulde and French-Fulfulde listings. East Lansing: Michigan State University Press.

Piasecki, M., Szpakowicz, S., Maziarz, M., & Rudnicka, E. (2016). plWordNet 3.0 - Almost there. In V. Barbu Mititelu, C. Forascu, C. Fellbaum, & P. Vossen (Eds.), Proceedings of the Eighth Global WordNet Conference (pp. 290-299). Retrieved from http://jiangbian.me/papers/2016/gwc2016.pdf

Rodríguez, H., Climent, S., Vossen, P., Bloksma, L., Peters, W., Alonge, A., Bertagna, F., & Roventini, A. (1998). The top-down strategy for building EuroWordNet: Vocabulary, base concepts, and top ontology. In P. Vossen (Ed.), EuroWordNet: A multilingual database with lexical semantic networks (pp. 45-80). Dordrecht: Springer. https://doi.org/10.1007/978-94-017-1491-4_3 DOI: https://doi.org/10.1007/978-94-017-1491-4_3

Slaughter, L., Wang, W., Morgado da Costa, L., & Bond, F. (2018). Enhancing the Collaborative Interlingual Index for Digital Humanities: Cross-linguistic analysis in the domain of theology. In F. Bond, C. Fellbaum, & P. Vossen (Eds.), Proceedings of the 9th Global Wordnet Conference, Singapore, 8-12 January 2018. Global Wordnet Association.

Vossen, P. (1998). Introduction to EuroWordNet. In P. Vossen (Ed.), EuroWordNet: A multilingual database with lexical semantic networks (pp. 1-17). Dordrecht: Springer. https://doi.org/10.1007/978-94-017-1491-4_1 DOI: https://doi.org/10.1007/978-94-017-1491-4_1

Vossen, P., Soria, C., & Monachini, M. (2013). Wordnet-LMF: A standard representation for multilingual Wordnets. In G. Francopoulo & P. Paroubek (Eds.), LMF Lexical Markup Framework (pp. 51-66). Hoboken, NJ: Hermess/Lavoisier. https://doi.org/10.1002/9781118712696.ch4 DOI: https://doi.org/10.1002/9781118712696.ch4

Downloads

Published

2018-12-20

Issue

Section

Challenges for Wordnets