Neural Semantic Parsing with Extremely Rich Symbolic Meaning Representations (2404.12698v2)
Abstract: Current open-domain neural semantics parsers show impressive performance. However, closer inspection of the symbolic meaning representations they produce reveals significant weaknesses: sometimes they tend to merely copy character sequences from the source text to form symbolic concepts, defaulting to the most frequent word sense based in the training distribution. By leveraging the hierarchical structure of a lexical ontology, we introduce a novel compositional symbolic representation for concepts based on their position in the taxonomical hierarchy. This representation provides richer semantic information and enhances interpretability. We introduce a neural "taxonomical" semantic parser to utilize this new representation system of predicates, and compare it with a standard neural semantic parser trained on the traditional meaning representation format, employing a novel challenge set and evaluation metric for evaluation. Our experimental findings demonstrate that the taxonomical model, trained on much richer and complex meaning representations, is slightly subordinate in performance to the traditional model using the standard metrics for evaluation, but outperforms it when dealing with out-of-vocabulary concepts. This finding is encouraging for research in computational semantics that aims to combine data-driven distributional meanings with knowledge-based symbolic representations.
- The Parallel Meaning Bank: Towards a multilingual corpus of translations annotated with compositional meaning representations. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pages 242–247, Association for Computational Linguistics, Valencia, Spain.
- Drs at mrp 2020: Dressing up discourse representation structures as graphs. In Proceedings of the CoNNL 2020 Shared Task: Cross-Framework Meaning Representation Parsing, pages 23–32.
- Deep semantic analysis of text. In Semantics in Text Processing. STEP 2008 Conference Proceedings, pages 343–354, College Publications.
- Graph pre-training for AMR parsing and generation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6001–6015, Association for Computational Linguistics, Dublin, Ireland.
- Abstract Meaning Representation for sembanking. In Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse, pages 178–186, Association for Computational Linguistics, Sofia, Bulgaria.
- Making better mistakes: Leveraging class hierarchies with deep networks. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12503–12512, IEEE Computer Society, Los Alamitos, CA, USA.
- Recent trends in word sense disambiguation: A survey. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, pages 4330–4338, International Joint Conferences on Artificial Intelligence Organization. Survey Track.
- Blanco, Eduardo and Dan Moldovan. 2010. Automatic discovery of manner relations and its applications. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pages 315–324, Association for Computational Linguistics, Cambridge, MA.
- Bond, Francis and Ryan Foster. 2013. Linking and extending an open multilingual wordnet. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1352–1362.
- A hierarchical unification of lirics and verbnet semantic roles. In Proceedings IEEE-ICSC 2011 Workshop on Semantic Annotation for Computational Linguistic Resources, pages 1–7, Stanford University.
- Bos, Johan. 2023. The sequence notation: Catching complex meanings in simple graphs. In Proceedings of the 15th International Conference on Computational Semantics (IWCS 2023), pages 1–14, Nancy, France.
- The groningen meaning bank. In Nancy Ide and James Pustejovsky, editors, Handbook of Linguistic Annotation, volume 2. Springer, pages 463–496.
- Wide-coverage semantic representations from a CCG parser. In COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics, pages 1240–1246, COLING, Geneva, Switzerland.
- Cai, Deng and Wai Lam. 2019. Core semantic first: A top-down approach for AMR parsing. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3799–3809, Association for Computational Linguistics, Hong Kong, China.
- Cai, Shu and Kevin Knight. 2013. Smatch: an evaluation metric for semantic feature structures. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 748–752, Association for Computational Linguistics, Sofia, Bulgaria.
- Edman, Lukas. 2024. The Little Data That Could: Making the Most of Low-Resource Natural Language Processing. Ph.D. thesis, University of Groningen.
- Evang, Kilian. 2019. Transition-based DRS parsing using stack-LSTMs. In Proceedings of the IWCS Shared Task on Semantic Parsing, Association for Computational Linguistics, Gothenburg, Sweden.
- Evang, Kilian and Johan Bos. 2016. Cross-lingual learning of an open-domain semantic parser. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 579–588, The COLING 2016 Organizing Committee, Osaka, Japan.
- WordNet: An Electronic Lexical Database. Language, Speech, and Communication. MIT Press, Cambridge, MA.
- Language-agnostic BERT sentence embedding. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 878–891, Association for Computational Linguistics, Dublin, Ireland.
- Sweetening wordnet with dolce. AI Magazine, 24(3):13–24.
- AMR parsing is far from solved: GrAPES, the granular AMR parsing evaluation suite. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 10728–10752, Association for Computational Linguistics, Singapore.
- Announcing Prague Czech-English Dependency Treebank 2.0. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12), pages 3153–3160, European Language Resources Association (ELRA), Istanbul, Turkey.
- Developing a natural language interface to complex data. In TODS.
- OntoNotes: The 90% solution. In Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers, pages 57–60, Association for Computational Linguistics, New York City, USA.
- From Discourse to Logic: Introduction to Modeltheoretic Semantics of Natural Language, Formal Logic and Discourse Representation Theory. Number pt. 2 in Developments in Cardiovascular Medicine. Kluwer Academic.
- Kingsbury, Paul and Martha Palmer. 2002. From TreeBank to PropBank. In Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02), European Language Resources Association (ELRA), Las Palmas, Canary Islands - Spain.
- Leacock, Claudia and Martin Chodorow. 1998. Combining local context and wordnet similarity for word sense identification.
- Maximum Bayes Smatch ensemble distillation for AMR parsing. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5379–5392, Association for Computational Linguistics, Seattle, United States.
- Maximum bayes smatch ensemble distillation for amr parsing. In North American Chapter of the Association for Computational Linguistics.
- Embedding semantic taxonomies. In Proceedings of the 28th International Conference on Computational Linguistics, pages 1279–1291, International Committee on Computational Linguistics, Barcelona, Spain (Online).
- BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7871–7880, Association for Computational Linguistics, Online.
- Discourse representation structure parsing with recurrent neural networks and the transformer model. In Proceedings of the IWCS Shared Task on Semantic Parsing, Association for Computational Linguistics, Gothenburg, Sweden.
- Multilingual denoising pre-training for neural machine translation. Transactions of the Association for Computational Linguistics, 8:726–742.
- Fully-Semantic Parsing and Generation: the BabelNet Meaning Representation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1727–1741, Association for Computational Linguistics, Dublin, Ireland.
- Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, volume 26, Curran Associates, Inc.
- Linguistic regularities in continuous space word representations. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 746–751, Association for Computational Linguistics, Atlanta, Georgia.
- Encoding hierarchical information in neural networks helps in subpopulation shift. CoRR, abs/2112.10844.
- Navigli, Roberto and Simone Ponzetto. 2012. Babelnet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence, 193:217–250.
- Evaluating scoped meaning representations. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), European Language Resources Association (ELRA), Miyazaki, Japan.
- Exploring neural methods for parsing discourse representation structures. Transactions of the Association for Computational Linguistics, 6:619–633.
- van Noord, Rik and Johan Bos. 2017. Neural semantic parsing by character-based translation: Experiments with abstract meaning representations. Computational Linguistics in the Netherlands Journal, 7:93–108.
- Linguistic information in neural semantic parsing with multiple encoders. In Proceedings of the 13th International Conference on Computational Semantics - Short Papers, pages 24–31, Association for Computational Linguistics, Gothenburg, Sweden.
- Character-level representations improve DRS-based semantic parsing even in the age of BERT. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4587–4603, Association for Computational Linguistics, Online.
- MRP 2020: The second shared task on cross-framework and cross-lingual meaning representation parsing. In Proceedings of the CoNLL 2020 Shared Task: Cross-Framework Meaning Representation Parsing, pages 1–22, Association for Computational Linguistics, Online.
- Proceedings of the CoNLL 2020 Shared Task: Cross-Framework Meaning Representation Parsing. Association for Computational Linguistics, Online.
- Oepen, Stephan and Jan Tore Lønning. 2006. Discriminant-based MRS banking. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06), European Language Resources Association (ELRA), Genoa, Italy.
- AMR Similarity Metrics from Principles. Transactions of the Association for Computational Linguistics, 8:522–538.
- Parsons, Terence. 1990. Events in the Semantics of English: A Study in Subatomic Semantics. MIT Press.
- Transparent semantic parsing with Universal Dependencies using graph transformations. In Proceedings of the 29th International Conference on Computational Linguistics, pages 4186–4192, International Committee on Computational Linguistics, Gyeongju, Republic of Korea.
- Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21:140:1–140:67.
- Resnik, Philip. 1995. Using information content to evaluate semantic similarity in a taxonomy. In International Joint Conference on Artificial Intelligence.
- Rothe, Sascha and Hinrich Schütze. 2015. AutoExtend: Extending word embeddings to embeddings for synsets and lexemes. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1793–1803, Association for Computational Linguistics, Beijing, China.
- WordNet embeddings. In Proceedings of the Third Workshop on Representation Learning for NLP, pages 122–131, Association for Computational Linguistics, Melbourne, Australia.
- Samuel, David and Milan Straka. 2020. ÚFAL at MRP 2020: Permutation-invariant semantic parsing in PERIN. In Proceedings of the CoNLL 2020 Shared Task: Cross-Framework Meaning Representation Parsing, pages 53–64, Association for Computational Linguistics, Online.
- Sensembert: Context-enhanced sense embeddings for multilingual word sense disambiguation. In AAAI Conference on Artificial Intelligence.
- Shou, Ziyi and Fangzhen Lin. 2021. Incorporating EDS graph for AMR parsing. In Proceedings of *SEM 2021: The Tenth Joint Conference on Lexical and Computational Semantics, pages 202–211, Association for Computational Linguistics, Online.
- Building a dictionary of affixal negations. In Proceedings of the Workshop on Extra-Propositional Aspects of Meaning in Computational Linguistics (ExProM), pages 49–56, The COLING 2016 Organizing Committee, Osaka, Japan.
- Conceptnet 5.5: an open multilingual graph of general knowledge. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, AAAI’17, page 4444–4451, AAAI Press.
- Yago: A large ontology from wikipedia and wordnet. Journal of Web Semantics, 6(3):203–217. World Wide Web Conference 2007Semantic Web Track.
- Tedeschi, Simone and Roberto Navigli. 2022. MultiNERD: A multilingual, multi-genre and fine-grained dataset for named entity recognition (and disambiguation). In Findings of the Association for Computational Linguistics: NAACL 2022, pages 801–812, Association for Computational Linguistics, Seattle, United States.
- Templeton, Marjorie and John Burger. 1983. Problems in natural-language interface to DBMS with examples from EUFID. In First Conference on Applied Natural Language Processing, pages 3–16, Association for Computational Linguistics, Santa Monica, California, USA.
- Vossen, Piek. 1998. Introduction to eurowordnet. Computers and the Humanities, 32:73–89.
- Pre-trained language-meaning models for multilingual parsing and generation. In Findings of the Association for Computational Linguistics: ACL 2023, pages 5586–5600, Association for Computational Linguistics, Toronto, Canada.
- Input representations for parsing discourse representation structures: Comparing English with Chinese. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 767–775, Association for Computational Linguistics, Online.
- Wein, Shira and Nathan Schneider. 2022. Accounting for language effect in the evaluation of cross-lingual AMR parsers. In Proceedings of the 29th International Conference on Computational Linguistics, pages 3824–3834, International Committee on Computational Linguistics, Gyeongju, Republic of Korea.
- Woods, William A. 1973. Progress in natural language understanding: an application to lunar geology. In Proceedings of the June 4-8, 1973, national computer conference and exposition, pages 441–450.
- Wu, Zhibiao and Martha Palmer. 1994. Verb semantics and lexical selection. CoRR, abs/cmp-lg/9406033.
- ByT5: Towards a token-free future with pre-trained byte-to-byte models. Transactions of the Association for Computational Linguistics, 10:291–306.
- mT5: A massively multilingual pre-trained text-to-text transformer. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 483–498, Association for Computational Linguistics, Online.