ReCOGS: How Incidental Details of a Logical Form Overshadow an Evaluation of Semantic Interpretation (2303.13716v2)
Abstract: Compositional generalization benchmarks for semantic parsing seek to assess whether models can accurately compute meanings for novel sentences, but operationalize this in terms of logical form (LF) prediction. This raises the concern that semantically irrelevant details of the chosen LFs could shape model performance. We argue that this concern is realized for the COGS benchmark. COGS poses generalization splits that appear impossible for present-day models, which could be taken as an indictment of those models. However, we show that the negative results trace to incidental features of COGS LFs. Converting these LFs to semantically equivalent ones and factoring out capabilities unrelated to semantic interpretation, we find that even baseline models get traction. A recent variable-free translation of COGS LFs suggests similar conclusions, but we observe this format is not semantically equivalent; it is incapable of accurately representing some COGS meanings. These findings inform our proposal for ReCOGS, a modified version of COGS that comes closer to assessing the target semantic capabilities while remaining very challenging. Overall, our results reaffirm the importance of compositional generalization and careful benchmark task design.
- Ekin Akyurek and Jacob Andreas. 2021. Lexicon learning for few shot sequence modeling. In Association for Computational Linguistics (ACL).
- Abstract Meaning Representation for sembanking. In Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse.
- Systematic generalization with edge transformers. In Advances in Neural Information Processing Systems (NeurIPS).
- Shu Cai and Kevin Knight. 2013. Smatch: an evaluation metric for semantic feature structures. In Association for Computational Linguistics (ACL).
- Meta-learning to compositionally generalize. In Association for Computational Linguistics (ACL).
- The devil is in the detail: Simple tricks improve systematic generalization of transformers. In Empirical Methods in Natural Language Processing (EMNLP).
- Combinatory Logic. North-Holland Amsterdam.
- Compositional semantic parsing with large language models. In International Conference on Learning Representations (ICLR).
- Posing fair generalization tasks for natural language inference. In Empirical Methods in Natural Language Processing (EMNLP).
- Irene Heim. 1982. The Semantics of Definite and Indefinite Noun Phrases. Ph.D. thesis, University of Massachusetts.
- Irene Heim. 1983. File change semantics and the familiarity theory of definiteness. In Meaning, Use, and Interpretation of Language. De Gruyter.
- Jonathan Herzig and Jonathan Berant. 2021. Span-based semantic parsing for compositional generalization. In Association for Computational Linguistics (ACL).
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation.
- Theo M. Janssen and Barbara H. Partee. 1997. Compositionality. In Handbook of logic and language. Elsevier.
- Hans Kamp. 1981. A theory of truth and semantic representation. In Formal Methods in the Study of Language. Mathematical Centre, Amsterdam.
- Measuring compositional generalization: A comprehensive method on realistic data. In International Conference on Learning Representations (ICLR).
- Najoung Kim and Tal Linzen. 2020. COGS: A compositional generalization challenge based on semantic interpretation. In Empirical Methods in Natural Language Processing (EMNLP).
- Uncontrolled lexical exposure leads to overestimation of compositional generalization in pretrained models. arXiv preprint arXiv:2212.10769.
- Brenden Lake and Marco Baroni. 2018. Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks. In International Conference on Machine Learning (ICML).
- BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Association for Computational Linguistics (ACL).
- James D. McCawley. 1968. Lexical insertion in a transformational grammar without deep structure. In Papers from the Fourth Meeting of the Chicago Linguistic Society.
- Richard Montague and Richmond H. Thomason. 1974. Formal philosophy: selected papers of Richard Montague. Erkenntnis.
- Making transformers solve compositional tasks. In Association for Computational Linguistics (ACL).
- Improving compositional generalization in semantic parsing. In Findings of Empirical Methods in Natural Language Processing (EMNLP).
- Terence Parsons. 1990. Events in the semantics of english: A study in subatomic semantics. MIT press Cambridge.
- Barbara H. Partee. 1984. Compositionality. In Varieties of Formal Semantics. Wiley-Blackwell.
- Revisiting the compositional generalization abilities of neural sequence models. In Association for Computational Linguistics (ACL).
- Improving compositional generalization with latent structure and data augmentation. In North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT).
- Exploring the limits of transfer learning with a unified text-to-text transformer. In The Journal of Machine Learning Research (JMLR).
- Universal semantic parsing. In Empirical Methods in Natural Language Processing (EMNLP).
- A benchmark for systematic generalization in grounded language understanding. Advances in Neural Information Processing Systems (NeurIPS).
- When can transformers ground and compose: Insights from compositional generalization benchmarks. In Empirical Methods in Natural Language Processing (EMNLP).
- Language model acceptability judgements are not always robust to context. In Association for Computational Linguistics (ACL).
- Attention is all you need. In Advances in Neural Information Processing Systems (NeurIPS).
- Hierarchical phrase-based sequence-to-sequence learning. In Empirical Methods in Natural Language Processing (EMNLP).
- Compositional generalization with a broad-coverage semantic parser. In Proceedings of the 11th Joint Conference on Lexical and Computational Semantics.
- ReaSCAN: Compositional reasoning in language grounding. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track.
- Yuekun Yao and Alexander Koller. 2022. Structural generalization is hard for sequence-to-sequence models. In Empirical Methods in Natural Language Processing (EMNLP).
- Hao Zheng and Mirella Lapata. 2021. Compositional generalization via semantic tagging. In Findings of Empirical Methods in Natural Language Processing (EMNLP).
- Hao Zheng and Mirella Lapata. 2022. Disentangled sequence to sequence learning for compositional generalization. In Association for Computational Linguistics (ACL).