Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Synthetic Dataset for Evaluating Complex Compositional Knowledge for Natural Language Inference (2307.05034v3)

Published 11 Jul 2023 in cs.CL

Abstract: We introduce a synthetic dataset called Sentences Involving Complex Compositional Knowledge (SICCK) and a novel analysis that investigates the performance of Natural Language Inference (NLI) models to understand compositionality in logic. We produce 1,304 sentence pairs by modifying 15 examples from the SICK dataset (Marelli et al., 2014). To this end, we modify the original texts using a set of phrases - modifiers that correspond to universal quantifiers, existential quantifiers, negation, and other concept modifiers in Natural Logic (NL) (MacCartney, 2009). We use these phrases to modify the subject, verb, and object parts of the premise and hypothesis. Lastly, we annotate these modified texts with the corresponding entailment labels following NL rules. We conduct a preliminary verification of how well the change in the structural and semantic composition is captured by neural NLI models, in both zero-shot and fine-tuned scenarios. We found that the performance of NLI models under the zero-shot setting is poor, especially for modified sentences with negation and existential quantifiers. After fine-tuning this dataset, we observe that models continue to perform poorly over negation, existential and universal modifiers.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. Lasha Abzianidze and Johan Bos. 2017. Towards universal semantic tagging. arXiv preprint arXiv:1709.10381.
  2. A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 632–642, Lisbon, Portugal. Association for Computational Linguistics.
  3. The snli corpus.
  4. Recursive neural networks can learn logical semantics.
  5. Using the framework. the fracas consortium. Technical report, Technical report, FraCaS deliverable D-16.
  6. Evaluating compositionality in sentence embeddings. arXiv preprint arXiv:1802.04302.
  7. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
  8. Maayan Geffet and Ido Dagan. 2005. The distributional inclusion hypotheses and lexical entailment. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05), pages 107–114.
  9. Logical inferences with comparatives and generalized quantifiers.
  10. An analysis of negation in natural language understanding corpora. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 716–723, Dublin, Ireland. Association for Computational Linguistics.
  11. An analysis of natural language inference benchmarks through the lens of negation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 9106–9118, Online. Association for Computational Linguistics.
  12. Monalog: a lightweight system for natural language inference based on monotonicity. arXiv preprint arXiv:1910.08772.
  13. Does data augmentation improve generalization in nlp? arXiv preprint arXiv:2004.15012.
  14. Taxinli: Taking a ride up the nlu hill. arXiv preprint arXiv:2009.14505.
  15. Albert: A lite bert for self-supervised learning of language representations.
  16. Bill MacCartney. 2009. Natural language inference. Stanford University.
  17. A SICK cure for the evaluation of compositional distributional semantic models. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), pages 216–223, Reykjavik, Iceland. European Language Resources Association (ELRA).
  18. George A Miller. 1995. Wordnet: a lexical database for english. Communications of the ACM, 38(11):39–41.
  19. Adversarial nli: A new benchmark for natural language understanding. arXiv preprint arXiv:1910.14599.
  20. Adversarial NLI: A new benchmark for natural language understanding. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics.
  21. Thumbs up? sentiment classification using machine learning techniques. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002), pages 79–86. Association for Computational Linguistics.
  22. A decomposable attention model for natural language inference. ArXiv, abs/1606.01933.
  23. Equate: A benchmark evaluation framework for quantitative reasoning in natural language inference.
  24. Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks.
  25. Probing natural language inference models through semantic fragments. Proceedings of the AAAI Conference on Artificial Intelligence, 34(05):8713–8721.
  26. Decomposing natural logic inferences in neural nli. arXiv preprint arXiv:2112.08289.
  27. Mark Steedman and Jason Baldridge. 2011. Combinatory categorial grammar. Non-Transformational Syntax: Formal and Explicit Models of Grammar. Wiley-Blackwell, pages 181–224.
  28. Parsing as tagging. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 5225–5231, Marseille, France. European Language Resources Association.
  29. Info{bert}: Improving robustness of language models from an information theoretic perspective. In International Conference on Learning Representations.
  30. Entailment as few-shot learner. arXiv preprint arXiv:2104.14690.
  31. A broad-coverage challenge corpus for sentence understanding through inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1112–1122. Association for Computational Linguistics.
  32. Can neural networks understand monotonicity reasoning? In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 31–40.
  33. Help: A dataset for identifying shortcomings of neural models in monotonicity reasoning. arXiv preprint arXiv:1904.12166.
  34. Xlnet: Generalized autoregressive pretraining for language understanding.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Sushma Anand Akoju (1 paper)
  2. Robert Vacareanu (12 papers)
  3. Haris Riaz (5 papers)
  4. Eduardo Blanco (26 papers)
  5. Mihai Surdeanu (53 papers)
Citations (1)
X Twitter Logo Streamline Icon: https://streamlinehq.com