Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CANDLE: Iterative Conceptualization and Instantiation Distillation from Large Language Models for Commonsense Reasoning (2401.07286v2)

Published 14 Jan 2024 in cs.CL

Abstract: The sequential process of conceptualization and instantiation is essential to generalizable commonsense reasoning as it allows the application of existing knowledge to unfamiliar scenarios. However, existing works tend to undervalue the step of instantiation and heavily rely on pre-built concept taxonomies and human annotations to collect both types of knowledge, resulting in a lack of instantiated knowledge to complete reasoning, high cost, and limited scalability. To tackle these challenges, we introduce CANDLE, a distillation framework that iteratively performs contextualized conceptualization and instantiation over commonsense knowledge bases by instructing LLMs to generate both types of knowledge with critic filtering. By applying CANDLE to ATOMIC, we construct a comprehensive knowledge base comprising six million conceptualizations and instantiated commonsense knowledge triples. Both types of knowledge are firmly rooted in the original ATOMIC dataset, and intrinsic evaluations demonstrate their exceptional quality and diversity. Empirical results indicate that distilling CANDLE on student models provides benefits across four downstream tasks. Our code, data, and models are publicly available at https://github.com/HKUST-KnowComp/CANDLE.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (88)
  1. Penguins don’t fly: Reasoning about generics through instantiations and exceptions. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2023, Dubrovnik, Croatia, May 2-6, 2023, pages 2610–2627. Association for Computational Linguistics.
  2. Deepspeed- inference: Enabling efficient inference of transformer models at unprecedented scale. In SC22: International Conference for High Performance Computing, Networking, Storage and Analysis, Dallas, TX, USA, November 13-18, 2022, pages 46:1–46:15. IEEE.
  3. Instantiation of general terms. Journal of Verbal Learning and Verbal Behavior, 15(6):667–679.
  4. Complex query answering on eventuality knowledge graph with implicit logical constraints. In Thirty-seventh Conference on Neural Information Processing Systems.
  5. Mahzarin R Banaji and Robert G Crowder. 1989. The bankruptcy of everyday memory. American Psychologist, 44(9):1185.
  6. Pratyay Banerjee and Chitta Baral. 2020. Self-supervised knowledge triplet learning for zero-shot question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020, pages 151–162. Association for Computational Linguistics.
  7. Abductive commonsense reasoning. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net.
  8. I2D2: inductive knowledge distillation with neurologic and self-imitation. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, pages 9614–9630. Association for Computational Linguistics.
  9. PIQA: reasoning about physical commonsense in natural language. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020, pages 7432–7439. AAAI Press.
  10. Dynamic neuro-symbolic knowledge graph construction for zero-shot commonsense question answering. In Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021, pages 4923–4931. AAAI Press.
  11. COMET: commonsense transformers for automatic knowledge graph construction. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, pages 4762–4779. Association for Computational Linguistics.
  12. Language models are few-shot learners. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual.
  13. Dialogue chain-of-thought distillation for commonsense-aware conversational agents. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10, 2023, pages 5606–5632. Association for Computational Linguistics.
  14. Chatgpt evaluation on sentence level relations: A focus on temporal, causal, and discourse relations. CoRR, abs/2304.14827.
  15. What are you trying to do? semantic typing of event processes. In Proceedings of the 24th Conference on Computational Natural Language Learning, CoNLL 2020, Online, November 19-20, 2020, pages 531–542. Association for Computational Linguistics.
  16. Storyanalogy: Deriving story-level analogies from large language models to unlock analogical understanding. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10, 2023, pages 11518–11537. Association for Computational Linguistics.
  17. Ernest Davis. 2014. Representations of commonsense knowledge. Morgan Kaufmann.
  18. Deriving generalized knowledge from corpora using wordnet abstraction. In EACL 2009, 12th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference, Athens, Greece, March 30 - April 3, 2009, pages 808–816. The Association for Computer Linguistics.
  19. CKBP v2: An expert-annotated evaluation set for commonsense knowledge base population. CoRR, abs/2304.10392.
  20. Pseudoreasoner: Leveraging pseudo labels for commonsense knowledge base population. In Findings of the Association for Computational Linguistics: EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, pages 3379–3394. Association for Computational Linguistics.
  21. Benchmarking commonsense knowledge base population with an effective evaluation dataset. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, pages 8949–8964. Association for Computational Linguistics.
  22. DISCOS: bridging the gap between discourse knowledge and commonsense knowledge. In WWW ’21: The Web Conference 2021, Virtual Event / Ljubljana, Slovenia, April 19-23, 2021, pages 2648–2659. ACM / IW3C2.
  23. Joseph L Fleiss. 1971. Measuring nominal scale agreement among many raters. Psychological bulletin, 76(5):378.
  24. Representing verbs as argument concepts. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, February 12-17, 2016, Phoenix, Arizona, USA, pages 2615–2621. AAAI Press.
  25. Multi-hop commonsense knowledge injection framework for zero-shot commonsense question answering. CoRR, abs/2305.05936.
  26. Acquiring and modelling abstract commonsense knowledge via conceptualization. CoRR, abs/2206.01532.
  27. On the role of conceptualization in commonsense knowledge graph construction. CoRR, abs/2003.03239.
  28. DeBERTav3: Improving deBERTa using ELECTRA-style pre-training with gradient-disentangled embedding sharing. In The Eleventh International Conference on Learning Representations.
  29. Metric-guided distillation: Distilling knowledge from the metric to ranker and retriever for generative commonsense reasoning. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, pages 839–852. Association for Computational Linguistics.
  30. ASER: towards large-scale commonsense knowledge acquisition via higher-order selectional preference over eventualities. Artif. Intell., 309:103740.
  31. Lora: Low-rank adaptation of large language models. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net.
  32. (comet-) atomic 2020: On symbolic and neural commonsense knowledge graphs. In Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021, pages 6384–6392. AAAI Press.
  33. Mistral 7b. CoRR, abs/2310.06825.
  34. SODA: million-scale dialogue distillation with social commonsense contextualization. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10, 2023, pages 12930–12949. Association for Computational Linguistics.
  35. Modularized transfer learning with multiple knowledge graphs for zero-shot commonsense reasoning. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2022, Seattle, WA, United States, July 10-15, 2022, pages 2244–2257. Association for Computational Linguistics.
  36. Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings.
  37. J Richard Landis and Gary G Koch. 1977. An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. Biometrics, pages 363–374.
  38. Alon Lavie and Abhaya Agarwal. 2007. METEOR: an automatic metric for MT evaluation with high levels of correlation with human judgments. In Proceedings of the Second Workshop on Statistical Machine Translation, WMT@ACL 2007, Prague, Czech Republic, June 23, 2007, pages 228–231. Association for Computational Linguistics.
  39. Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.
  40. Vera: A general-purpose plausibility estimation model for commonsense statements. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 1264–1287, Singapore. Association for Computational Linguistics.
  41. Vocsk: Verb-oriented commonsense knowledge mining with taxonomy-guided induction. Artif. Intell., 310:103744.
  42. Roberta: A robustly optimized BERT pretraining approach. CoRR, abs/1907.11692.
  43. Ilya Loshchilov and Frank Hutter. 2019. Decoupled weight decay regularization. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net.
  44. Knowledge-driven data construction for zero-shot evaluation in commonsense question answering. In Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021, pages 13507–13515. AAAI Press.
  45. George A. Miller. 1995. Wordnet: A lexical database for english. Commun. ACM, 38(11):39–41.
  46. Eduardo F Mortimer. 1995. Conceptual change or conceptual profile change? Science & Education, 4:267–285.
  47. Erik T Mueller. 2014. Commonsense reasoning: an event calculus based approach. Morgan Kaufmann.
  48. Gregory Murphy. 2004. The big book of concepts. MIT press.
  49. Extracting cultural commonsense knowledge at scale. In Proceedings of the ACM Web Conference 2023, WWW 2023, Austin, TX, USA, 30 April 2023 - 4 May 2023, pages 1907–1917. ACM.
  50. OpenAI. 2022. Chatgpt: Optimizing language models for dialogue. OpenAI.
  51. OpenAI. 2023. GPT-4 technical report. CoRR, abs/2303.08774.
  52. Pragmatic inference with a CLIP listener for contrastive captioning. In Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada, July 9-14, 2023, pages 1904–1917. Association for Computational Linguistics.
  53. Hierarchical event grounding. In Thirty-Seventh AAAI Conference on Artificial Intelligence, AAAI 2023, Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence, IAAI 2023, Thirteenth Symposium on Educational Advances in Artificial Intelligence, EAAI 2023, Washington, DC, USA, February 7-14, 2023, pages 13437–13445. AAAI Press.
  54. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, July 6-12, 2002, Philadelphia, PA, USA, pages 311–318. ACL.
  55. COPEN: probing conceptual knowledge in pre-trained language models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, pages 5015–5035. Association for Computational Linguistics.
  56. Inferring the reader: Guiding automated story generation with commonsense reasoning. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 7008–7029, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  57. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
  58. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21:140:1–140:67.
  59. Joshua Robinson and David Wingate. 2023. Leveraging large language models for multiple choice question answering. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net.
  60. Winogrande: an adversarial winograd schema challenge at scale. Commun. ACM, 64(9):99–106.
  61. ATOMIC: an atlas of machine commonsense for if-then reasoning. In The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019, pages 3027–3035. AAAI Press.
  62. Social iqa: Commonsense reasoning about social interactions. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, pages 4462–4472. Association for Computational Linguistics.
  63. Referee: Reference-free sentence summarization with sharper controllability through symbolic knowledge distillation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, pages 9649–9668. Association for Computational Linguistics.
  64. QADYNAMICS: training dynamics-driven synthetic QA diagnostic for zero-shot commonsense question answering. In Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, December 6-10, 2023, pages 15329–15341. Association for Computational Linguistics.
  65. Unsupervised commonsense question answering with self-talk. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020, pages 4615–4629. Association for Computational Linguistics.
  66. Short text conceptualization using a probabilistic knowledgebase. In IJCAI 2011, Proceedings of the 22nd International Joint Conference on Artificial Intelligence, Barcelona, Catalonia, Spain, July 16-22, 2011, pages 2330–2336. IJCAI/AAAI.
  67. Open domain short text conceptualization: A generative + descriptive modeling approach. In Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, IJCAI 2015, Buenos Aires, Argentina, July 25-31, 2015, pages 3820–3826. AAAI Press.
  68. Conceptnet 5.5: An open multilingual graph of general knowledge. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, USA, pages 4444–4451. AAAI Press.
  69. MICO: A multi-alternative contrastive learning framework for commonsense knowledge representation. In Findings of the Association for Computational Linguistics: EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, pages 1339–1351. Association for Computational Linguistics.
  70. Commonsenseqa: A question answering challenge targeting commonsense knowledge. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pages 4149–4158. Association for Computational Linguistics.
  71. How to grow a mind: Statistics, structure, and abstraction. science, 331(6022):1279–1285.
  72. Llama 2: Open foundation and fine-tuned chat models. CoRR, abs/2307.09288.
  73. MISC: A mixed strategy-aware model integrating COMET for emotional support conversation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 308–319, Dublin, Ireland. Association for Computational Linguistics.
  74. Cider: Consensus-based image description evaluation. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015, pages 4566–4575. IEEE Computer Society.
  75. CAR: Conceptualization-augmented reasoner for zero-shot commonsense question answering. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 13520–13545, Singapore. Association for Computational Linguistics.
  76. CAT: A contextualized conceptualization and instantiation framework for commonsense reasoning. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, pages 13111–13140. Association for Computational Linguistics.
  77. Self-consistency improves chain of thought reasoning in language models. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net.
  78. Abspyramid: Benchmarking the abstraction ability of language models with a unified entailment graph. CoRR, abs/2311.09174.
  79. Chain-of-thought prompting elicits reasoning in large language models. In NeurIPS.
  80. Larger language models do in-context learning differently. CoRR, abs/2303.03846.
  81. Symbolic knowledge distillation: from general language models to commonsense models. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2022, Seattle, WA, United States, July 10-15, 2022, pages 4602–4625. Association for Computational Linguistics.
  82. Novacomet: Open commonsense foundation models with symbolic knowledge distillation. In Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, December 6-10, 2023, pages 1127–1149. Association for Computational Linguistics.
  83. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, EMNLP 2020 - Demos, Online, November 16-20, 2020, pages 38–45. Association for Computational Linguistics.
  84. Probase: a probabilistic taxonomy for text understanding. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2012, Scottsdale, AZ, USA, May 20-24, 2012, pages 481–492. ACM.
  85. Folkscope: Intention knowledge graph construction for e-commerce commonsense discovery. In Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada, July 9-14, 2023, pages 1173–1191. Association for Computational Linguistics.
  86. Distilling script knowledge from large language models for constrained language planning. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, pages 4303–4325. Association for Computational Linguistics.
  87. Bertscore: Evaluating text generation with BERT. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net.
  88. Texygen: A benchmarking platform for text generation models. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR 2018, Ann Arbor, MI, USA, July 08-12, 2018, pages 1097–1100. ACM.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (12)
  1. Weiqi Wang (58 papers)
  2. Tianqing Fang (43 papers)
  3. Chunyang Li (19 papers)
  4. Haochen Shi (34 papers)
  5. Wenxuan Ding (14 papers)
  6. Baixuan Xu (13 papers)
  7. Zhaowei Wang (36 papers)
  8. Jiaxin Bai (30 papers)
  9. Xin Liu (820 papers)
  10. Jiayang Cheng (12 papers)
  11. Chunkit Chan (19 papers)
  12. Yangqiu Song (196 papers)
Citations (24)