NovaCOMET: Open Commonsense Foundation Models with Symbolic Knowledge Distillation
Abstract: We present NovaCOMET, an open commonsense knowledge model, that combines the best aspects of knowledge and general task models. Compared to previous knowledge models, NovaCOMET allows open-format relations enabling direct application to reasoning tasks; compared to general task models like Flan-T5, it explicitly centers knowledge, enabling superior performance for commonsense reasoning. NovaCOMET leverages the knowledge of opaque proprietary models to create an open knowledge pipeline. First, knowledge is symbolically distilled into NovATOMIC, a publicly-released discrete knowledge graph which can be audited, critiqued, and filtered. Next, we train NovaCOMET on NovATOMIC by fine-tuning an open-source pretrained model. NovaCOMET uses an open-format training objective, replacing the fixed relation sets of past knowledge models, enabling arbitrary structures within the data to serve as inputs or outputs. The resulting generation model, optionally augmented with human annotation, matches or exceeds comparable open task models like Flan-T5 on a range of commonsense generation tasks. NovaCOMET serves as a counterexample to the contemporary focus on instruction tuning only, demonstrating a distinct advantage to explicitly modeling commonsense knowledge as well.
- I2D2: Inductive knowledge distillation with NeuroLogic and self-imitation. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, ACL.
- Abductive commonsense reasoning. ICLR.
- Chatgpt is a knowledgeable but inexperienced solver: An investigation of commonsense problem in large language models. ArXiv, abs/2303.16421.
- PIQA: Reasoning about physical commonsense in natural language. In AAAI Conference on Artificial Intelligence.
- Comet: Commonsense transformers for automatic knowledge graph construction. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4762–4779.
- Language models are few-shot learners.
- Sparks of artificial general intelligence: Early experiments with gpt-4.
- N-gram counts and language models from the common crawl. In LREC, volume 2, page 4. Citeseer.
- Codah: An adversarially-authored question answering dataset for common sense. In Proceedings of the 3rd Workshop on Evaluating Vector Space Representations for NLP, pages 63–69.
- Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality.
- Palm: Scaling language modeling with pathways. ArXiv, abs/2204.02311.
- Scaling instruction-finetuned language models. ArXiv, abs/2210.11416.
- Commonsense knowledge mining from pretrained models. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, pages 1173–1178. Association for Computational Linguistics.
- Qlora: Efficient finetuning of quantized llms. arXiv preprint arXiv:2305.14314.
- Time-aware language models as temporal knowledge bases. Trans. Assoc. Comput. Linguistics, 10:257–273.
- Joseph L Fleiss. 1971. Measuring nominal scale agreement among many raters. Psychological bulletin, 76(5):378.
- Koala: A dialogue model for academic research. Blog post.
- Neurocomparatives: Neuro-symbolic distillation of comparative knowledge. ArXiv, abs/2305.04978.
- Comet-atomic 2020: On symbolic and neural commonsense knowledge graphs. In AAAI Conference on Artificial Intelligence.
- (comet-) atomic 2020: On symbolic and neural commonsense knowledge graphs. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 6384–6392.
- Impossible distillation: from low-quality model to high-quality dataset & model for summarization and paraphrasing. ArXiv, abs/2305.16635.
- SODA: Million-scale dialogue distillation with social commonsense contextualization. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP.
- TellMeWhy: A dataset for answering why-questions in narratives. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 596–610, Online. Association for Computational Linguistics.
- J Richard Landis and Gary G Koch. 1977. The measurement of observer agreement for categorical data. biometrics, pages 159–174.
- Dbpedia - a large-scale, multilingual knowledge base extracted from wikipedia. Semantic Web, 6:167–195.
- Guided generation of cause and effect. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20.
- RiddleSense: Reasoning about riddle questions featuring linguistic creativity and commonsense knowledge. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 1504–1515, Online. Association for Computational Linguistics.
- WANLI: Worker and ai collaboration for natural language inference dataset creation. In Conference on Empirical Methods in Natural Language Processing.
- Generated knowledge prompting for commonsense reasoning. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, pages 3154–3169. Association for Computational Linguistics.
- Vera: A general-purpose plausibility estimation model for commonsense statements. ArXiv, abs/2305.03695.
- QUARK: Controllable text generation with reinforced unlearning. In Advances in Neural Information Processing Systems.
- A corpus and cloze evaluation for deeper understanding of commonsense stories. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 839–849, San Diego, California. Association for Computational Linguistics.
- Scientific language models for biomedical knowledge base completion: An empirical study. In 3rd Conference on Automated Knowledge Base Construction, AKBC 2021, Virtual, October 4-8, 2021.
- OpenAI. 2023. Gpt-4 technical report. ArXiv, abs/2303.08774.
- Training language models to follow instructions with human feedback. ArXiv, abs/2203.02155.
- Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics, pages 311–318. Association for Computational Linguistics.
- Language models as knowledge bases? CoRR, abs/1909.01066.
- Raul Puri and Bryan Catanzaro. 2019. Zero-shot text classification with generative language models. CoRR, abs/1912.10165.
- Exploring the limits of transfer learning with a unified text-to-text transformer. ArXiv, abs/1910.10683.
- Scaling up models and data with t5x and seqio. arXiv preprint arXiv:2203.17189.
- Winogrande: An adversarial winograd schema challenge at scale. In AAAI Conference on Artificial Intelligence.
- Multitask prompted training enables zero-shot task generalization. ArXiv, abs/2110.08207.
- NLPositionality: Characterizing design biases of datasets and models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, ACL.
- Social IQA: Commonsense reasoning about social interactions. In Conference on Empirical Methods in Natural Language Processing.
- Referee: Reference-free sentence summarization with sharper controllability through symbolic knowledge distillation. arXiv preprint arXiv:2210.13800.
- Autoprompt: Eliciting knowledge from language models with automatically generated prompts. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020, pages 4222–4235. Association for Computational Linguistics.
- Oyvind Tafjord and Peter Clark. 2021. General-purpose question-answering with macaw. ArXiv, abs/2109.02593.
- CommonsenseQA: A question answering challenge targeting commonsense knowledge. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4149–4158, Minneapolis, Minnesota. Association for Computational Linguistics.
- Llama: Open and efficient foundation language models. ArXiv, abs/2302.13971.
- Trieu H. Trinh and Quoc V. Le. 2018. A simple method for commonsense reasoning. CoRR, abs/1806.02847.
- Symbolic knowledge distillation: from general language models to commonsense models. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4602–4625, Seattle, United States. Association for Computational Linguistics.
- Symbolic Knowledge Distillation: from general language models to commonsense models. In North American Chapter of the Association for Computational Linguistics.
- Baize: An open-source chat model with parameter-efficient tuning on self-chat data. ArXiv, abs/2304.01196.
- HellaSwag: Can a machine really finish your sentence? In Annual Meeting of the Association for Computational Linguistics.
- TransOMCS: From linguistic graphs to commonsense knowledge. In IJCAI.
- Bertscore: Evaluating text generation with bert. ArXiv, abs/1904.09675.
- Reflect, not reflex: Inference-based common ground improves dialogue response quality. In Conference on Empirical Methods in Natural Language Processing.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.