Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DiffuCOMET: Contextual Commonsense Knowledge Diffusion (2402.17011v2)

Published 26 Feb 2024 in cs.CL

Abstract: Inferring contextually-relevant and diverse commonsense to understand narratives remains challenging for knowledge models. In this work, we develop a series of knowledge models, DiffuCOMET, that leverage diffusion to learn to reconstruct the implicit semantic connections between narrative contexts and relevant commonsense knowledge. Across multiple diffusion steps, our method progressively refines a representation of commonsense facts that is anchored to a narrative, producing contextually-relevant and diverse commonsense inferences for an input context. To evaluate DiffuCOMET, we introduce new metrics for commonsense inference that more closely measure knowledge diversity and contextual relevance. Our results on two different benchmarks, ComFact and WebNLG+, show that knowledge generated by DiffuCOMET achieves a better trade-off between commonsense diversity, contextual relevance and alignment to known gold references, compared to baseline knowledge models.

Contextual Commonsense Knowledge Diffusion with \diffucomet{}

Introduction to Contextual Commonsense Knowledge Generation

The challenge of generating contextually relevant and diverse commonsense knowledge in natural language understanding and generation has seen significant advancements with the advent of knowledge models. However, these models often fall short in generating diverse inferences and ensuring alignment of generated inferences with their corresponding narrative contexts. Addressing these shortcomings, this work introduces a series of knowledge models, named \diffucomet{}, leveraging diffusion techniques to enhance the generation of contextually relevant commonsense knowledge.

Diffusion Models for Knowledge Generation

Diffusion models learn to generate data by refining a latent representation over multiple steps, progressively denoising a sample from a random noise distribution to the target data distribution. The \diffucomet{} models utilize diffusion-based decoding, refining commonsense knowledge embeddings constrained to the narrative context. This process not only ensures the generation of contextually relevant knowledge but also aids in generating a diverse set of inferences by reconstructing implicit semantic connections unique to each narrative. The models are developed in two variants targeting fact-level and entity-level knowledge generation, with entity-level knowledge generation showing slightly superior performance in terms of generating novel and relevant inferences.

Evaluation Metrics for Commonsense Inference

To assess the effectiveness of \diffucomet{} models, novel metrics that capture the diversity and contextual relevance of generated knowledge were introduced. These clustering-based metrics provide a nuanced evaluation by considering generated knowledge in clusters of similar facts, calculated based on either word-level edit distance or embedding Euclidean distance. This approach allows for a more accurate measurement of the diversity in generated inferences and their relevance to given contexts, addressing the limitations of traditional NLG metrics like BLEU and ROUGE-L.

Findings and Implications

The \diffucomet{} models demonstrated the ability to generate knowledge that is both contextually relevant and diverse across multiple benchmarks. The models outperformed baseline knowledge models in generating contextually aligned commonsense knowledge, especially in terms of producing knowledge inferences that are novel and not present in the initial training set. The models also displayed robust generalization capabilities, effectively generating knowledge for out-of-distribution narrative contexts. These findings underscore the potential of diffusion models in generating contextually relevant and diverse commonsense knowledge, opening up new avenues for research in natural language understanding and generation.

Future Directions

While \diffucomet{} models signify a significant step forward, exploring their applications in longer narrative contexts and different linguistic settings presents an exciting avenue for future work. It also remains to be seen how these methodologies can be adapted to other knowledge generation tasks beyond commonsense inference, potentially broadening the scope and applicability of diffusion models in AI-driven natural language processing.

In summary, the \diffucomet{} models represent a promising advancement in the generation of contextually relevant and diverse commonsense knowledge, offering insights and methodologies that could refine future research in the field of natural language processing and artificial intelligence.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Learning latent personas of film characters. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 352–361.
  2. Satanjeev Banerjee and Alon Lavie. 2005. Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, pages 65–72.
  3. Dynamic neuro-symbolic knowledge graph construction for zero-shot commonsense question answering. In Proceedings of the AAAI conference on Artificial Intelligence, volume 35, pages 4923–4931.
  4. Comet: Commonsense transformers for automatic knowledge graph construction. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4762–4779.
  5. Analog bits: Generating discrete data using diffusion models with self-conditioning. arXiv preprint arXiv:2208.04202.
  6. Mutual: A dataset for multi-turn dialogue reasoning. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1406–1416.
  7. Analyzing commonsense emergence in few-shot knowledge models. arXiv preprint arXiv:2101.00297.
  8. A density-based algorithm for discovering clusters in large spatial databases with noise. In Knowledge discovery and aata mining, volume 96, pages 226–231.
  9. The 2020 bilingual, bi-directional webnlg+ shared task overview and evaluation results (webnlg+ 2020). In Proceedings of the 3rd International Workshop on Natural Language Generation from the Semantic Web (WebNLG+).
  10. Peacok: Persona commonsense knowledge for consistent and engaging narratives. arXiv preprint arXiv:2305.02364.
  11. Comfact: A benchmark for linking contextual commonsense knowledge. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 1656–1675.
  12. Difformer: Empowering diffusion model on embedding space for text generation. arXiv preprint arXiv:2212.09412.
  13. Story ending generation with incremental encoding and commonsense knowledge. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 6473–6480.
  14. Transfer learning for text diffusion models. arXiv preprint arXiv:2401.17181.
  15. Deberta: Decoding-enhanced bert with disentangled attention. In International Conference on Learning Representations.
  16. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851.
  17. The curious case of neural text degeneration. arXiv preprint arXiv:1904.09751.
  18. (comet-) atomic 2020: On symbolic and neural commonsense knowledge graphs. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 6384–6392.
  19. Mete Ismayilzada and Antoine Bosselut. 2023. kogito: A commonsense knowledge inference toolkit. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, pages 96–104.
  20. Language generation with multi-hop reasoning on commonsense knowledge graph. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 725–736.
  21. “i’m not mad”: Commonsense implications of negation and contradiction. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4380–4397.
  22. Dbpedia–a large-scale, multilingual knowledge base extracted from wikipedia. Semantic web, 6(2):167–195.
  23. Vladimir I Levenshtein et al. 1966. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady, volume 10, pages 707–710. Soviet Union.
  24. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7871–7880.
  25. Diffusion-lm improves controllable text generation. Advances in Neural Information Processing Systems, 35:4328–4343.
  26. Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74–81.
  27. Genie: Large scale pre-training for text generation with diffusion model. arXiv preprint arXiv:2212.11685.
  28. Ilya Loshchilov and Frank Hutter. 2018. Decoupled weight decay regularization. In International Conference on Learning Representations.
  29. Knowledge graph generation from text. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 1610–1622.
  30. A corpus and cloze evaluation for deeper understanding of commonsense stories. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 839–849.
  31. Lassila Ora. 1999. Resource description framework (rdf) model and syntax specification. http://www. w3. org/TR/REC-rdf-syntax/.
  32. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318.
  33. Inferring the reader: Guiding automated story generation with commonsense reasoning. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 7008–7029.
  34. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
  35. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21:5485–5551.
  36. Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992.
  37. Atomic: An atlas of machine commonsense for if-then reasoning. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 3027–3035.
  38. Semeval-2013 task 9: Extraction of drug-drug interactions from biomedical texts (ddiextraction 2013). In Second Joint Conference on Lexical and Computational Semantics (* SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), pages 341–350.
  39. Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, pages 2256–2265. PMLR.
  40. Yang Song and Stefano Ermon. 2019. Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems, 32.
  41. Conceptnet 5.5: An open multilingual graph of general knowledge. In Proceedings of the AAAI conference on artificial intelligence, volume 31.
  42. Misc: A mixed strategy-aware model integrating comet for emotional support conversation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 308–319.
  43. Attention is all you need. Advances in neural information processing systems, 30.
  44. Symbolic knowledge distillation: from general language models to commonsense models. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4602–4625.
  45. Deep bidirectional language-knowledge graph pretraining. Advances in Neural Information Processing Systems, 35:37309–37323.
  46. Qa-gnn: Reasoning with language models and knowledge graphs for question answering. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 535–546.
  47. Seqdiffuseq: Text diffusion with encoder-decoder transformers. arXiv preprint arXiv:2212.10325.
  48. Personalizing dialogue agents: I have a dog, do you have pets too? In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2204–2213.
  49. Greaselm: Graph reasoning enhanced language models for question answering. In Proceedings of the 10th International Conference for Learning Representations (ICLR).
  50. Think before you speak: Explicitly generating implicit commonsense knowledge for response generation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1237–1252.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Silin Gao (17 papers)
  2. Mete Ismayilzada (10 papers)
  3. Mengjie Zhao (35 papers)
  4. Hiromi Wakaki (16 papers)
  5. Yuki Mitsufuji (127 papers)
  6. Antoine Bosselut (85 papers)