Papers
Topics
Authors
Recent
2000 character limit reached

MORE: Multi-mOdal REtrieval Augmented Generative Commonsense Reasoning (2402.13625v2)

Published 21 Feb 2024 in cs.CL

Abstract: Since commonsense information has been recorded significantly less frequently than its existence, LLMs pre-trained by text generation have difficulty to learn sufficient commonsense knowledge. Several studies have leveraged text retrieval to augment the models' commonsense ability. Unlike text, images capture commonsense information inherently but little effort has been paid to effectively utilize them. In this work, we propose a novel Multi-mOdal REtrieval (MORE) augmentation framework, to leverage both text and images to enhance the commonsense ability of LLMs. Extensive experiments on the Common-Gen task have demonstrated the efficacy of MORE based on the pre-trained models of both single and multiple modalities.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. Gpt-4 technical report. arXiv preprint arXiv:2303.08774.
  2. Spice: Semantic propositional image caption evaluation. ArXiv, abs/1607.08822.
  3. Satanjeev Banerjee and Alon Lavie. 2005. Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In IEEvaluation@ACL.
  4. Optimizing retrieval-augmented reader models via token elimination. arXiv preprint arXiv:2310.13682.
  5. Paul Bloom. 2002. How children learn the meanings of words. MIT press.
  6. Improving language models by retrieving from trillions of tokens. In International conference on machine learning, pages 2206–2240. PMLR.
  7. A large annotated corpus for learning natural language inference. In Conference on Empirical Methods in Natural Language Processing.
  8. Retrieve, caption, generate: Visual grounding for enhancing commonsense in text generation models. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 10618–10626.
  9. Linda B. Gambrell and Ruby J. Bales. 1986. Mental imagery and the comprehension-monitoring performance of fourth- and fifth-grade poor readers. Reading Research Quarterly, 21:454.
  10. Jonathan Gordon and Benjamin Van Durme. 2013. Reporting bias and knowledge acquisition. In Conference on Automated Knowledge Base Construction.
  11. Herbert P Grice. 1975. Logic and conversation. In Speech acts, pages 41–58. Brill.
  12. Visually-augmented pretrained language models for nlp tasks without images. ArXiv, abs/2212.07937.
  13. Conceptnet 3: a flexible, multilingual semantic network for common sense knowledge. In Recent advances in natural language processing, pages 27–29. John Benjamins Philadelphia, PA.
  14. Metric-guided distillation: Distilling knowledge from the metric to ranker and retriever for generative commonsense reasoning. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 839–852.
  15. Lora: Low-rank adaptation of large language models. ArXiv, abs/2106.09685.
  16. Gautier Izacard and Édouard Grave. 2021. Leveraging passage retrieval with generative models for open domain question answering. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 874–880.
  17. Few-shot learning with retrieval augmented language models. arXiv preprint arXiv:2208.03299.
  18. Comprehension problems in children with specific language impairment: Does mental imagery training help? International Journal of Language & Communication Disorders, 42(6):648–664.
  19. Dense-captioning events in videos. In Proceedings of the IEEE international conference on computer vision, pages 706–715.
  20. The power of scale for parameter-efficient prompt tuning. In Conference on Empirical Methods in Natural Language Processing.
  21. Kfcnet: Knowledge filtering and contrastive learning for generative commonsense reasoning. In Conference on Empirical Methods in Natural Language Processing.
  22. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. In International Conference on Machine Learning.
  23. Commongen: A constrained text generation challenge for generative commonsense reasoning. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1823–1840.
  24. Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Annual Meeting of the Association for Computational Linguistics.
  25. Microsoft coco: Common objects in context. In European Conference on Computer Vision.
  26. Tcra-llm: Token compression retrieval augmented large language model for inference cost reduction. arXiv preprint arXiv:2310.15556.
  27. Gpt understands, too. ArXiv, abs/2103.10385.
  28. Kgr4: Retrieval, retrospect, refine and rethink for commonsense generation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 11029–11037.
  29. Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. In International Conference on Learning Representations.
  30. Discriminability objective for training descriptive captions. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6964–6974.
  31. When not to trust language models: Investigating effectiveness of parametric and non-parametric memories. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 9802–9822, Toronto, Canada. Association for Computational Linguistics.
  32. OpenAI. 2022. Introducing chatgpt. https://openai.com/blog/chatgpt.
  33. OpenAI. 2023a. Gpt-3.5 turbo fine-tuning and api updates. https://openai.com/blog/gpt-3-5-turbo-fine-tuning-and-api-updates.
  34. OpenAI. 2023b. Gpt-4. https://openai.com/research/gpt-4.
  35. Bleu: a method for automatic evaluation of machine translation. In Annual Meeting of the Association for Computational Linguistics.
  36. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR.
  37. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21:140:1–140:67.
  38. In-context retrieval-augmented language models. Transactions of the Association for Computational Linguistics, 11:1316–1331.
  39. The probabilistic relevance framework: Bm25 and beyond. Foundations and Trends® in Information Retrieval, 3(4):333–389.
  40. Replug: Retrieval-augmented black-box language models. arXiv preprint arXiv:2301.12652.
  41. Prompting gpt-3 to be reliable. In The Eleventh International Conference on Learning Representations.
  42. Learning to imagine: Visually-augmented natural language generation. ArXiv, abs/2305.16944.
  43. Llama: Open and efficient foundation language models. ArXiv, abs/2302.13971.
  44. Attention is all you need. Advances in neural information processing systems, 30.
  45. Cider: Consensus-based image description evaluation. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4566–4575.
  46. Contextualized scene imagination for generative commonsense reasoning. In International Conference on Learning Representations.
  47. Vatex: A large-scale, high-quality multilingual dataset for video-and-language research. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 4580–4590.
  48. A broad-coverage challenge corpus for sentence understanding through inference. In North American Chapter of the Association for Computational Linguistics.
  49. Recomp: Improving retrieval-augmented lms with compression and selective augmentation. arXiv preprint arXiv:2310.04408.
  50. Retrieval augmentation for commonsense reasoning: A unified approach. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 4364–4377.
  51. Opt: Open pre-trained transformer language models. ArXiv, abs/2205.01068.
  52. Visualize before you write: Imagination-guided open-ended text generation. arXiv preprint arXiv:2210.03765.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.