Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MM-PhyRLHF: Reinforcement Learning Framework for Multimodal Physics Question-Answering (2404.12926v1)

Published 19 Apr 2024 in cs.AI

Abstract: Recent advancements in LLMs have shown their significant potential in tasks like text summarization and generation. Yet, they often encounter difficulty while solving complex physics problems that require arithmetic calculation and a good understanding of concepts. Moreover, many physics problems include images that contain important details required to understand the problem's context. We propose an LMM-based chatbot to answer multimodal physics MCQs. For domain adaptation, we utilize the MM-PhyQA dataset comprising Indian high school-level multimodal physics problems. To improve the LMM's performance, we experiment with two techniques, RLHF (Reinforcement Learning from Human Feedback) and Image Captioning. In image captioning, we add a detailed explanation of the diagram in each image, minimizing hallucinations and image processing errors. We further explore the integration of Reinforcement Learning from Human Feedback (RLHF) methodology inspired by the ranking approach in RLHF to enhance the human-like problem-solving abilities of the models. The RLHF approach incorporates human feedback into the learning process of LLMs, improving the model's problem-solving skills, truthfulness, and reasoning capabilities, minimizing the hallucinations in the answers, and improving the quality instead of using vanilla-supervised fine-tuned models. We employ the LLaVA open-source model to answer multimodal physics MCQs and compare the performance with and without using RLHF.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. Let the llms talk: Simulating human-to-human conversational qa via zero-shot llm-to-llm interactions. , 8–17 pages.
  2. Gpt-4 technical report.
  3. Revolutionizing High School Physics Education: A Novel Dataset. , 64–79 pages.
  4. SciPhyRAG-Retrieval Augmentation to Improve LLMs on Physics Q &A. , 50–63 pages.
  5. KG-CTG: Citation Generation Through Knowledge Graph-Guided Large Language Models. , 37–49 pages.
  6. Mathify: Evaluating Large Language Models on Mathematical Problem Solving Tasks.
  7. GEC-DCL: Grammatical Error Correction Model with Dynamic Context Learning for Paragraphs and Scholarly Papers. , 95–110 pages.
  8. MM-PhyQA: Multimodal Physics Question-Answering With Multi-Image CoT Prompting.
  9. Context-Enhanced Language Models for Generating Multi-paper Citations. , 80–94 pages.
  10. Qwen-vl: A versatile vision-language model for understanding, localization, text reading, and beyond.
  11. Investigating the potential of gpt-3 in providing feedback for programming assessments. , 292–298 pages.
  12. Ultrafeedback: Boosting language models with high-quality feedback.
  13. Palm-e: An embodied multimodal language model.
  14. Future applications of generative large language models: A data-driven case study on ChatGPT. Technovation 133 (2024), 103002.
  15. Advancements in Scientific Controllable Text Generation Methods.
  16. Lora: Low-rank adaptation of large language models.
  17. Decomposed Prompting to Answer Questions on a Course Discussion Board. , 218–223 pages.
  18. Understanding the effects of rlhf on llm generalisation and diversity.
  19. Rlaif: Scaling reinforcement learning from human feedback with ai feedback.
  20. Adapting Large Language Models for Education: Foundational Capabilities, Potentials, and Challenges.
  21. Visual instruction tuning.
  22. Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. Advances in Neural Information Processing Systems 35 (2022), 1950–1965.
  23. InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding.
  24. Elevating Learning Experiences: Leveraging Large Language Models as Student-Facing Assistants in Discussion Forums. , 1752–1753 pages.
  25. Embodiedgpt: Vision-language pre-training via embodied chain of thought.
  26. Webgpt: Browser-assisted question-answering with human feedback.
  27. Maciej Pankiewicz and Ryan S Baker. 2023. Large Language Models (GPT) for automating feedback on programming assignments.
  28. Kosmos-2: Grounding multimodal large language models to the world.
  29. The robots are here: Navigating the generative ai revolution in computing education. , 108–159 pages.
  30. Ben Puryear and Gina Sprint. 2022. Github copilot in the classroom: learning to code with AI assistance. Journal of Computing Sciences in Colleges 38, 1 (2022), 37–47.
  31. Direct preference optimization: Your language model is secretly a reward model.
  32. Image captioning for effective use of language models in knowledge-based visual question answering. Expert Systems with Applications 212 (2023), 118669.
  33. Bima Sapkota and Liza Bondurant. 2024. Assessing concepts, procedures, and cognitive demand of ChatGPT-generated mathematical tasks. International Journal of Technology in Education 7, 2 (2024), 218–238.
  34. Proximal policy optimization algorithms.
  35. Learning to summarize with human feedback. Advances in Neural Information Processing Systems 33 (2020), 3008–3021.
  36. Visionllm: Large language model is also an open-ended decoder for vision-centric tasks.
  37. Evaluating reading comprehension exercises generated by LLMs: A showcase of ChatGPT in education applications. , 610–625 pages.
  38. Olagpt: Empowering llms with human-like problem-solving abilities.
  39. MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
  40. Judging llm-as-a-judge with mt-bench and chatbot arena.
  41. Minigpt-4: Enhancing vision-language understanding with advanced large language models.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Avinash Anand (19 papers)
  2. Janak Kapuriya (5 papers)
  3. Chhavi Kirtani (6 papers)
  4. Apoorv Singh (14 papers)
  5. Jay Saraf (2 papers)
  6. Naman Lal (7 papers)
  7. Jatin Kumar (5 papers)
  8. Adarsh Raj Shivam (2 papers)
  9. Astha Verma (5 papers)
  10. Rajiv Ratn Shah (108 papers)
  11. Roger Zimmermann (76 papers)
Citations (7)