Multimodal Multi-Hop Question Answering Through a Conversation Between Tools and Efficiently Finetuned Large Language Models (2309.08922v1)

Published 16 Sep 2023 in cs.CL

Abstract: We employ a tool-interacting divide-and-conquer strategy enabling LLMs to answer complex multimodal multi-hop questions. In particular, we harness the power of LLMs to divide a given multimodal multi-hop question into unimodal single-hop sub-questions to be answered by the appropriate tool from a predefined set of tools. After all corresponding tools provide the LLM with their answers, the LLM generates the next relevant unimodal single-hop question. To increase the reasoning ability of LLMs, we prompt chatGPT to generate a tool-interacting divide-and-conquer dataset. This dataset is then used to efficiently finetune the corresponding LLM. To assess the effectiveness of this approach, we conduct an evaluation on two recently introduced complex question-answering datasets. The experimental analysis demonstrate substantial improvements over existing state-of-the-art solutions, indicating the efficacy and generality of our strategy

PDF Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

Authors (4)

Hossein Rajabzadeh (8 papers)
Suyuchen Wang (16 papers)
Hyock Ju Kwon (5 papers)
Bang Liu (93 papers)

Citations (2)

View on Semantic Scholar

Multimodal Multi-Hop Question Answering Through a Conversation Between Tools and Efficiently Finetuned Large Language Models (2309.08922v1)

Related Papers