Exploring Hybrid Question Answering via Program-based Prompting (2402.10812v1)
Abstract: Question answering over heterogeneous data requires reasoning over diverse sources of data, which is challenging due to the large scale of information and organic coupling of heterogeneous data. Various approaches have been proposed to address these challenges. One approach involves training specialized retrievers to select relevant information, thereby reducing the input length. Another approach is to transform diverse modalities of data into a single modality, simplifying the task difficulty and enabling more straightforward processing. In this paper, we propose HProPro, a novel program-based prompting framework for the hybrid question answering task. HProPro follows the code generation and execution paradigm. In addition, HProPro integrates various functions to tackle the hybrid reasoning scenario. Specifically, HProPro contains function declaration and function implementation to perform hybrid information-seeking over data from various sources and modalities, which enables reasoning over such data without training specialized retrievers or performing modal transformations. Experimental results on two typical hybrid question answering benchmarks HybridQA and MultiModalQA demonstrate the effectiveness of HProPro: it surpasses all baseline systems and achieves the best performances in the few-shot settings on both datasets.
- Open question answering over tables and text. In International Conference on Learning Representations.
- Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks. arXiv preprint arXiv:2211.12588.
- Hybridqa: A dataset of multi-hop question answering over tabular and textual data. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1026–1036.
- Binding language models in symbolic languages. In The Eleventh International Conference on Learning Representations.
- Mate: Multi-view attention for table transformer efficiency. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 7606–7619.
- Multi-hop open-domain question answering over structured and unstructured knowledge. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 151–156.
- Pal: Program-aided language models. In International Conference on Machine Learning, pages 10764–10799. PMLR.
- Making the v in vqa matter: Elevating the role of image understanding in visual question answering. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6904–6913.
- Manymodalqa: Modality disambiguation and qa over diverse inputs. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 7879–7886.
- Multi-instance training for question answering across table and linked text. arXiv preprint arXiv:2112.07337.
- Mafid: Moving average equipped fusion-in-decoder for question answering over tabular and textual data. In Findings of the Association for Computational Linguistics: EACL 2023, pages 2337–2344.
- S33{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPTHQA: A three-stage approach for multi-hop text-table hybrid question answering. arXiv preprint arXiv:2305.11725.
- Tsqa: tabular scenario based question answering. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 13297–13305.
- Mmhqa-icl: Multimodal in-context learning for hybrid question answering over text, tables and images. arXiv preprint arXiv:2309.04790.
- Self-refine: Iterative refinement with self-feedback. arXiv preprint arXiv:2303.17651.
- Unsupervised multi-hop question answering by question generation. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5866–5880.
- Panupong Pasupat and Percy Liang. 2015. Compositional semantic parsing on semi-structured tables. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1470–1480.
- Squad: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2383–2392.
- Mumuqa: Multimedia multi-hop news question answering via cross-media knowledge extraction and grounding. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 11200–11208.
- Nils Reimers and Iryna Gurevych. 2020. Making monolingual sentence embeddings multilingual using knowledge distillation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics.
- Mimoqa: Multimodal input multimodal output question answering. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5317–5332.
- End-to-end multihop retrieval for compositional question answering over long documents. arXiv preprint arXiv:2106.00200.
- Multimodalqa: complex question answering over text, tables and images. In International Conference on Learning Representations.
- Muger2: Multi-granularity evidence retrieval and reasoning for hybrid question answering. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 6687–6697.
- Enhancing multi-modal multi-hop question answering via structured knowledge and unified retrieval-generation. In Proceedings of the 31st ACM International Conference on Multimedia, pages 5223–5234.
- Turning tables: Generating examples from semi-structured tables for endowing language models with reasoning skills. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6016–6031.
- Tat-qa: A question answering benchmark on a hybrid of tabular and textual content in finance. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 3277–3287.