AdaCoder: Adaptive Prompt Compression for Programmatic Visual Question Answering (2407.19410v1)

Published 28 Jul 2024 in cs.AI, cs.CV, and cs.MM

Abstract: Visual question answering aims to provide responses to natural language questions given visual input. Recently, visual programmatic models (VPMs), which generate executable programs to answer questions through LLMs, have attracted research interest. However, they often require long input prompts to provide the LLM with sufficient API usage details to generate relevant code. To address this limitation, we propose AdaCoder, an adaptive prompt compression framework for VPMs. AdaCoder operates in two phases: a compression phase and an inference phase. In the compression phase, given a preprompt that describes all API definitions in the Python language with example snippets of code, a set of compressed preprompts is generated, each depending on a specific question type. In the inference phase, given an input question, AdaCoder predicts the question type and chooses the appropriate corresponding compressed preprompt to generate code to answer the question. Notably, AdaCoder employs a single frozen LLM and pre-defined prompts, negating the necessity of additional training and maintaining adaptability across different powerful black-box LLMs such as GPT and Claude. In experiments, we apply AdaCoder to ViperGPT and demonstrate that it reduces token length by 71.1%, while maintaining or even improving the performance of visual question answering.

Summary

The paper introduces a novel two-phase framework that adaptively compresses prompts, achieving a reduction in token length of 71.1% for visual question answering.
The methodology employs a single frozen large language model for both compression and generation, eliminating the need for additional training.
Experimental results on GQA, VQAv2, and NLVR2 demonstrate improved computational efficiency over baselines while maintaining or enhancing answer quality.

Overview of "AdaCoder: Adaptive Prompt Compression for Programmatic Visual Question Answering"

The paper "AdaCoder: Adaptive Prompt Compression for Programmatic Visual Question Answering" addresses a notable challenge in the field of visual question answering (VQA): the extensive computational cost associated with long input prompts needed by Visual Programmatic Models (VPMs). These models involve generating executable programs via LLMs to answer questions about visual input. AdaCoder introduces an innovative approach to mitigate this challenge by employing adaptive prompt compression.

Key Contributions and Methodology

Framework Design: AdaCoder operates in two distinct phases: the compression phase and the inference phase. In the compression phase, AdaCoder constructs a set of compressed preprompts tailored to specific question types, effectively reducing input length. During inference, the framework predicts the question type and selects an appropriate precompressed prompt to generate code in response to the input question.
Adaptive Prompt Compression: By focusing on specialized prompt compression, AdaCoder achieves significant reductions in computational load. It employs a single frozen LLM to manage both compression and generation tasks, avoiding further training requirements. This allows AdaCoder to maintain adaptability across various black-box LLMs, such as GPT and Claude.
Experimental Results: The paper reports a remarkable reduction in token length of about 71.1% through the adoption of AdaCoder, while maintaining or even enhancing question answering performance. This efficiency is compared against the ViperGPT baseline and LLMLingua, demonstrating superior performance in the context of three VQA datasets: GQA, VQAv2, and NLVR2.

Implications and Future Directions

The implications of AdaCoder are both practical and theoretical. Practically, it offers a solution to reduce computational cost and improve efficiency in VQA tasks, making it suitable for deployment in environments with limited computational resources. Theoretically, it opens new avenues for research in prompt compression and LLM utilization, potentially influencing future developments in AI by presenting an efficient way to leverage powerful LLMs without exhaustive computational demands.

Looking ahead, extending AdaCoder's applicability across a broader spectrum of tasks beyond VQA may prove beneficial. Furthermore, advancements in adaptive extraction of necessary API definitions and incremental learning over expanding datasets could further enhance VPMs' utility.

In conclusion, AdaCoder represents a significant stride in optimizing the computational efficiency of VPMs in visual question answering by innovatively employing adaptive prompt compression. This paper provides a compelling methodology with robust results, contributing to the ongoing discourse on efficient AI and LLM application.

PDF Markdown

Related Papers

Tweets

https://twitter.com/omron_sinicx/status/1851770704980812161

YouTube

Show All Videos