Eyeballing Combinatorial Problems: A Case Study of Using Multimodal Large Language Models to Solve Traveling Salesman Problems (2406.06865v1)
Abstract: Multimodal LLMs (MLLMs) have demonstrated proficiency in processing di-verse modalities, including text, images, and audio. These models leverage extensive pre-existing knowledge, enabling them to address complex problems with minimal to no specific training examples, as evidenced in few-shot and zero-shot in-context learning scenarios. This paper investigates the use of MLLMs' visual capabilities to 'eyeball' solutions for the Traveling Salesman Problem (TSP) by analyzing images of point distributions on a two-dimensional plane. Our experiments aimed to validate the hypothesis that MLLMs can effectively 'eyeball' viable TSP routes. The results from zero-shot, few-shot, self-ensemble, and self-refine zero-shot evaluations show promising outcomes. We anticipate that these findings will inspire further exploration into MLLMs' visual reasoning abilities to tackle other combinatorial problems.
- Mohammed Elhenawy (34 papers)
- Ahmed Abdelhay (3 papers)
- Taqwa I. Alhadidi (11 papers)
- Shadi Jaradat (6 papers)
- Ahmed Jaber (11 papers)
- Sebastien Glaser (6 papers)
- Andry Rakotonirainy (14 papers)
- Huthaifa I Ashqar (3 papers)