Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 163 tok/s

Gemini 2.5 Pro 50 tok/s Pro

GPT-5 Medium 36 tok/s Pro

GPT-5 High 35 tok/s Pro

GPT-4o 125 tok/s Pro

Kimi K2 208 tok/s Pro

GPT OSS 120B 445 tok/s Pro

Claude Sonnet 4.5 36 tok/s Pro

2000 character limit reached

Training-Free Pyramid Token Pruning for Efficient Large Vision-Language Models via Region, Token, and Instruction-Guided Importance (2509.15704v1)

Published 19 Sep 2025 in cs.CV

Abstract: Large Vision-LLMs (LVLMs) have significantly advanced multimodal understanding but still struggle with efficiently processing high-resolution images. Recent approaches partition high-resolution images into multiple sub-images, dramatically increasing the number of visual tokens and causing exponential computational overhead during inference. To address these limitations, we propose a training-free token pruning strategy, Pyramid Token Pruning (PTP), that integrates bottom-up visual saliency at both region and token levels with top-down instruction-guided importance. Inspired by human visual attention mechanisms, PTP selectively retains more tokens from visually salient regions and further leverages textual instructions to pinpoint tokens most relevant to specific multimodal tasks. Extensive experiments across 13 diverse benchmarks demonstrate that our method substantially reduces computational overhead and inference latency with minimal performance loss.