Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 69 tok/s

Gemini 2.5 Pro 52 tok/s Pro

GPT-5 Medium 37 tok/s Pro

GPT-5 High 28 tok/s Pro

GPT-4o 119 tok/s Pro

Kimi K2 218 tok/s Pro

GPT OSS 120B 456 tok/s Pro

Claude Sonnet 4.5 33 tok/s Pro

2000 character limit reached

One Framework to Rule Them All: Unifying Multimodal Tasks with LLM Neural-Tuning (2408.03001v2)

Published 6 Aug 2024 in cs.CV and cs.MM

Abstract: Large-scale models have exhibited remarkable capabilities across diverse domains, including automated medical services and intelligent customer support. However, as most large models are trained on single-modality corpora, enabling them to effectively process and understand multimodal signals remains a significant challenge. Current research often focuses on designing task-specific or scenario-specific tuning strategies, which limits the scalability and versatility. To address this limitation, we propose a unified framework that concurrently handles multiple tasks and modalities. In this framework, all modalities and tasks are represented as unified tokens and trained using a single, consistent approach. To enable efficient multitask processing, we introduce a novel tuning strategy termed neural tuning, inspired by the concept of sparse distributed representation in the human brain, where only specific subsets of neurons are activated for each task. Furthermore, to advance research in multimodal and multitask learning, we present a new benchmark, MMUD, which includes samples annotated with multiple task labels spanning reasoning segmentation, referring segmentation, image captioning, and text-to-image generation. By applying neural tuning to pretrained large models on the MMUD benchmark, we demonstrate the ability to handle multiple tasks simultaneously in a streamlined and efficient manner. All models, code, and datasets will be released publicly upon publication, fostering further research and innovation in this field.