Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

FLoRA: Enhancing Vision-Language Models with Parameter-Efficient Federated Learning (2404.15182v1)

Published 12 Apr 2024 in cs.LG and cs.AI

Abstract: In the rapidly evolving field of artificial intelligence, multimodal models, e.g., integrating vision and language into visual-LLMs (VLMs), have become pivotal for many applications, ranging from image captioning to multimodal search engines. Among these models, the Contrastive Language-Image Pre-training (CLIP) model has demonstrated remarkable performance in understanding and generating nuanced relationships between text and images. However, the conventional training of such models often requires centralized aggregation of vast datasets, posing significant privacy and data governance challenges. To address these concerns, this paper proposes a novel approach that leverages Federated Learning and parameter-efficient adapters, i.e., Low-Rank Adaptation (LoRA), to train VLMs. This methodology preserves data privacy by training models across decentralized data sources and ensures model adaptability and efficiency through LoRA's parameter-efficient fine-tuning. Our approach accelerates training time by up to 34.72 times and requires 2.47 times less memory usage than full fine-tuning.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Duy Phuong Nguyen (6 papers)
  2. Ali Jannesari (56 papers)
  3. J. Pablo Munoz (4 papers)
Citations (3)