Improving LoRA in Privacy-preserving Federated Learning (2403.12313v1)

Published 18 Mar 2024 in cs.LG, cs.CR, and cs.DC

Abstract: Low-rank adaptation (LoRA) is one of the most popular task-specific parameter-efficient fine-tuning (PEFT) methods on pre-trained LLMs for its good performance and computational efficiency. LoRA injects a product of two trainable rank decomposition matrices over the top of each frozen pre-trained model module. However, when applied in the setting of privacy-preserving federated learning (FL), LoRA may become unstable due to the following facts: 1) the effects of data heterogeneity and multi-step local updates are non-negligible, 2) additive noise enforced on updating gradients to guarantee differential privacy (DP) can be amplified and 3) the final performance is susceptible to hyper-parameters. A key factor leading to these phenomena is the discordance between jointly optimizing the two low-rank matrices by local clients and separately aggregating them by the central server. Thus, this paper proposes an efficient and effective version of LoRA, Federated Freeze A LoRA (FFA-LoRA), to alleviate these challenges and further halve the communication cost of federated fine-tuning LLMs. The core idea of FFA-LoRA is to fix the randomly initialized non-zero matrices and only fine-tune the zero-initialized matrices. Compared to LoRA, FFA-LoRA is motivated by practical and theoretical benefits in privacy-preserved FL. Our experiments demonstrate that FFA-LoRA provides more consistent performance with better computational efficiency over vanilla LoRA in various FL tasks.

PDF Abstract

Improving LoRA in Privacy-preserving Federated Learning

The paper "Improving LoRA in Privacy-preserving Federated Learning" addresses several challenges associated with the application of Low-Rank Adaptation (LoRA) in the context of privacy-preserving federated learning (FL). The authors particularly focus on mitigating the instability issues that arise due to data heterogeneity, multi-step local updates, additive noise from differential privacy (DP) enforcement, and hyper-parameter susceptibility in LoRA.

Key Insights

LoRA in Federated Learning Context

LoRA is renowned for its efficacy in parameter-efficient fine-tuning (PEFT) on large pre-trained LLMs, boasting computational efficiency and performance. However, when integrated into federated learning — a decentralized learning paradigm that allows participants to collaboratively train models without sharing raw data — LoRA experiences instability. The paper identifies three primary discordances contributing to the instability: data heterogeneity impacts, amplified noise due to DP, and sensitivity to hyper-parameters.

Federated Freeze A LoRA (FFALORA)

To address these challenges, the authors propose Federated Freeze A LoRA (FFALORA). The novel approach involves fixing the randomly initialized matrices and fine-tuning only the zero-initialized matrices, significantly halving the communication cost of federated fine-tuning. FFALORA ensures the alignment of client-side local updates with server-side aggregation, reducing the undesirable noise amplification from DP and removing the dependence on the scaling factor $\alpha$ critical for convergence in vanilla LoRA.

Experimental Evaluation

In extensive experiments involving LLMs like RoBERTa-Large on the GLUE benchmark and LLaMA on the GSM-8K dataset, FFALORA consistently demonstrated superior performance over traditional LoRA, proving particularly advantageous in environments with strong data heterogeneity and under differential privacy constraints. Specifically, FFALORA achieved an accuracy of 17.12% on GSM-8K, surpassing LoRA's 15.68%. Across various tasks, FFALORA provided more stable results with reduced variance, especially in tasks with more complex data distributions.

Theoretical Implications

The paper dives into the theoretical benefits of FFALORA, arguing that fixing part of the matrices aligns better with Lipschitz smoothness requirements for gradient descent optimization. This alignment results in better convergence properties, especially under noisy gradient settings typical in DP environments.

Future Directions

The paper speculates on potential avenues for future work, including exploring alternative initialization methods for matrices, further reducing trainable parameter requirements, and investigating connections between FFALORA and random kernel methods due to its pseudo-linear nature.

Conclusion

The paper successfully offers a solution to improve LoRA's functionality in privacy-preserving federated learning settings, demonstrating practical enhancements in computational and communication efficiency while ensuring robust performance under challenging conditions.

These contributions not only advance the application of LoRA in federated learning frameworks but also lay the groundwork for further research in PEFT methods and their integration with decentralized learning paradigms.