Improving LoRA in Privacy-preserving Federated Learning
The paper "Improving LoRA in Privacy-preserving Federated Learning" addresses several challenges associated with the application of Low-Rank Adaptation (LoRA) in the context of privacy-preserving federated learning (FL). The authors particularly focus on mitigating the instability issues that arise due to data heterogeneity, multi-step local updates, additive noise from differential privacy (DP) enforcement, and hyper-parameter susceptibility in LoRA.
Key Insights
LoRA in Federated Learning Context
LoRA is renowned for its efficacy in parameter-efficient fine-tuning (PEFT) on large pre-trained LLMs, boasting computational efficiency and performance. However, when integrated into federated learning — a decentralized learning paradigm that allows participants to collaboratively train models without sharing raw data — LoRA experiences instability. The paper identifies three primary discordances contributing to the instability: data heterogeneity impacts, amplified noise due to DP, and sensitivity to hyper-parameters.
Federated Freeze A LoRA (FFALORA)
To address these challenges, the authors propose Federated Freeze A LoRA (FFALORA). The novel approach involves fixing the randomly initialized matrices and fine-tuning only the zero-initialized matrices, significantly halving the communication cost of federated fine-tuning. FFALORA ensures the alignment of client-side local updates with server-side aggregation, reducing the undesirable noise amplification from DP and removing the dependence on the scaling factor critical for convergence in vanilla LoRA.
Experimental Evaluation
In extensive experiments involving LLMs like RoBERTa-Large on the GLUE benchmark and LLaMA on the GSM-8K dataset, FFALORA consistently demonstrated superior performance over traditional LoRA, proving particularly advantageous in environments with strong data heterogeneity and under differential privacy constraints. Specifically, FFALORA achieved an accuracy of 17.12% on GSM-8K, surpassing LoRA's 15.68%. Across various tasks, FFALORA provided more stable results with reduced variance, especially in tasks with more complex data distributions.
Theoretical Implications
The paper dives into the theoretical benefits of FFALORA, arguing that fixing part of the matrices aligns better with Lipschitz smoothness requirements for gradient descent optimization. This alignment results in better convergence properties, especially under noisy gradient settings typical in DP environments.
Future Directions
The paper speculates on potential avenues for future work, including exploring alternative initialization methods for matrices, further reducing trainable parameter requirements, and investigating connections between FFALORA and random kernel methods due to its pseudo-linear nature.
Conclusion
The paper successfully offers a solution to improve LoRA's functionality in privacy-preserving federated learning settings, demonstrating practical enhancements in computational and communication efficiency while ensuring robust performance under challenging conditions.
These contributions not only advance the application of LoRA in federated learning frameworks but also lay the groundwork for further research in PEFT methods and their integration with decentralized learning paradigms.