- The paper introduces ZipLoRA, a novel method that merges independently trained style and subject LoRAs while preserving their distinct attributes.
- It leverages the sparsity and orthogonality of LoRA weight matrices to learn disjoint merger coefficients that effectively avoid interference.
- Experimental results demonstrate that ZipLoRA outperforms direct merging and joint training, consistently generating high-fidelity personalized images.
Introduction
Generative models for concept-driven personalization, such as style-driven or subject-driven image generation, have showcased significant advancements recently. Low-rank adaptations (LoRAs) have offered a parameter-efficient way to personalize concepts within these models. However, existing techniques that attempt to combine separate style and subject LoRAs often face difficulties in maintaining both style and subject fidelities simultaneously. In response, a novel method titled ZipLoRA emerges, promising to merge independently trained style and subject LoRAs effectively and efficiently.
Approach
ZipLoRA’s design relies on crucial insights regarding the sparsity and alignment of LoRA weight matrices. The observation that most elements of these matrices are of minor consequence to model performance underpins the rationale for minimizing component fusion when merging two LoRAs. Specifically, ZipLoRA aims to avoid interference by learning disjoint sets of merger coefficients that promote the orthogonality of combined weights. This method ensures that the final output retains the distinctiveness of both the subject and the style, akin to a zipper that unites two separate entities without compromising the integrity of either.
Experiments and Results
Empirical evidence demonstrates ZipLoRA’s strength in generating compelling personalized creations. It consistently outperforms baselines across a broad spectrum of content and style combinations. Experiments conducted show that this method can generate images that appropriately represent both the style and the subject, crucially allowing recontextualization capabilities. Compared to a direct arithmetic merge and a computationally expensive joint training from scratch, ZipLoRA proves superior, evidenced by both qualitative and quantitative outcomes, including user preference studies.
Conclusion
The introduction of ZipLoRA marks a significant leap forward in the field of personalized image generation. This method facilitates the combination of any subject with any style, leveraging the power and flexibility of models like Stable Diffusion XL. ZipLoRA stands out for its efficiency and simplicity, offering a hyperparameter-free solution that surpasses the need for intricate manual tuning. It reveals new avenues for artists and users to explore personalized creations, backed by the powerful capabilities of contemporary diffusion models.