Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs (2311.13600v1)

Published 22 Nov 2023 in cs.CV, cs.GR, and cs.LG

Abstract: Methods for finetuning generative models for concept-driven personalization generally achieve strong results for subject-driven or style-driven generation. Recently, low-rank adaptations (LoRA) have been proposed as a parameter-efficient way of achieving concept-driven personalization. While recent work explores the combination of separate LoRAs to achieve joint generation of learned styles and subjects, existing techniques do not reliably address the problem; they often compromise either subject fidelity or style fidelity. We propose ZipLoRA, a method to cheaply and effectively merge independently trained style and subject LoRAs in order to achieve generation of any user-provided subject in any user-provided style. Experiments on a wide range of subject and style combinations show that ZipLoRA can generate compelling results with meaningful improvements over baselines in subject and style fidelity while preserving the ability to recontextualize. Project page: https://ziplora.github.io

Citations (68)

Summary

  • The paper introduces ZipLoRA, a novel method that merges independently trained style and subject LoRAs while preserving their distinct attributes.
  • It leverages the sparsity and orthogonality of LoRA weight matrices to learn disjoint merger coefficients that effectively avoid interference.
  • Experimental results demonstrate that ZipLoRA outperforms direct merging and joint training, consistently generating high-fidelity personalized images.

Introduction

Generative models for concept-driven personalization, such as style-driven or subject-driven image generation, have showcased significant advancements recently. Low-rank adaptations (LoRAs) have offered a parameter-efficient way to personalize concepts within these models. However, existing techniques that attempt to combine separate style and subject LoRAs often face difficulties in maintaining both style and subject fidelities simultaneously. In response, a novel method titled ZipLoRA emerges, promising to merge independently trained style and subject LoRAs effectively and efficiently.

Approach

ZipLoRA’s design relies on crucial insights regarding the sparsity and alignment of LoRA weight matrices. The observation that most elements of these matrices are of minor consequence to model performance underpins the rationale for minimizing component fusion when merging two LoRAs. Specifically, ZipLoRA aims to avoid interference by learning disjoint sets of merger coefficients that promote the orthogonality of combined weights. This method ensures that the final output retains the distinctiveness of both the subject and the style, akin to a zipper that unites two separate entities without compromising the integrity of either.

Experiments and Results

Empirical evidence demonstrates ZipLoRA’s strength in generating compelling personalized creations. It consistently outperforms baselines across a broad spectrum of content and style combinations. Experiments conducted show that this method can generate images that appropriately represent both the style and the subject, crucially allowing recontextualization capabilities. Compared to a direct arithmetic merge and a computationally expensive joint training from scratch, ZipLoRA proves superior, evidenced by both qualitative and quantitative outcomes, including user preference studies.

Conclusion

The introduction of ZipLoRA marks a significant leap forward in the field of personalized image generation. This method facilitates the combination of any subject with any style, leveraging the power and flexibility of models like Stable Diffusion XL. ZipLoRA stands out for its efficiency and simplicity, offering a hyperparameter-free solution that surpasses the need for intricate manual tuning. It reveals new avenues for artists and users to explore personalized creations, backed by the powerful capabilities of contemporary diffusion models.

Github Logo Streamline Icon: https://streamlinehq.com

GitHub