Mechanism behind LoRA’s advantage over full fine-tuning in AE-Adapt-V

Ascertain whether the superior performance of LoRA-based end-to-end fine-tuning over full fine-tuning during AE-Adapt-V is attributable to LoRA’s improved preservation of the knowledge encoded in the pre-trained diffusion transformer backbone.

Background

In end-to-end adaptation following embedding space alignment, the authors compared LoRA tuning against full fine-tuning and observed that LoRA achieved higher VBench scores and better visual quality with fewer trainable parameters.

They explicitly conjecture that LoRA’s low-rank adaptation better preserves the pre-trained model’s knowledge, which could explain its empirical advantages. Validating this mechanism would clarify why parameter-efficient fine-tuning can outperform full fine-tuning in this setting and inform future adaptation designs.

References

We find that LoRA not only reduces training cost by requiring fewer trainable parameters, but also achieves higher VBench scores and improved visual quality compared with full finetuning. We conjecture that this is because LoRA better preserves the knowledge of the base model.

— DC-VideoGen: Efficient Video Generation with Deep Compression Video Autoencoder (2509.25182 - Chen et al., 29 Sep 2025) in Section 3.3.2, AE-Adapt-V Stage 2: End-to-End Fine-Tuning with LoRA

Mechanism behind LoRA’s advantage over full fine-tuning in AE-Adapt-V

Sponsor

Background

References

Related Problems