Mechanism behind LoRA’s advantage over full fine-tuning in AE-Adapt-V
Ascertain whether the superior performance of LoRA-based end-to-end fine-tuning over full fine-tuning during AE-Adapt-V is attributable to LoRA’s improved preservation of the knowledge encoded in the pre-trained diffusion transformer backbone.
References
We find that LoRA not only reduces training cost by requiring fewer trainable parameters, but also achieves higher VBench scores and improved visual quality compared with full finetuning. We conjecture that this is because LoRA better preserves the knowledge of the base model.
— DC-VideoGen: Efficient Video Generation with Deep Compression Video Autoencoder
(2509.25182 - Chen et al., 29 Sep 2025) in Section 3.3.2, AE-Adapt-V Stage 2: End-to-End Fine-Tuning with LoRA