Sufficiency of perturbing only late self-attention layers in deep VLA flow models for ACG
Determine whether, in deeper transformer-based Vision-Language-Action flow matching policies employing Action Coherence Guidance (ACG), perturbing only a small subset of the late self-attention layers is sufficient to construct an effective incoherent guidance vector field for coherent action generation.
References
Still, it remains an open question whether perturbing only a small fraction of the latter layers suffices for deeper networks.
— ACG: Action Coherence Guidance for Flow-based VLA models
(2510.22201 - Park et al., 25 Oct 2025) in Conclusion