Mechanism behind effectiveness gains from SPLADE++SelfDistil initialization
Ascertain the causal mechanism by which initializing SPLADE-v3 training from the SPLADE++SelfDistil checkpoint yields better effectiveness than initialization from CoCondenser or DistilBERT, and investigate whether curriculum learning effects as described by Zeng et al. are responsible for the observed improvements.
Sponsor
References
We are still not sure about the cause(s) of this effect, but we believe that a sort of curriculum learning -- as the one investigated in {\it Zeng et al.} -- could happen and lead to the observed improvements, but it still needs to be better investigated.
— SPLADE-v3: New baselines for SPLADE
(2403.06789 - Lassance et al., 11 Mar 2024) in Section 2.4, Further Fine-Tuning SPLADE