Mechanism behind effectiveness gains from SPLADE++SelfDistil initialization

Ascertain the causal mechanism by which initializing SPLADE-v3 training from the SPLADE++SelfDistil checkpoint yields better effectiveness than initialization from CoCondenser or DistilBERT, and investigate whether curriculum learning effects as described by Zeng et al. are responsible for the observed improvements.

Background

The authors report that starting SPLADE-v3 training from SPLADE++SelfDistil leads to improved effectiveness compared to initializing from CoCondenser or DistilBERT. They hypothesize that a form of curriculum learning could be occurring.

Despite this empirical observation, the precise reason for the improvement remains unknown, and the authors call for further investigation to confirm and explain the phenomenon.

References

We are still not sure about the cause(s) of this effect, but we believe that a sort of curriculum learning -- as the one investigated in {\it Zeng et al.} -- could happen and lead to the observed improvements, but it still needs to be better investigated.

SPLADE-v3: New baselines for SPLADE (2403.06789 - Lassance et al., 11 Mar 2024) in Section 2.4, Further Fine-Tuning SPLADE