Effect of Text Length on Embedding Inversion and Information Retention
Determine whether longer MS-Marco passages encoded by the Contriever text encoder indeed contain more information that makes exact reconstruction harder and causes the resulting embeddings to discard more details from the input, thereby reducing inversion performance with increasing text length.
Sponsor
References
Our conjectured explanation is that longer texts contain more information, making exact reconstruction harder and leading to embeddings that discard more details from the input.
— Universal Zero-shot Embedding Inversion
(2504.00147 - Zhang et al., 31 Mar 2025) in Subsection "Effect of Text Length" (Section 5); Table \ref{tab:length_effect}