Benefit of the reconstruction objective in a full-JEPA variant of LLM2Vec-Gen

Determine whether incorporating the reconstruction objective (next-token prediction conditioned on the learned compression tokens) remains beneficial in a full Joint Embedding Predictive Architecture (JEPA) variant of LLM2Vec-Gen, where the teacher and student are the same frozen large language model and training uses only an alignment objective to match the model’s mean-pooled embedding of its generated response; ascertain whether any benefit is primarily for interpretability rather than for embedding quality.

Background

LLM2Vec-Gen learns generative embeddings by appending trainable thought and compression tokens to inputs and optimizing two losses while keeping the backbone LLM frozen: (1) a reconstruction objective that conditions the model on the compression tokens to reconstruct its own response, and (2) an embedding alignment objective that matches the compression-token-derived embedding to an external unsupervised teacher’s embedding of the response.

The Open frontiers section proposes a full JEPA variant that eliminates the external teacher by using the same frozen model as both generator and target encoder. In this setup, the teacher embedding is produced by mean pooling over the model’s representation of the generated response under a reconstruction-oriented prompt, and the student is trained to predict this target from the query using only the alignment objective. The authors explicitly state that it remains an open empirical question whether retaining the reconstruction objective provides additional benefits, particularly for interpretability rather than embedding quality.

References

Whether the reconstruction objective remains beneficial in this setting, for interpretability rather than embedding quality, is an open empirical question.

LLM2Vec-Gen: Generative Embeddings from Large Language Models  (2603.10913 - BehnamGhader et al., 11 Mar 2026) in Section: Open frontiers, Full JEPA mode