An Invariant Latent Space Perspective on Language Model Inversion (2511.19569v1)

Published 24 Nov 2025 in cs.LG

Abstract: LLM inversion (LMI), i.e., recovering hidden prompts from outputs, emerges as a concrete threat to user privacy and system security. We recast LMI as reusing the LLM's own latent space and propose the Invariant Latent Space Hypothesis (ILSH): (1) diverse outputs from the same source prompt should preserve consistent semantics (source invariance), and (2) input<->output cyclic mappings should be self-consistent within a shared latent space (cyclic invariance). Accordingly, we present Inv^2A, which treats the LLM as an invariant decoder and learns only a lightweight inverse encoder that maps outputs to a denoised pseudo-representation. When multiple outputs are available, they are sparsely concatenated at the representation layer to increase information density. Training proceeds in two stages: contrastive alignment (source invariance) and supervised reinforcement (cyclic invariance). An optional training-free neighborhood search can refine local performance. Across 9 datasets covering user and system prompt scenarios, Inv^2A outperforms baselines by an average of 4.77% BLEU score while reducing dependence on large inverse corpora. Our analysis further shows that prevalent defenses provide limited protection, underscoring the need for stronger strategies. The source code and data involved in this paper can be found in https://github.com/yyy01/Invariant_Attacker.

Summary

The paper introduces Inv, an innovative framework that leverages invariant latent space to enhance language model inversion from outputs.
It utilizes a two-stage training process employing contrastive alignment and reinforcement learning to maintain semantic and cyclic invariance.
Experiments across nine datasets show Inv outperforms previous methods by 4.77% on BLEU scores with less training data, highlighting its robustness.

An Invariant Latent Space Perspective on LLM Inversion

Introduction to LLM Inversion

LLM Inversion (LMI) is an emerging threat to privacy and security, characterized by the ability to reconstruct hidden prompts from a LLM's outputs. This paper presents a novel approach, reframing LMI through the reusability of the LLMs' (LLMs) latent space. The proposed Invariant Latent Space Hypothesis (ILSH) posits that outputs with shared source prompts should maintain semantic consistency, and cyclic mappings between inputs and outputs should reflect self-consistency in a unified latent space.

Method: Invariant Inverse Attacker (Inv)

The paper introduces Inv, an advanced framework leveraging ILSH to enhance LMI efficiency and effectiveness. The approach involves utilizing the LLM as an invariant decoder while developing a specialized inverse encoder that maps outputs back into consistent pseudo-representations. This restructured approach reduces dependency on extensive inverse data and enhances inversion fidelity.

Figure 1: Overview of Inv. An inverse encoder maps one or more outputs into denoised pseudo-representations in the LLM's latent space, and the LLM is reused to recover the prompt.

Training and Evaluation

Inv employs a two-stage training process emphasizing source and cyclic invariances. Initial training focuses on contrastive alignment to ensure semantic consistency across diverse outputs from identical prompts. Subsequently, reinforcement learning strengthens cyclic invariance, optimizing the model for high-fidelity input reconstruction via the inverse encoder.

Figure 2: Evaluation of cyclic invariance. Synonym Replacement, Random Swap (randomly swapping words within a sentence), and Random Noise (replacing words with random WordNet entries) represent different perturbation types. Numbers in parentheses indicate the proportion of perturbed words. The brown dashed line marks the mean under the original setting.

Robustness and Comparative Performance

Inv exhibits notable resilience, outperforming past methods on metrics like BLEU score by 4.77%, demanding significantly less training data to achieve comparable effectiveness. Experiments conducted across nine datasets demonstrate Inv's superiority, leveraging invariant latent geometries inherent in LLMs, and challenging conventional defense strategies reliant on sampling randomness.

Figure 3: Robustness study of diverse sampling strategies (Top-k and Top-n).

Practical Implications and Future Directions

The findings underline the vulnerability of existing LLMs, fostering a pressing call for enhanced security measures. By exposing the latent inversion properties of LLMs, this research catalyzes a shift towards fortified defenses and privacy-preserving mechanisms, emphasizing model robustness without compromising utility.

Conclusion

The presented research offers a profound perspective on LMI, showcasing Inv as a formidable framework explicitly harnessing the latent invariances within LLMs to optimize inversion processes. This study marks a pivotal step within AI, prompting further exploration into effective defenses and encouraging responsible consideration of LMI capabilities.