Unordered Word-Bag Text Reconstruction Attack
- UTR attack is a method that reconstructs text by recovering unordered word sets through combinatorial subword counts and gradient inversion, achieving near-perfect reconstruction accuracy.
- It employs two strategies: a theoretical subword-count approach for string models and a gradient-based attack in adapter-trained federated LLMs, leveraging low-rank properties.
- The approach significantly lowers query complexity compared to previous methods, with empirical results showing 99-100% ROUGE scores on benchmarks like BERT-Base and GPT2-Large.
Unordered-word-bag-based Text Reconstruction (UTR) attacks target text reconstruction from highly restricted information, either in idealized string models or as privacy-compromising attacks on federated LLMs with frozen or highly constrained parameters. The central mechanism is to infer the set of vocabulary tokens (word bags) or subword units present in a secret text and then reconstruct the original sequence, leveraging combinatorial or linear-algebraic properties of the leakage channel. UTR attacks have been developed both in the theoretical context of reconstructing unknown words from subword queries (Richomme et al., 2023), and as advanced gradient inversion attacks against adapter-based federated models (Chen et al., 24 Jan 2026).
1. Attack Formalisms: Query and Gradient Settings
In the word-combinatorial model (Richomme et al., 2023), the UTR attack is posed as follows: given an unknown word over an alphabet of size , the adversary may issue adaptive queries and observe responses such as:
- “#-subword”: is the number of occurrences of as a (possibly scattered) subword (notation: ).
- “-factor” or “-subword” existence: binary responses indicating whether is present as a factor or subword.
The goal is to reconstruct exactly with as few queries as possible. The attack design hinges on capturing enough structural information through subword counts to uniquely determine the sequence.
In the adapter-based federated LLM context (Chen et al., 24 Jan 2026), the UTR attack proceeds with access to low-rank gradient updates from adapter modules trained with the backbone frozen. Here, the adversary receives
- : gradient update from the embedding adapter
- : gradient update from a layer adapter
From these, the attack reconstructs the full input sequence submitted for training, even when conventional gradient inversion attacks fail due to low-dimensionality or parameter sparsity.
2. Core Reconstruction Algorithms
2.1. Subword-Count UTR (Combinatorial Word Model)
The attack on binary alphabets expresses in the form , where are block lengths and . Recovery proceeds in two phases:
- Phase A: Large block identification. A threshold partitions blocks into “large” and “small.” Indices of blocks are found using recursive existence tests of the form “is a subword?” This costs queries.
- Phase B: Small block group recovery. Remaining blocks are recovered in groups of size via a sum query whose combinatorial expansion is uniquely invertible (Lemma 3), making it possible to unlock block values with a single query.
Total query complexity is for binary, and for general alphabets after lifting via all binary projections.
2.2. Adapter-Gradient UTR (FedLLM Setting)
The UTR attack for LoRA-style adapter-based FedLLMs (Chen et al., 24 Jan 2026) is structured into:
- Stage 1: Word Bag Inference. For each vocabulary token , its frozen embedding is tested for membership in the gradient-induced subspace via , using the ratio of weight-to-bias gradients (“RWBG”) extracted from the adapter’s down-projection layer. Tokens passing the test form the candidate word bag .
- Stage 2: Sentence Inference. Candidate sequences of length from are scored by alignment to the layer-adapter gradient subspace :
with a constraint .
- Language Prior Filtering. To reduce combinatorial explosion, the search is filtered via non-repetition, grammar checks (e.g., language_tool_python), and SBERT-based semantic coherence checks.
This two-stage approach leverages the low-rank (typically ) nature of adapters and the fact that all evaluation occurs in the (frozen) embedding spaces.
3. Theoretical Guarantees and Complexity Bounds
The combinatorial UTR’s query complexity is derived using two principal lemmas:
- Group-recovery lemma: For group size and block upper bound , the sum uniquely resolves all by a Vandermonde-structured argument.
- Large-block localization lemma: A recursive binary search on existence queries locates all large blocks using at most queries.
In the adapter-gradient setting, the maximum number of tokens uniquely recoverable in a single batch is bounded by (Chen et al., 24 Jan 2026), directly controlled by the adapter’s internal rank.
Both approaches exploit the high “information per query/inference” enabled by clever combinatorial or algebraic structuring—restricted attention to low-rank subspaces or combinatorially independent block groupings—achieving a sharp separation from previous linear or quasilinear query bounds.
4. Comparison with Prior Approaches and Defenses
Table 1: UTR Attack Models and Comparative Bounds
| Model/Setting | Query/Inference Mode | Query/Time Complexity |
|---|---|---|
| Subword count, -ary | #subword counts () | (Richomme et al., 2023) |
| Adapter-gradient (FedLLM) | Low-rank gradient inversion | pruned to (filtered/beam search) (Chen et al., 24 Jan 2026) |
| Existence-of-subword [prior] | Binary existence query | (Richomme et al., 2023) |
| Existence-of-factor [prior] | Binary factor existence | (Richomme et al., 2023) |
UTR substantially improves over the prior bounds of Skiena–Sundaram for both subword/factor existence queries. In federated settings, UTR outperforms optimization-based (LAMP) and embedding-span GIAs (DAGER), particularly at large batch sizes or with frozen backbones (Chen et al., 24 Jan 2026). Differential privacy (Gaussian noise ) and extreme gradient pruning (>99.9%) remain effective defenses at the cost of significant model utility.
5. Experimental Results and Empirical Performance
Empirical results (Chen et al., 24 Jan 2026) show:
- Reconstruction Accuracy: UTR achieves 99–100% ROUGE-1/2 scores across BERT-Base, GPT2-Large, and Qwen2.5-7B on CoLA, SST-2, and Rotten Tomatoes datasets. Performance is maintained at batch sizes up to for Qwen and CoLA/GPT2, where baseline GIAs degrade sharply.
- Comparison with Baselines: LAMP degrades to 20–60% ROUGE at , while DAGER fails (0% on Qwen with frozen backbone).
- Defensive Efficacy: Differential privacy with sufficient noise drops UTR's recovery to below 2% ROUGE, but at corresponding model utility loss (accuracy ≈ 69%).
This demonstrates that UTR attacks exploit new leakage surfaces exposed by low-rank adaptation even when classical GIA channels are blocked.
6. Core Insights and Significance
The effectiveness of UTR attacks demonstrates a critical privacy vulnerability: low-rank adapters in federated LLMs enable unauthorized textual recovery despite backbones being frozen or highly restricted. Similarly, in string-combinatorial settings, scattered subword counts supply enough global constraints to reconstruct arbitrary unknown words with sublinear query complexity, outperforming earlier methods.
Key methodological advances include:
- The translation from global, structured leakage (subword statistics; adapter-induced subspaces) to combinatorially efficient and uniquely invertible recovery algorithms.
- The use of cascade filters—syntactic and semantic priors—to prune false positive combinations without material loss of coverage.
- The insight that low-rank parameterization, while efficient, offers a “high-leakage” channel that can be systematically exploited.
A plausible implication is that parameter-efficient training regimes cannot be assumed to enhance privacy, and that defenses must be specifically tailored to block subspace membership inference or disrupt gradient structure without sacrificing utility (Chen et al., 24 Jan 2026). For the string-theoretic model, the results close major gaps in the combinatorial complexity of sequence reconstruction and suggest broader applications wherever only coarse or aggregate string statistics are observable.