Textual Self-Attention Network (TSAN)
- TSAN is a test-time preference optimization framework that leverages natural-language self-attention to synthesize and integrate multiple candidate responses.
- It employs a multi-step pipeline including candidate generation, external scoring, textual QKV encoding, and iterative refinement using PAS_model and PAU_model.
- Empirical results demonstrate TSAN’s significant performance gains over standard fine-tuning and other test-time methods with minimal additional computational cost.
The Textual Self-Attention Network (TSAN) is a test-time preference optimization framework for LLM alignment that operates entirely in the natural-language domain and requires no parameter updates. By recasting the evaluation and synthesis of candidate completions as a self-attention problem—implemented through LLM prompting—TSAN systematically analyzes, weighs, and integrates the strengths of multiple model outputs, achieving interpretable iterative optimization. Empirical results demonstrate that TSAN significantly outperforms both standard supervised fine-tuning and previous test-time methods, including single-candidate revision, across a broad range of benchmarks (Mo et al., 10 Nov 2025).
1. Architectural Paradigm and Test-Time Pipeline
TSAN departs from conventional alignment mechanisms such as Reinforcement Learning from Human Feedback (RLHF) and Direct Preference Optimization (DPO), which require compute-intensive weight updates during training. Instead, TSAN performs all preference optimization at inference, keeping the base policy model πθ frozen. It leverages a multi-candidate, self-attention-inspired approach rather than single-candidate "critique+revise" loops.
Key steps in the TSAN inference loop:
- Candidate Generation: Sample diverse responses from the frozen policy πθ.
- External Scoring: Use a reward model (RM) to select the top- candidates.
- Textual QKV Construction: Initialize as the prompt , and and as the concatenated top- responses paired with their RM scores.
- Textual Attention Scoring: Use a Prompted Attention Scorer LLM (PAS_model) with prompt to generate —an explicit, natural language analysis assigning relative weights and rationale (e.g., “Candidate 1 is very clear and accurate (50% weight)…”).
- Aggregation & Synthesis: Pass 0 to a Prompted Aggregation Updater LLM (PAU_model) to synthesize 1, a new response integrating the best aspects of all candidates in accordance with the attention analysis.
- Iterative Refinement: Optionally, iterate the process for 2 steps, each time re-evaluating, reselecting, and further refining the response set via "textual gradient descent."
- Output Selection: Return the highest-scoring response (per RM) from all iterations.
TSAN’s key innovation is in reframing the candidate solution set as a Query–Key–Value attention operation, but executed in natural-language space at each step (Mo et al., 10 Nov 2025).
2. Mechanized Textual Self-Attention: QKV Encoding and Attention Analysis
TSAN draws an explicit analogy to transformer self-attention mechanisms. In a standard transformer, the operation is:
- 3
- 4
TSAN implements this entirely in text as follows:
Textual Encoding:
- 5 (raw prompt)
- 6 = concatenated top-7 candidate responses, each paired with RM scores.
Attention Score Generation:
- PAS_model receives 8 and produces 9: an interpretive breakdown of each candidate’s strengths/weaknesses and a summary that enables extraction of an equivalent to attention weights 0 from the narrative.
Aggregation Update:
- PAU_model, supplied with 1, produces 2—a synthesized response composed to reflect the proportional strengths as captured in 3:
4
However, this process is performed via prompt-engineered natural language instructions, rather than learned numeric matrices.
3. Iterative Textual Gradient Descent and Inference Pseudocode
TSAN expands the single-pass synthesis into an iterative optimization, structured around natural-language “gradients.” Each iteration 5 involves:
- Computation of 6.
- Synthesis of 7.
- Critique of 8 by a dedicated LLM_L using prompt 9, producing 0 containing structured feedback and “natural-language gradients” (e.g., 1).
- Translation of these gradients into actionable update instructions 2 using another LLM with 3.
- Production of 4 parallel, refined “attention-head” candidates 5 using 6 and 7.
- Rescoring the expanded candidate set with RM, reselection of top-8 for the next iteration, and repetition until 9 is reached or rewards plateau.
Algorithm 1 Pseudocode:
8
4. Weighted Synthesis and Natural-Language Aggregation
Upon obtaining 0, TSAN’s aggregation step operationalizes guidance such as “focus proportionally more on the aspects highlighted as strengths in the attention scores.” Thus, the synthesized response merges clarity from one candidate, factuality from another, tone from a third, etc. This enables structured, interpretable, and preference-aligned output construction, closely mirroring a weighted Key–Value merge:
1
but rendered in human-interpretable text rather than as a numeric tensor.
5. Illustrative Case Studies: Multi-Aspect Synthesis
The framework's efficacy is demonstrated through detailed case studies:
- Mathematical Reasoning: For the alternating sum 2, multiple candidates paired terms in various (sometimes flawed) ways. Attention analysis by PAS_model correctly identified which candidates applied the right grouping, and the PAU_model synthesized an explanation—“Group into 50 pairs yielding –1 each, so sum = 50×(–1)= –50. Thus boxed answer = –50.”—that was more explicit and accurate than any single candidate.
- Instruction Following and Tone: Given an instruction to “channel pure love” in a dialogue about consciousness and biology, candidates varied in warmth and depth. TSAN attention analysis integrated emotional intelligence from one and specificity from another, allowing the aggregation to combine an open-hearted tone with precise scientific inquiry (e.g., “love is not just an emotion…” alongside “How does neuroscience explain the reward circuits of love?”).
These examples illustrate TSAN’s core advantage in systematically harvesting complementary strengths from diverse outputs (Mo et al., 10 Nov 2025).
6. Empirical Performance, Ablations, and Computational Overhead
TSAN was benchmarked against base SFT models, the state-of-the-art TPO method, and commercially aligned models (e.g., Llama-3.1-Instruct, Qwen-3-Plus), across instruction following, open-ended preference, safety, and math tasks.
Performance improvements with 3 (candidates), 4 heads, 5 iterations:
| Benchmark | SFT Base | TSAN | TPO |
|---|---|---|---|
| AlpacaEval 2 LC | 3.01% | 18.57% | 17.95% |
| Raw win-rate WR | 4.91% | 17.05% | 20.18% |
| Arena-Hard | 5.5% | 8.5% | 6.0% |
| HH-RLHF avg reward | –6.65 | –2.88 | –2.96 |
| XSTest safety | 75.2% | 78.8% | 76.6% |
| MATH-500 | 22.0% | 28.2% | 32.0% |
Further summary: Aligned models (e.g., Llama-3.1-Instruct) also gain substantial accuracy when paired with TSAN (e.g., AlpacaEval 2 WR improves from 18.18% to 23.19%). Qwen-3-Plus+TSAN achieves an Arena-Hard score of 72.1%, compared to 47.3% without TSAN. In several metrics, gpt-oss 20B + TSAN matches or surpasses gpt-oss 120B (Mo et al., 10 Nov 2025).
Ablation studies reveal:
- Increasing 6 (number of candidates) from 2 to 4 steadily increases reward scores.
- Increasing 7 (parallel textual heads) yields richer multi-headed “gradient” signals, further boosting performance.
TSAN’s per-query computational cost is ≈11.78 PFLOPs—only ~0.016% of Llama-3.1-70B-DPO’s training cost (72,840 PFLOPs), and marginally higher than TPO’s 9.3 PFLOPs.
7. Significance and Novel Contributions
TSAN introduces a principled, interpretable workflow for combining the strengths of diverse LLM outputs, reframing candidate aggregation as an attention problem in the natural language domain. It operates entirely at inference with no parameter updates, and its iterative, textual-gradient-based process is both structured and interpretable. TSAN systematically outperforms both prior test-time optimization strategies and strong training-time baselines in alignment tasks, with only marginal extra compute cost (Mo et al., 10 Nov 2025).