Papers
Topics
Authors
Recent
Search
2000 character limit reached

Anchor-Conditioned Token Generation (ACTG)

Updated 21 January 2026
  • ACTG is a conditional text generation framework that uses fixed anchor tokens to guide output, ensuring inclusion and ordered placement.
  • It leverages hierarchical task decomposition with multi-agent reinforcement learning and adversarial signals to optimize token insertion and replacement.
  • The framework integrates differential privacy measures, enabling fine-grained, privacy-preserving control in applications like targeted editing and instruction following.

Anchor-Conditioned Token Generation (ACTG) specifies a family of conditional text generation frameworks in which generation is guided by a set of “anchor” tokens or attributes that act as explicit conditions or control points. These systems enforce the inclusion and order of given anchors in the generated text, enabling fine-grained instruction following, targeted editing, and privacy-preserving synthesis by decoupling the specification of control features from the underlying generative mechanism. Contemporary ACTG frameworks employ hierarchical decompositions, multi-agent reinforcement learning, adversarial signal optimization, and differentially private mechanisms to realize anchored control at scale (Jo, 2020, Hu et al., 21 Oct 2025).

1. Formal Problem Definition

ACTG formulates generation as learning a mapping from a set of anchors (A or f) to a target sequence S (or text x), enforcing that the output sequence respects anchor constraints. Specifically, the basic structure is as follows:

  • Anchor tokens: A={a1,...,am}A = \{a_1, ..., a_m\} are provided as immutable sequence constraints (“seed” tokens or structured attributes).
  • Target sequence: S={s1,...,sn}S = \{s_1, ..., s_n\} must include all anchors in the prescribed order, with intervening or manipulated content generated to fill in gaps or extend context.
  • Blank positions: B{1,...,n}B \subseteq \{1,...,n\} denotes positions selected for replacement or insertion by a dedicated agent.
  • Objective: Maximize the conditional likelihood P(SA)=tP(sts<t,A)P(S|A) = \prod_t P(s_t | s_{<t}, A), optionally under adversarial and privacy constraints.

In privacy-constrained ACTG (Hu et al., 21 Oct 2025), anchors are extracted as categorical schema fields f=ϕS(x)f=\phi_S(x) and both anchor synthesis GfG_f and conditional sequence generation GxfG_{x|f} are required to be (ϵ,δ)(\epsilon, \delta)-differentially private.

2. Hierarchical Task Decomposition and Multi-Agent RL

Token Manipulation GAN (Token-MANGAN) (Jo, 2020) operationalizes ACTG via a hierarchical multi-agent reinforcement learning architecture:

  • Make-a-Blank Agent (GmanG_{man}): Given partial output y<ty_{<t}, full anchor set AA, and an anchor pointer idx, this agent decides to (a) insert a blank for generation, (b) consume the next anchor, (c) replace an anchor to be refilled, or (d) pass (rarely used).
  • Fill-in-the-Blank Agent (GtokG_{tok}): Activated upon “add” or “replace” actions, it fills the specified position with a sampled token from the vocab VV.
  • Policy updates: Both agents are updated via policy gradients, maximizing expected cumulative reward R(τ)=t=1TγTtrtR(\tau)=\sum_{t=1}^T \gamma^{T-t} r_t with advantage estimation At=R(τ)Vϕ(st)A_t = R(\tau) - V_\phi(s_t), where VϕV_\phi is a learned critic.

This architecture allows dynamic manipulation of anchor placement and content in the output sequence, supporting both insertion and selective replacement in an end-to-end setup.

3. Conditional Adversarial and Privacy-Preserving Learning

ACTG frameworks integrate adversarial learning and differential privacy as follows:

  • Adversarial loss (Token-MANGAN): The generator optimizes against a discriminator D(S,A)D(S,A) trained to distinguish human-written (S,A)(S,A) pairs from machine-generated (S^,A)(\hat{S},A); generator and discriminator losses are formulated analogously to GAN literature.
  • Differential privacy (ACTG-ARL): Both anchor synthesis GfG_f (via AIM, an adaptive tabular synthesizer) and conditional text generator GxfG_{x|f} (via DP-Adam) enforce (ϵ1,δ)(\epsilon_1, \delta)- and (ϵ2,δ)(\epsilon_2, \delta)-DP respectively, with total budget ϵ1+ϵ2ϵ\epsilon_1+\epsilon_2\leq\epsilon. Feature extraction stages incur no privacy cost when implemented via trusted LLM oracles.
  • Reward design: In privacy-preserving settings, the Reyenforcement Learning reward signals are defined by field-wise anchor matching r(f,x)=1Kk=1K1[fk=ϕS(x)k]r(f, x)=\frac{1}{K} \sum_{k=1}^K \mathbf{1}[f_k = \phi_S(x)_k].

Table 1: Summary of ACTG Optimization Objectives

Framework Conditioning Adversarial Signal Privacy Guarantee
Token-MANGAN Hard anchor tokens Discriminator loss None
ACTG-ARL Rich tabular schema RL + best-of-N anchor (ϵ,δ)(\epsilon,\delta)-DP

4. Model Architectures and Training Schedules

Architectural designs in ACTG systems reflect the decomposition of generation and control tasks:

  • Token-MANGAN (Jo, 2020):
    • Both agents GmanG_{man}, GtokG_{tok}: 2-layer LSTMs (embedding dim d=300d=300, hidden h=512h=512), action heads for manipulation and vocab.
    • Critic: single-layer MLP on hidden state.
    • Discriminator: 2-layer uni-LSTM (h=512h=512) with sigmoid output.
    • Hyperparameters: Generator LR $1e$-$4$, Discriminator LR $5e$-$5$, RL discount γ=0.95\gamma=0.95, batch size 64, vocab \sim5,000.
  • ACTG-ARL (Hu et al., 21 Oct 2025):
    • Feature extractor via LLM oracle.
    • DP tabular synthesizer (AIM) for anchors.
    • Conditional LM (gemma-3-1b-pt, fine-tuned via DP-Adam and LoRA).
    • Post-training RL (PPO surrogate) using anchor-matching reward.
    • Hybrid loss combining RL objective and best-of-NN supervised fine-tuning.

Training schedules typically involve an initial MLE phase (teacher-forcing on reference data), followed by adversarial RL or RL-boosted control with privacy constraints.

5. Pseudocode and Workflow

Token-MANGAN’s adversarial multi-agent RL training can be summarized:

1
2
3
4
5
6
7
8
9
10
11
for epoch in 1E_adversarial:
  for batch in data:
    A  sample_anchors(batch)
    Ŝ  rollout(G_man, G_tok | A)
    rewards  [ log D(Ŝₜ | Ŝ<ₜ,A) for t in 1T ]
    R  discounted_sum(rewards, γ)
    adv  R  Vϕ(Ŝ states)
    θ_man  θ_man + α * Σₜ advₜ  log G_man(aₜ|sₜ)
    θ_tok  θ_tok + α * Σₜ advₜ  log G_tok(yₜ|sₜ)
    ϕ     ϕ  α_c * (Vϕ(sₜ)  R)²
    D     D  α_d * _D L_D(real=(S,A), fake=(Ŝ,A))
In ACTG-ARL (Hu et al., 21 Oct 2025), post-processing comprises:

  1. Feature extraction: anchors from private corpus.
  2. DP tabular anchor synthesis with AIM.
  3. DP fine-tuned conditional LM.
  4. Best-of-NN anchor dataset construction.
  5. RL rounds with PPO updates for anchor matching.
  6. Hybrid SFT + RL loss for final model selection.

6. Evaluation Criteria and Comparative Results

ACTG systems are evaluated on both traditional and specialized metrics:

  • Content quality: BLEU(nn), perplexity, semantic alignment (MAUVE).
  • Diversity: self-BLEU.
  • Control accuracy: instruction-following accuracy (IFAcc), per-field anchor match.
  • Distributional alignment: mean Jensen–Shannon distance dJSfd_{JS}^f between private and synthetic anchor distributions.
  • Privacy metrics: (ϵ,δ)(\epsilon, \delta)-DP compliance, error decomposition traced by RDP/PLD accountants.

Representative results include:

  • On COCO captions (Jo, 2020), Token-MANGAN improves quality-diversity tradeoff over MaskGAN and SeqGAN at high mask rates (BLEU-5 at mask=0.5: MaskGAN GAN −0.23, Token-MANGAN −0.19; lower is better).
  • On bioRxiv (Hu et al., 21 Oct 2025), ACTG (“schema + AIM + DP-FT”) achieves MAUVE = 0.775 vs CTCL = 0.647 (+20%), dJSf=0.087d_{JS}^f = 0.087 vs CTCL = 0.175 (−50%), IFAcc under DP = 0.534, and ACTG-ARL boosts IFAcc to ≈0.65 without collapse in MAUVE (≈0.76).

7. Limitations and Prospective Directions

Noted limitations of current ACTG instantiations:

  • Hierarchical RL with discrete action spaces exhibits sample inefficiency (Jo, 2020).
  • Scalability to long output sequences (n20n \gg 20) is constrained by policy gradient variance and architectural bottlenecks.
  • Fixed manipulator action sets; limited span-level editing and context-aware operation.
  • Reward hacking can occur in RL-boosted setups, necessitating hybrid objectives (best-of-NN SFT anchors) (Hu et al., 21 Oct 2025).
  • Model architectures still rely on LSTMs; Transformer-based replacements are proposed for improved context handling.
  • Rich tabular schema selection substantially affects anchor distributional fidelity; the greatest improvement opportunity lies in conditional text generation error.

Future enhancements include replacing LSTMs with Transformers, augmenting reward signals (coherence, topic coverage), and extending manipulators to span-level control. The hybrid ACTG-ARL approach is shown to restore instruction fidelity under privacy constraints and sets new benchmarks in differentially private conditional text generation (Hu et al., 21 Oct 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Anchor-Conditioned Token Generation (ACTG).