Papers
Topics
Authors
Recent
Search
2000 character limit reached

Progress Ratio Embeddings (PRE)

Updated 14 December 2025
  • Progress Ratio Embeddings (PRE) are continuous, trigonometric embeddings that encode the progress ratio of generated text for precise length control in Transformer decoders.
  • PRE replace discrete countdown signals with a smooth impatience signal, improving stability and generalization across various output lengths in tasks like summarization and question generation.
  • By integrating seamlessly into existing architectures with minimal modifications, PRE maintain high output quality and low error rates even on out-of-distribution target lengths.

Progress Ratio Embeddings (PRE) are continuous, trigonometric embeddings designed to provide robust and generalizable length control for neural text generation models, specifically those employing Transformer-based architectures. PRE operate by introducing a smoothly varying impatience signal, tied to a normalized progress ratio rt=t/ℓr_t = t/\ell at each decoding step—where tt is the current token position and ℓ\ell the user-specified target length. This approach replaces previous techniques relying on discrete countdown signals, offering improved stability, length fidelity, and generalization to unseen output lengths in sequence-to-sequence tasks such as abstractive summarization and question generation. PRE are injective with minimal architectural modification and have demonstrated effective control over text length without degrading output quality under standard evaluation metrics (Botcazou et al., 7 Dec 2025).

1. Motivation and Definition

PRE address the problem of explicit length planning in neural sequence generation. Traditional autoregressive decoders for tasks like summarization, question generation, and dialog typically lack mechanisms to precisely satisfy a user-specified output length ℓ\ell, instead relying on stochastic EOS token prediction. Reverse Positional Embeddings (RPE) attempted to remedy this by injecting a fixed countdown signal (ℓ−t\ell-t) at each decoding position, but exhibited poor generalization when the target length fell outside the training distribution. PRE propose a continuous signal: the progress ratio rt=t/ℓ∈[0,1]r_t = t/\ell \in [0,1]. This ratio is used to generate a smoothly evolving impatience signal embedded into the decoder, indicating the fraction of output generated and promoting more reliable adherence to desired lengths.

2. Mathematical Formulation of PRE

For a decoding step tt (with 0≤t≤ℓ0 \leq t \leq \ell), the PRE mechanism is instantiated as follows:

  • Progress ratio: rt=t/â„“r_t = t / \ell.
  • Decoder input embedding:

Xt=Et+Pt+ξ(rt)X_t = E_t + P_t + \xi(r_t)

Here, tt0 is the token embedding, tt1 is the standard positional embedding, and tt2 denotes the PRE vector.

  • PRE vector construction: Defining tt3 with tt4, for tt5,

tt6

Each consecutive (cos, sin) pair encodes a sinusoid whose frequency grows linearly in tt7, producing a dense, continuous signature of generation progress.

3. Integration into Transformer Architectures

PRE are incorporated into standard encoder–decoder Transformer models by injecting tt8 as part of the input embedding at every decoding step for every decoder layer. The core self-attention, cross-attention, feed-forward blocks, and output head remain unchanged. In inference, at each decoding step tt9, the model calculates ℓ\ell0, forms ℓ\ell1, and sums it with existing embeddings before decoding the next token. Decoding continues until EOS is predicted or the ratio saturates at ℓ\ell2, discouraging generation beyond the requested length.

4. Training Objective and Ratio Noise Regularization

Models employing PRE are fine-tuned under teacher forcing to maximize conditional probabilities over reference sequences of target length â„“\ell3, using the cross-entropy objective: â„“\ell4 To promote smooth interpolation and prevent overfitting to discrete â„“\ell5 values, Gaussian noise is injected into each ratio before embedding: â„“\ell6 This procedure exposes the model to a spectrum of â„“\ell7 values, enhancing generalization for arbitrary output lengths.

5. Comparative Analysis: PRE vs Reverse Positional Embeddings

RPE encode the countdown via,

â„“\ell8

This discrete representation leads to instability for out-of-training-distribution length requests: mean absolute error (MAE) spikes, and the number of large-error outliers rises significantly. In contrast, PRE's continuous embedding avoids discretization artifacts, complies with the Nyquist–Shannon criterion (ℓ\ell9), and maintains stable behavior for all ℓ\ell0 within model capacity.

Approach Embedding Structure Generalization (O.O.D â„“)
RPE Discrete countdown Poor (outliers, MAE spikes)
PRE Continuous impatience (PRE) Robust (maintains low error, few outliers)

A plausible implication is that PRE’s mathematical structure inherently supports interpolation and generalization across arbitrary lengths, whereas RPE is constrained by the granularity of its countdown basis.

6. Empirical Validation and Results

Rigorous experiments on BART-L (400M, â„“\ell1) and T5-Large (770M, â„“\ell2) were conducted for CNN/DailyMail and XSum summarization, as well as SQuAD question generation.

  • Length Fidelity (MAE ± SD):
    • CNN/DM: No-control 19.2±17; RPE 1.6±3.6; PRE 0.5±0.3.
    • XSum: No-control 5.8±5; RPE 0.7±1.1; PRE 0.1±0.2.
  • Content Quality (ROUGE/BERTScore):
    • CNN/DM: PRE 45.3/21.9/42.2/69.8 vs RPE 44.5/21.2/41.3/69.4.
    • XSum: PRE 45.2/21.3/36.4/72.7 vs RPE 44.5/20.8/35.6/72.3.
  • Out-of-Distribution Target Lengths:
    • For â„“\ell3 on CNN/DM, RPE outlier rate (>20-token error) exceeds 50% while PRE remains below 10% for â„“\ell4 up to 1000.
  • SQuAD Question Generation:
    • MAE: PRE 0.0±0.1; RPE 0.8±3.6; baseline 3.12±3.3.

Gaussian ratio noise proved essential; ablation reveals its necessity for smooth interpolation. Statistical significance for PRE’s MAE improvement over baselines is ℓ\ell5.

7. Limitations and Prospective Developments

Current PRE research targets encoder–decoder architectures exclusively. Application in large, decoder-only LLMs remains an open question. Its efficacy beyond summarization and question generation, for tasks such as dialog or code synthesis, is unknown. Integrating PRE into chain-of-thought reasoning to control inference depth may reduce hallucinations and computational cost. This suggests potential extensions into reasoning-intensive generation domains, contingent on future empirical validation.

In summary, Progress Ratio Embeddings (PRE) constitute a continuous, trigonometric impatience signal for robust sequence length control, generalizing across broad length distributions while preserving or enhancing text generation metrics and requiring minimal architectural modification (Botcazou et al., 7 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Progress Ratio Embeddings (PRE).