Adaptive Duplicate Token Suppression (ADTS)
- Adaptive Duplicate Token Suppression (ADTS) is a decoding algorithm that tracks token frequencies in structured outputs and applies an exponential penalty to suppress redundancy.
- Empirical results show ADTS can reduce output tokens by up to 41.4%, decrease inference times by nearly 48.8%, and cut FLOPs by 44.9% in UI code generation.
- Integrated into MLLM workflows, ADTS dynamically updates CSS/HTML parsers and adjusts logits in real time, ensuring concise, valid, and efficient code generation.
Adaptive Duplicate Token Suppression (ADTS) is a decoding-stage algorithmic mechanism specifically designed to suppress redundant or repetitive token generation in sequence output tasks, most notably within Multimodal LLM (MLLM)–based automatic UI code generation. It dynamically penalizes the likelihood of outputting duplicate structures or tokens by combining online frequency tracking and exponential logit attenuation. ADTS substantially improves both computational efficiency and output quality in domains characterized by syntactic redundancy and repetition, such as HTML/CSS code generation for user interface construction (Xiao et al., 15 Sep 2025).
1. Principle of ADTS: Real-Time Duplicate Structure Tracking
At its core, ADTS operates by parsing the stream of tokens generated during decoding and tracking the appearance frequency of syntactically meaningful structures:
- CSS Parsing: Frequencies are maintained for selectors, properties, and combined <selector, property> pairs—each CSS-related token incrementally updates relevant counts within
css_countsdictionaries. - HTML Parsing: ADTS identifies quadruples—tuples comprising tag, property, value, and content—via customized parsing logic, tracking their frequencies in
html_counts. - Textual Content Tracking: Token strings of non-structural content are managed by
text_counts.
These frequency counters are updated synchronously at each decoding step as new tokens are appended to the output sequence. The parsers are tailored to efficiently handle token recognition at syntactic boundaries, e.g., detection of special tokens such as "{", ":", ";", "}" in CSS or HTML tag open/close events.
2. Exponential Penalty Mechanism for Logit Modification
Upon detection of repeated tokens or code blocks, ADTS modifies their generation probabilities using an exponential decay strategy applied directly to the model logits. Given an original logit associated with a candidate token, its effective logit after ADTS adjustment becomes:
where:
- denotes the observed repetition count for the token or code structure (as tracked by the frequency parsers),
- is a decay factor strictly less than 1 (empirically, is optimal in EfficientUICoder experiments).
This formulation causes the probability of generating further duplicates to decrease exponentially with each occurrence, effectively disincentivizing cyclic redundancy. The penalty thus rapidly attenuates the likelihood that the model continues to output syntactically or semantically repetitive structures.
3. Integration with MLLM Decoding Workflows
ADTS is integrated into the EfficientUICoder decoding pipeline as follows (Xiao et al., 15 Sep 2025):
- At each decoding step, the model produces a token.
- The token is appended to the in-progress output sequence.
- CSS/HTML parsers update the frequency counters with the new token.
- The logit modification strategy is applied: tokens that would continue a detected repetition have their logits scaled by .
- Softmax normalization is performed on the adjusted logits before sampling or selection of the next token.
This online, stepwise adjustment enables ADTS to respond immediately to emergent repetition patterns and preemptively suppress continuation, preventing output “loops” where models become stuck in redundant generation.
4. Empirical Impact on Efficiency and Quality
The adoption of ADTS yields marked improvements in both computational metrics and code quality outcomes:
- Generated Token Reduction: ADTS contributes to up to 41.4% reduction in output tokens when combined with other compression and refinement modules in EfficientUICoder.
- Computational Overhead: There are observed efficiency gains in MLLM inference, with prefill time decreased by up to 46.6% and total inference time reduced by up to 48.8% (on 34B-level models).
- Output Validity and Semantic Integrity: By suppressing repetitive blocks—such as recurrent selectors in CSS or duplicate HTML element definitions—ADTS decreases the number of samples flagged for redundancy (RS metric), with reductions over 50% in some test settings. Automated evaluations (CLIP, BLEU) also record substantial improvements when ADTS is deployed.
- FLOPs Savings: The shorter, non-redundant output sequences lower total required floating point operations during decoding by up to 44.9%, facilitating practical deployment in latency-constrained scenarios.
A plausible implication is that ADTS not only addresses the technical inefficiency inherent to UI2Code but also mitigates validity errors caused by ill-formed or unnecessarily long code artifacts.
5. Relationship with Other Token Suppression Mechanisms
ADTS is conceptually distinct from contrastive repetition suppression methodologies as in CTSD for neural machine translation (Dai et al., 30 Sep 2024), which rely on modulating suppression via both inter-token attention similarity and positional decay. While such techniques attenuate the penalty according to temporal distance and attention overlap, ADTS operates strictly on syntactic structure frequencies and applies an unconditional exponential penalty once a duplicate pattern is detected.
Unlike duplicate elimination paradigms in self-supervised representation learning (such as the memory control policy in DUEL (Choi et al., 2022), which replaces the most redundant negative sample in a working memory buffer), ADTS does not discard samples but continuously penalizes the output probabilities of repeated tokens at the generation stage.
| Mechanism | Domain | Suppression Signal |
|---|---|---|
| ADTS | UI code (HTML/CSS) | Structural frequency + exponential penalty |
| DUEL | SSL (negatives in memory) | Pair-wise latent similarity/collision |
| CTSD | NMT (token generation) | Attention similarity + positional decay |
6. Significance and Application Context
ADTS enables compression and quality assurance for code generation settings featuring high intrinsic redundancy, such as automatic HTML/CSS synthesis via MLLMs. Its structural tracking logic, combined with the exponential penalty, ensures that output not only requires fewer computational resources but also better matches human judgment for conciseness and validity. This method is key to the operational viability of advanced compression frameworks such as EfficientUICoder (Xiao et al., 15 Sep 2025), supporting real-world deployment in web development toolchains where output volume and correctness are essential constraints.
This suggests that ADTS—or closely related frequency-based exponential suppression strategies—will remain central in both the evolution of code-generating LLMs and other structured output domains in which uncontrolled repetition undermines performance, efficiency, and semantic fidelity.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free