Partial YaRN Methods
- Partial YaRN is a method that selectively compresses, interpolates, or implicitly models parts of a system’s compositional structure to extend context or detail while preserving key fidelity.
- It utilizes split-frequency reparameterization in domains like transformer RoPE, audio-language models, and photorealistic cloth rendering to achieve substantial computational savings and targeted generalization.
- Empirical validations demonstrate that Partial YaRN recovers 80–90% of full method performance with reduced fine-tuning tokens and efficient scaling for complex, hierarchical systems.
Partial YaRN refers to a class of “partial” methods in different domains—transformers for long-context language and audio models, photorealistic cloth rendering, and differentiable cloth physics—where a traditional compositional structure (context, geometry, or physical degrees of freedom) is selectively compressed, interpolated, or implicitly modeled, rather than fully reparameterized or explicitly simulated. Across these contexts, Partial YaRN methods deliver significant computational savings or targeted generalization, while preserving most domain-specific fidelity.
1. Definition and General Principle
Partial YaRN, as formalized in the context of transformer-based Rotary Position Embedding (RoPE) extension (Peng et al., 2023), is a "by-parts" (i.e., partial) adaptation strategy. Instead of globally altering positional encodings or model structure, it applies a split mechanism, modifying only predetermined segments or feature regimes. This principle generalizes to rendering (Khattar et al., 2024) and physics simulation (Gong et al., 2022) as well: Partial YaRN schemes operate by either (a) selective interpolation (by frequency or region), (b) collapsing compositional structure (e.g., substituting explicit sub-elements by on-the-fly or statistical surrogates), or (c) strictly local augmentation. The core aim is to retain high-frequency (local) or critical structure while scaling context window, geometric detail, or physical state space.
2. Partial YaRN for Transformer Context Extension
The archetypal Partial YaRN method is the NTK-by-parts interpolation used in context window extension for transformer models with RoPE. The method applies only the piecewise interpolation component of YaRN, omitting attention-temperature scaling.
- Partition of hidden dimensions: Each hidden dimension is classified using its wavelength , and the ratio identifies how many rotations fit in the trained context.
- Partitioning regimes: Using two thresholds , dimensions are split into low-frequency (interpolated), high-frequency (left unchanged), and a linear regime in between.
- Modified angular frequencies: The new frequency assignment per dimension is
with the piecewise ramp (as above), and is the context scaling factor.
- No logit rescaling: Unlike full YaRN, the attention softmax logits retain their original scaling; there is no temperature modulation.
- Empirical trade-offs: When evaluated on perplexity and passkey-retrieval benchmarks, Partial YaRN recovers 80–90% of the full YaRN performance with ~10x less fine-tuning tokens than prior PI or NTK-aware interpolation baselines. Only at extreme context scaling does logit temperature provide an additional 5–8% reduction in perplexity (Peng et al., 2023).
3. Algorithmic Details and Pseudocode
Partial YaRN's implementation for transformer context extension comprises two main stages: fine-tuning and inference. Both adopt the same split-frequency RoPE reparameterization, using precomputed frequency maps depending on and the scaling thresholds .
- Fine-tuning routine: At each forward pass, token positions are processed with per-dimension rescaled frequencies. Gradients update all model parameters in standard backpropagation.
- Inference routine: The rescaled angular frequencies are recomputed according to the effective scaling (), which is static or dynamically updated to match current sequence length.
Representative pseudocode (cf. (Peng et al., 2023)):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
for d in range(num_rope_dims): theta_d = base_freq ** (-2*d/num_rope_dims) r_d = L * theta_d / (2 * np.pi) if r_d <= alpha: gamma_d = 0 elif r_d >= beta: gamma_d = 1 else: gamma_d = (r_d - alpha) / (beta - alpha) theta_prime_d = (1 - gamma_d) * (theta_d / s) + gamma_d * theta_d for generation_step in steps: if dynamic: s_eff = max(1, curr_length / L) else: s_eff = s # repeat per-dimension update with s_eff |
This structure ensures that all computation can be implemented as a simple drop-in frequency table adjustment with no custom CUDA or model surgery required.
4. Empirical Validation and Trade-Offs
Partial YaRN has been quantitatively benchmarked on LLaMA-2 models (7B/13B) extended from context length to (or up to $128$k), trained for 400–600 steps on only 0.1% of pre-training corpus tokens. In sliding-window perplexity evaluation, Partial YaRN matches or surpasses both Position Interpolation (PI) and prior NTK-aware interpolation across the 2–20x context extension regime, with only a modest performance deficit (5–8% at extreme lengths) against full YaRN. Qualitative extrapolation on information retrieval tasks and context lengths beyond the fine-tuning window demonstrates robust generalization—a plausible implication is that preservation of high-frequency RoPE dimensions is the key determinant of local order retention (Peng et al., 2023).
Ablation confirms that choice of and is robust across model sizes, with too low allowing unwanted extrapolation and too high degrading local positional information.
5. Application to Audio-LLMs
Partial YaRN has been adapted as a modality-selective, training-free extension method for large audio-LLMs (LALMs) (Chaichana et al., 17 Oct 2025). In this setting, only audio-region positions are linearly remapped to fill the available RoPE frequency range, preserving all original text token positions, and hence the pre-trained LLM reasoning. Distinctively:
- For each hybrid audio-text segment, audio token positions are compressed/stretch-mapped (using the same split-frequency partition, in practice with a single cutoff on RoPE dims), while text tokens retain pristine positional information.
- No weight updates are needed—this extension is inference-only.
- When used in combination with Virtual Longform Audio Training (VLAT), where fine-tuning samples virtualized audio resolutions, the approach enables LALMs to generalize robustly to much longer, previously unseen audio contexts.
Empirical results show Partial YaRN delivers up to 13.5-point absolute accuracy gains in zero-shot extension, and when incorporated into training via VLAT, enables substantial generalization performance at unseen lengths (Chaichana et al., 17 Oct 2025).
6. Partial YaRN in Computer Graphics and Physics
In the physically-based rendering domain, "Partial YaRN" refers to geometric aggregation, where explicit ply and fiber curves within a yarn are collapsed into a single tube (Partial Yarn Curve), with local detail dynamically generated only for the intersection points relevant to visible rays (Khattar et al., 2024). Shadowing and sub-yarn normal variation are injected back via analytically correct horizon-map corrections. As a result, memory savings of – and significant rendering acceleration are achieved, with visual fidelity matched via integrated shading models.
A closely related idea appears in differentiable fabrics simulation, where only a partial subset of yarn-level degrees of freedom is explicitly simulated, with contact, sliding, shear, and even collision constraints all made analytically differentiable. This enables high data-efficiency (as few as 5 ground-truth frames needed), robust parameter recovery (1–5% relative error), and physically plausible extrapolation across a range of woven patterns (Gong et al., 2022).
7. Limitations and Future Directions
Partial YaRN, while achieving most of the performance benefits of full context/geometry extension, does not provide the final gains available through logit temperature scaling in language/LLMs, or explicit sub-geometry in rendering. Additional hyperparameter tuning (e.g., in transformers; frequency cutoff in audio; curve densities in rendering) is required, and optimal settings may be model or data dependent (Peng et al., 2023, Chaichana et al., 17 Oct 2025). Further, in the audio-LLM context, evaluation is thus far limited to multiple-choice QA rather than generative tasks, and extension to other modalities (e.g., video) remains open.
This suggests that while Partial YaRN methods offer a high-efficiency, zero-overhead avenue for extending context or structure in complex, hierarchical systems, domain-specific residual refinements (e.g., logit scaling, finer dynamic adaptation) retain importance at the highest-fidelity settings.