Input Time Scaling (ITS) Overview
- Input Time Scaling (ITS) is a set of techniques that transform and augment model inputs without altering parameters, enhancing accuracy and robustness across domains.
- ITS methods, such as prompt expansion, retrieval augmentation, and log-time history transformation, are tailored to optimize performance in LLMs, time series, and physical simulations.
- ITS emphasizes train–test co-design and strategic compute allocation, leading to measurable improvements including enhanced multi-modal generation and near-optimal forecasting in time series.
Input Time Scaling (ITS) encompasses a spectrum of methodologies that allocate computational resources to the manipulation or augmentation of model inputs—at training, inference, or both—to enhance downstream performance without altering model parameters. Distinct from scaling paradigms based on model size or pre-training data volume, ITS operates by enriching or transforming the input space, often yielding substantial gains in accuracy, reasoning ability, or robustness. Applications span LLMs, vision, time-series forecasting, audio-video generation, and physical systems, with techniques tailored to each domain’s invariances and data constraints.
1. Conceptual Foundations and Taxonomy
ITS refers to strategies where additional compute is expended at input processing time—typically at inference, but also potentially during training—to refine, augment, or otherwise transform queries, prompts, or timeseries so as to elicit superior responses from a fixed pretrained model (Wang et al., 12 Oct 2025, Huang et al., 19 Aug 2025). This contrasts with:
- Model scaling: Performance gains through increased parameter count or training FLOPs;
- Inference-time scaling (output-focused): Enhanced computation via modified decoding (e.g., self-consistency, tree search, multi-step reasoning).
ITS covers two broad subtypes (Wang et al., 12 Oct 2025):
- Input-focused ITS: Expansion, demonstration selection, retrieval-augmentation, or meta-cognitive transformations applied to the input;
- Output-focused scaling: Decoding-time manipulations, not covered here except where intertwined with input transformations.
Within input-focused ITS, key methodologies include: few-shot prompting, retrieval-augmented generation (RAG), multi-modal input expansion, meta-knowledge persona insertion, log-time history transformation in time-series, and similarity-scaling protocols in physical simulations (Wang et al., 12 Oct 2025, Huang et al., 19 Aug 2025, Jacques et al., 2021, Ruiz et al., 2022, Shi et al., 2024, Jung et al., 2 Jun 2026).
2. Formal Definitions and Canonical Algorithms
ITS can be formalized as a transformation function mapping each input/query and strategy to a processed query (Huang et al., 19 Aug 2025). At training, the model is trained on ; at inference, the test query is refined as and presented to .
Generic ITS algorithmic template:
0
In LLMs, refinement strategies include persona conditioning, example selection, input paraphrasing, or augmentation with retrieved documents (Huang et al., 19 Aug 2025, Wang et al., 12 Oct 2025).
3. Core ITS Techniques Across Domains
3.1. LLMs: Prompt Expansion and Meta-Knowledge
ITS in LLMs is realized via prompt augmentation—few-shot demonstrations, chain-of-thought prefixing, persona conditioning, and RAG pipelines (Wang et al., 12 Oct 2025, Huang et al., 19 Aug 2025). Key elements:
- Few-shot prompting: Concatenation of labeled examples; boosts accuracy substantially (e.g., GPT-3: 0 accuracy on arithmetic with 1).
- Persona strategies: Conditioning queries with similar, dissimilar, or random "persona" context, constructed via meta-models and concatenated to the input (Huang et al., 19 Aug 2025).
- Prompt–train/test co-design: Alignment of input strategies at both training and inference is critical; mismatched train/test strategies result in large performance drops.
- Retrieval-Augmented Generation: Top-K document retrieval, query expansion (paraphrase, hypothetical answer, draft-based refinement), reranking, and sequence/token-level fusion (Wang et al., 12 Oct 2025).
Insert Table: Pass@1 accuracies for typical persona strategies (Qwen2.5-32B on AIME24, OT-1k) (Huang et al., 19 Aug 2025):
| Train\Test | N | R | S | U |
|---|---|---|---|---|
| S | 43.3 | 60.0 | 66.7 | 76.7* |
| U | 33.3 | 66.7 | 70.0* | 60.0 |
(2: best results occur for strategies with "dissimilar" or "random" personas)
3.2. Multimodal Generation and Verification
ITS generalizes to audio-video and multi-modal domains by leveraging multi-verifier selection and reward aggregation during generative sampling (Jung et al., 2 Jun 2026). Typical process:
- For each generated 3 candidate, evaluate quality via multiple verifiers (e.g., semantic alignment, synchronization).
- Aggregate verifier signals using Adaptive Reward Weighting (ARW), which adaptively rescales each reward based on its variance and online statistics.
- Candidate selection proceeds via best-of-N sampling or evolutionary search.
ARW outperforms static aggregation methods, yielding up to 4 improvement (overall) on JavisDiT audio-video benchmarks.
3.3. Time Series: Log-Time History and Horizon Scaling
For time-series, ITS appears in:
- Scale-Invariant Memory (SITHCon): Representation of history windows via logarithmically-spaced kernels; convolution and pooling over log-time achieves invariance to temporal rescaling of the input signal (Jacques et al., 2021). SITHCon maintains high accuracy across 5 time-stretch factors from 6 to 7, substantially outgeneralizing Temporal Convolution Networks.
- Horizon scaling: Optimal look-back horizon 8 for forecasting is determined by a trade-off between irreducible (Bayesian) error (decays as 9) and model/data approximation error (rises with 0 due to finite capacity and dataset size). 1 increases with data size 2 but shrinks with model size 3 (Shi et al., 2024).
3.4. Physical Systems: Similarity Scaling of Inputs
In fusion science, rise-time scaling for MagLIF implosions prescribes precise power-law transformations of all input parameters when the drive timescale 4 changes, to maintain invariant dimensionless physics (Ruiz et al., 2022). For 5, variables such as liner radii, liner mass, preheat energies, and circuit elements are scaled as functions of 6; load voltage declines only weakly (7).
4. Theoretical Principles and Empirical Observations
4.1. Coverage and Diversity Laws
In few-shot and sampling-based ITS, performance improvement with 8 candidates follows the coverage law 9, where 0 is the per-sample probability of a correct output (Wang et al., 12 Oct 2025). Diminishing returns set in as 1 increases.
4.2. Latency-Accuracy-Compute Trade-Offs
ITS shifts compute from model training to inference or input processing.
- Few-shot prompting increases prompt length linearly with 2 (examples).
- RAG incurs additional retrieval (10–100 ms/query), encoding, and context-token cost.
- Despite increased inference-time FLOPs, these are typically sublinear compared to full model-scaling, and are tunable per query/task (Wang et al., 12 Oct 2025).
- In SITHCon, log-time memory size 3 sets the range of scale invariance; increasing 4 (nodes) extends the invariance range exponentially with only linear cost (Jacques et al., 2021).
4.3. Surprising Empirical Findings
- Small, diverse, and minimally filtered datasets (OT-1k) outperform curated, larger ones (OT-15k, LIMO) in LLM ITS (Huang et al., 19 Aug 2025).
- Injecting “irrelevant” persona information often produces higher accuracy than closely matched conditioning, contrary to “garbage in, garbage out” intuition.
- In time series, excessive look-back 5 degrades forecasting—optimal 6 must be tuned jointly with data size and model capacity (Shi et al., 2024).
5. Methodological Comparisons and Design Constraints
A distinctive property of ITS, emphasized in multiple domains, is the necessity of aligning input transformation strategies between training and inference stages. This "train–test co-design" is essential to avoid out-of-distribution prompt collapse (Huang et al., 19 Aug 2025).
ITS is orthogonal and complementary to output-side inference scaling (e.g., tree-of-thought, self-consistency) and may be combined for further gains. For instance, chain-of-thought traces can be embedded atop ITS-refined inputs (Wang et al., 12 Oct 2025).
6. Representative Case Studies and Metrics
- LLMs: Qwen2.5-32B-Instruct, on OpenThoughts-1k with S–U or U–S persona ITS, achieves 76.7% pass@1 on AIME24—outperforming much larger or more data-intensive settings (Huang et al., 19 Aug 2025).
- Audio–video generation: ITS-guided multi-verifier + ARW achieves 29.9% (text), 68.3% (AV sync), and 46.4% (overall) improvement on MMDisCo/VGGSound (Jung et al., 2 Jun 2026).
- Time series: SITHCon yields nearly 100% accuracy across 7–8 time scales in Morse code classification; TCNs fail outside training scale (Jacques et al., 2021).
- Physical systems: Analytic scaling of fusion implosion inputs preserves non-dimensional similarity, achieving simulation–theory agreement at the 10–30% level over scale factors 9 (Ruiz et al., 2022).
7. Implications, Challenges, and Future Directions
ITS reframes the locus of AI scaling from purely model-centric or data-centric to a query-centric perspective, emphasizing structured, meta-cognitive, or information-theoretic transformations at the input layer (Wang et al., 12 Oct 2025, Huang et al., 19 Aug 2025). Implications include:
- Realization of the "less is more" phenomenon: moderate amounts of diverse data, with appropriate input augmentations, can elicit high-level reasoning, circumventing expensive model or dataset scaling (Huang et al., 19 Aug 2025).
- Invariance and few-shot generalization: Log-time (SITHCon) and other self-similar input strategies unlock robust generalization across data scales.
- Multi-objective inference: Adaptive reward weighting (ARW) is essential for multi-modal domains, preventing single-metric collapse (Jung et al., 2 Jun 2026).
Key research challenges include: formal quantification of coverage/diversity, extension to reinforcement learning from human feedback (RLHF), principled automation of input strategy selection, and systematic evaluation of potential overfitting and domain adaptation in ITS-augmented pipelines.
Input Time Scaling is thereby established as a foundational paradigm for maximizing downstream performance in fixed-model settings, pivotal across reasoning, retrieval, multi-modal generation, and scientific simulation (Wang et al., 12 Oct 2025, Huang et al., 19 Aug 2025, Jacques et al., 2021, Ruiz et al., 2022, Jung et al., 2 Jun 2026).