Less-is-More Hypothesis

Updated 19 November 2025

Less-is-More Hypothesis is a principle stating that careful selection and pruning of inputs or model components can improve performance and interpretability in systems like neural networks and sensor fusion.
Empirical evidence shows that curated subsets, such as selected radar points or instruction samples, reduce error metrics and enhance convergence compared to full datasets.
The hypothesis is underpinned by theoretical scaling laws and phase transition analyses that guide optimal data curation and pruning strategies across diverse domains.

The Less-is-More Hypothesis refers to the principle that reducing the number or diversity of elements in a system can, under certain conditions, yield improved performance, generalization, interpretability, or efficiency compared to naïvely maximizing quantity. This hypothesis is rigorously supported across domains such as neural network training, data curation, sensor fusion, physics, and social decision-making, where judicious selection, pruning, or filtering of data, model components, or system interactions frequently leads to outcomes that outperform those obtained using all available resources.

1. Formal Definitions and Domain-Specific Instantiations

The Less-is-More Hypothesis is not a single mathematical statement, but recurs in problem-specific formulations:

Radar-Inertial Odometry (RIO): Accurate trajectory estimates are achieved by using strictly fewer radar points, selected via physical properties such as Doppler velocity and radar cross-section. This leads to cleaner, more informative trajectory constraints and demonstrably lower localization error (Huang et al., 2024).
Instruction Tuning for LLMs: Subsets of 1k–6k high-quality instructions suffice for strong generalization and style alignment in large models, rivaling performance obtained from ≥50k samples (Jha et al., 2023).
Data Curation and Learning Theory: Small, highly curated subsets of training examples, chosen for difficulty and correctness, can yield lower test error than training on the entire set. Theoretical phase transition curves govern when curation improves generalization (Dohmatob et al., 5 Nov 2025).
Lottery Ticket Hypothesis: Sparse subnetworks (“winning tickets”) can be found efficiently using only a pruning-aware critical subset of the training data, matching full-data accuracy while massively reducing training iterations (Zhang et al., 2021).
Time Series Forecasting: Structured pruning of foundation models before task-specific fine-tuning leads to superior accuracy, faster convergence, and improved generalization over full-parameter adaptation (Zhao et al., 29 May 2025).
Physical Transport: In anomalous conductors, increasing the density of scatterers (impurities) can decrease resistance, overturning classical intuitions about additive scattering (Znidaric, 2021).

The general principle: quality and relevance of constraints, examples, or parameters dominate sheer quantity.

2. Theoretical Foundations and Mathematical Formulations

The hypothesis is substantiated by diverse mathematical frameworks:

Data Curation Scaling Laws: Let $E_{\text{test}}(N,p)$ denote test error with $N$ examples and fraction $p$ retained by a pruning oracle. Closed-form scaling curves reveal when $\min_{q\in Q_p} E_{\text{test}}(N,p) < E_{\text{test}}(N,1)$ , i.e., curated subsets outperform the full set given sufficient data abundance and oracle reliability. Analytical phase transitions are observed: for strong generators in the data-rich regime, best performance may arise at aggressive pruning levels ( $p \ll 1$ ) (Dohmatob et al., 5 Nov 2025).
Few-shot Learning and Entropy: The Asymptotic Equipartition Property (AEP) implies that almost all probability mass in high-dimensional data lies in a typical set of size $2^{nH}$ (with $H$ entropy rate). Training and generalizing on this set (the “less”) suffices for prediction across the full support (the “more”) (Pereg et al., 2022).
Radar–Inertial Optimization: The state-space model leverages residual functions derived from physical properties, filtering out ~80% of points as redundant, and casting the estimation as a constrained sliding-window nonlinear optimization with Doppler and cross-section constraints (Huang et al., 2024).

3. Mechanisms Underlying Less-is-More Effects

Several mechanisms recur in empirical and theoretical investigations:

Redundancy Suppression: Pruning, coreset selection, or coverage optimization removes redundant or low-informative constraints or examples, thus shrinking noise and variance while preserving essential learning signals.
Activation of Task-Relevant Substructures: Pre-training often produces networks with rich, but highly redundant subnets; fine-tuning over a minimal, task-aligned subnetwork regularizes adaptation and improves generalization (Zhao et al., 29 May 2025).
Improved Signal-to-Noise Ratio: Maximum coverage algorithms for synthetic data select diverse, non-overlapping examples, boosting classifier accuracy and data diversity, often at a reduction of ~90% in data volume (Tavakkol et al., 20 Apr 2025).
Interference and Bottleneck Removal: In physics, frequent scatterers break up long sub-diffusive bottlenecks, decreasing resistance even as more scatterers are added (Znidaric, 2021).

4. Empirical Evidence and Quantitative Validation

Rigorous quantitative studies confirm the hypothesis in practice:

Domain / Task	Full-data performance	Pruned / Curated subset	Relative data used	References
RIO localization	0.80m RMSE	0.29m RMSE	20%	(Huang et al., 2024)
LLM instruction tuning	0.360 accuracy	0.356 (5k subset)	8%	(Jha et al., 2023)
Lottery ticket finding	Full test accuracy	Matched/↑ accuracy	35–78%	(Zhang et al., 2021)
Synthetic data F1 (SST-2)	0.80	0.81 (500 ACS)	10%	(Tavakkol et al., 20 Apr 2025)
TSFM forecasting (ETTm2)	N/A	–22.8% MSE	<50% parameters	(Zhao et al., 29 May 2025)

Experiments reveal sweet spots (e.g. 50% sparsity in CNNs (Merkle et al., 2023)), benchmarks plateauing in accuracy at small curated subsets, and efficiency gains (92.7% fewer iterations in ticket finding).

5. Domain-Specific Applications and Implications

Radar-Inertial Sensor Fusion: Filtering radar points with Doppler and RCS yields trajectory estimates with lower RMSE and computational cost, reflecting that physically validated constraints trump raw data quantity (Huang et al., 2024).
LLM Instruction Tuning: Small, high-quality, mixed datasets optimize both factual recall and style alignment, reducing GPU costs and maximizing reproducibility (Jha et al., 2023).
Data Curation Strategies: Precise analytic conditions reveal when “less is more” applies—aggressive pruning in abundant data with strong oracles; otherwise, full data is necessary. This framework accounts for contradictory mathematical reasoning strategies (e.g., LIMO vs. Sun et al.) (Dohmatob et al., 5 Nov 2025).
Synthetic Data for Classifiers: Adaptive coverage sampling outperforms random or k-means selection, slashing required data and boosting F1-score (Tavakkol et al., 20 Apr 2025).
Sparse CNN Explainability: Moderate pruning increases both classifier accuracy and human-rated explainability, linking network sparsity to clearer attribution maps (Merkle et al., 2023).
Human Social Learning: Minimal social signals (frequency-only) produce higher cumulative payoff than richer rating systems, highlighting behavioral consequences (Toyokawa et al., 2014).

6. Limitations, Caveats, and Domain Dependency

The Less-is-More effect is not universal. Conditions for its validity include:

Generator/Oracle Strength: Weak generators (or noisy model/data) may require “more is more”—keeping all, or easy, examples.
Abundance: Data-rich regimes support aggressive curation; data-poor scenarios do not.
Task Complexity and Representational Completeness: In reasoning, incomplete pre-training nullifies LIMO benefits (Ye et al., 5 Feb 2025); in TSFMs, over-pruning may disrupt pre-trained priors.
Application-specific Constraints: Excessive pruning can compromise linguistic fidelity in LLMs or degrade classifier accuracy beyond certain thresholds.

7. Broader Impact and Future Directions

The hypothesis informs best practices for model fine-tuning, synthetic data pipelines, sensor front ends, and active material engineering. Future work aims to formalize adaptive curation, quantify optimal pruning thresholds, and extend Less-is-More methodologies into new modalities and emerging research areas.

In summary, the Less-is-More Hypothesis provides a cross-disciplinary principle: rigorous selection, filtering, or pruning of constraints, examples, or model components can promote robust learning, efficiency, interpretability, and—paradoxically—sometimes even yield strictly better performance than indiscriminate maximization of quantity. Its domain-specific realizations are governed by well-defined mathematical and empirical conditions (Huang et al., 2024, Jha et al., 2023, Dohmatob et al., 5 Nov 2025, Zhang et al., 2021, Zhao et al., 29 May 2025, Tavakkol et al., 20 Apr 2025, Merkle et al., 2023, Znidaric, 2021, Toyokawa et al., 2014, Pereg et al., 2022, Ye et al., 5 Feb 2025).