Less is More (LiMo) Paradigm

Updated 1 February 2026

Less is More (LiMo) is a paradigm focused on strategically reducing model complexity and data to achieve efficient, scalable, and high-performing AI systems.
It demonstrates practical gains through unified Code LLM pruning, targeted data curation, efficient context masking, and even quantum transport improvements.
LiMo consistently reduces resource consumption while meeting or exceeding benchmark performance across machine learning, agentic systems, and physical experiments.

The “Less is More” (LiMo) paradigm refers to a series of empirical and algorithmic findings across machine learning, AI systems, scientific computing, and physics where strategic reduction—rather than brute-force expansion—of model size, training data, or contextual elements yields improved efficiency, scalability, and unexpectedly strong or even superior performance. This principle is substantiated in recent works spanning unified LLM pruning, function tool selection, agency demonstration curation, multimodal token reduction, RL sample efficiency, and quantum transport, collectively redefining conventional wisdom around scaling laws and resource utilization.

1. Unified Structural Pruning in Code LLMs

LiMo, in the context of generative Code LLMs, formalizes the problem as minimizing the KL divergence between the dense model’s output distribution $P(y|X;\theta_0)$ and the pruned model $P(y|X;\theta_p)$ , across all generative coding contexts $X$ (Yang et al., 2024). Unlike traditional methods limited to single structural components, the Flab-Pruner framework addresses three axes simultaneously: vocabulary entries (unused tokens), transformer layers, and feed-forward network (FFN) neurons. The pruning objective is cast as

$L_{\rm total}(\theta) = L_{\rm original}(\theta) + \lambda_{\rm vocab} R_{\rm vocab}(\theta) + \lambda_{\rm layer} R_{\rm layer}(\theta) + \lambda_{\rm FFN} R_{\rm FFN}(\theta)$

where regularization terms incentivize removal of low-utility parameters.

Algorithmic Steps:

Vocabulary Pruning: Retain only tokens observed in a representative corpus; remove others from embedding/output heads.
Layer Pruning: Iteratively remove layers yielding minimal KL divergence increase.
FFN Pruning: Mask search over neuron selections, choosing the minimal-KL mask.
Customized Code-Instruction Recovery: Label ground truths via original model pass/fail cases, LoRA-tune on this set for rapid, complete performance recovery.

Extensive benchmarking across CodeQwen-1.5, NxCode, and CodeSlerp models demonstrates that pruning 22% of parameters retains 97% of performance, increases throughput, and reduces GPU/memory requirements (e.g., BF16 usage drops from 13.55 GB to 10.72 GB; tokens/sec rises from 30 to 38). INT4 quantization further compresses runtime requirements. Under diverse code perturbations, robustness is largely unaffected; post-training can even improve pruned-model accuracy relative to dense baselines.

2. Data-Efficient Reasoning and Agency: Selection Over Scale

The LiMo principle is critical in supervised reasoning, RL scaling, and agentic generalization. The LIMO framework (Ye et al., 5 Feb 2025) shows that a foundation model pretrained for mathematics (Qwen2.5-32B) achieves state-of-the-art pass@1 scores on AIME24 (63.3%) and MATH500 (95.6%) with a mere 1% (800) curated high-quality “cognitive template” examples, beating baselines trained on 100× more data. The “Less-Is-More Reasoning Hypothesis” formalizes the saturation of reasoning performance $R(K, E(N^*)) \approx R_{max}$ once pretrained knowledge $K$ is complete and a small set $N^*$ of high-impact exemplars serves as robust meta-guidance.

Similarly, LIMR (Li et al., 17 Feb 2025) introduces Learning Impact Measurement (LIM), an algorithm that assigns an alignment score $s_i$ to each RL training sample, quantifying its trajectory’s contribution to the aggregate learning curve. By training on only the top 1,389 out of 8,523 samples (cut by $s_i>0.6$ ), RL performance matches or outstrips full-data runs (78.0% MATH500; +16.7% AIME24). Empirical analyses show that only a minority of samples with steadily rising rewards drive long-term improvement—the rest dilute the gradient signal.

3. Theory of Data Curation: When Is Less Better?

Recent formalizations (Dohmatob et al., 5 Nov 2025) resolve the paradox of aggressive curation outperforming full-data training. In high-dimensional proportional limits ( $n,d\to\infty$ , $P(y|X;\theta_p)$ 0), the optimal curation depends on generator and oracle strengths:

If the model is strong on the task, keep-hard (high-difficulty) curation (LiMo) is optimal.
If the model is weak, retain easy examples (More-is-More). Phase boundaries are precisely computable; ImageNet experiments confirm model-collapse avoidance and phase transitions matching theoretical predictions. This framework underpins contradictory results seen in LLM math reasoning and supports practical recommendations for selective data pruning.

4. Efficient Context Pruning in GUI and Function-Calling Agents

LiMo-inspired simplification frameworks extend to agentic tool selection and GUI navigation:

Function-Calling (Edge Devices): A dynamic tool-selection layer for LLMs (Paramanayakam et al., 2024), based on retrieval-augmented clustering and MPNet embeddings, dramatically reduces execution time (up to 70%) and power (up to 40%) on edge hardware by optimizing the number and relevance of provided API tools, raising success rates without fine-tuning.
SimpAgent for GUI Navigation: Context-aware masking of unrelated visual elements and history compression via consistency-guided KL regularization (Chen et al., 4 Jul 2025) drop computation by 27% while raising step success rates across mobile and web environments (e.g., Qwen2VL-2B baseline 69.0%, SimpAgent 71.3% on AITW). Ablation trials confirm that both aggressive element pruning and token drop, when regularized for consistency, jointly yield maximal performance with minimal resource expense.

5. Multimodal and 3D Task Complexity Reduction

In multimodal LLMs, the TRIM module (Song et al., 2024) exploits CLIP-based cosine similarity and adaptive IQR outlier selection to prune 79% of image tokens per input—retaining only those strongly aligned with the instruction—while aggregating discarded regions as a summary token. Benchmarking across 12 datasets demonstrates near-baseline accuracy with 67% reduced inference time and 30% reduced GPU memory. Comparative ablations find that CLIP-driven selection plus aggregation outperforms uniform pooling and fixed ratios, showing that non-uniform, semantically-aware pruning is especially effective.

In 3D LiDAR segmentation (Li et al., 2023), less-is-more is realized by combining:

Spatio-temporal frame downsampling (maximizing scan diversity via SSIM),
Model-size reductions via Sparse Depthwise Separable Convolution (SDSC; delivering ≈2.3× fewer params and 641× fewer multiply-adds than full Cylinder3D),
Reflectivity-informed soft pseudo-labeling, yielding substantially higher mIoU (SemanticKITTI: 59.5% on 5% labels) than larger baselines with uniform sampling.

6. Less is More in Physical Systems: Quantum Transport

In physics, LiMo manifests as anomalous impurity-driven transport in one-dimensional conductors (Znidaric, 2021). Contrary to Matthiessen’s rule (resistivity proportional to impurity density), in systems with subdiffusive clean-segment transport ( $P(y|X;\theta_p)$ 1), increasing impurity density $P(y|X;\theta_p)$ 2 results in a diffusion constant scaling as $P(y|X;\theta_p)$ 3 with $P(y|X;\theta_p)$ 4, so more impurities lead to less resistance. Physically, introducing scatterers breaks up ultra-slow anomalous domains into faster composite segments. This nonlinear, non-monotonic scaling has direct experimental implications for materials engineering.

7. Broader Implications and Future Directions

LiMo shifts prevailing paradigms away from brute-force data/model scaling toward strategically guided reduction and curation:

Practical systems can realize significant savings in compute, storage, and labeling budgets without sacrificing, and often improving, functional robustness.
The principle operates across architectures (transformers, vision models), task types (reasoning, navigation, segmentation), and scientific domains (agentic AI, physical transport).
Future directions include automating curation via active learning, exploring closed-loop RL on agentic tasks, and extending theoretical analyses to multi-modal and cross-domain generalization. Notably, rigorous quantification of sample informativeness, model-context interaction, and scaling law phase transitions will shape best practices in deployment and research for years ahead.