Optimal Data Fraction Selection Under Latency or Cost Constraints

Determine the optimal fraction p of training samples to include when using the gradient-norm–based sample importance selection framework for telecom model training, under specified latency or monetary cost constraints, to provide operational guidelines that balance predictive accuracy with computational and energy efficiency.

Background

The paper introduces a gradient-norm–based sample importance framework that prioritizes training samples with higher influence on model updates, enabling reduced computation and energy consumption without compromising accuracy in telecom applications. Empirical results across multiple datasets show that comparable performance can be achieved using only a subset of the most impactful samples, revealing practical trade-offs between data fraction, training time, energy use, and accuracy.

While the empirical findings demonstrate the viability of selective training, the authors note the absence of clear operational guidelines for choosing the fraction of data to use in practice. Specifically, they highlight the need to formalize how to select the optimal subset size given constraints such as training latency and cost, thereby translating empirical trade-offs into actionable deployment strategies.

References

Finally, translating the observed trade-offs into operational guidelines—such as how to select the optimal data fraction under given latency or cost constraints—remains an open area for exploration.

Through the telecom lens: Are all training samples important?  (2511.21668 - Bothe et al., 26 Nov 2025) in Section "Conclusion and future works" (final paragraph)