Neural Dynamic Data Valuation
- NDDV is a dynamic framework that assigns scalable, per-sample values using optimal control principles integrated into neural network training.
- It employs neural surrogate models and adjoint-based methods to efficiently approximate influence scores and capture data utility.
- The framework enhances model performance and fairness through bilevel optimization and re-weighting strategies, addressing computational scaling and data heterogeneity.
Neural Dynamic Data Valuation (NDDV) is a methodological framework for quantifying the importance of individual data points to the performance and generalization of machine learning models, particularly neural networks. Distinct from classical retraining-based or marginal contribution methods, NDDV frames data valuation as a dynamic process embedded within model training or as a control-theoretic or neural surrogate problem, allowing scalable, efficient, and theoretically grounded assignment of per-sample values. This approach leverages the sensitivity of model states, adjoint dynamics, or learned surrogate functions for data valuation, and has been instantiated in various neural architectures, learning pipelines, and data selection tasks.
1. Mathematical Foundations and Optimal Control Formulation
NDDV recasts data valuation as an optimal control problem in continuous or discrete time, establishing a dynamics-based framework for assigning value to each data point. The evolution of the state for each data point is defined through a controlled stochastic differential equation: where is the state, is a control parameter (interpreted as a data weight), is the empirical population mean, and denotes Brownian motion. The typical linear-quadratic drift is with .
The NDDV objective seeks to minimize the population-averaged cost: where is the running cost, the terminal cost, and is a learnable meta-weighting network for fairness and heterogeneity.
The stochastic maximum principle yields a coupled forward-backward SDE system:
- Forward: integrates state trajectories.
- Backward: computes adjoints via
where the Hamiltonian captures control, cost, and adjoint interactions.
For each sample, the dynamic utility is defined as . The marginal value of data point is
which satisfies properties such as efficiency, symmetry, dummy, additivity, and marginalism (Liang et al., 30 Apr 2024, Liang et al., 9 Nov 2025).
2. Neural Surrogate Architectures and Dynamic Inference
NDDV has been realized as neural surrogate models that learn to approximate expensive influence function computations or importance scores. The NN-CIFT approach ("Neural Networks for effiCient Instruction Fine-Tuning") (Agarwal et al., 14 Feb 2025) employs a compact "InfluenceNetwork" to regress pairwise influence metrics.
InfluenceNetwork Architecture
- Embedding: Precompute using BGE embeddings.
- Input: For pair , concatenate embeddings, yielding .
- Hidden: Two dense layers of width 100, ReLU activations.
- Output: Scalar influence score .
- Parameters: (0.0027% size of a 7B LLM).
Training
- Train on a small fraction of data pairs with "ground-truth" labels from an expensive influence metric (e.g., DELIFT).
- Minimize mean squared error:
with Adam optimizer, 20 epochs, learning rate .
- Once trained, the network infers influence values for all pairs via efficient forward passes, supporting dynamic, on-the-fly data valuation as new samples arrive (Agarwal et al., 14 Feb 2025).
3. Data Re-Weighting, Fairness, and Meta-Optimization
To handle heterogeneous data and encourage fairness, NDDV employs a meta-weighting network that re-weights the terminal cost for each sample, adapting the effect of each point on the global objective (Liang et al., 30 Apr 2024, Liang et al., 9 Nov 2025). This forms a bilevel optimization:
- Inner level: optimize .
- Outer level: update , where is a holdout or validation loss.
Optimization alternates gradient steps on (data weights/controls) and (meta-network), using forward/backward SDE integrations and backpropagation through the Hamiltonian. This yields adaptive, context-sensitive data values and ensures data points with harmful contributions are down-weighted.
4. Algorithmic Pipelines and Computational Efficiency
NDDV frameworks provide single-pass, unified data valuation, avoiding repeated model retraining common in classical Shapley value or leave-one-out schemes (Liang et al., 30 Apr 2024, Wibiral et al., 5 Dec 2024):
- Forward step: Integrate state SDE/ODE for minibatches.
- Backward step: Integrate adjoint updates via reverse-mode autodiff.
- Gradient updates: Perform SGD/Adam steps on controls and meta-parameters.
- Valuation: After training, compute to obtain sample values.
Complexity per epoch is for batch size and feature dim , with total scaling for epochs. NN-CIFT achieves further gains: for a $7$–$8$B LLM, pairwise DELIFT requires 67,000s; NN-CIFT needs 215s—yielding 77–99% wall-clock speedup for instruction fine-tuning subset selection (Agarwal et al., 14 Feb 2025).
LossVal (Wibiral et al., 5 Dec 2024) embeds per-sample weights directly into the loss: where is a weighted loss and a Sinkhorn-regularized optimal transport distance to a validation set. LossVal performs joint gradient descent on both network and weights, updating dynamically to reflect per-sample importance, without retraining loops.
5. Theoretical Guarantees: Stability, Error Bounds, and Convergence
Under routine convexity, Lipschitz, and smoothness assumptions, NDDV admits quadratic loss bounds: where is universal, sample size, and the time discretization. This ensures that minor weight/control perturbations have only controlled quadratic effects on the global loss, underpinning the stability of the forward–backward SDE routines (Liang et al., 9 Nov 2025).
Convergence analysis covers both the inner control and meta-optimization:
- For training-loss gradients: .
- For meta-loss: sublinear convergence to a stationary point, so meta-iterations suffice to hit stationarity within accuracy (Liang et al., 9 Nov 2025).
The dynamic character—single unified forward–backward sweeps, closed-form sensitivity-based value assignment—enables all data-point values to be computed in one run, a substantial improvement over classical influence or Shapley approaches (Liang et al., 30 Apr 2024, Liang et al., 9 Nov 2025).
6. Empirical Performance and Applications
Comprehensive experiments on tabular, text, and image benchmarks demonstrate that NDDV and its variants surpass classical methods in efficiency, accuracy, and robustness:
- Corrupted data detection: NDDV achieves 10–20% higher F1 in noisy-label settings compared to KNNShapley, AME, Data-OOB, and influence-based methods (Liang et al., 30 Apr 2024).
- Subset selection: In instruction fine-tuning for LLMs, NN-CIFT yields only 1.4% average drop in performance metrics (ROUGE, BGE, LAJ) relative to original influence functions, despite dramatic computational speedup (Agarwal et al., 14 Feb 2025).
- Computational scaling: NDDV scales linearly with dataset size ( up to ); LossVal is 2–10x faster than Data-OOB and dramatically outpaces retraining-based approaches (Wibiral et al., 5 Dec 2024).
- Data addition/removal: Removal of high-value points degrades generalization fastest, while addition of high-value points yields the largest performance boost—demonstrating effective capture of data utility (Liang et al., 30 Apr 2024).
7. Limitations, Open Problems, and Future Directions
NDDV's efficacy rests partly on the quality of "ground-truth" influence signals (for surrogate networks) and meta-losses (for re-weighting), as surrogate models will inherit any noise from these sources (Agarwal et al., 14 Feb 2025). Although the method achieves quadratic cost for pairwise scoring, whether fully linear NDDV is possible remains open. LossVal requires access to a clean validation set for optimal transport computations, and the OT step can become costly for very large or (Wibiral et al., 5 Dec 2024).
Future research aims at:
- Meta-learning influence networks capable of few-shot adaptation to new influence metrics (Agarwal et al., 14 Feb 2025).
- Continual/online updates to NDDV models as new data arrives.
- Extending NDDV approaches to other loss families, scalable optimal-transport layers, and task-specific continual learning setups (Wibiral et al., 5 Dec 2024).
- Investigation of standardized benchmarks for dynamic data valuation.
The convergence guarantees, empirical scalability, and dynamic responsiveness mark NDDV as a prominent class of data valuation methods, with ongoing advances expected in both theoretical and applied aspects across data-centric machine learning.