- The paper introduces UFXP, which removes the nested Bellman fixed points problem by decoupling neural network utility updates from dynamic programming recursions.
- The estimator is proven to be consistent and asymptotically normal, with the OUFXP modification achieving maximum likelihood efficiency.
- The method achieves dramatic computational speedups through one-time dual fixed point computation, enabling massive parallel optimization in high-dimensional settings.
Training Neural Networks Embedded in Dynamic Discrete Choice Models
Introduction and Motivation
This work introduces UFXP, the first general-purpose estimator for infinite-horizon dynamic discrete choice models (DDCMs) in which the utility function is represented nonparametrically by a neural network. The key technical advancement is the removal of the computational bottleneck induced by nested solutions to Bellman’s equation, which is typically required at every iteration of parameter estimation as the utility function's nonlinearity precludes the classic “unnesting” afforded by a linear parameterization. The proposed estimator exploits the duality of the Bellman operator to fully decouple utility parameter updates from dynamic programming recursions, such that after a small up-front cost, it is possible to reuse these computations for many parameter updates.
The authors provide a rigorous analysis of the statistical properties, establishing consistency and asymptotic normality for UFXP, and show that a simple modification, OUFXP, achieves maximum likelihood efficiency. The estimator generalizes previous approaches by working in a space defined by the choice-specific value functions (CSVs) rather than policies directly, and is compatible with arbitrary neural network architectures for the utility function.
Technical Contributions
Model and Structural Framework
The paper addresses a canonical dynamic discrete choice setup with A actions and X discrete states. The agent's utility function is parameterized as uθ(x), with x observed and e∼g unobserved preference shocks. A neural network substitutes for linear-in-parameters utility, enabling flexible, high-dimensional, non-additive functional forms.
The dynamic programming problem is reframed to work on expected value functions: vθ(x)=ν(uθ(x)+βx′∑f(x′∣x)vθ(x′))
where ν is the social surplus function, accommodating the integration over unobserved shocks. A key insight is that policy functions are mappings from X to the probability simplex, and the main objects of the theory are the conditional choice probabilities (CCPs).
Duality and Unnesting Fixed Points
Classical approaches (NFXP, CCP, SC, MPEC) all require nested fixed points during estimation, which are prohibitive when the utility function is highly flexible. The central advance is a dual decomposition of the Bellman equation: the (primal) value function recursion is replaced by an equivalent dual recursion for a vector λ, which does not depend on the utility parameter θ. Thus, for linear functionals of the value function, the fixed point need only be computed once and can be amortized over many evaluations and parameter updates.
The UFXP estimator is defined as the minimizer of a random projection of the Bellman first-order conditions (FONCs) in the CSV space. For each random projection indexed by X0, only one dual fixed point needs to be solved; these are independent of the parameter vector. The gradient and Hessian of the criterion can also be evaluated without further recursions, producing dramatic speedups for high-dimensional parameterizations.
Statistical Guarantees
The paper proves that under standard regularity conditions, the UFXP estimator is consistent and asymptotically normal. An optimal weighting modification, OUFXP, is introduced which achieves the efficiency bound of the maximum likelihood estimator (MLE), equating its information matrix to that of likelihood-based estimators.
Numerical Experiments
Extensive simulation studies are conducted on high-dimensional inventory management DDCMs, with both synthetic and empirical data. The structural utility—the holding cost as a function of state—is intentionally irregular, and the estimator's neural network flexibly adapts to its shape.
The experimental setup includes several state-of-the-art benchmarks (NFXP, CCP, SC, MPEC) and evaluates four neural architectures, multiple optimizers, and dozens of random initializations. Results consistently demonstrate:
- Successful recovery of complex utility surfaces by UFXP/OUFXP and high X1 (see below).
- Orders of magnitude computational savings in both “workload” (total fixed point recursions) and “span” (depth of parallelizable computation).
- Robustness to optimizer and initialization, with substantially higher rates of convergence to high-quality minima.
- Amortized costs in multi-start optimization: a one-time dual fixed point solve allows hundreds or thousands of parallel parameter searches with negligible additional cost.
Numerical results for the 540-state and 5400-state models (showcasing recovery of the holding cost function) are shown below for key experiments.
Figure 1: True and estimated holding cost functions for the 540-state model; black is ground truth, colored lines are each estimator’s best neural network fit.
Multiple metrics—e.g., work distribution of estimation errors across initializations—are reported.
Figure 2: Empirical CDFs of holding cost X2 errors for 540-state model, for 100 optimization runs per neural net and optimizer; smaller is better.
The empirical case study employs a multi-echelon supply chain with large state space and panel data, estimating a highly nonlinear holding cost. Random ensemble solutions reveal the heterogeneity and uncertainty in the neural fit.
Figure 3: Ensemble of 383 estimated holding cost functions whose UFXP objectives are within 0.5% of the minimum over 1000 runs, showing variability and measured ensemble inference.
Theoretical and Practical Implications
The primary advantage of UFXP/OUFXP is a shift from scaling “in X3” (the number of parameters, i.e. neural network size) for each gradient/Hessian evaluation, to just X4 (or slightly more) fixed point recurrences, which are parallelizable and re-usable. Computational experiments show speedups up to four orders of magnitude in estimation time over classic methods. Especially under multi-start optimization, traditional estimators’ cost becomes infeasible, while UFXP's cost increments sublinearly with the number of restarts and architectures.
This makes it possible to train flexible neural network utilities in DDCMs with large state spaces and high-dimensional covariates—settings where previous approaches are effectively intractable. It also enables best practices from machine learning: architecture searching, optimizer benchmarking, and ensemble estimation.
Statistical Implications and Interpretability
The estimator’s consistency, root-X5 asymptotics, and MLE efficiency extend the statistical foundations for inference in dynamic structural models with nonparametric function approximation. The flexibility of neural network utilities admits intricate state dependence, allowing the field to move beyond restrictive low-dimensional polynomial forms. Empirical results reveal that true utility surfaces can be nonconvex, non-monotonic, and multidimensional, departing sharply from classic economic theory assumptions.
The statistical results also clarify that random projection for the estimation criterion yields identification almost surely as soon as the number of projections matches or slightly exceeds the number of parameters.
Limitations and Future Directions
The methodology presumes knowledge of transition kernels and focuses on the single-agent setting. Extending UFXP/OUFXP to partially observed or strategic multi-agent settings (e.g., dynamic games) remains an open problem. Moreover, while efficient, the criterion is in general nonconvex in neural network parameters, so practical implementations are subject to the usual local minima issues, albeit mitigated by the ability to run large optimization ensembles.
Further theoretical work might consider non-asymptotic guarantees, adaptive projection schemes, and robust inference under model misspecification. The empirical findings suggest revisiting modeling conventions in operations research, industrial organization, and management science that have historically defaulted to simple quadratic or linear utility specifications.
Conclusion
This paper supplies a fundamentally more scalable approach to estimating dynamic discrete choice models with flexible, nonparametric utility functions embedded as neural networks. By organizing the estimation problem via Bellman duality, the UFXP and OUFXP estimators remove the prohibitive recursion bottleneck for general nonlinear specifications. Massively parallelizable and ensemble-friendly, these estimators unlock high-dimensional, empirically relevant dynamic systems for rigorous structural analysis—propelling DDCMs toward current standards in statistical and machine learning. The methods are supported by strong theoretical guarantees and validated through challenging synthetic and empirical tasks well beyond the previous practical frontier.