Monotonic Neural Networks: Neural Fine-Gray
- Monotonic neural networks are specialized function approximators that enforce non-decreasing relationships between selected inputs and outputs, enhancing interpretability and fairness.
- The Neural Fine-Gray approach models cause-specific cumulative incidence functions in survival analysis by integrating softmax probabilities with monotonic subnetworks.
- Empirical evaluations show these models achieve competitive calibration and discrimination with significant computational efficiency, making them ideal for high-stakes domains.
Monotonic neural networks are a class of artificial neural networks (ANNs) designed to enforce monotonic relationships between selected inputs and outputs, motivated by requirements in domains such as medicine, finance, or fairness-sensitive applications where interpretability and domain alignment are essential. The neural Fine-Gray approach specifically addresses time-to-event data with competing risks, enforcing monotonicity in the modeling of cause-specific cumulative incidence functions (CIFs) to permit exact, efficient likelihood-based inference in survival analysis settings.
1. Monotonic Neural Networks: Formal Principles and Motivation
Monotonic neural networks are defined as function approximators such that for a subset of input coordinates and corresponding output indices , the mappings are componentwise non-decreasing in for all , holding all other features fixed. This property is formalized by the implication: where denotes componentwise ordering on and equality elsewhere (Kitouni et al., 2023).
Enforcing monotonicity aligns model predictions with expert-imposed constraints, ensures interpretability, and mitigates risks of model-induced unfairness. Standard multilayer perceptrons (MLPs) cannot guarantee monotonic dependence without additional architectural modifications.
2. Weight Constraints, Monotonicity Certification, and Spline-Based Approaches
Several architectural mechanisms exist for enforcing monotonicity:
- Positive-Weight Networks: By constraining all weights along paths from monotonic input features to output nodes to be nonnegative and employing non-decreasing activation functions, the resulting function is monotonically non-decreasing in those features. The constraints can be enforced by parameterizing each relevant weight as or 0, for unconstrained 1 (Jeanselme et al., 2023, Rindt et al., 2021).
- Certified Monotonicity via Splines (MonoKAN): The MonoKAN architecture replaces standard piecewise polynomial activations with cubic Hermite splines. For each edge 2, the activation is given by a spline interpolant parameterized via knots, values, and derivatives. Sufficient conditions for segment-wise monotonicity enforce 3, nonnegative normalized derivatives 4, and their norm bound 5, following Fritsch–Carlson (1980). If each spline attached to a monotonic feature’s path and the corresponding linear weights are nonnegative, the MonoKAN output is globally certified to be non-decreasing in marked monotonic coordinates (Polo-Molina et al., 2024).
The MonoKAN constraint-enforcement routine proceeds after each gradient step via (i) projecting weights and spline coefficients to their feasible (nonnegative) region, (ii) rescaling derivatives as needed, and (iii) zeroing derivatives on flat intervals.
3. Neural Fine-Gray: Model Architecture for Competing Risks
In survival analysis with competing risks, one models the time to multiple mutually exclusive potential failure types. The cumulative incidence function (CIF) for event 6 is 7. The neural Fine-Gray (NeuralFG) approach models: 8 where 9 is a learned softmax output encoding the marginal probability of cause 0, and 1 is a monotonic function in 2 implemented via a feedforward subnetwork with enforced monotonicity in 3. Monotonicity is achieved by constraining all weights along the 4-input path to be nonnegative and using non-decreasing activations such as Softplus or Tanh. The final 5 output is passed through a Softplus to guarantee non-negativity, and therefore, 6 is strictly increasing in 7 (Jeanselme et al., 2023).
This construction enables exact, efficient training of the full Fine–Gray likelihood by computing derivatives of 8 directly via automatic differentiation.
4. Likelihood-Based Training and Computational Frameworks in Survival Analysis
The neural Fine-Gray model is trained by maximizing the exact competing risks log-likelihood: 9 where 0 is computed via automatic differentiation. This approach eliminates the need for parametric assumptions or costly numerical integration schemes (such as Gauss–Legendre quadrature), delivering a per-epoch computational complexity of 1 for a dataset of 2 subjects, substantially outperforming integral-based methods with complexity 3 for 4-point discretizations (Jeanselme et al., 2023).
Monotonic neural networks also allow for direct optimization of proper scoring rules (e.g., the right-censored log-likelihood in standard survival analysis), ensuring statistical consistency of fitted models (Rindt et al., 2021).
5. Empirical Results: Benchmarks, Calibration, and Efficiency
Empirical evaluations on synthetic and real-world survival datasets (PBC, Framingham, SEER, synthetic) demonstrate that NeuralFG achieves either the lowest or near-lowest Brier scores (calibration) and competitive C-Index (discrimination), matching or outperforming established baselines such as DeepHit and DeSurv at reduced computational cost. For example, On SEER, NeuralFG attains a C-Index of 0.855 and a Brier score of 0.069 at the median event time, closely matching the best baseline (0.860, 0.070) (Jeanselme et al., 2023).
Similar monotonic MLP approaches (SurvivalMonotonic-net/SuMo-net) yield state-of-the-art right-censored log-likelihoods and drastically reduced inference times compared to neural-ODE and other non-monotonic neural competitors (20–1005 speedup) (Rindt et al., 2021).
Recent monotonic models outside survival analysis—such as MonoKAN—have shown superior or competitive performance on partially monotonic tabular benchmarks, with test accuracy and RMSE that either lead or are second-best among state-of-the-art approaches (Polo-Molina et al., 2024), see summary:
6. Interpretability, Universality, and Practical Constraints
Monotonic neural networks preserve interpretability by permitting direct visualization of per-input, per-layer monotonic 1D mappings. MonoKAN, for example, achieves this via plotting each spline 8, giving an explicit graphical summary of feature influence (Polo-Molina et al., 2024). The parameterization of monotonicity via splines or nonnegative weights is compatible with universal approximation properties—any continuous monotonic function can be approximated arbitrarily well by a sufficiently deep and wide monotonic MLP, and similarly for partially monotonic splines (Lang, 2005).
Practical monotonic network designs maintain linear growth in parameter count with respect to key architectural hyperparameters and incur negligible computational overhead. Monotonicity constraints are typically enforced by weight projection, normalization, or construction (via parameterization), compatible with standard optimizers and backpropagation (Kitouni et al., 2023, Polo-Molina et al., 2024).
7. Application Domains and Implications in Risk Modeling
Monotonic neural networks are particularly suited to domains where domain knowledge stipulates monotonic dependencies, including but not limited to clinical risk prediction, financial scoring, and regulatory environments. The neural Fine-Gray approach demonstrates that direct parameterization of cause-specific cumulative incidence functions, with monotonicity guaranteed by construction, corrects bias from naive handling of competing risks and enables accurate calibration—critical for high-stakes applications such as drug allocation or statin eligibility (Jeanselme et al., 2023).
A plausible implication is that monotonic neural networks could become a standard modeling tool in domains where fidelity to domain knowledge, interpretability, and calibration are simultaneously required. Modelers are cautioned, however, that strict monotonicity may reduce expressive power in settings where the monotonicity assumption is itself misspecified.