Ordinal-ResLogit: Interpretable Ordered Choice Model
- Ordinal-ResLogit is an interpretable deep residual neural network framework for ordered choice modeling that integrates CORAL to enforce ordinal consistency.
- It uses residual blocks with skip connections to embed traditional ordered logit structure, enabling recovery of interpretable coefficients and latent heterogeneity.
- The model yields economically meaningful metrics, accurately capturing substitution patterns and elasticities while outperforming standard ordered choice models.
The Ordinal-ResLogit model is an interpretable deep residual neural network framework for ordered choice modeling, designed to capture ordinal responses while maintaining interpretability and statistical structure. It achieves this by embedding deep residual learning within the COnsistent RAnk Logits (CORAL) architecture, creating a binary classification-based ordinal regression model that guarantees consistency across ordered thresholds. Unlike standard neural approaches, Ordinal-ResLogit is specifically constructed to address the “black-box” limitation in deep learning, enabling the recovery of interpretable coefficients, latent heterogeneity, threshold parameters, and economically meaningful metrics such as elasticities and substitution patterns (Kamal et al., 2022).
1. Model Structure and Mathematical Formulation
Consider a dataset of i.i.d. samples , where are features and denote ordinal labels. The Ordinal-ResLogit model introduces latent utilities
In the CORAL framework, the probability that exceeds threshold is estimated as
with monotonicity enforced via nonincreasing biases .
The ResLogit component augments the linear utility with 0 residual blocks. The computation proceeds as:
- Set 1.
- For 2:
3
where 4 is the residual-layer weight, and 5 is applied elementwise.
The final latent utilities are 6, and the threshold logits are 7.
2. Loss Function and Optimization
The Ordinal-ResLogit model is trained by minimizing the sum of binary cross-entropy losses across 8 thresholds:
9
where 0 and 1. The parameter set is 2.
Residual blocks include skip connections, which ensure that the gradient with respect to 3 is nonzero for all 4, mitigating the vanishing/exploding gradient problem and thus keeping optimization well conditioned.
3. Neural Network Architecture and Monotonicity Constraints
The architecture consists of:
- Input layer: 5.
- Linear deterministic layer: 6.
- 7 residual blocks: Each block applies a linear transformation (8), sigmoid activation, and skip connection.
- Final CORAL layer: Produces 9, which is thresholded via 0 logits 1.
To ensure monotonicity, biases are parameterized as 2 with 3, thereby strictly enforcing 4. This guarantees the non-increasing property of the predicted category probabilities required by ordinal data.
4. Behavioral Derivatives and Choice Model Metrics
Ordinal-ResLogit enables closed-form computation of market shares, substitution patterns, and elasticities:
- Market share for category 5:
6
- Substitution patterns: Effects of a covariate shift 7 in 8 are captured by comparing 9 for all 0.
- Arc elasticities:
1
where 2 is the change in market share for 3, given a relative change in 4.
5. Interpretability and Modeling Heterogeneity
The interpretability of Ordinal-ResLogit arises from its decomposition:
- 5 parameters mirror those in Ordered Logit models: a unit increase in 6 shifts all thresholds by 7.
- Residual weights 8 represent non-additive and unobserved heterogeneity—allowing the model to flexibly recover alternative-specific effects and break the parallel regression assumption.
- Threshold biases 9 correspond to learned cutpoints separating adjacent ordinal categories, always constrained to maintain ordinal consistency.
This structure enables extraction of economically meaningful behavioral derivatives, identification of dominant attributes, and statistically valid inference on the contribution of observed and latent factors.
6. Training Algorithm
Training follows a minibatch stochastic optimization procedure, typically using RMSprop or Adam. Monotonicity constraints on 0 are enforced via projection after each gradient update. Early stopping is performed based on validation error.
7. Empirical Performance and Practical Considerations
Ordinal-ResLogit demonstrates strong empirical performance, particularly for large-scale revealed preference data (e.g., 1, 2 categories: validation accuracy 81.9% vs 56.0% for standard Ordered Logit; substantially superior log-likelihood and AIC). On stated preference datasets with 3, 4, predictive improvements are more modest but the model captures stronger, economically interpretable attribute effects.
The architecture ensures the recovery of substitution effects and elasticities, and the extracted attributes (such as travel costs for revealed preference data, or traffic conditions for stated preference data) align with substantive knowledge in transport choice modeling. Residual blocks enable the model to fit unobserved heterogeneity and complex alternative-specific effects that are unattainable with classic Ordered Logit, while maintaining statistical consistency and interpretability.