Ordinal-ResLogit: Interpretable Ordered Choice Model

Updated 14 May 2026

Ordinal-ResLogit is an interpretable deep residual neural network framework for ordered choice modeling that integrates CORAL to enforce ordinal consistency.
It uses residual blocks with skip connections to embed traditional ordered logit structure, enabling recovery of interpretable coefficients and latent heterogeneity.
The model yields economically meaningful metrics, accurately capturing substitution patterns and elasticities while outperforming standard ordered choice models.

The Ordinal-ResLogit model is an interpretable deep residual neural network framework for ordered choice modeling, designed to capture ordinal responses while maintaining interpretability and statistical structure. It achieves this by embedding deep residual learning within the COnsistent RAnk Logits (CORAL) architecture, creating a binary classification-based ordinal regression model that guarantees consistency across ordered thresholds. Unlike standard neural approaches, Ordinal-ResLogit is specifically constructed to address the “black-box” limitation in deep learning, enabling the recovery of interpretable coefficients, latent heterogeneity, threshold parameters, and economically meaningful metrics such as elasticities and substitution patterns (Kamal et al., 2022).

1. Model Structure and Mathematical Formulation

Consider a dataset of $N$ i.i.d. samples $\{(\mathbf{x}_n, y_n)\}$ , where $\mathbf{x}_n \in \mathbb{R}^d$ are features and $y_n \in \{1,2,\dots,K\}$ denote ordinal labels. The Ordinal-ResLogit model introduces latent utilities

$\mathbf{u}_n = (u_{n,1}, \ldots, u_{n,K}) \in \mathbb{R}^K.$

In the CORAL framework, the probability that $y_n$ exceeds threshold $k$ is estimated as

$P(y_n > k \mid \mathbf{x}_n) = \sigma(h_k(\mathbf{x}_n)) = \frac{1}{1 + \exp(-h_k(\mathbf{x}_n))}, \quad k=1,\ldots,K-1,$

with monotonicity enforced via nonincreasing biases $b_1 \geq b_2 \geq \dots \geq b_{K-1}$ .

The ResLogit component augments the linear utility $V(\mathbf{x}_n) = \boldsymbol{\beta}^\top \mathbf{x}_n \in \mathbb{R}^K$ with $\{(\mathbf{x}_n, y_n)\}$ 0 residual blocks. The computation proceeds as:

Set $\{(\mathbf{x}_n, y_n)\}$ 1.
For $\{(\mathbf{x}_n, y_n)\}$ 2:

$\{(\mathbf{x}_n, y_n)\}$ 3

where $\{(\mathbf{x}_n, y_n)\}$ 4 is the residual-layer weight, and $\{(\mathbf{x}_n, y_n)\}$ 5 is applied elementwise.

The final latent utilities are $\{(\mathbf{x}_n, y_n)\}$ 6, and the threshold logits are $\{(\mathbf{x}_n, y_n)\}$ 7.

2. Loss Function and Optimization

The Ordinal-ResLogit model is trained by minimizing the sum of binary cross-entropy losses across $\{(\mathbf{x}_n, y_n)\}$ 8 thresholds:

$\{(\mathbf{x}_n, y_n)\}$ 9

where $\mathbf{x}_n \in \mathbb{R}^d$ 0 and $\mathbf{x}_n \in \mathbb{R}^d$ 1. The parameter set is $\mathbf{x}_n \in \mathbb{R}^d$ 2.

Residual blocks include skip connections, which ensure that the gradient with respect to $\mathbf{x}_n \in \mathbb{R}^d$ 3 is nonzero for all $\mathbf{x}_n \in \mathbb{R}^d$ 4, mitigating the vanishing/exploding gradient problem and thus keeping optimization well conditioned.

3. Neural Network Architecture and Monotonicity Constraints

The architecture consists of:

Input layer: $\mathbf{x}_n \in \mathbb{R}^d$ 5.
Linear deterministic layer: $\mathbf{x}_n \in \mathbb{R}^d$ 6.
$\mathbf{x}_n \in \mathbb{R}^d$ 7 residual blocks: Each block applies a linear transformation ( $\mathbf{x}_n \in \mathbb{R}^d$ 8), sigmoid activation, and skip connection.
Final CORAL layer: Produces $\mathbf{x}_n \in \mathbb{R}^d$ 9, which is thresholded via $y_n \in \{1,2,\dots,K\}$ 0 logits $y_n \in \{1,2,\dots,K\}$ 1.

To ensure monotonicity, biases are parameterized as $y_n \in \{1,2,\dots,K\}$ 2 with $y_n \in \{1,2,\dots,K\}$ 3, thereby strictly enforcing $y_n \in \{1,2,\dots,K\}$ 4. This guarantees the non-increasing property of the predicted category probabilities required by ordinal data.

4. Behavioral Derivatives and Choice Model Metrics

Ordinal-ResLogit enables closed-form computation of market shares, substitution patterns, and elasticities:

Market share for category $y_n \in \{1,2,\dots,K\}$ 5:

$y_n \in \{1,2,\dots,K\}$ 6

Substitution patterns: Effects of a covariate shift $y_n \in \{1,2,\dots,K\}$ 7 in $y_n \in \{1,2,\dots,K\}$ 8 are captured by comparing $y_n \in \{1,2,\dots,K\}$ 9 for all $\mathbf{u}_n = (u_{n,1}, \ldots, u_{n,K}) \in \mathbb{R}^K.$ 0.
Arc elasticities:

$\mathbf{u}_n = (u_{n,1}, \ldots, u_{n,K}) \in \mathbb{R}^K.$ 1

where $\mathbf{u}_n = (u_{n,1}, \ldots, u_{n,K}) \in \mathbb{R}^K.$ 2 is the change in market share for $\mathbf{u}_n = (u_{n,1}, \ldots, u_{n,K}) \in \mathbb{R}^K.$ 3, given a relative change in $\mathbf{u}_n = (u_{n,1}, \ldots, u_{n,K}) \in \mathbb{R}^K.$ 4.

5. Interpretability and Modeling Heterogeneity

The interpretability of Ordinal-ResLogit arises from its decomposition:

$\mathbf{u}_n = (u_{n,1}, \ldots, u_{n,K}) \in \mathbb{R}^K.$ 5 parameters mirror those in Ordered Logit models: a unit increase in $\mathbf{u}_n = (u_{n,1}, \ldots, u_{n,K}) \in \mathbb{R}^K.$ 6 shifts all thresholds by $\mathbf{u}_n = (u_{n,1}, \ldots, u_{n,K}) \in \mathbb{R}^K.$ 7.
Residual weights $\mathbf{u}_n = (u_{n,1}, \ldots, u_{n,K}) \in \mathbb{R}^K.$ 8 represent non-additive and unobserved heterogeneity—allowing the model to flexibly recover alternative-specific effects and break the parallel regression assumption.
Threshold biases $\mathbf{u}_n = (u_{n,1}, \ldots, u_{n,K}) \in \mathbb{R}^K.$ 9 correspond to learned cutpoints separating adjacent ordinal categories, always constrained to maintain ordinal consistency.

This structure enables extraction of economically meaningful behavioral derivatives, identification of dominant attributes, and statistically valid inference on the contribution of observed and latent factors.

6. Training Algorithm

Training follows a minibatch stochastic optimization procedure, typically using RMSprop or Adam. Monotonicity constraints on $y_n$ 0 are enforced via projection after each gradient update. Early stopping is performed based on validation error.

$y_n$ 5 (Kamal et al., 2022)

7. Empirical Performance and Practical Considerations

Ordinal-ResLogit demonstrates strong empirical performance, particularly for large-scale revealed preference data (e.g., $y_n$ 1, $y_n$ 2 categories: validation accuracy 81.9% vs 56.0% for standard Ordered Logit; substantially superior log-likelihood and AIC). On stated preference datasets with $y_n$ 3, $y_n$ 4, predictive improvements are more modest but the model captures stronger, economically interpretable attribute effects.

The architecture ensures the recovery of substitution effects and elasticities, and the extracted attributes (such as travel costs for revealed preference data, or traffic conditions for stated preference data) align with substantive knowledge in transport choice modeling. Residual blocks enable the model to fit unobserved heterogeneity and complex alternative-specific effects that are unattainable with classic Ordered Logit, while maintaining statistical consistency and interpretability.

(Kamal et al., 2022)

Markdown Report Issue Upgrade to Chat

References (1)

Ordinal-ResLogit: Interpretable Deep Residual Neural Networks for Ordered Choices (2022)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Ordinal-ResLogit Model.