Papers
Topics
Authors
Recent
Search
2000 character limit reached

Ordinal-ResLogit: Interpretable Ordered Choice Model

Updated 14 May 2026
  • Ordinal-ResLogit is an interpretable deep residual neural network framework for ordered choice modeling that integrates CORAL to enforce ordinal consistency.
  • It uses residual blocks with skip connections to embed traditional ordered logit structure, enabling recovery of interpretable coefficients and latent heterogeneity.
  • The model yields economically meaningful metrics, accurately capturing substitution patterns and elasticities while outperforming standard ordered choice models.

The Ordinal-ResLogit model is an interpretable deep residual neural network framework for ordered choice modeling, designed to capture ordinal responses while maintaining interpretability and statistical structure. It achieves this by embedding deep residual learning within the COnsistent RAnk Logits (CORAL) architecture, creating a binary classification-based ordinal regression model that guarantees consistency across ordered thresholds. Unlike standard neural approaches, Ordinal-ResLogit is specifically constructed to address the “black-box” limitation in deep learning, enabling the recovery of interpretable coefficients, latent heterogeneity, threshold parameters, and economically meaningful metrics such as elasticities and substitution patterns (Kamal et al., 2022).

1. Model Structure and Mathematical Formulation

Consider a dataset of NN i.i.d. samples {(xn,yn)}\{(\mathbf{x}_n, y_n)\}, where xnRd\mathbf{x}_n \in \mathbb{R}^d are features and yn{1,2,,K}y_n \in \{1,2,\dots,K\} denote ordinal labels. The Ordinal-ResLogit model introduces latent utilities

un=(un,1,,un,K)RK.\mathbf{u}_n = (u_{n,1}, \ldots, u_{n,K}) \in \mathbb{R}^K.

In the CORAL framework, the probability that yny_n exceeds threshold kk is estimated as

P(yn>kxn)=σ(hk(xn))=11+exp(hk(xn)),k=1,,K1,P(y_n > k \mid \mathbf{x}_n) = \sigma(h_k(\mathbf{x}_n)) = \frac{1}{1 + \exp(-h_k(\mathbf{x}_n))}, \quad k=1,\ldots,K-1,

with monotonicity enforced via nonincreasing biases b1b2bK1b_1 \geq b_2 \geq \dots \geq b_{K-1}.

The ResLogit component augments the linear utility V(xn)=βxnRKV(\mathbf{x}_n) = \boldsymbol{\beta}^\top \mathbf{x}_n \in \mathbb{R}^K with {(xn,yn)}\{(\mathbf{x}_n, y_n)\}0 residual blocks. The computation proceeds as:

  • Set {(xn,yn)}\{(\mathbf{x}_n, y_n)\}1.
  • For {(xn,yn)}\{(\mathbf{x}_n, y_n)\}2:

{(xn,yn)}\{(\mathbf{x}_n, y_n)\}3

where {(xn,yn)}\{(\mathbf{x}_n, y_n)\}4 is the residual-layer weight, and {(xn,yn)}\{(\mathbf{x}_n, y_n)\}5 is applied elementwise.

The final latent utilities are {(xn,yn)}\{(\mathbf{x}_n, y_n)\}6, and the threshold logits are {(xn,yn)}\{(\mathbf{x}_n, y_n)\}7.

2. Loss Function and Optimization

The Ordinal-ResLogit model is trained by minimizing the sum of binary cross-entropy losses across {(xn,yn)}\{(\mathbf{x}_n, y_n)\}8 thresholds:

{(xn,yn)}\{(\mathbf{x}_n, y_n)\}9

where xnRd\mathbf{x}_n \in \mathbb{R}^d0 and xnRd\mathbf{x}_n \in \mathbb{R}^d1. The parameter set is xnRd\mathbf{x}_n \in \mathbb{R}^d2.

Residual blocks include skip connections, which ensure that the gradient with respect to xnRd\mathbf{x}_n \in \mathbb{R}^d3 is nonzero for all xnRd\mathbf{x}_n \in \mathbb{R}^d4, mitigating the vanishing/exploding gradient problem and thus keeping optimization well conditioned.

3. Neural Network Architecture and Monotonicity Constraints

The architecture consists of:

  • Input layer: xnRd\mathbf{x}_n \in \mathbb{R}^d5.
  • Linear deterministic layer: xnRd\mathbf{x}_n \in \mathbb{R}^d6.
  • xnRd\mathbf{x}_n \in \mathbb{R}^d7 residual blocks: Each block applies a linear transformation (xnRd\mathbf{x}_n \in \mathbb{R}^d8), sigmoid activation, and skip connection.
  • Final CORAL layer: Produces xnRd\mathbf{x}_n \in \mathbb{R}^d9, which is thresholded via yn{1,2,,K}y_n \in \{1,2,\dots,K\}0 logits yn{1,2,,K}y_n \in \{1,2,\dots,K\}1.

To ensure monotonicity, biases are parameterized as yn{1,2,,K}y_n \in \{1,2,\dots,K\}2 with yn{1,2,,K}y_n \in \{1,2,\dots,K\}3, thereby strictly enforcing yn{1,2,,K}y_n \in \{1,2,\dots,K\}4. This guarantees the non-increasing property of the predicted category probabilities required by ordinal data.

4. Behavioral Derivatives and Choice Model Metrics

Ordinal-ResLogit enables closed-form computation of market shares, substitution patterns, and elasticities:

  • Market share for category yn{1,2,,K}y_n \in \{1,2,\dots,K\}5:

yn{1,2,,K}y_n \in \{1,2,\dots,K\}6

  • Substitution patterns: Effects of a covariate shift yn{1,2,,K}y_n \in \{1,2,\dots,K\}7 in yn{1,2,,K}y_n \in \{1,2,\dots,K\}8 are captured by comparing yn{1,2,,K}y_n \in \{1,2,\dots,K\}9 for all un=(un,1,,un,K)RK.\mathbf{u}_n = (u_{n,1}, \ldots, u_{n,K}) \in \mathbb{R}^K.0.
  • Arc elasticities:

un=(un,1,,un,K)RK.\mathbf{u}_n = (u_{n,1}, \ldots, u_{n,K}) \in \mathbb{R}^K.1

where un=(un,1,,un,K)RK.\mathbf{u}_n = (u_{n,1}, \ldots, u_{n,K}) \in \mathbb{R}^K.2 is the change in market share for un=(un,1,,un,K)RK.\mathbf{u}_n = (u_{n,1}, \ldots, u_{n,K}) \in \mathbb{R}^K.3, given a relative change in un=(un,1,,un,K)RK.\mathbf{u}_n = (u_{n,1}, \ldots, u_{n,K}) \in \mathbb{R}^K.4.

5. Interpretability and Modeling Heterogeneity

The interpretability of Ordinal-ResLogit arises from its decomposition:

  • un=(un,1,,un,K)RK.\mathbf{u}_n = (u_{n,1}, \ldots, u_{n,K}) \in \mathbb{R}^K.5 parameters mirror those in Ordered Logit models: a unit increase in un=(un,1,,un,K)RK.\mathbf{u}_n = (u_{n,1}, \ldots, u_{n,K}) \in \mathbb{R}^K.6 shifts all thresholds by un=(un,1,,un,K)RK.\mathbf{u}_n = (u_{n,1}, \ldots, u_{n,K}) \in \mathbb{R}^K.7.
  • Residual weights un=(un,1,,un,K)RK.\mathbf{u}_n = (u_{n,1}, \ldots, u_{n,K}) \in \mathbb{R}^K.8 represent non-additive and unobserved heterogeneity—allowing the model to flexibly recover alternative-specific effects and break the parallel regression assumption.
  • Threshold biases un=(un,1,,un,K)RK.\mathbf{u}_n = (u_{n,1}, \ldots, u_{n,K}) \in \mathbb{R}^K.9 correspond to learned cutpoints separating adjacent ordinal categories, always constrained to maintain ordinal consistency.

This structure enables extraction of economically meaningful behavioral derivatives, identification of dominant attributes, and statistically valid inference on the contribution of observed and latent factors.

6. Training Algorithm

Training follows a minibatch stochastic optimization procedure, typically using RMSprop or Adam. Monotonicity constraints on yny_n0 are enforced via projection after each gradient update. Early stopping is performed based on validation error.

yny_n5 (Kamal et al., 2022)

7. Empirical Performance and Practical Considerations

Ordinal-ResLogit demonstrates strong empirical performance, particularly for large-scale revealed preference data (e.g., yny_n1, yny_n2 categories: validation accuracy 81.9% vs 56.0% for standard Ordered Logit; substantially superior log-likelihood and AIC). On stated preference datasets with yny_n3, yny_n4, predictive improvements are more modest but the model captures stronger, economically interpretable attribute effects.

The architecture ensures the recovery of substitution effects and elasticities, and the extracted attributes (such as travel costs for revealed preference data, or traffic conditions for stated preference data) align with substantive knowledge in transport choice modeling. Residual blocks enable the model to fit unobserved heterogeneity and complex alternative-specific effects that are unattainable with classic Ordered Logit, while maintaining statistical consistency and interpretability.

(Kamal et al., 2022)

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Ordinal-ResLogit Model.