Papers
Topics
Authors
Recent
Search
2000 character limit reached

XNNTab: Interpretable Neural Architecture

Updated 22 December 2025
  • XNNTab is a neural network architecture for tabular data that uniquely combines an MLP with a sparse autoencoder to decompose latent representations into distinct, interpretable features.
  • The framework employs a rule induction process using Skope-rules to assign human-readable semantics to each latent feature, enhancing both local and global model transparency.
  • Empirical evaluations demonstrate that XNNTab delivers competitive predictive performance on benchmarks like Adult and CHURN, while ensuring clear and auditable model explanations.

XNNTab is a neural network architecture for tabular data that addresses the need for intrinsic interpretability while retaining predictive performance competitive with state-of-the-art black-box models. XNNTab augments a standard feed-forward multilayer perceptron (MLP) with a sparse autoencoder (SAE) that decomposes the MLP's latent representations into monosemantic features, each aligned with a simple, human-interpretable concept. This construction enables model predictions to be expressed as explicit linear combinations of semantically meaningful components, situating XNNTab as a transparent alternative to conventional neural and classical machine learning models for tabular data domains where interpretability is critical (Elhadri et al., 10 Sep 2025, Elhadri et al., 15 Dec 2025).

1. Architectural Components

The XNNTab framework comprises three principal modules: an MLP backbone, a sparse autoencoder, and a linear decision layer. The MLP gθ:Rd0Rding_\theta:\mathbb{R}^{d_0}\to\mathbb{R}^{d_\text{in}} maps input xx to an embedding h=gθ(x)h_\ell=g_\theta(x). The SAE, defined by tied weights MRdhid×dinM\in\mathbb{R}^{d_\text{hid}\times d_\text{in}}, encodes hh_\ell into sparse codes hSAE=ReLU(Mh+b)Rdhidh_\text{SAE}=\mathrm{ReLU}(Mh_\ell+b)\in\mathbb{R}^{d_\text{hid}} and reconstructs h^=MhSAE\hat h_\ell=M^\top h_\text{SAE}. The final (originally black-box) linear layer WW acts on either hh_\ell or h^\hat h_\ell, but, since both the SAE decoder and classifier are linear, their weights are merged into a single interpretable map W=WMW' = WM^\top such that

y^=WhSAE+b,\hat{y} = W' h_\text{SAE} + b',

creating predictions as explicitly weighted sums of monosemantic features. The typical SAE expansion is dhid=Rdind_\text{hid} = R\,d_\text{in} with R>1R>1 (Elhadri et al., 15 Dec 2025).

2. Training Procedure and Loss Functions

XNNTab is trained in a staged process:

  1. MLP Pretraining: Standard predictive loss (cross-entropy for classification, MSE for regression) plus 1\ell_1 regularization on the final weights:

Lstage1(θg,W)=1Ni=1N(k=1Kyi,klogy^i,k)+λWW1.L_\text{stage1}(\theta_g,W) = \frac{1}{N}\sum_{i=1}^N \left(-\sum_{k=1}^K y_{i,k} \log \hat{y}_{i,k}\right) + \lambda_W\|W\|_1.

  1. SAE Training: The encoder and decoder weights (M,b)(M,b) are optimized to minimize combined 2\ell_2 reconstruction and 1\ell_1 sparsity cost on the codes:

LSAE=1Ni=1Nh(i)MhSAE(i)22+αi=1NhSAE(i)1.\mathcal{L}_\text{SAE} = \frac{1}{N}\sum_{i=1}^N\|h_\ell^{(i)} - M^\top h_\text{SAE}^{(i)}\|_2^2 + \alpha\sum_{i=1}^N\|h_\text{SAE}^{(i)}\|_1.

  1. Decision Layer Finetuning: With SAE and MLP fixed, WW is retrained on h^\hat{h}_\ell to minimize predictive loss again.
  2. Merging Linearity for Interpretability: Final predictions reduce to

y^c(x)=j=1dhidWj,cϕj(x)+bc,\hat{y}_c(x) = \sum_{j=1}^{d_\text{hid}} W'_{j,c} \phi_j(x) + b'_c,

where ϕj(x)=hSAE[j]\phi_j(x)=h_\text{SAE}[j] and each jj is semantically aligned (Elhadri et al., 10 Sep 2025, Elhadri et al., 15 Dec 2025).

3. Monosemantic Feature Extraction and Semantics Assignment

The overcomplete dimension and 1\ell_1 sparsity induce each code in hSAEh_\text{SAE} to represent a distinct, interpretable data pattern. After SAE training, semantics for each latent neuron are assigned through a rule induction process:

  • High-activation samples (activations above threshold, typically the 90th percentile or t=0.9t=0.9) for neuron jj are labeled positive.
  • Skope-rules, a rule-based classifier, identifies Boolean conjunctions (with up to 4 conditions) in the input space that best separate the positives from the negatives.
  • The rule with maximum recall (coverage) over the high-activation set is retained.
  • Each hSAE[j]h_\text{SAE}[j] is thus mapped to a logical statement, e.g., "marital_status ≠ Married AND education_num < 13 AND capital_gain ≤ 8028" (Elhadri et al., 15 Dec 2025).

This mechanism yields monosemantic, human-readable features, transforming model explanations into white-box logical summaries.

4. Predictive Formulation and Interpretability

By merging the decoder and classifier weights, each model prediction becomes an interpretable linear combination of semantically-characterized rules:

y^k(x)=j=1dhidWk,j  rj(x)\hat{y}_k(x) = \sum_{j=1}^{d_\text{hid}} W'_{k,j}\; r_j(x)

where rj(x)r_j(x) is $1$ if the Boolean rule for feature jj holds on xx, zero otherwise. This functional form allows for both local explanations (which rules contributed to an individual prediction) and global model summaries (all rules and associated weights). Empirically, rule length per feature is low (median 2.2 clauses on ADULT; 1.8 on CHURN), and the number of rules firing per instance is modest (on average 8 out of 21 for ADULT; 21.5 out of 48 for CHURN), supporting transparency and tractability (Elhadri et al., 10 Sep 2025, Elhadri et al., 15 Dec 2025).

5. Empirical Performance and Benchmarking

Extensive evaluation on public tabular datasets demonstrates that XNNTab closes the performance gap between classical interpretable algorithms and black-box ensemble or neural architectures:

Dataset XNNTab Macro F1 XGBoost F1 RF F1
Adult 0.795 (Elhadri et al., 15 Dec 2025) 0.815 0.799
Spambase 0.948 0.957 0.947
Covertype 0.878 0.942 0.838
Gesture 0.634 0.638 0.596

On CHURN: Accuracy = 0.861±0.005, F1 = 0.759±0.001, outperforming even XGBoost (Acc = 0.854, F1 = 0.730) (Elhadri et al., 10 Sep 2025).

XNNTab systematically exceeds or matches interpretable baselines (Logistic Regression, Decision Trees), while performing within a few points of leading black-box models (e.g., XGBoost, TabNet, NODE, FT-Transformer). The intrinsic interpretability mechanism imposes negligible or, in some cases, no loss in predictive performance (Elhadri et al., 10 Sep 2025, Elhadri et al., 15 Dec 2025).

6. Model Explanations: Local and Global Transparency

XNNTab explanations are characterized at both individual and aggregate levels:

  • Local explanations: For a given input xx, only a small subset of dictionary rules activates. Each rule's contribution to the prediction can be directly read off its weight and Boolean activation status.
  • Global explanations: The model can be summarized as a finite set of logical rules and a tabulated weight matrix. This supports global auditability—users may identify high-weight features, investigate the cumulative effect of patterns, or perform model pruning without loss of semantic transparency.

For example, a prediction for an Adult instance with x={x=\{age=28, marital_status=Never-married, education_num=10, capital_gain=0}\} may activate only rules j=9j=9 and j=40j=40; their contributions (e.g., 0.15-0.15 and +0.23+0.23) sum to the final classification logit, making the inference process explicit (Elhadri et al., 15 Dec 2025, Elhadri et al., 10 Sep 2025).

7. Broader Impact and Significance

XNNTab demonstrates that neural models for tabular data can achieve both expressiveness and transparency without tradeoff. The SAE yields a directly interpretable intermediate representation, while rule assignment via Skope-rules formalizes the semantic meaning of each hidden component. This enables deployment in domains where regulatory or scientific audit trails are mandatory, and supports model interrogation at a mechanistic, rather than post hoc, level.

A plausible implication is that other neural architectures for tabular or similar structured data may also benefit from monosemantic feature decomposition and systematic rule extraction, potentially catalyzing further progress toward fully explainable deep learning systems for critical applications (Elhadri et al., 10 Sep 2025, Elhadri et al., 15 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to XNNTab.