Interpretable Machine Learning in Physics

Updated 27 February 2026

Interpretable machine learning in physics is an approach that uses models designed to yield physically meaningful explanations aligned with known laws.
It integrates techniques like symbolic regression, decision trees, and physics-informed neural networks to reveal internal model logic.
This paradigm empowers researchers to validate theories and discover new empirical laws with actionable, verifiable insights.

Interpretable machine learning (IML) in physics seeks to bridge the gap between the predictive power of complex data-driven models and the fundamental requirement for scientific understanding, hypothesis formation, and trust. Interpretability—understood as the model’s capacity to yield physically meaningful insight or to align its logic with known principles—enables domain experts to transcend the “black-box” opacity of expressive machine-learned functions and ensures that automated analyses contribute actionable, verifiable, and human-comprehensible information. IML in physics thus encompasses a spectrum of mathematical, computational, and methodological strategies, uniting statistical, symbolic, and physics-informed paradigms to produce models whose internal structures and outputs correspond to physically relevant observables, laws, or mechanisms.

1. Interpretability: Concepts and Taxonomy

Interpretability in the context of physics divides along several principal axes:

Intrinsic vs. post-hoc: Some models, such as decision trees or symbolic regressions, are interpretable by construction (intrinsic). Others, such as neural networks, require auxiliary interpretation techniques (post-hoc) to extract meaning from their function (Wetzel et al., 30 Mar 2025).
Mechanistic vs. functional: Mechanistic interpretability refers to analyses of internal model computations or parameters (e.g., feature importance via the Hessian), while functional interpretability concerns the mapping from input variables to output predictions as an encapsulated law or formula.
Local vs. global: Local interpretability addresses explanations of individual predictions (e.g., saliency maps, SHAP values), whereas global interpretability seeks an overall summary of decision logic or learned relationships (e.g., decision rules, order parameters, symbolic formulas) (Wetzel et al., 30 Mar 2025, Ghiringhelli, 2021).
Verifying vs. discovering: Interpretability frameworks can either verify known scientific concepts (e.g., order parameter identification in phase transitions) or discover new latent quantities (e.g., previously unrecognized invariants or empirical laws).
Low-level vs. high-level features: Some approaches emphasize interpretations using near-input quantities (e.g., raw variables, pixels, instrument readings), while others target latent, emergent, or concept-like internal features.

Consensus emphasizes that for physics, interpretable models must align compact explanations—mechanistic or functional—with known symmetries, conservation laws, or hypothesized mechanisms, and must yield actionable insights for model refinement or experimental design (Wetzel et al., 30 Mar 2025, Ghiringhelli, 2021).

2. Model Classes and Formalism

IML in physics encompasses a wide array of model classes and formal strategies:

Symbolic regression: Analytical expressions are distilled from data (or learned surrogates of deep models) using genetic programming or sparse regression; the result is a closed-form formula balancing accuracy and complexity (e.g., via PySR) (Maheshwari et al., 7 Dec 2025, Mengel et al., 2023, Faucett et al., 2020). These formulas often recover or re-derive physical laws such as volume or surface correlations in nuclear charge radii.
Decision trees and rule lists: Tree-based classifiers partition feature space into axis-aligned or hyperplane-defined regions; each path forms a human-readable rule, providing explicit logical structure and interpretability (Kapteyn et al., 2020, Wetzel et al., 30 Mar 2025).
Functional and linear operator surrogates: Generalized functional linear models (GFLMs) use additive (often integral) kernels on fields or images to yield tractable surrogates for physics operator learning or for interpreting deep networks (Arzani et al., 2023).
Physics-informed neural networks (PINNs) and hybrid architectures: Neural networks are constrained by embedding governing equations in the loss (e.g., PDE residuals), thereby ensuring the output respects physical laws and enabling interpretation via inspection of layers or terms in physically meaningful bases (Rudin et al., 2021, Mohammed et al., 25 Feb 2026).
Port-Hamiltonian (p-H) modeling: Model parameters are mapped one-to-one onto physical quantities such as energy, damping, or coupling, ensuring that learned models retain classical system-theoretic interpretability (Matei et al., 2020).
Kernel methods and SVMs: Polynomial or physically informed kernels in support vector machines can explicitly recover analytic order parameters or Hamiltonian constraints, particularly in many-body spin and gauge systems (Ponte et al., 2017).
Meta-learning with affine or factorized heads: Models such as CAMEL constrain adaptation to affine transformations in task parameters, enabling direct correspondence between learned task embeddings and physically meaningful context variables (Blanke et al., 2023).

The following table summarizes key model classes and their associated interpretability modes:

Model Class	Interpretability Mode	Example Applications
Symbolic regression	Functional, global	Nuclear radii formulas, jet subtraction
Decision trees/rule lists	Mechanistic, local-global	Digital twins, structural health
PINNs/physics-informed NN	Functional via constraints	PDE modeling, vessel power prediction
SVMs/kernel methods	Mechanistic, analytic mapping	Ising/gauge theory order parameters
GFLMs	Functional kernel-based	Surrogate for DL, operator learning
p-H modeling	Mechanistic via system theory	Robotics, swarm dynamics
Affine meta-learning	Mechanistic task embedding	System ID, robotic adaptation

3. Methodologies for Interpretability

The methodologies used to imbue models with interpretability or to extract interpretation vary by model class and objective:

Explicit feature engineering: Features are deliberately chosen for physical relevance (e.g., proton/neutron numbers, binding energies in nuclear ML) (Maheshwari et al., 7 Dec 2025, Mumpower et al., 2022).
Architectural constraints: Sparse, piecewise-linear, or additive structures are imposed (e.g., hierarchical mixtures of experts, GFLMs, KANs) to ensure decomposability (Iwasaki et al., 2019, Mohammed et al., 25 Feb 2026, Arzani et al., 2023).
Regularization and penalty design: Sparsity ( $\ell_0$ or $\ell_1$ ), complexity penalizations, or physically-based penalties (e.g., Garvey–Kelson relations in nuclear ML, physics-informed residuals in PINNs) are used to align learned models with known laws or to avoid overfitting (Mumpower et al., 2022, Maheshwari et al., 7 Dec 2025, Rudin et al., 2021).
Post-hoc analysis and model distillation: Surrogates are trained on black-box model outputs via symbolic regression, boosting, or rule extraction; relevant techniques include PySR, decision-tree surrogates, SHAP, and attention mechanism visualization (Mengel et al., 2023, Faucett et al., 2020, Wetzel et al., 30 Mar 2025).
Loss landscape and landscape-theoretic analysis: Tools from energy-landscape theory—disconnectivity graphs, identification of conserved weights in loss minima clusters—are imported to investigate feature importance and network symmetries (Niroomand et al., 2023).
Sensitivity and attribution techniques: Gradients, Hessians, and influence functions are computed to quantify feature contributions, uncertainty, and extrapolation risks (Dawid et al., 2021, Grojean et al., 2022).

Key practical steps include:

Enforcing or extracting sparsity to ensure each model decision can be attributed to a small number of terms or features.
Applying symbolic regression post-processing to neural networks to reveal the form of physical dependencies underlying learned predictions.
Designing hybrid workflows where numerical regressors smooth experimental data, and symbolic search is performed on the interpolated model outputs for tractable analytic expressions (Maheshwari et al., 7 Dec 2025).
Employing clustering or rule-analysis in tree-based or mixture-of-expert architectures to partition the feature space into physically meaningful regimes.

4. Applications and Case Studies

IML has seen broad adoption and impact across multiple domains of physics:

Statistical Mechanics and Phase Transitions: Neural network outputs calibrated as physical observables allow the direct application of histogram reweighting, finite-size scaling, and conjugate field analysis without prior knowledge of the true order parameter. This enables extraction of critical points and exponents from ML predictions alone (Aarts et al., 2021).
High-Energy and Nuclear Physics: Symbolic regression distilled deep background-subtraction neural networks into transparent, experimentally verifiable formulas, matching and physically explaining the network performance in jet studies (Mengel et al., 2023). Hybrid regressors and symbolic approaches in nuclear charge radii reconstruct closed-form expressions consistent with and extending beyond traditional semi-empirical models (Maheshwari et al., 7 Dec 2025).
Materials Science and Thermoelectricity: Factorized Bayesian hierarchical mixtures of experts uncovered nontrivial composition–spin interaction laws in spin-driven thermoelectric materials, leading directly to synthesis of a record thermopower compound (Iwasaki et al., 2019).
Quantum Many-Body Systems: Influence-function and Hessian analysis enables identification of relevant configurations, quantification of uncertainty, and detection of extrapolation regimes in black-box models applied to condensed matter and quantum phase transitions (Dawid et al., 2021).
Operator Learning and Surrogates: GFLMs have provided interpretable, kernel-based surrogates for deep models in mechanics and fluid dynamics, outperforming neural networks in out-of-distribution generalization and yielding physically meaningful analytic mappings (Arzani et al., 2023).
Meta-Learning and Multi-Environment Adaptation: Affine meta-learning methods demonstrate direct correspondence between learned weights and physical parameters, supporting zero-shot adaptation and parameter recovery in control and system identification (Blanke et al., 2023).
Physics-Informed Digital Twins and Engineering: Optimal decision trees built on data generated by libraries of physics-based models construct interpretable and explainable digital twins for health monitoring in complex systems such as unmanned aerial vehicles (Kapteyn et al., 2020).

5. Metrics and Trade-Offs

Interpretability is fundamentally multi-dimensional and in physics must be evaluated alongside predictive accuracy and domain-aligned behavior:

Sparsity: Fraction of nonzero parameters or terms indicates complexity and simulatability.
Explanation-model fidelity: The agreement of a surrogate or explanation model with the full model’s predictions, e.g., via accuracy, ROC AUC, or decision ordering metrics (Faucett et al., 2020).
Rule/path complexity: In tree-based or rule-list models, the number and length of rules quantify their interpretability.
Concept alignment: Mutual information or correlation between learned latent variables and known physical parameters reflects the model’s alignment with domain concepts (Wetzel et al., 30 Mar 2025).

A fundamental trade-off exists: more complex (deep, nonlinear, or high-dimensional) models offer superior predictive performance, especially in high-noise or high-dimensional regimes, but at the cost of interpretability. Methods such as symbolic regression, mixture-of-expert sparsification, and complexity-constrained tree models explicitly negotiate this frontier by seeking compact, accurate approximators (Wetzel et al., 30 Mar 2025, Iwasaki et al., 2019, Mohammed et al., 25 Feb 2026).

6. Philosophical and Epistemological Considerations

IML in physics raises essential questions about the nature of scientific explanation. While explainable AI approaches deliver algorithmic intelligibility, scientific interpretability demands that models embed or recover explanatory information consistent with theoretical frameworks, symmetries, and causal structures. Explanations that are mechanistically transparent but not physically aligned may not support scientific understanding or trust (Wetzel et al., 30 Mar 2025, Ghiringhelli, 2021). Fully interpretable models can serve as toy models or idealizations that foster conceptual advances and hypothesis generation, even if not exhaustive or fully accurate in every regime.

Emergent research continues to connect IML to deeper epistemological constructs: symmetries and invariances (Noether’s theorem, tensor SVMs), causal and generative constraints, and unifying frameworks that enable cross-domain scientific understanding.

7. Future Directions and Open Challenges

Key research frontiers in interpretable machine learning for physics include:

Development of standardized interpretability metrics and benchmarking protocols for scientific use cases, beyond subjective or domain-specific heuristics (Wetzel et al., 30 Mar 2025).
Automated scientific discovery, closing the loop from data-driven hypothesis generation to explanation, validation, and theory refinement (Wetzel et al., 30 Mar 2025).
Robustness to distribution shift and uncertainty quantification, particularly in out-of-sample or extrapolative regimes (Dawid et al., 2021, Arzani et al., 2023).
Integration of geometric, topological, and category-theoretic insights to advance the understanding and interpretability of deep learning representations (Wetzel et al., 30 Mar 2025).
Generalization of physics-informed and interpretable learning to quantum, stochastic, and multi-scale domains, while respecting domain-specific epistemic constraints and physical law enforceability (Blanke et al., 2023, Rudin et al., 2021).
Broader adoption of hybrid and modular architectures, combining physics-encoded, symbolic, and data-driven layers for tractable, trustworthy modeling of complex phenomena (Maheshwari et al., 7 Dec 2025, Mohammed et al., 25 Feb 2026).
Formalizing and extending the relationships between interpretability, discoverability, and computational tractability in high-dimensional and high-noise science applications.

A guiding principle is that interpretable ML must not only recover existing physical knowledge, but also provide a systematic avenue for uncovering new laws and mechanisms, thus acting as a true companion to scientific inquiry.