Through a Steerable Lens: Magnifying Neural Network Interpretability via Phase-Based Extrapolation (2506.02300v3)

Published 2 Jun 2025 in cs.LG

Abstract: Understanding the internal representations and decision mechanisms of deep neural networks remains a critical open challenge. While existing interpretability methods often identify influential input regions, they may not elucidate how a model distinguishes between classes or what specific changes would transition an input from one category to another. To address these limitations, we propose a novel framework that visualizes the implicit path between classes by treating the network gradient as a form of infinitesimal motion. Drawing inspiration from phase-based motion magnification, we first decompose images using invertible transforms-specifically the Complex Steerable Pyramid-then compute class-conditional gradients in the transformed space. Rather than iteratively integrating the gradient to trace a full path, we amplify the one-step gradient to the input and perform a linear extrapolation to expose how the model moves from source to target class. By operating in the steerable pyramid domain, these amplified gradients produce semantically meaningful, spatially coherent morphs that highlight the classifier's most sensitive directions, giving insight into the geometry of its decision boundaries. Experiments on both synthetic and real-world datasets demonstrate that our phase-focused extrapolation yields perceptually aligned, semantically meaningful transformations, offering a novel, interpretable lens into neural classifiers' internal representations.

Summary

The paper introduces a phase-based gradient extrapolation method that visualizes class transitions by leveraging steerable decompositions.
It employs Complex Steerable Pyramid decomposition and Wirtinger calculus to compute and amplify model sensitivity gradients.
Experimental results on MNIST and facial data illustrate coherent morphing between classes, enhancing interpretability.

Through a Steerable Lens: Magnifying Neural Network Interpretability via Phase-Based Extrapolation

The paper "Through a Steerable Lens: Magnifying Neural Network Interpretability via Phase-Based Extrapolation" introduces a novel approach to enhancing the interpretability of neural networks. It proposes a framework leveraging phase-based extrapolation to visualize the implicit paths neural networks perceive between different classes.

Summary of Contributions

The research addresses a significant gap in existing interpretability methods that often highlight influential input regions without elucidating how models distinguish between classes. The authors propose a method that treats the network gradient as an infinitesimal motion, inspired by phase-based motion magnification techniques. By decomposing images using invertible transforms, specifically the Complex Steerable Pyramid (CSP), the framework calculates class-conditional gradients in the transformed space. Subsequently, the method amplifies these gradients to reveal the model's internal transition paths between classes.

Key contributions of this framework include:

A novel extrapolation-based method for visualizing neural network sensitivities using transformed amplitude-phase spaces.
Demonstration of semantically meaningful morphing sequences by extrapolating phase components.
Implementation of Wirtinger calculus for manipulating complex coefficients in transformed domains.
Empirical validation through experiments on synthetic and real-world datasets, showing perceptually aligned transformations.

Technical Approach

The research introduces a steered approach, utilizing the CSP decomposition, to decipher neural network decision boundaries. The CSP allows decomposition into amplitude (feature strength) and phase (feature position) components, affording a more structured manipulation space than pixel-based or purely frequency domain approaches. Gradient extrapolation is performed primarily on the phase to articulate the model's perceived transitions between source and target classes.

By employing Wirtinger calculus, the authors calculate gradients concerning complex-valued variables, thereby facilitating a principled analysis in amplitude-phase spaces. This mathematical formulation enables linear phase extrapolation without altering amplitude, ensuring transformations retain their visual integrity while highlighting decision-direction sensitivity.

Experimental Findings

The method is applied to several datasets, including a synthetic arcade dataset, MNIST, and facial expression data from FER2013. The results demonstrate coherent and meaningful morphing paths between classes, revealing insights into the neural network's decision-making process. For example, in the MNIST digit transformations, the approach reveals intuitive morphs (e.g., transforming '3' into '8' by closing off loops), indicating how models internally encode class distinctions.

Implications and Future Directions

The research presents both theoretical and practical implications for advancing interpretability in neural networks. By illuminating the decision processes underlying class transitions, this method provides a dynamic alternative to static saliency maps or adversarial perturbations. The extrapolation framework suggests that visualizing decision boundaries in structured transform spaces can yield intuitive insights aligned with human perception.

Potential future directions include exploring adaptive gradient steps, alternative decomposition transformations, and quantitative metrics for evaluating trajectory quality. Furthermore, extending this approach to generative models or regression contexts could broaden its applicability in understanding complex AI systems.

In conclusion, the paper offers a compelling framework to improve neural network interpretability through phase-based gradient extrapolation in structured transform domains, paving the way for deeper insights into complex decision-making processes in AI systems.