Papers
Topics
Authors
Recent
Search
2000 character limit reached

Representational Dynamics Analysis (RDA)

Updated 27 June 2026
  • Representational Dynamics Analysis (RDA) is a framework that quantifies and visualizes the evolution of data representations using methods like RDM, MDS, and Procrustes alignment.
  • It integrates geometric, topological, and information-theoretic tools to capture temporal, layerwise, and training-induced changes in both neural and artificial systems.
  • RDA enables practical insights for model selection, pruning, and mechanistic interpretability by correlating novel metrics with performance and functional change.

Representational Dynamics Analysis (RDA) is a broad methodological framework for quantifying, visualizing, and interpreting how internal data representations evolve over time, across layers, or during optimization in neural and biological systems. RDA formally connects measurement tools from representational similarity analysis, multidimensional scaling, topology, subspace geometry, neighborhood graphs, and information theory to enable rigorous comparison of representational changes. Core use cases encompass time-resolved neural recordings, model training or transfer, and layer-wise evolution in deep networks, with implications for mechanistic interpretability, model selection, pruning, and understanding representation learning regimes (Lin et al., 2019, Barannikov et al., 2021, Jiang et al., 12 May 2026, Kokilepersaud et al., 18 May 2025).

1. Core Methodological Pipeline: RDM, MDS, and Procrustes Alignment

RDA originated from analyses of time-dependent neural recordings where the challenge was to extract, compare, and visualize the evolving representational geometry. Lin & Kriegeskorte et al. established a canonical pipeline composed of the following steps (Lin et al., 2019):

  1. Representational Dissimilarity Matrix (RDM): For a set of NN stimuli, at each time window tt, the response-pattern vectors xi(t)Rp\mathbf{x}_i(t) \in \mathbb{R}^p (for pp measured units) are used to compute pairwise dissimilarity matrices D(t)D(t), using either correlation distance

Dijcorr(t)=1corr(xi(t),xj(t))D_{ij}^{\mathrm{corr}}(t) = 1 - \mathrm{corr}(\mathbf{x}_i(t), \mathbf{x}_j(t))

or Euclidean distance.

  1. RDM Movie: Sliding-window RDMs D(t),t=1,,TD(t), t=1, \ldots, T, are stacked to create an RDM trajectory or “movie” capturing representational dynamics throughout the stimulus period.
  2. Multidimensional Scaling (MDS): Each D(t)D(t) is embedded into a low-dimensional Euclidean space (X(t)RN×dX(t) \in \mathbb{R}^{N \times d}, d=2d=2 or tt0), using classical or nonmetric MDS, such that pairwise distances in tt1 approximate tt2.
  3. Procrustes Alignment (pMDS): Because each tt3 lives in its own arbitrary reference frame (due to indeterminacies of rotation, reflection, and scaling), all MDS configurations are aligned into a common, temporally coherent frame via generalized Procrustes analysis, producing aligned embeddings tt4.
  4. Trajectory-Based Quantification: Category centroids, inter-centroid distances, trajectory lengths, instantaneous speeds, convex-hull areas, and oscillatory analysis (Fourier or wavelet) are extracted from tt5 for further quantification.

This RDA pipeline enabled novel insights, such as the hierarchical and stagewise emergence of categorical information in the monkey IT cortex, and the presence of oscillatory post-stimulus convergence (Lin et al., 2019).

2. Topological and Multi-Scale Geometric Analysis: RTD-Based RDA

Representation Topology Divergence (RTD) extends RDA to the topological domain, quantifying multi-scale (“persistent homology”-based) differences between two point cloud representations (e.g., model epochs or layers), even if embedded in spaces of differing dimensions (Barannikov et al., 2021). The RTD protocol is as follows:

  1. Point Cloud Extraction: For a fixed batch of tt6 samples with one-to-one correspondence, representations tt7 and tt8 are constructed.
  2. Distance Graphs and Vietoris–Rips Filtration: Weighted complete graphs are formed using pairwise Euclidean distances. The Vietoris–Rips filtration tt9 defines simplicial complexes at every scale xi(t)Rp\mathbf{x}_i(t) \in \mathbb{R}^p0, tracking the emergence and disappearance of topological features (clusters, loops, voids).
  3. R-Cross-Barcode Construction: The combined distance matrix merges information from both representations, capturing topological features present in one but not the other.
  4. RTD Computation: The total persistence (sum of lifetimes of topological features in the cross-barcode) is the RTD score. This is symmetrized between both directions and tracked across epochs to measure convergence.

Empirically, RTD was shown to correlate almost perfectly with test-time label disagreement and outperform kernel-matrix-based similarity (CKA/HSIC) in detecting functionally-relevant representational change during network training (Barannikov et al., 2021).

3. Layerwise Measurement of Representation Dynamics: LRD Framework

The Layer-wise Representation Dynamics (LRD) framework advances RDA by decomposing hidden-state evolution along neural network depths into three mathematically distinct diagnostic classes (Jiang et al., 12 May 2026):

  1. Frenet Family (Global Subspace Motion):
    • Adopts Grassmannian geometry to compute sequential subspace displacement between principal directions at each layer. The Grassmann distance xi(t)Rp\mathbf{x}_i(t) \in \mathbb{R}^p1 is computed from principal angles. The overall end-to-end displacement xi(t)Rp\mathbf{x}_i(t) \in \mathbb{R}^p2 and curvature measures further characterize the trajectory.
  2. Neighborhood Retention Score (NRS, Local Stability):
    • For anchor points, the Jaccard overlap of xi(t)Rp\mathbf{x}_i(t) \in \mathbb{R}^p3-nearest neighbors is computed between consecutive layers, with late-layer mean NRS (xi(t)Rp\mathbf{x}_i(t) \in \mathbb{R}^p4) summarizing local retention.
  3. Graph Filtration Mutual Information (GFMI, Alignment to Final Layer):
    • Cosine xi(t)Rp\mathbf{x}_i(t) \in \mathbb{R}^p5-NN graphs are thresholded by distance percentile to yield a filtration per layer; mutual information is computed between connected-component partitions at layer xi(t)Rp\mathbf{x}_i(t) \in \mathbb{R}^p6 and the final layer, integrated over percentiles.

Large-scale application to curated model/task matrices (31 models, 30 datasets) identified xi(t)Rp\mathbf{x}_i(t) \in \mathbb{R}^p7 as the strongest unsupervised correlate of downstream task performance, while GFMI excelled at identifying layers that could be pruned without substantial performance loss. NRS was sensitive in retrieval tasks but less stable for pruning (Jiang et al., 12 May 2026).

4. Entropy, Mutual Information, and RDA in SSL: Dynamics and Dimensionality

Recent work has connected RDA with quantitative information-theoretic analysis of self-supervised learning (SSL) representations (Kokilepersaud et al., 18 May 2025):

  • Dimensionality (xi(t)Rp\mathbf{x}_i(t) \in \mathbb{R}^p8): Estimated by von Neumann or Renyi entropy of the normalized eigenvalue spectrum of the representation covariance.
  • Mutual Information (xi(t)Rp\mathbf{x}_i(t) \in \mathbb{R}^p9): Closed-form or matrix Renyi estimator between the high-dimensional representation pp0 and projected embedding pp1.

Key findings:

  • Early Training: Increases in entropy pp2, driven by feature decorrelation, are accompanied by increasing pp3.
  • Late Training: Increased uniformity (spread) in pp4 yields further entropy gains but causes pp5 to plateau or decrease, due to information bottleneck effects in pp6.
  • Performance Manifold: The best-performing SSL models settle at intermediate pp7 values; neither maximizing entropy nor minimizing mutual information alone suffices.

These dynamics motivated AdaDim, an adaptive algorithm that interpolates between feature-decorrelating (e.g., VICReg covariance) and sample-uniformizing (e.g., InfoNCE) objectives according to the empirical effective rank of representations during training (Kokilepersaud et al., 18 May 2025).

5. Applications and Empirical Insights

RDA methods have enabled:

  • Visualization and Interpretation: Smooth temporal movies or layerwise trajectories revealing category separation, recurrence, and reorganization in both neural and artificial systems (Lin et al., 2019).
  • Model Selection: Unsupervised RDA criteria (e.g., pp8, GFMI, RTD) predict downstream accuracy and can guide pre-benchmarking evaluation (Barannikov et al., 2021, Jiang et al., 12 May 2026).
  • Inference-Time Layer Pruning: RDA identifies structurally redundant layers (GFMI achieves the lowest performance drop at 15–20% pruned budgets, outperforming both random and last-pp9 removals) (Jiang et al., 12 May 2026).
  • Optimization and Training Regimes: RDA timecourses reveal mechanistic phase transitions—e.g., cluster formation, subspace rotation, topology stabilization—that are invisible at the level of simple average similarity (Kokilepersaud et al., 18 May 2025).
  • Comparative Model Analysis: LRD signatures distinguish between encoder-/decoder-based embedders and base LLMs and explain architectural and task-level variation invisible at the final embedding layer (Jiang et al., 12 May 2026).

6. Methodological Comparison and Theoretical Considerations

The following table summarizes major RDA methodologies:

Method/Families Signal Captured Main Strengths
RDM+pMDS (Lin et al., 2019) Pairwise geometry/time evolution Category/trajectory visualization
RTD (Barannikov et al., 2021) Multiscale topology (homology) Topology-aware, correlates w/ function
LRD: Frenet/NRS/GFMI (Jiang et al., 12 May 2026) Subspace, local, graph alignment Predicts accuracy, guides pruning
Entropy/Mutual Info (Kokilepersaud et al., 18 May 2025) Global dim. & channel information Dissects learning phases in SSL

While classical methods such as CKA or SVCCA capture linear and kernel similarity, they miss topological phenomena detected by RTD and cannot track phasewise geometry revealed by LRD. RDA methodologies with topological and geometric sensitivity have been shown to strongly correlate with functional dissimilarity and to capture true manifold evolution during optimization (Barannikov et al., 2021).

7. Open Questions and Future Directions

Open lines of investigation include:

  • Robustness to hyperparameters: Sensitivity of RDA metrics to subsample size, distance/graphtype, or preprocessing in large models remains underexplored (Jiang et al., 12 May 2026).
  • Extension to new modalities: Application of RDA with mathematically coherent measures across vision, NLP, and speech is an emerging direction (Kokilepersaud et al., 18 May 2025, Jiang et al., 12 May 2026).
  • Relation to mechanistic/causal interpretability: Linking RDA trajectories, topological events, or phase transitions to circuit-level or conceptual units in neural/AI systems is unresolved.
  • Beyond one-to-one correspondence: RTD and related methods require matched samples; extending RDA to domains (e.g., cross-domain transfer) without alignment is an open challenge (Barannikov et al., 2021).
  • Task-specific diagnostic strategies: Refining RDA as a toolkit for early stopping, remodularization, or targeted fine-tuning via task-informed metric weighting (Jiang et al., 12 May 2026, Kokilepersaud et al., 18 May 2025).

RDA thus represents a mathematically grounded, empirically validated, and still rapidly evolving set of techniques for probing the dynamics of learned and biological representations.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Representational Dynamics Analysis (RDA).