Revisiting Model Stitching to Compare Neural Representations (2106.07682v1)

Published 14 Jun 2021 in cs.LG and stat.ML

Abstract: We revisit and extend model stitching (Lenc & Vedaldi 2015) as a methodology to study the internal representations of neural networks. Given two trained and frozen models $A$ and $B$, we consider a "stitched model'' formed by connecting the bottom-layers of $A$ to the top-layers of $B$, with a simple trainable layer between them. We argue that model stitching is a powerful and perhaps under-appreciated tool, which reveals aspects of representations that measures such as centered kernel alignment (CKA) cannot. Through extensive experiments, we use model stitching to obtain quantitative verifications for intuitive statements such as "good networks learn similar representations'', by demonstrating that good networks of the same architecture, but trained in very different ways (e.g.: supervised vs. self-supervised learning), can be stitched to each other without drop in performance. We also give evidence for the intuition that "more is better'' by showing that representations learnt with (1) more data, (2) bigger width, or (3) more training time can be "plugged in'' to weaker models to improve performance. Finally, our experiments reveal a new structural property of SGD which we call "stitching connectivity'', akin to mode-connectivity: typical minima reached by SGD can all be stitched to each other with minimal change in accuracy.

Citations (93)

View on Semantic Scholar

Summary

The paper demonstrates model stitching as a robust tool to compare neural representations beyond traditional metrics like CKA.
It reveals that stitching connectivity between different network layers uncovers compatibility even across diverse training paradigms.
Empirical results show that scaling training resources enhances representation quality, offering practical benefits for transfer learning pipelines.

An Analysis of Model Stitching and Neural Representations

The paper "Revisiting Model Stitching to Compare Neural Representations" investigates the internal representations of neural networks through the lens of model stitching. It provides a framework for connecting the lower layers of one neural network to the upper layers of another, using a trainable layer in between. This methodology offers a unique perspective for exploring neural representations, supplementing traditional approaches such as Centered Kernel Alignment (CKA).

The authors present a thorough experimental exploration of model stitching, contrasting it with existing similarity measures while demonstrating its potential for yielding insightful conclusions about the nature of learned representations. The key contributions can be summarized as follows:

Model Stitching as a Comparative Tool: The authors emphasize model stitching's ability to reveal aspects of neural representations that are not captured by traditional representational similarity metrics like CKA. They argue that model stitching provides an operational perspective by allowing layers from different networks to be "plugged in" to each other, thus assessing compatibility and the interchangeability of learned representations.
Stitching Connectivity: The paper introduces the notion of "stitching connectivity," where models initialized and trained independently using Stochastic Gradient Descent (SGD) can be stitched at various layers without a drop in performance. This property suggests a convergence in the learned representations for networks trained under similar conditions.
Exploring Training Variations: The authors critically assess the outcomes of different training paradigms—such as supervised versus self-supervised learning—and find that despite differing training methodologies, networks of the same architecture tend to learn representations that can be effectively stitched. They provide strong empirical results indicating that networks trained with different objectives maintain compatible intermediate representations, facilitating a robust interchange.
Scaling Effects and Representation Quality: A vital aspect explored is the effect of scaling, including larger datasets, increased network width, and extended training times. The findings underscore that representations developed from richer training setups can be swapped into networks trained under less resource-intensive conditions, often resulting in performance improvements. This aligns with the widely held intuition in the community that "more is better" when it comes to data and compute resources.
Theoretical and Practical Implications: The insights obtained from the research have both theoretical and practical ramifications. Theoretically, they contribute to understanding layer-wise learning dynamics and representation modularity. Practically, they suggest that model stitching could optimize transfer learning pipelines through the reuse and combination of trained subcomponents across tasks and models.
Methodological Insights: An interesting methodological comparison is made between stitching, which focuses on task-specific alignment and linear transformations, and more abstract similarity measures like CKA. Model stitching emerges as a task-sensitive approach that aligns more closely with practical performance impacts.

In conclusion, "Revisiting Model Stitching to Compare Neural Representations" significantly expands upon our understanding of neural representations and the dynamics of training. By establishing model stitching not just as a conceptual novelty but as a robust analytical tool, the authors provide a means to compare and analyze deep learning architectures in ways that could influence future research and application designs. The paper invites further investigation into how various training setups and architectures might leverage or manipulate the inherent connectivity of neural representations.

PDF Markdown

Related Papers

Tweets

https://twitter.com/BogdanIonutCir2/status/1848407708942090454

https://twitter.com/datagenproc/status/1790702478419284250

https://twitter.com/datagenproc/status/1756815434702037229

https://twitter.com/sidhusmart/status/1934958597617955163

YouTube

Show All Videos