Papers
Topics
Authors
Recent
2000 character limit reached

Git Re-Basin: Canonical Model & Code Merging

Updated 11 January 2026
  • Git Re-Basin is a canonicalization technique for aligning neural network weights and software patches by algorithmically transforming them into a unified form.
  • It employs activation matching, weight matching, and straight-through estimators to optimize alignment and achieve near-zero loss barriers in model interpolation.
  • The method extends to refactoring-aware patch integration in version control, addressing structural discrepancies and enhancing merging accuracy.

Git Re-Basin refers to a canonicalization methodology applied to weight matching in deep neural networks and to refactoring-aware patch integration in software version control systems. Across domains, its unifying purpose is to reconcile structurally divergent entities—be they neural network models or software repositories—by algorithmically realigning them into a shared "basin" or canonical form that facilitates seamless merging, transfer, or generative modeling. The principal mechanisms rely on permutation symmetry, assignment algorithms, and structural transformation pipelines tailored to the underlying data modality, either weights or abstract syntax trees.

1. Permutation Symmetry and Model Canonicalization

Permutation symmetry in feedforward neural networks arises from the fact that permuting the order of hidden units with corresponding adaptations to adjacent weight matrices leaves the network's functional output invariant. In mathematical terms, for an LL-layer MLP with weights WeRde+1×deW_e\in\mathbb{R}^{d_{e+1}\times d_e}, applying a permutation matrix P{0,1}de+1×de+1P\in\{0,1\}^{d_{e+1}\times d_{e+1}} to the outputs of layer ee and PP^\top to the inputs of layer e+1e+1 yields functionally equivalent weights:

ze+1=σ(Weze+be)    Pze+1=σ(PWeze+Pbe)=σ(Weze+be)z_{e+1} = \sigma(W_e z_e + b_e) \implies P z_{e+1} = \sigma(P W_e z_e + P b_e) = \sigma(W'_e z_e + b'_e)

where We=PWeW'_e = P W_e, be=Pbeb'_e = P b_e, and We+1=We+1PW'_{e+1} = W_{e+1} P^\top (Gupta et al., 8 Jan 2026, Ainsworth et al., 2022). This invariance leads to multi-modality in weight space: multiple networks differing only by permutations are equivalent under the loss function but encoded at distant points in parameter space.

Git Re-Basin is thus introduced as a weight-matching technique to align multiple trained models to a single reference, breaking permutation-induced multi-modality and producing a unified canonical representation. The canonicalization objective for LL layers seeks permutations {P1,,PL}\{P_1,\dots,P_L\} maximizing the summed inner product over layers:

argmaxP1PLe=1LWe,PeWePe1\arg\max_{P_1\dots P_L} \sum_{e=1}^{L} \langle W^*_e, P_e W_e P_{e-1}^\top\rangle

subject to PeP_e being permutation matrices. This is a Sum-of-Bilinear-Assignment Problem (SOBLAP), which is NP-hard in general (Gupta et al., 8 Jan 2026, Ainsworth et al., 2022).

2. Assignment Algorithms and Practical Canonicalization

Three principal algorithms operationalize Git Re-Basin for neural networks (Ainsworth et al., 2022):

  • Activation Matching ("data-aware LAP"): Collect layer-wise activations across nn data examples from reference and candidate models, then solve the layer-wise linear assignment problem via the Hungarian or Jonker–Volgenant algorithms. Each permutation PP_\ell aligns units by either maximizing activation similarity or minimizing the Frobenius norm of the difference.
  • Weight Matching ("coordinate-descent SOBLAP"): Directly matches the weights by iteratively solving for PP_\ell using a block-coordinate descent heuristic. Each update comprises a linear assignment on a cost matrix derived from the inner products between layers, producing strictly increasing alignment until convergence.
  • Straight-through Estimator (STE): Uses a continuous parameterization of permutation to minimize the midpoint loss along the interpolation path, applying projection in the forward pass but identity in the backward pass to facilitate gradient computation.

The weight matching approach is both data-agnostic and highly efficient, enabling linear mode connectivity in seconds for large architectures. Activation matching requires a pass over data, while STE yields the lowest interpolation barrier but with greater computational cost.

3. Empirical Outcomes and Theoretical Insights

Experiments across MLPs, VGG-16, and ResNet architectures demonstrate that:

  • Naïve linear interpolation between independently trained models produces substantial loss barriers.
  • Weight- and activation-matched models exhibit nearly zero loss interpolants, and in some cases, midpoint networks outperform endpoints.
  • Wider networks (e.g., ResNet variants) achieve near-zero barrier connectivity after re-basin, while under-parameterized and early-trained models remain resistant.
  • In federated learning and model ensembling, matching enables merging and post-hoc calibration with minimal inference cost (Ainsworth et al., 2022).

The theoretical conjecture posits that stochastic gradient descent (SGD)-trained solutions form a set S\mathcal{S} such that any two can be permutation-aligned for barrier-free interpolation. Counterexamples indicate this property hinges on architecture and optimization bias, not on permutation symmetry alone.

4. Git Re-Basin in Generative Weight Modeling

Within generative models for neural network weights, e.g., Flow Matching in DeepWeightFlow (Gupta et al., 8 Jan 2026), Git Re-Basin plays a central role in reducing the complexity of the weight space. Pre-aligning training models to a common basin enables the generative model to learn a smooth mapping, bypassing modal jumps caused by permutation symmetries. Empirically, canonicalizing a training set of 100 ResNet-18 models (11M parameters each) with Git Re-Basin takes approximately 2 minutes. For moderate-capacity generative models:

  • Generated accuracies with Re-Basin: Iris (MLP): 91.87 ± 2.23%, MNIST (MLP): 96.19 ± 0.27%.
  • Generated accuracies without Re-Basin: Iris (MLP): 90.80 ± 4.86%, MNIST (MLP): 91.74 ± 10.37%.

The canonicalization facilitates scalability to larger models (O(100M parameters)) due to smoother training distributions.

5. Refactoring-Aware Rebasin Mechanisms in Software Repositories

Git Re-Basin methodologies have also been extended to version control, specifically for integrating patches across long-lived, structurally divergent forks (Ogenrwot et al., 8 Aug 2025). Structural drift, primarily from independent refactorings, invalidates line-based correspondence, causing git cherry-pick or rebase to fail in 64.4% of studied cases.

The refactoring-aware rebasin mechanism, as realized in RePatch, models refactorings as invertible transformations on program abstract syntax trees (ASTs). The integration algorithm follows an invert-apply-replay pipeline:

  • Detection: Use RefactoringMiner to identify refactorings in both patch and target history.
  • Inversion: Rewind the target repository and patch context by inverting detected refactorings.
  • Application: Apply the de-refactored patch in the structurally aligned context.
  • Replay: Restore the target variant’s structure by replaying refactorings post-patch.
  • Commit: Add the merged changes to the target repository.

Across 478 bug-fix pull requests, RePatch successfully resolved 52.8% of patches that failed via vanilla cherry-pick, mainly by mitigating misalignments due to renames, moves, and parameter changes.

6. Re-Basin Operations in Versioned Agent Contexts

In LLM-based agent systems, versioned memory hierarchies can also benefit from rebasin operations (Wu, 30 Jul 2025). Here, agent memory is structured as a file-system resembling Git (.GCC/), with explicit operations:

  • COMMIT: Snapshot current workspace and metadata.
  • BRANCH: Diverge from a commit to isolate experimental plans.
  • MERGE: Perform three-way merges after lowest-common-ancestor identification.
  • CONTEXT: Stream back summaries at requested granularity.

The Re-Basin operator in GCC formally takes the sequence of commits from a branch and replays them onto a new base, producing a new branch with updated snapshots and conflict resolution during application. Complexity is linear in commit count and patch size, and in typical agent workflow scenarios, rebasin executes in under one second.

7. Limitations, Integration Guidelines, and Future Directions

Known constraints in Git Re-Basin for neural network matching include the requirement for consistent layer widths, inapplicability to non-hidden-unit symmetries, and observed failures for under-parameterized or early-training architectures (Ainsworth et al., 2022). In refactoring-aware patching, accuracy depends on refactoring detection tool precision and recall, and current implementations are language-specific (Java), with possible extensions to other languages or LLM-based patch adaptation.

Recommended practice includes using weight matching for speed and reliability in neural net rebasin, recalculating normalization statistics after mergers, and integrating refactoring-aware mechanisms into git rebase workflows (e.g., git rebase --re-basin or custom Gerrit/GitLab CI hooks).

Git Re-Basin thus furnishes a unifying technical framework enabling robust, permutation-invariant model merging, efficient generative modeling in weight space, and semantically-aware patch transfer in software engineering. Its adoption addresses major impediments in model and codebase integration, with empirical evidence of substantial improvements in merging accuracy, calibration, and scalability (Gupta et al., 8 Jan 2026, Ainsworth et al., 2022, Ogenrwot et al., 8 Aug 2025, Wu, 30 Jul 2025).

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Git Re-Basin.