Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 91 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 29 tok/s
GPT-5 High 29 tok/s Pro
GPT-4o 98 tok/s
GPT OSS 120B 472 tok/s Pro
Kimi K2 196 tok/s Pro
2000 character limit reached

FlowBack: All-Atom Protein Backmapping

Updated 8 August 2025
  • FlowBack is a deep generative framework that converts Cα backbone traces into chemically detailed all-atom protein structures using conditional flow-matching.
  • It employs a six-layer equivariant graph neural network and drift-field matching to minimize L1 loss, achieving high fidelity in atomistic reconstruction despite minor bond errors.
  • FlowBack-Adjoint enhances the initial model with physics-aware, energy-guided post-training corrections that dramatically reduce bond-length errors and atomic clashes for MD-ready ensembles.

FlowBack is a deep generative framework for all-atom (AA) protein backmapping from coarse-grained (CG) traces, specifically Cα-backbone representations, to chemically detailed atomistic structures. The architecture is based on conditional flow-matching, where a continuous equivariant vector field transforms CG configurations to atomistic ensembles. FlowBack-Adjoint is an enhancement that applies a physics-aware and energy-guided post-training pass to a pre-trained FlowBack model, incorporating molecular mechanics energy gradients via adjoint matching, along with auxiliary correction fields, to produce lower-energy, physically plausible AA reconstructions capable of direct use in molecular dynamics simulations.

1. FlowBack Architecture and Training Procedure

FlowBack utilizes a conditional flow-matching framework, learning a continuous, equivariant vector field vγ(xt,t)v_\gamma(x_t, t) that evolves initial random AA configurations (sampled around each Cα via a tractable prior) toward the target distribution of AA structures conditioned on the CG backbone. The model is implemented using a six-layer Equivariant Graph Neural Network (EGNN) that operates on combined Cα and AA atomic positions.

The key training step is drift-field matching: for each training pair (Cα trace, AA configuration), and noise time tt, the model learns to reproduce the optimal drift by minimizing an L1L_1 loss between vγ(xt,t)v_\gamma(x_t, t) and the true step from the perturbed configuration to the reference AA ensemble. The forward diffusion process is deterministic,

dxtdt=vγ(xt,t),x0N(0,σp2I)\frac{dx_t}{dt} = v_\gamma(x_t, t), \quad x_0 \sim \mathcal{N}(0, \sigma_p^2 I)

where initial positions x0x_0 are sampled from a Gaussian and t[0,1]t \in [0,1] parameterizes the flow from random to fully reconstructed atomistic states.

FlowBack achieves state-of-the-art Cα-to-AA backmapping performance—producing diverse, chemically plausible ensembles—but, absent explicit physical supervision, residual errors in bond lengths and atomic overlaps persist in a small fraction of samples.

2. Energy-Driven Enhancement: FlowBack-Adjoint

FlowBack-Adjoint upgrades a pre-trained FlowBack model with a one-pass, physics-aware post-training procedure. This method adds both structure-based and energy-based velocity corrections to the generative flow:

  • Auxiliary Drift Fields:
    • Chirality correction: Enforces the L-enantiomeric form for side chains.
    • Bond-length regularization: Introduces harmonic forces to maintain covalent bond lengths near equilibrium.
    • Lennard-Jones repulsion: Suppresses atomic clashes by adding repulsive pairwise corrections for heavy atoms below a threshold separation.
    • These fields are "gated" to activate predominantly at late integration times, allowing the generative process to first reach structural diversity before imposing energetic realism.
  • Adjoint Matching:

    • A differentiable molecular mechanics force field (CHARMM27) defines an energy U(x)U(x) for every configuration. The adjoint matching procedure uses a reward

    R(x)=λU(x)R(x) = -\lambda U(x)

    and integrates a lean adjoint ODE:

    datdt=atx(2vθ(xt,t)1txt),a1=xR(x1)\frac{d a_t}{dt} = -a_t^\top \nabla_x \left(2 v_\theta(x_t, t) - \frac{1}{t} x_t\right), \quad a_1 = -\nabla_x R(x_1) - The adjusted drift is

    vφ(xt,t)=vθ(xt,t)σt22atv_\varphi(x_t, t) = v_\theta(x_t, t) - \frac{\sigma_t^2}{2} a_t - The adjoint loss ensures the flow matches the energy-guided trajectory:

    Ladj=σp2t2σt(vφvθ)+σtat2\mathcal{L}_{\text{adj}} = \sigma_p^2 \sum_t \left\| \frac{2}{\sigma_t} (v_\varphi - v_\theta) + \sigma_t a_t \right\|^2

    This aligns the generated AA ensemble with low-energy regions in the molecular force field without retraining the original neural network.

3. Performance Metrics and Benchmarks

Benchmarking demonstrates that FlowBack-Adjoint substantially improves upon the vanilla FlowBack model:

Metric FlowBack FlowBack-Adjoint Improvement
Median energy (kcal/mol·res.) High (baseline) ↓ ~78 Lowered single-point energies
Bond-length error Residual (few %) >92% reduction Near-perfect covalent geometries
Clash score Up to a few % >98% eliminated Most structures clash-free
Diversity High Maintained AA ensemble variability preserved
MD stability Variable Near 100% Direct MD initialization, no minimization needed
  • Bond quality scores reach 99–100%.
  • Clash scores are driven to nearly zero.
  • Trajectory stability: AA structures sampled from FlowBack-Adjoint can be immediately used as initial configurations in all-atom molecular dynamics, typically resulting in energy-conserving, stable evolutions.

4. Technical Implementation Details

  • Noise schedule: To avoid numerical divergence at t0t \to 0, a memoryless schedule is used, σt2(t)=2ηt\sigma_t^2(t) = 2 \eta_t with ηt=(1t+Δt)/(t+Δt)\eta_t = (1-t+\Delta t)/(t+\Delta t), where Δt\Delta t is a small discretization parameter.
  • Post-training pass: Only the velocity field is modified; there is no retraining on atomic structures—thus FlowBack-Adjoint is computationally efficient and modular.
  • Equivariant architecture: The use of EGNN ensures all updates respect SE(3) symmetry (translation, rotation invariance).

5. Applications and Implications

  • Enhancing Backbone-Only Predictions: Tools such as AlphaFold or BioEmu often output Cα-only or backbone models; FlowBack-Adjoint can recover full AA ensembles that are both structurally and energetically competent.
  • Seeding Atomistic Simulations: Direct initialization of MD (without further minimization), expediting conformational sampling and property prediction in computational protein studies.
  • Compatibility with CG Trajectories: FlowBack-Adjoint can backmap MARTINI or other CG time series, lifting entire CG trajectories to AA ensembles for kinetics and mechanism analysis.
  • Docking and Structure-Based Applications: The reductions in clashes and improved stereochemistry enable use in protein-ligand and protein-protein modeling, where physically plausible side chain orientation and lack of high-energy artifacts are critical.

A plausible implication is that post-trained energy-guided corrections such as those in FlowBack-Adjoint could be adapted to multiscale generative models in other domains where the alignment of learned distributions with known energy landscapes is essential.

6. Broader Context and Future Directions

The adjoint matching paradigm introduced in FlowBack-Adjoint bridges data-driven generative modeling and physics-based simulation. By injecting force-field aware corrections after neural network training, it avoids the need for end-to-end co-training or differentiable force field inclusion in the original model. Modular upgrades of this type could generalize to other generative molecular models, or to domains requiring energy or constraint-informed sampling (e.g., materials, small molecules, nucleic acids).

Extensions may include incorporating learned interatomic potentials in place of classical force fields for end-to-end differentiability, operating at varying CG/AA mapping schemes, and generalized use of adjoint-driven corrections in generative normalizing flows for physics-constrained generative modeling.


In summary, FlowBack provides a scalable, equivariant, and efficient framework for all-atom reconstruction from coarse-grained protein traces, and FlowBack-Adjoint upgrades this process via lightweight, physics-aware corrections, yielding MD-ready, low-energy AA ensembles. This enables new levels of physical fidelity for multiscale biomolecular modeling and broadens the applicability of neural generative models in structural biology and computational chemistry (Berlaga et al., 5 Aug 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)