Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
173 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ProxelGen

Updated 1 July 2025
  • ProxelGen is a generative modeling framework that represents protein structures as fixed-size 3D voxelized densities (proxels), a departure from conventional variable-size point cloud or atomistic representations.
  • This proxel representation enables novel generative tasks, including motif scaffolding through spatial inpainting and protein generation conditioned on arbitrary 3D shapes.
  • Empirical results demonstrate that ProxelGen achieves state-of-the-art sample quality (lower FID), high novelty, and competitive designability compared to existing protein generation models.

ProxelGen is a generative modeling framework for protein structures based on 3D voxelized density representations—termed "proxels"—instead of the prevalent 3D point cloud or atomistic coordinate forms. By casting proteins as fixed-size density grids, ProxelGen enables new generative tasks in protein design, including structure generation under flexible spatial constraints, motif scaffolding without explicit sequence mapping, and conditioning on arbitrary 3D shapes. Empirical benchmarks show that ProxelGen outperforms or matches contemporary state-of-the-art models in sample quality, novelty, and designability, demonstrating its efficacy for protein engineering and structure-function studies.

1. Model Architecture

ProxelGen consists of three main architectural stages that collectively generate protein structures as 3D densities and optionally recover full atomistic models:

  • 3D CNN-based Variational Autoencoder (VAE):

The encoder uses 3D convolutions to map high-dimensional proxel arrays PRC×H×W×DP \in \mathbb{R}^{C \times H \times W \times D} into a reduced-dimensional latent space LRc×h×w×dL \in \mathbb{R}^{c \times h \times w \times d}. The decoder reconstructs proxel grids from latent representations. The reconstruction objective is:

LAE=ELE(P),Pμ[D(L)P22]+βKL(E(P)N(0,I))\mathcal{L}_{AE} = \mathbb{E}_{L \sim E(P), P \sim \mu} \left[ \| D(L) - P \|^2_2 \right] + \beta \operatorname{KL}(E(P) \|\mathcal{N}(0,I))

where EE and DD are the encoder and decoder, μ\mu is the data distribution, and β\beta is a KL-divergence weight.

  • Latent Diffusion/Flow Model:

A linear stochastic interpolant in the latent space enables generative sampling. Using a 3D UNet with self-attention, the model learns to bridge between prior noise ZZ and encoded latents E(P)E(P) with a velocity predictor ss:

LFlow=EPμ,ZN,tν[s(tE(P)+(1t)Z)(E(P)Z)2]\mathcal{L}_{\text{Flow}} = \mathbb{E}_{P \sim \mu,\, Z \sim \mathcal{N},\, t \sim \nu} \left[ \| s(t \cdot E(P) + (1-t)Z) - (E(P) - Z) \|^2 \right]

Timesteps tt are sampled from a biased distribution ν\nu.

  • Coordinate Decoder (Atomistic Flow Model):

The latent proxel representation is mapped back to atomic (or Cα/backbone) coordinates via an adapted version of the Proteina model. Spatial patches of the latent density grid are tokenized and embedded for atomistic prediction, supporting sequence design and assessment workflows.

The standard inference pipeline is: Latent sampling → VAE decoding to proxels → Atomistic decoding (optional).

2. Voxelized Densities: Proxel Representation

ProxelGen represents protein structures as multi-channel, fixed-dimensional 3D grids, where each voxel (proxel) encodes localized chemical or geometric information:

  • Backbone Channels (3):

Gaussian-smeared densities for carbonyl (C), alpha-carbon (Cα), and nitrogen (N) backbone atoms:

f(x)=i=1Nδxi(x),g(x)=(Gf)(x)=i=1NG(xxi)f(x) = \sum_{i=1}^N \delta_{x_i}(x), \qquad g(x) = (G * f)(x) = \sum_{i=1}^N G(x - x_i)

Proxels are generated by sampling g(x)g(x) onto a regular grid.

  • Bond Density Channel (1):

Density is placed at the midpoints between consecutive backbone atoms, then convolved with a Gaussian kernel.

  • Chain Flow Vector Channels (3):

Each grid point aggregates vectors aligned along the backbone from N to C terminus, weighted by backbone-proximal density:

v(x)=i=1NG(xxi)viv(x) = \sum_{i=1}^N G(x - x_i) \cdot v_i

where vi=xi+1xiv_i = x_{i+1} - x_i.

Distinction from Conventional Point Clouds:

Point clouds list discrete atomic coordinates and are variable in size. Proxels, by contrast, are fixed-size, multi-channel 3D arrays—facilitating the use of 3D CNNs and enabling model architectures independent of sequence length or atom count.

Methodological Advantages:

  • Locality and translation invariance, exploitable by modern CNNs.
  • Input and output size independence accommodates proteins of varying lengths within a uniform generative schema.
  • Supports spatial operations (cropping, masking, inpainting, shape constraints) with ease.
  • Flexibility in conditioning: arbitrary spatial masks, surface shapes, or motif densities.

3. Sample Quality, Designability, and Diversity Metrics

ProxelGen is systematically evaluated on protein sample realism, novelty, and utility for downstream design through several quantitative metrics:

Metric ProxelGen Proteina (best) Notable Comparison
FID (lower better) 6.05 7.25 Proteina low-T samples FID > 16
Novelty (TMScore\downarrow) 0.73 0.74–0.88 Lower is more novel
Designability 53.13 ≈training set High-novelty Proteina: much lower
  • Frechet Inception Distance (FID):

Computed on proxel grids using ProxCLR (SimCLR-trained 3D ResNet) representations.

  • Novelty:

Minimum TMScore between generated sample and any training set structure.

  • Designability:

Number of generated structures that tolerate blind sequence design/refolding (ProteinMPNN + ESMFold).

  • Diversity and Higher-Order Contacts:

Number of unique structural clusters (Foldseek) and prevalence of nonlocal residue contacts.

  • Secondary Structure Content:

Helix and sheet fractions among sampled proteins.

Summary:

ProxelGen achieves superior FID, competitive or greater novelty, and high designability—simultaneously offering diverse and designable folds, a challenge for prior models.

4. Motif Scaffolding and Spatial Inpainting

Motif scaffolding, a pivotal conditional protein design task, is addressed in ProxelGen by spatial inpainting on the proxel grid:

  • Approach:

A fixed-space motif is embedded as a density region within the proxel grid. The model is conditioned to inpaint the surrounding structure, filling in the remainder of the density.

  • Motif Agnostic Conditioning:

No requirement to map the motif to specific sequence positions or chain lengths. Both position and context within the grid are freely determined.

  • Benchmarking:

Performance is assessed on the standard RFdiffusion motif scaffolding suite (24 diverse motifs, both contiguous and split/disconnected). Baselines are Proteina, Genie2, RFDiffusion, FrameFlow.

Key Results:

  • ProxelGen demonstrates robust performance on motifs that fragment across multiple chains or sequence segments (e.g., 1QJG, 1BCF), producing up to 12 unique successful designs for the most challenging motifs (1BCF), compared to a maximum of 1 for all comparator models.
  • For single-contiguous motifs, larger search spaces (via enforced length) in other models sometimes yield higher diversity, but ProxelGen's results remain competitive.
  • The atomistic coordinate decoder sometimes struggles to guarantee motif topology preservation in disconnected or entangled arrangements, which is acknowledged as a practical limitation.

A plausible implication is that the voxel-based, motif-agnostic conditioning unlocks more general motif scaffolding tasks than traditional sequence-mapped approaches.

5. Shape Conditioning and Spatial Control

ProxelGen uniquely supports protein generation conditioned on arbitrary 3D shapes, realized as binary or real-valued volumetric masks:

  • Mechanism:

Target surface masks are derived from real protein surfaces, and proxel grids are conditioned to match these shapes. The model is fine-tuned or trained to respect these volumetric constraints.

  • Evaluation:

Overlap between generated structure and input mask is quantified by F1-score, alongside TMScore to the original protein for novelty assessment.

Model Shape Fidelity (F1) Novelty (TMScore)
ProxelGen ≈ 0.89 Low (High Novelty)
Chroma ≈ 0.67 N/A
  • Observations:

ProxelGen produces high-fidelity shape-conforming proteins that remain structurally novel and sequence-designable. This demonstrates a capacity for geometric constraint satisfaction that is impractical in coordinate-based architectures.

Applications Enabled:

  • Arbitrary geometric conditioning (tunnels, channels, convex or concave pockets).
  • Surface-driven binder protein design.
  • Denoising or refinement tasks for experimental maps (e.g., cryo-EM).

This suggests — as cited — the proxel paradigm will facilitate entirely new classes of protein design problems that cannot be formulated in atomistic models.

6. Significance, Opportunities, and Limitations

ProxelGen advances the field by:

  • Establishing a fixed-size, multi-channel 3D density-based representation as an effective generative substrate for proteins.
  • Demonstrating sample generation with state-of-the-art FID, improved novelty, and competitive designability and structural diversity, notably without requiring sequence- or length-conditioned generation.
  • Enabling new conditioning and control modalities, such as volumetric shape masks and spatial inpainting.
  • Supporting integration with atomistic decoding, sequence design, and foldability assessment pipelines.

Limitations and Open Problems:

  • The atomistic decoder is motif-agnostic and may permit chain connectivity violations or incomplete motif embedding, especially for large or topologically complex motifs.
  • Ensuring single-chain connectivity and fully motif-aware decoding remains an unsolved issue.
  • Further work is suggested toward leveraging experimental existing (e.g., cryo-EM) densities, improving the granularity of channel information, and increasing the fidelity of atomistic reconstruction.

A plausible implication is that density-based generative models could serve as a unifying interface between computational protein design and experimental structure determination workflows.

7. Broader Impact and Future Directions

ProxelGen's voxelized density representation ("proxels") provides a foundation for diverse protein engineering tasks:

  • Flexible protein design under arbitrary spatial, topological, and geometric constraints.
  • Direct compatibility with volume-based imaging modalities and datasets.
  • Feasibility for arbitrary-domain spatial conditioning, including inpainting, shape fitting, and catalysis-site scaffolding.

Foreseeable directions include refinement of motif-aware atomistic decoders, exploiting the representation for end-to-end structure refinement from cryo-EM, and adaptation to other macromolecular or supramolecular systems where density-based representations are beneficial.

In summary, ProxelGen demonstrates that 3D density-based generative modeling of proteins is a viable, effective, and versatile approach, achieving high sample quality and new conditional generation tasks previously challenging for coordinate-based methods.