Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 92 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 32 tok/s
GPT-5 High 40 tok/s Pro
GPT-4o 83 tok/s
GPT OSS 120B 467 tok/s Pro
Kimi K2 197 tok/s Pro
2000 character limit reached

FastCSP: High-Throughput CSP Workflow

Updated 9 August 2025
  • FastCSP Workflow is an open-source protocol for molecular crystal structure prediction that leverages a universal MLIP and advanced random structure generation to explore diverse packing arrangements.
  • The methodology employs a two-stage pipeline combining Genarris 3.0 for candidate generation with UMA for systematic geometry relaxation and free energy evaluation.
  • FastCSP achieves high accuracy and efficiency with energy resolutions within 5 kJ/mol, over 94% recall of known polymorphs, and orders-of-magnitude speed improvements over traditional DFT methods.

FastCSP Workflow is an open-source, high-throughput computational protocol for molecular crystal structure prediction (CSP) utilizing machine learning interatomic potentials (MLIPs). Its principal aim is to deliver rapid, accurate exploration and ranking of crystal structures for molecular solids, circumventing the traditionally prohibitive costs of quantum mechanical methods such as dispersion-corrected density functional theory (DFT). The workflow is engineered for application across diverse chemical systems without per-system potential retraining, capitalizing on a universal MLIP architecture (Universal Model for Atoms, UMA). FastCSP achieves efficiency and accuracy by integrating advanced random structure generation with systematic geometry optimization and free energy evaluation, allowing CSP tasks that previously required days with DFT to be routinely completed in hours using modern GPU clusters (Gharakhanyan et al., 4 Aug 2025).

1. Architecture and Workflow Overview

The FastCSP pipeline operates as a two-stage protocol: candidate crystal structure generation followed by high-throughput geometry relaxation and ranking. In Stage 1, Genarris 3.0—an established random structure generator—enumerates likely molecular packings spanning multiple space groups and Z values starting from a single molecular conformer. Stage 2 leverages the Universal Model for Atoms (UMA), which is based on an equivariant graph neural network with Mixture of Linear Experts (MoLE) layers, to perform full periodic geometry relaxation and free energy calculations. UMA's training on the OMC25 dataset (>25 million configurations) enables transferability across a wide chemical space without further system-specific tuning.

The overall workflow is characterized by the following:

  • Input: Single molecular structure (conformer).
  • Generation: Exhaustive sampling over space group symmetries and molecular arrangements using probabilistic volume estimation (via PyMoVE within Genarris 3.0).
  • Packing Validation: Enforcement of physical plausibility via van der Waals radius overlap checks and hard-sphere compression.
  • Deduplication: Post-processing with Pymatgen's StructureMatcher to remove redundant candidates.
  • Relaxation and Property Evaluation: All structures are relaxed and evaluated for lattice and free energies (Helmholtz and Gibbs) via UMA. Convergence and integrity checks (including force thresholds and bond connectivity monitoring) ensure only well-optimized structures are retained.
  • Output: Ranked list of unique candidate structures with computed energies and optional vibrational thermodynamic properties.

2. Random Structure Generation and Candidate Sampling

Random structure generation in FastCSP is conducted by Genarris 3.0 with the following procedural steps:

  • Volume Assignment: Candidate unit cell volumes are sampled from a Gaussian centered at a value estimated by PyMoVE and scaled (typically by 1.5×) to ensure broad packing possibilities.
  • Space Group and Z Sampling: All space groups compatible with the molecular symmetry are targeted for varying numbers of formula units in the cell (Z = 1, 2, 3, 4, 6, 8), with typical per-group candidate quotas of 500–1000 structures.
  • Site-Symmetric Placement: Molecules are positioned according to crystallographic requirements, including special Wyckoff sites, ensuring physical and group-theoretical correctness.
  • Inter-molecular Distance Check: Structures must satisfy minimal contact separation constraints:

dij>sr(rivdW+rjvdW)d_{ij} > s_r \left(r^{\mathrm{vdW}}_i + r^{\mathrm{vdW}}_j\right)

where sr=0.95s_r = 0.95.

  • Rigid Press Step: A regularized hard-sphere potential (with static internal geometry) is applied to enforce dense packing prior to optimization.
  • Redundancy Filtering: Structures are deduplicated using graph matching algorithms based on structural fingerprints.

This strategy yields a structurally diverse population, efficiently sampling the relevant configurational space for subsequent MLIP-based assessment.

3. Geometry Relaxation, Energy, and Free Energy Evaluation

Once candidate structures are prepared, full periodic geometry relaxation is performed using UMA-Small (v1.1):

  • Optimization Protocol: Broyden–Fletcher–Goldfarb–Shanno (BFGS) optimization within the Atomic Simulation Environment (ASE) under periodic boundary conditions; convergence threshold on forces is set to 0.01 eV/Å; maximum 1,000 steps per structure.
  • Failure Handling: Candidates that do not converge or exhibit molecular fragmentation (as evidenced by altered connectivity graphs) are culled.
  • Energetics: UMA yields lattice energies at 0 K for all relaxed structures.
  • Vibrational Free Energies: The workflow supports free energy evaluation at finite temperature (TT) and pressure (PP) using the harmonic and quasi-harmonic approximations:

    • Phonon Calculations: Finite displacement method (Phonopy) is used to compute phonon spectra needed for vibrational contributions.
    • Thermal Expansion: Gibbs free energy at non-zero pressure is obtained by minimizing over unit cell volumes,

    G(T,P)=minV{F(T,V)+PV}G(T, P) = \min_V \left\{F(T, V) + PV\right\}

    where F(T,V)F(T, V) is the Helmholtz free energy as a function of (T,V)(T,V), fit to a Vinet equation of state over a ±6%\pm6\% volume grid centered on the equilibrium.

This approach enables fast and ab initio-quality evaluation of both static and finite-temperature stabilities across the candidate ensemble.

4. Accuracy, Performance, and Benchmarks

FastCSP's performance metrics—assessed on 28 diverse mostly rigid molecules—show:

  • Experimental Reproducibility: For 17 molecules, UMA ranks the known structure as the absolute lattice energy minimum at 0 K; for others, the experimental form is among the top 4–10.
  • Energy Resolution: All experimental polymorphs are within 5 kJ/mol (often <3 kJ/mol) of the lowest predicted lattice energy.
  • Recall Rate: More than 94% of known polymorphs are retrieved within the top 10 candidates.
  • Agreement with DFT: Lattice energy MAE vs. PBE-D3 is 1.16 kJ/mol; Spearman correlation of 0.94 for energy rankings.
  • Throughput: Each geometry relaxation requires ~15 seconds on an NVIDIA H100 GPU. Entire CSP campaigns, with thousands of structures, are routinely completed within hours on modest GPU clusters.

These results demonstrate that universal MLIPs can supplant both classical force fields and DFT-based ranking in early and final stages, respectively, with no accuracy compromise for rigid molecules.

5. Deployment, Applicability, and Open Source Availability

FastCSP is positioned for high-throughput materials discovery in fields where polymorphism control is critical:

  • Pharmaceuticals: Reliable ranking of polymorphs aids regulatory, manufacturing, and intellectual property decisions.
  • Organic Electronics: Structure–property linkage, particularly charge transport, is highly sensitive to packing motifs captured by CSP.
  • Open-Source Ecosystem: Both the workflow, Genarris 3.0, and UMA MLIP models (training data, weights, and inference code) are available in public repositories, significantly lowering technical barriers for academic and industrial groups.

FastCSP requires no procedure-specific parameter tuning and operates for arbitrary organic molecules within the MLIP training manifold, enabling previously impractical high-throughput CSP tasks.

6. Limitations and Future Directions

Identified limitations and prospects for enhancement include:

  • Complex Systems: Broader coverage for flexible molecules, Z′ > 1 (multiple molecules in the asymmetric unit), co-crystals, salts, and hydrates is a target for further workflow generalization.
  • Potential Adaptation: For molecules with functional groups or elements underrepresented in the UMA training set (e.g., transition metals), the workflow may require retraining or adaptation of the MLIP.
  • Algorithmic Acceleration: Plans include improving UMA batch throughput (e.g., via batched GPU operations or torch compilation) and expanding free energy evaluation modules.
  • Accuracy Refinement: For cases with minute energy differences (e.g., fine polymorph distinctions), further free energy corrections or hybrid approaches (possibly DFT re-ranking) may be explored, though UMA is already competitive for most systems tested.

These development vectors aim to make CSP as routine and robust as contemporary computational screening for gas-phase molecules.

7. Significance and Implications

FastCSP demonstrates that robust, universal MLIPs can deliver CSP performance nearly equivalent in ranking fidelity to state-of-the-art DFT for rigid molecular systems, enabling practical high-throughput CSP without system-specific parametrization. The integration of advanced random generation, ML-accelerated structure relaxation, and vibrational free energy evaluation achieves computational tractability orders of magnitude improved over traditional methods. This positions FastCSP to significantly broaden the accessibility and impact of crystal structure prediction within the molecular sciences (Gharakhanyan et al., 4 Aug 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)