Inverse Design of Inorganic Compounds with Generative AI

Published 11 Apr 2026 in physics.chem-ph, cond-mat.mtrl-sci, and cs.LG | (2604.11827v1)

Abstract: Machine learning is revolutionizing chemistry. Beyond the value of predictive models accelerating virtual screening, generative AI aims at enabling inverse design, reversing the compound-to-property prediction paradigm into property-to-compound generation. Chemists now have access to a rich AI toolbox for organic chemistry, including drug discovery. However, the application of these methods to inorganic compounds remains limited by the challenges posed by their intrinsic nature. This Review analyzes how these challenges have been addressed, considering widely diverse systems ranging from molecules to crystals, including transition metal complexes and microporous materials. The analysis focuses on how generative AI methods have evolved towards data-representation-model pipelines that address the full complexity of inorganic compounds, including their chemical composition, geometry, symmetry, and electronic structure. Future directions, like benchmark standardization and the development of synthesizability metrics, are also discussed.

Abstract PDF Upgrade to Chat

Authors (5)

Summary

The paper introduces a generative AI approach that reverses traditional predictive modeling by mapping target properties directly to inorganic compound synthesis.
It integrates advanced techniques including GAs, VAEs, DMs, and LLMs to address challenges in composition, geometry, and electronic structure across TMCs, crystals, and MOFs.
The study underscores the potential of coupling AI-driven design with experimental workflows to accelerate discovery and optimize material validity, stability, and synthesizability.

Inverse Design of Inorganic Compounds with Generative AI: A Comprehensive Technical Perspective

Introduction and Scope

The inverse design of inorganic compounds via generative artificial intelligence is transforming rational materials discovery far beyond predictive modeling. Unlike discriminative approaches mapping compound-to-property ( $X \rightarrow y$ ), generative AI methods establish the $X \leftarrow y$ paradigm, mapping target properties directly to compound generation. While generative modeling has been established in organic chemistry and drug design, its extension to inorganic systems—encompassing transition metal complexes (TMCs), non-porous crystals, metal-organic frameworks (MOFs), and zeolites—presents unique representational and algorithmic challenges related to chemical composition, geometry, and electronic structure.

Figure 1: Diverse classes of inorganic compounds: (a) TMCs, (b) non-porous inorganic crystals, (c) microporous materials (MOFs, zeolites).

Significant progress in the past decade includes adapting data representations, integrating domain knowledge into model architectures, and leveraging dataset expansion strategies to address the combinatorics of inorganic spaces.

Evolution and Taxonomy of Generative AI for Inorganics

The historical trajectory of generative modeling in this domain traces from evolutionary algorithms (notably GAs) to increasingly expressive and domain-aware deep learning methodologies, including VAEs, GANs, DMs, flow models, and, most recently, LLMs and quantum natural language processing (QNLP).

Figure 2: Timeline illustrating the advancement of generative AI algorithms (GAs, GANs, VAEs, DMs, LLMs) and their applications across TMCs, non-porous crystals, MOFs, and zeolites.

Key advances include invertible, physics-aware representations and conditional generative pipelines targeting high-dimensional compositional-structural property manifolds.

Transition Metal Complexes: Modular Inverse Design

The modularity of TMCs has made them an exemplary testing ground for GAs and structural generative modeling. GAs allow systematic exploration by encoding metal and ligand components as chromosomes, enabling multi-objective optimization (e.g., stability, catalytic efficiency, synthetic accessibility). The PL-MOGA algorithm enables targeted Pareto front navigation in billion-scale design spaces, a capability validated on the [PdL $_4$ ] scaffold [Kneiding263].

DL approaches have overcome limitations of fixed ligand libraries. VAE-based generation, optimized for smooth latent manifolds and structured via junction tree representations, supports the inverse design of ligands and full TMCs conditional on spectroscopic and redox properties [Lee1095], [Strandgaard2294]. DMs, leveraging equivariant graph neural networks, have advanced the generation of 3D TMC structures and site-specific ligands, achieving high geometric fidelity [Jin4377], [Cornet1793], enabling property-guided transition state and catalyst discovery.

LLMs serve as both interfaces and surrogates for synthetic logic, automating reaction planning, multi-property optimization, and broadening accessibility to domain experts [Boiko570], [Lu32377].

Figure 3: Representative generative AI strategies for TMCs: (a) GA with Pareto optimization, (b) PL-MOGA directional control, (c) VAE with ligand encoding, (d) CatDRX VAE incorporating reaction conditions.

Non-porous Inorganic Crystals: Geometry–Symmetry–Electronic Interplay

Generative modeling for non-porous crystals must explicitly handle crystallographic periodicity and symmetry (230 space groups), together with composition and atomic positions. Early work employed GAs and EC for crystal structure prediction (CSP), with software frameworks like CALYPSO and USPEX becoming standard in exploring composition–structure–stability landscapes.

GANs and VAEs incrementally addressed the need for symmetry- and geometry-preserving representations, with models such as PGCGM leveraging symmetry constraints to improve validity. Recent state-of-the-art models employ equivariant GNNs to encode and denoise crystal structures through DMs. MatterGen simultaneously generates chemical composition, lattice parameters, and atomic positions, achieving near-DFT structure precision and enabling property-conditional generation with high stability, uniqueness, and novelty scores [Zeni624]. ChargeDIFF integrates quantum-chemical descriptors (e.g., charge density) into crystal generation.

Figure 4: (a) Schematic of DM framework for crystal structure representation and denoising; (b) MatterGen supports both unconditional generation and property-conditional sampling with performance metrics and experimental validation.

LLMs, such as CrystaLLM and Chemeleon, are rapidly gaining traction, tokenizing CIF representations and enabling prompt-driven inverse design in a co-intelligent chemist-agent architecture [Antunes10570], [Park4379].

Microporous Inorganic Materials: MOFs and Zeolites

For high-topology systems such as MOFs and zeolites, genetic and data-driven approaches have been vital. GAs, with frameworks like PORMAKE, optimize MOF modular combinations for applications in gas capture and separation, achieving rapid convergence and experimentally verified high-performance materials [Chunge1600909]. Component-level evolution (e.g., MOFF-GA) and parallel-population strategies have expanded both fitness and chemical diversity.

GANs and DMs have become increasingly important as MOF and zeolite dataset sizes have permitted training of physically realistic models. ZeoGAN and ZeoDiff illustrate a transition from low-validity unconditional GANs to DMs with validity rates exceeding three orders of magnitude higher, incorporating 3D grid and energy representations [Kimeaax9324], [Park6507].

For MOFs, multi-modal DMs like MOFDiff and MOFFUSION, which integrate signed distance functions and builder modules, now yield 81% structure validity and facilitate multi-property conditional generation [Park34], [fu2023mofdiff]. LLMs act as both property predictors and generative agents, interfacing with AG systems for inverse design and assisting with synthesis recipes and data mining [Kang4705], [Zheng369].

Of particular note, QNLP frameworks push the field toward hybrid quantum–classical generative modeling, efficiently encoding multi-property constraints on MOF architectures via qubits and quantum circuits [Kang321].

Figure 5: (a) Comparison between classical token and quantum-qubit representations in LLMs for MOF generation; (b) QNLP generative process for four-category property classification.

Metrics, Validation, and Limitations

Robust validation metrics—stability, uniqueness, novelty (SUN), rediscovery, validity, diversity, and synthesizability—are increasingly standardized, yet specific challenges persist:

Stability for crystals remains difficult to define universally; convex hull energies or phonon analysis are often needed.
Synthesizability for inorganics is not well-correlated with stability and still lacks universal, retro-synthesis-aware scoring, though ML and LLM-based approaches are emerging [Antoniuk155], [Song6530].
Experimental verification of AI-generated inorganics is limited; most validation is computational (DFT-level).

Recent advances in integrating evolutionary and deep learning approaches (e.g., GAs in LLM latent spaces, DFT surrogate models in GAs) indicate synergistic potential for more robust, sample-efficient multi-objective optimization [Kneiding15522], [Qiu184].

Implications and Prospects

Generative AI is reconfiguring the inorganic design pipeline, enabling large-scale, property-focused exploration previously inaccessible to human or rule-based approaches. Theoretical implications include the unveiling of out-of-distribution materials, polynuclear and multi-electronic systems, high-disorder and amorphous states, and excited-state landscapes. Practically, rapid AI-driven design and experimental iteration loops—the emerging self-driving lab paradigm—are converging on automated workflows for catalysts, energy, and quantum materials.

Future developments will center on:

Integrating general-purpose foundation models, enabling transfer learning across compound classes.
Benchmarking platforms harmonizing validation metrics to ensure comparability and reproducibility [Betala2025].
Seamless coupling of generative pipelines with experimental robotics for high-throughput verification and accelerated discovery.
Expansion into quantum-assisted and QNLP-based generative design, especially for high-throughput, high-fidelity property targeting.
Pursuing sustainability via pre-training/fine-tuning strategies and mindful data/compute usage [Sandonas2026].

Conclusion

Generative AI for inorganic compound inverse design has advanced from modular heuristic searches to data-driven, symmetry- and physics-aware pipelines capable of multi-property, multi-scale structure generation. Current state-of-the-art DMs, VAEs, and LLMs—augmented by quantum and evolutionary frameworks—enable scalable, computationally efficient discovery, but further development in metric standardization, synthesizability prediction, and experimental coupling is necessary for routine practical impact across chemical domains. The integration of these innovations promises to extend AI-enabled discovery into domains of unprecedented chemical complexity and functionality.

Markdown Report Issue