Principled sampling of synthetic generators to match target domains
Develop principled methods to guide sampling from neural cellular automata–based synthetic data generators so that the resulting structures match the computational and statistical characteristics of specified downstream domains, rather than relying solely on coarse complexity measures such as gzip compressibility and alphabet size.
References
This points to a key open problem for future work: developing principled methods to guide synthetic generators to sample structures that match those of target domains.
— Training Language Models via Neural Cellular Automata
(2603.10055 - Lee et al., 9 Mar 2026) in Discussion — Limitations and open problems subsection