Constrained Random Generation

Updated 22 November 2025

Constrained Random Generation is the process of uniformly (or weighted) sampling objects that strictly satisfy given hard constraints, ensuring precise distribution over the solution space.
Key methodologies include Markov chain edge-switching, hash-based cell partitioning, and combinatorial decompositions, each offering unique trade-offs between exactness and computational efficiency.
Advanced techniques such as diffusion-guided sampling and decision-procedure methods extend these concepts to high-dimensional and hybrid domains, improving scalability and performance in complex applications.

The Constrained Random Generation Problem encompasses the uniform (or randomly weighted) generation of objects—such as graphs, vectors, codewords, images, logic programs, or sequences—that satisfy specified hard constraints. This problem is fundamental in areas including random graph modeling, statistical simulation, constraint satisfaction, coding theory, randomized testing, probabilistic inference, constrained sampling for generative models, and design of randomized benchmarks. Distinct from unconstrained generation, constrained random generation requires the output distribution to be uniform (or otherwise specified) on the solution space defined by the constraints, precluding naive rejection or ad hoc sampling except in special cases. Methodologies span Markov-chain-based strategies, hash and parity constraint algorithms, combinatorial decompositions, algebraic sampling, decision-procedure enumeration, constraint programming, and optimization-guided generative modeling.

1. Formal Problem Statement and Core Principles

Let $\mathcal{X}$ denote the space of combinatorial objects, which could be graphs, sequences, vectors, or assignment tuples. Let $C$ be the set of constraints, defined either as a predicate $C(x) = \mathtt{True}$ for $x\in\mathcal{X}$ , a system of equations/inequalities, or a logical/structural specification. The goal is to sample $x$ from the restricted set:

$\mathcal{X}_C = \{x\in\mathcal{X} : C(x)\}$

such that $x$ is drawn uniformly (or as otherwise prescribed) from $\mathcal{X}_C$ , i.e., $\Pr[x] = 1/|\mathcal{X}_C|$ for all $x \in \mathcal{X}_C$ .

The problem arises in many domains:

Generation of graphs with prescribed degree sequences, forbidden subgraphs, and other structural invariants (Tabourier et al., 2010).
Sampling vectors or matrices under affine or nonlinear equality/inequality constraints, e.g., with prescribed sums, bounds, or moment properties (Willemsen et al., 28 Jan 2025).
Stochastic encoding for channel/source coding under hash-based coset constraints (Muramatsu, 2013).
Generation of logic programs, sentences, images, or samples with user-imposed semantic, topological, or syntactic requirements (Dilkas et al., 2020, Korneev et al., 2018, Bonlarron et al., 15 Jun 2024).

2. Classical Markov Chain Edge-Switching and Generalizations

In random graph generation, the edge-switching Markov chain is a principal methodology (Tabourier et al., 2010). Given a constrained family $G_C$ (e.g., graphs with a fixed degree sequence), a Markov chain is constructed as follows:

2-edge switch (classical swap): Iteratively select two distinct edges $(a,b),(c,d)$ ; attempt to rewire as $(a,d),(c,b)$ or $(a,c),(b,d)$ and accept only if the resultant graph $G'$ satisfies $C$ .
The transition matrix is constructed to be symmetric and hold probability mass in “self-loops” when rewiring fails, yielding each state the same in- and out-degree, making the uniform distribution stationary on connected components of the Markov chain.
However, for many constraint families $G_C$ (notably for certain directed graphs), the chain is disconnected, restricting uniformity to a single reachable component.

To mitigate irreducibility limits, a $k$ -edge switch is introduced:

At each step, select $k$ edges and apply a random permutation to their destinations (under the constraint $C$ ), generalizing the transition to larger “macroscopic” moves.
The connectivity of the induced transition graph $\mathcal{C}_k$ increases monotonically with $k$ ; for $k=M$ (the total number of edges), the chain is irreducible (full support) and aperiodic, hence yields uniform sampling on $G_C$ .
There is no general polynomial bound for the mixing time $\tau_\mathrm{mix}(k)$ , but increasing $k$ can accelerate mixing provided per-step success probabilities do not fall prohibitively low.

Relevant theoretical statements:

The stationary distribution under the $k$ -switch chain is uniform on each connected component, and becomes uniform on $G_C$ for $k=M$ (Tabourier et al., 2010).

3. Uniform Generation via Hashing and Random Constraints

For Boolean and arithmetic CSPs, hash-based cell partitioning forms the basis for scalable, near-uniform sampling of constraints (e.g., uniform SAT witness generation) (Chakraborty et al., 2014):

Identify an independent support $S$ such that each full solution is uniquely determined by its projection on $S$ .
Partition $S$ via random $k$ -wise independent hash functions, often linear XOR constraints: $h(s)_i = \alpha_{i,0} \oplus \bigoplus_j \alpha_{i,j}s_j$ .
Conjoin the original formulas $F$ with a randomly sampled hash constraint (e.g., $h(S)=\alpha$ ) to carve the solution space into “cells”.
Efficient SAT solving within random cells of targeted size (controlled by hash length) enables sampling within “small” solution slivers, and uniformly at random from within, yielding almost-uniformity guarantees:

$\frac{1}{(1+\varepsilon)|\mathcal{X}_C|} \leq \Pr[\text{sample} = x] \leq \frac{1+\varepsilon}{|\mathcal{X}_C|},$

for all $x \in \mathcal{X}_C$ and tunable $\varepsilon > 0$ (Chakraborty et al., 2014).

4. Combinatorial and Algebraic Techniques for Structured Objects

Algebraic combinatorics, Boltzmann sampling, and CSP-based decompositions enable uniform generation of structured and recursively defined data:

Boltzmann samplers generate recursive shapes (e.g., trees, lists, algebraic data types) with target size via OGF tuning (Ziat et al., 2022).
Each drawn “shape” is instantiated by solving a finite-domain CSP for the numeric or categorical labels, under high-level constraints (“membership predicates”), often via uniform RE or PRT (path-oriented random testing).
This two-stage approach (shape, then label assignment) achieves uniformity provided both sampler stages are uniform; empirical work shows linear scaling in data structure size for many recursive combinatorial classes (Ziat et al., 2022).

For logic programs and complex formulas, CP modeling uses:

Explicit construction of all syntax elements (heads, bodies, variable binding, symmetry breaking).
Global constraints for structure, e.g., no negative cycles, stratification, and predicate independence (through propagation over dependency graphs).
Empirically validated combinatorial accounting matches model solution counts (Dilkas et al., 2020).

5. Decision Procedures and Search Heuristics for Constrained Random Sampling

For domains where constraints are highly non-local, decision procedure–based methods are applied:

Example: constrained image generation under topological (grain/void, connectivity) and process constraints, with satisfaction encoded using SMT or ILP solvers, and randomization introduced at seed selection, DAG rooting, and in solver search (Korneev et al., 2018).
To obtain multiple diverse samples, either blocking clauses after each accepted solution or randomizing seeds/ordering is used.
Solver performance and sample diversity strongly depend on both logical constraint structure and the underlying solver’s search heuristics.

Variable-ordering/biasing methods such as YORO pre-rolling (for PCG via constraint solvers) optimize for population-level statistical properties:

Predefine a “score” on decision variables reflecting desired population-level statistics (e.g., tile frequency distribution), then permute the variable order such that constraint solvers inherently reflect the statistical bias in their first solutions, while still enforcing all hard constraints (Katz et al., 1 Sep 2024).
This mechanism controls output statistics while retaining the global-constraint expressivity of generic CSP/SAT solving.

6. Continuous and Hybrid Domain Methods

For continuous or mixed discrete-continuous settings, direct mathematical transformation and rejection-based approaches are central:

DRSC (Dirichlet-Rescale-Constraints): Uniformly samples the standard simplex (for fixed-sum vectors), then applies successive affine maps to navigate into feasible regions defined by linear and nonlinear constraints, preserving uniform measure by construction and ensuring exact uniformity even under complex feasible regions (Willemsen et al., 28 Jan 2025).
Failures of naive DRS (Dirichlet-Rescale) under overlapping or non-simplex constraints are explicitly analyzed, and DRSC correctness is established by Jacobian preservation and careful induction on simplexes partitioning (Willemsen et al., 28 Jan 2025).

In modern generative modeling, stochastic process–based sampling is adapted:

Constraint-aware flow-matching combines learned ODE flows from base to target distributions with constraint penalties or policy gradient, balancing high-dimensional fitting and satisfaction probability, even with non-differentiable oracle constraints (Huan et al., 18 Aug 2025).
Diffusion-based guided sampling (e.g., GuidedDiffTime) injects differentiable constraint guidance at each denoising step—enabling enforcement of complex, hard or soft constraints during DDPM-based time-series or image generation—avoiding retraining when constraints change (Coletta et al., 2023).

7. Exactness, Bias, and Efficiency: Analysis and Trade-offs

Uniformity, efficiency, and coverage are key comparative axes across methods:

Markov Chain Monte Carlo (MCMC)-type methods can reach exact uniformity as $k$ increases, but at a computational cost per trial and possible impracticality for complex $C$ unless $k$ is large (Tabourier et al., 2010).
Hash/XOR-based SAT witness generation achieves scalability and adjustable approximate uniformity for large $n$ with provable statistical bounds (Chakraborty et al., 2014).
Controlled-bias generators (e.g., for minimal CSP/Sudoku) can be analyzed for explicit sample bias (e.g., as a function of clue count) and reweighted post hoc to produce unbiased estimators of statistics over the true uniform population (Berthier, 2011).
Flow-guided and diffusion-guided samplers incur domain-specific costs (differentiability, constraint query complexity) but allow conditional, soft, and hard constraints to be incorporated directly in high-dimensional, structured generative tasks (Huan et al., 18 Aug 2025, Coletta et al., 2023).
Decisional approaches (SAT/SMT/CP) can guarantee satisfaction with full constraint expressivity but may only approximate uniformity depending on variable-order randomization and solver heuristics (Korneev et al., 2018, Dilkas et al., 2020, Katz et al., 1 Sep 2024).

The theory and practice of constrained random generation continue to evolve, with current research focusing on the trade-off between guarantee strength (e.g., exactness/uniformity), scalability to high-dimensional and highly constrained domains, and the flexibility to handle new constraint classes with minimal retraining or reparameterization.