Generative Matrix Product States

Updated 4 July 2026

Generative Matrix Product States are tensor-network models that factorize high-order tensors into a one-dimensional chain, enabling efficient probability estimation via the Born-rule.
They support tractable normalization, sequential sampling, and scalable training through techniques like DMRG-like updates and Riemannian optimization.
Recent advances extend MPS with shortcuts, comb networks, and quantum circuitry to capture long-range correlations and facilitate applications from synthetic data generation to quantum state preparation.

Searching arXiv for papers on generative MPS and related tensor-network generative models. Generative Matrix Product States (MPS) are tensor-network generative models in which a probability distribution over configurations is defined from an MPS wavefunction or unnormalized amplitude, most commonly through a Born-rule prescription of the form $\mathbb{P}(\tau)=|\psi(\tau)|^2/Z$ (Li et al., 2018). In this setting, an MPS—also known as a tensor train—factorizes a high-order tensor into a one-dimensional chain of low-order tensors, yielding exact polynomial-time contraction, exact or tractable normalization in important gauges, and sequential sampling procedures that make the ansatz attractive for unsupervised modeling, density estimation, synthetic data generation, and quantum state preparation (Li et al., 2018). Across recent work, generative MPS have been developed for binary images, mixed-type tabular data, structured continuous functions, stochastic processes, and circuit-based quantum generative models, while also motivating extensions such as Shortcut MPS, comb tensor networks, unitary MPS, and continuous-variable Gaussian MPS (Li et al., 2018, R. et al., 8 Aug 2025, Kolesnyk et al., 2024, Yang et al., 2018, Schuch et al., 2012).

1. Formal definition and probabilistic interpretation

In the standard one-dimensional construction, an MPS for $N$ variables is a chain of local tensors $A^{(k)}$ , each with one physical index and two virtual indices, with open-boundary conditions $D_0=D_N=1$ (Li et al., 2018). For discrete configurations $\tau=(\tau_1,\dots,\tau_N)$ , the amplitude is the contraction

$\psi(\tau_1,\dots,\tau_N) = \sum_{a_1,\dots,a_{N-1}} A^{(1)}_{\tau_1,a_1} A^{(2)}_{a_1,\tau_2,a_2} \cdots A^{(N)}_{a_{N-1},\tau_N},$

and the associated generative model is typically specified by

$\mathbb{P}(\tau)=\frac{|\psi(\tau)|^2}{Z}, \qquad Z=\sum_\tau |\psi(\tau)|^2$

(Li et al., 2018). This is the Born-machine formulation restated in the MPS setting (Li et al., 2018).

The generative objective is negative log-likelihood,

$L = -\frac{1}{N_T}\sum_{\tau\in T} \log\left(\frac{|\psi(\tau)|^2}{Z}\right),$

and minimizing $L$ is equivalent to minimizing the Kullback–Leibler divergence $D_{\text{KL}(Q\|\mathbb{P})$ up to the constant data entropy (Li et al., 2018). This same maximum-likelihood perspective reappears in later work on privacy-preserving tabular synthesis, where the MPS cores parametrize amplitudes and the squared amplitudes define a normalized probability distribution over encoded tabular rows (R. et al., 8 Aug 2025).

A central technical advantage is that for standard MPS in canonical form, the normalization constant $N$ 0 can be calculated exactly and straightforwardly by one-dimensional contraction (Li et al., 2018). This exact, polynomial-time evaluation of the partition function is one of the defining distinctions between MPS generative models and many energy-based models. In open-boundary or canonical gauges, left and right environments also enable efficient evaluation of marginals, gradients, and conditional distributions (Li et al., 2018, Duan et al., 12 Mar 2026).

The expressive bias of the model is determined by the chain structure and the bond dimensions. In the tabular setting, this implies locality or Markov-like structure along the chain and a low-rank constraint across bipartitions, so strongly interacting variables are easiest to model when placed close together in the chosen ordering (R. et al., 8 Aug 2025). In the continuous-data setting with compression layers, the same principle appears as a dependence of representational efficiency on site ordering, compressed physical dimension $N$ 1, and bond dimension $N$ 2 (Kolesnyk et al., 2024).

2. Representational structure, gauge, and identifiability

MPS define a restricted algebraic family rather than an arbitrary distribution class. For translation-invariant periodic boundary conditions, amplitudes take the form

$N$ 3

and the resulting image is an algebraic variety $N$ 4 in the ambient amplitude space (Critch et al., 2012). For $N$ 5, the representable family is highly constrained: for $N$ 6, a four-qubit state is a limit of binary periodic translation-invariant MPS if and only if a specific sextic polynomial vanishes, and for $N$ 7, the ideal of $N$ 8 is minimally generated by 3 quartics and 27 sextics (Critch et al., 2012). This establishes that fixed-bond-dimension generative MPS are structured low-dimensional model classes, not generic distributions.

The same work explains the relation between MPS and hidden Markov models through trace varieties and trace algebras (Critch et al., 2012). Open-boundary MPS with

$N$ 9

become classical HMMs when the matrices and boundary vectors are constrained to be non-negative stochastic objects (Critch et al., 2012). This suggests that generative MPS can be viewed as complex or quantum generalizations of finite-state sequence models. The connection is sharpened further by the EHMM correspondence: a significant class of MPS can be derived as the outcomes of entangled hidden Markov models, and conversely every MPS satisfying a standard gauge condition can be turned into an EHMM associated to it (Souissi, 18 Feb 2025).

Gauge freedom is intrinsic. Simultaneous conjugation $A^{(k)}$ 0 leaves periodic MPS amplitudes invariant, and for $A^{(k)}$ 1 the invariant trace algebra yields a five-parameter description of the gauge-quotiented model (Critch et al., 2012). The same paper makes four conjectures on identifiability; in particular, for periodic $A^{(k)}$ 2, it conjectures that for $A^{(k)}$ 3 a generic state has exactly $A^{(k)}$ 4 preimages under the trace-parameterization map (Critch et al., 2012). This suggests finite algebraic non-uniqueness rather than uncontrolled degeneracy.

A complementary structural perspective comes from the theory of parent Hamiltonians. For fixed bond and physical dimensions, once there exists one MPS with a simple local parent Hamiltonian, generic MPS with those dimensions possess an equally simple Hamiltonian (Schuch et al., 13 Mar 2025). For generative MPS, this gives a physical interpretation: many models can be regarded as unique gapped ground states of local frustration-free Hamiltonians, implying short-range correlated inductive bias and robustness under perturbations (Schuch et al., 13 Mar 2025).

3. Correlations, expressivity, and nonlocal extensions

A standard limitation of one-dimensional MPS is exponential decay of connected correlations with distance. In the generative setting, this becomes restrictive when data are not naturally ordered along a one-dimensional lattice, such as flattened images (Li et al., 2018). The paper introducing Shortcut Matrix Product States makes this explicit by defining a correlation function $A^{(k)}$ 5 through the second singular value of a $A^{(k)}$ 6 interaction matrix built from pairwise marginals, and showing that because $A^{(k)}$ 7 decays exponentially in a finite-correlated MPS, so does $A^{(k)}$ 8 (Li et al., 2018).

Empirical correlation maps reveal the mismatch between natural data geometry and a plain chain. For MNIST, correlations are strongly concentrated near the diagonal but also exhibit significant off-diagonal structure, especially near corners, indicating long-range top–bottom dependencies after row-wise flattening (Li et al., 2018). Standard MPS with modest bond dimension underrepresent these off-diagonal correlations; increasing $A^{(k)}$ 9 extends the correlation span, but at higher computational cost and with a tendency to memorize small datasets (Li et al., 2018).

Shortcut Matrix Product States address this by adding long-range virtual bonds between non-adjacent sites. A shortcut connecting sites $D_0=D_N=1$ 0 and $D_0=D_N=1$ 1 introduces an extra virtual index shared by the corresponding local tensors, reducing effective graph distance while preserving polynomial contractibility (Li et al., 2018). Periodic MPS or tensor rings appear as a special case with a shortcut between the first and last tensors (Li et al., 2018). A structural theorem states that a SMPS can always be written as a sum of multiple ordinary MPS components,

$D_0=D_N=1$ 2

which suggests a mixture-like increase in representational capacity (Li et al., 2018).

In MNIST experiments based on 50 images, SMPS with $D_0=D_N=1$ 3 and well-chosen shortcuts achieve lower NLL than a plain MPS with the same bond dimension, and the improvement depends strongly on shortcut placement (Li et al., 2018). The shortcut heuristic is correlation-based: compare empirical $D_0=D_N=1$ 4 with $D_0=D_N=1$ 5, then place shortcuts between distant positions with high data correlation and large discrepancy (Li et al., 2018). This makes expressivity explicitly data-adaptive.

For continuous and high-dimensional data with compression layers, an alternative architectural response is to move beyond a chain altogether. A recent comparison between regular MPS and comb tensor networks derives explicit contraction costs and a threshold condition in bond dimension $D_0=D_N=1$ 6 and compressed dimension $D_0=D_N=1$ 7 beyond which comb contractions are cheaper than standard MPS contractions (Kolesnyk et al., 2024). This suggests that, for high-dimensional continuous generative modeling, the chain geometry itself may become the bottleneck rather than merely the bond dimension.

4. Training objectives and optimization algorithms

The canonical generative training objective is maximum likelihood or NLL minimization (Li et al., 2018, R. et al., 8 Aug 2025). In the Born-machine setting of SMPS and MPS, the tensor gradient for a given local tensor $D_0=D_N=1$ 8 is

$D_0=D_N=1$ 9

with $\tau=(\tau_1,\dots,\tau_N)$ 0 (Li et al., 2018). Training proceeds via a DMRG-like two-site update: merge two neighboring tensors, compute the gradient, update the merged tensor, perform truncated SVD, and split back into two local tensors, sweeping left-to-right and right-to-left until convergence (Li et al., 2018). An appendix bound controls NLL drift under SVD truncation: $\tau=(\tau_1,\dots,\tau_N)$ 1 (Li et al., 2018).

A more recent line of work proposes unitary MPS and Riemannian optimization. In that formulation, the model distribution is again $\tau=(\tau_1,\dots,\tau_N)$ 2, but a unit Frobenius-norm constraint is imposed on the MPS so that $\tau=(\tau_1,\dots,\tau_N)$ 3 in mixed-canonical form (Duan et al., 12 Mar 2026). This removes global scale ambiguity, simplifies the gradient, and turns training into constrained optimization over a smooth manifold via a space-decoupling construction (Duan et al., 12 Mar 2026). The resulting UMPS-SD algorithm uses Riemannian gradients and retractions rather than Euclidean gradients plus projection, and on EMNIST it reaches NLL comparable to the baseline MPS method in 3 loops whereas the baseline MPS needs about 25 loops, yielding up to $\tau=(\tau_1,\dots,\tau_N)$ 4 faster convergence in the reported setting (Duan et al., 12 Mar 2026).

The same paper gives a sequential exact sampling procedure for binary data: compute $\tau=(\tau_1,\dots,\tau_N)$ 5 from the last tensor in canonical form, then sample conditionals recursively from right to left using partial contractions (Duan et al., 12 Mar 2026). This preserves one of the main practical attractions of MPS generative models: exact tractable sampling without MCMC.

A distinct training setting appears in simultaneous classification and generation. There, an MPS ensemble defines class scores $\tau=(\tau_1,\dots,\tau_N)$ 6 whose squared values are normalized into class probabilities, while each class-specific MPS also acts as a generator $\tau=(\tau_1,\dots,\tau_N)$ 7 (Mossi et al., 2024). The same work introduces a differentiable sequential sampling method for non-normalized MPS based on reduced density matrices and one-dimensional CDF inversion, enabled by orthogonal local embedding functions such as Fourier features (Mossi et al., 2024). A GAN-inspired training loop then alternates adversarial updates with supervised retraining to prevent degradation of classification accuracy (Mossi et al., 2024). This suggests a hybrid regime in which the same tensor network is used for both conditional density modeling and discriminative supervision.

5. Data modalities and application domains

The best-known early applications of generative MPS are binarized images and simple discrete sequences (Li et al., 2018). In MNIST experiments, each $\tau=(\tau_1,\dots,\tau_N)$ 8 image is vectorized to length 784 and treated as a sequence of binary or discrete physical indices; the model then represents $\tau=(\tau_1,\dots,\tau_N)$ 9 (Li et al., 2018). Correlation maps, NLL curves, and generated subclasses demonstrate that even a single shortcut can materially change the generative behavior (Li et al., 2018).

Tabular synthetic data generation extends the framework to heterogeneous variables by digitization and feature ordering (R. et al., 8 Aug 2025). Categorical features are represented by single cores with physical dimension equal to the number of categories, while continuous and integer features are scaled and decomposed into base- $\psi(\tau_1,\dots,\tau_N) = \sum_{a_1,\dots,a_{N-1}} A^{(1)}_{\tau_1,a_1} A^{(2)}_{a_1,\tau_2,a_2} \cdots A^{(N)}_{a_{N-1},\tau_N},$ 0 digits, each digit becoming its own tensor core (R. et al., 8 Aug 2025). The chain order is chosen so that high-cardinality or strongly correlated features are near the center (R. et al., 8 Aug 2025). Under differential privacy, the method integrates gradient clipping and either Gaussian or Laplacian noise with Rényi Differential Privacy accounting (R. et al., 8 Aug 2025). On Adult Income, unclipped and unnoised MPS achieve near-perfect fidelity metrics, and under full DP the paper reports “more than 10%” improvement in overall utility at standard privacy levels and “over 8%” improvement even at the highest privacy settings compared to PrivBayes (R. et al., 8 Aug 2025). This suggests that MPS structure is unusually robust to privacy noise.

Continuous-data generative modeling with MPS often introduces learned compression layers $\psi(\tau_1,\dots,\tau_N) = \sum_{a_1,\dots,a_{N-1}} A^{(1)}_{\tau_1,a_1} A^{(2)}_{a_1,\tau_2,a_2} \cdots A^{(N)}_{a_{N-1},\tau_N},$ 1 so that a site’s continuous feature vector is mapped into a low-dimensional physical space before tensor contraction (Kolesnyk et al., 2024). The underlying probabilistic semantics remain Born-machine-like or positive-amplitude-based, but the focus shifts to contraction cost and architecture (Kolesnyk et al., 2024). This approach is especially relevant for high-dimensional continuous features or image patches.

Another application class concerns stochastic processes. A stationary stochastic process with $\psi(\tau_1,\dots,\tau_N) = \sum_{a_1,\dots,a_{N-1}} A^{(1)}_{\tau_1,a_1} A^{(2)}_{a_1,\tau_2,a_2} \cdots A^{(N)}_{a_{N-1},\tau_N},$ 2-machine transition probabilities $\psi(\tau_1,\dots,\tau_N) = \sum_{a_1,\dots,a_{N-1}} A^{(1)}_{\tau_1,a_1} A^{(2)}_{a_1,\tau_2,a_2} \cdots A^{(N)}_{a_{N-1},\tau_N},$ 3 can be encoded as an infinite MPS by setting

$\psi(\tau_1,\dots,\tau_N) = \sum_{a_1,\dots,a_{N-1}} A^{(1)}_{\tau_1,a_1} A^{(2)}_{a_1,\tau_2,a_2} \cdots A^{(N)}_{a_{N-1},\tau_N},$ 4

and the resulting iMPS reproduces the correct finite-block measurement statistics (Yang et al., 2018). The q-sample state

$\psi(\tau_1,\dots,\tau_N) = \sum_{a_1,\dots,a_{N-1}} A^{(1)}_{\tau_1,a_1} A^{(2)}_{a_1,\tau_2,a_2} \cdots A^{(N)}_{a_{N-1},\tau_N},$ 5

becomes a generative MPS for the process, while the entanglement entropy across a past–future cut equals the quantum memory cost $\psi(\tau_1,\dots,\tau_N) = \sum_{a_1,\dots,a_{N-1}} A^{(1)}_{\tau_1,a_1} A^{(2)}_{a_1,\tau_2,a_2} \cdots A^{(N)}_{a_{N-1},\tau_N},$ 6 of the optimal quantum predictor (Yang et al., 2018). This provides an exact operational interpretation of bond dimension as predictive memory.

In the continuous-variable setting, Gaussian Matrix Product States generalize the projected-entangled-pair picture to lattices of harmonic oscillators (Schuch et al., 2012). GMPS can approximate arbitrary translationally invariant pure Gaussian states in one dimension, have computable correlation functions, and exhibit exponential correlation decay in one dimension (Schuch et al., 2012). They therefore serve as continuous-variable generative MPS for Gaussian wavefunctions over bosonic modes.

6. Quantum-circuit realizations and state-preparation viewpoints

Generative MPS are closely tied to quantum state preparation because a Born-machine MPS becomes a quantum generative model once encoded into a circuit and measured in the computational basis (Ran, 2019). One constructive route is to build unitary matrix product disentanglers (MPDs) that map a target MPS to $\psi(\tau_1,\dots,\tau_N) = \sum_{a_1,\dots,a_{N-1}} A^{(1)}_{\tau_1,a_1} A^{(2)}_{a_1,\tau_2,a_2} \cdots A^{(N)}_{a_{N-1},\tau_N},$ 7, then reverse the circuit (Ran, 2019). For bond dimension $\psi(\tau_1,\dots,\tau_N) = \sum_{a_1,\dots,a_{N-1}} A^{(1)}_{\tau_1,a_1} A^{(2)}_{a_1,\tau_2,a_2} \cdots A^{(N)}_{a_{N-1},\tau_N},$ 8, the encoding is exact in one layer; for $\psi(\tau_1,\dots,\tau_N) = \sum_{a_1,\dots,a_{N-1}} A^{(1)}_{\tau_1,a_1} A^{(2)}_{a_1,\tau_2,a_2} \cdots A^{(N)}_{a_{N-1},\tau_N},$ 9, a deep circuit with multiple MPD layers approximates the target with high fidelity (Ran, 2019). For Ising, Heisenberg, and XY ground-state MPS, the negative log-fidelity per site decreases substantially with circuit depth, and the per-site error saturates rather than blowing up with system size (Ran, 2019). This suggests that classical MPS generative models can be compiled into practical quantum Born machines.

A related approach uses classical variational disentanglement. Given an MPS in $\mathbb{P}(\tau)=\frac{|\psi(\tau)|^2}{Z}, \qquad Z=\sum_\tau |\psi(\tau)|^2$ 0- $\mathbb{P}(\tau)=\frac{|\psi(\tau)|^2}{Z}, \qquad Z=\sum_\tau |\psi(\tau)|^2$ 1 form, one optimizes local two-qubit disentanglers to reduce Rényi entropies across bonds layer by layer; the inverse circuit then prepares the target MPS approximately (Mansuroglu et al., 30 Apr 2025). Because the Schmidt coefficients on every bond are directly accessible, the optimization is local and parallelizable (Mansuroglu et al., 30 Apr 2025). The resulting compilation provides controlled error bounds in terms of MPS truncation error and discarded Schmidt weight, with no barren plateau for fixed bond dimension in the analyzed regime (Mansuroglu et al., 30 Apr 2025).

An end-to-end state-preparation framework further combines heuristic staircase-like or brick-wall disentanglers with Evenbly–Vidal or Riemannian refinement, targeting MPS that arise either from physical states or from amplitude encoding of classical datasets (Szołdra et al., 12 Feb 2026). The pipeline includes entanglement-based qubit reordering via a quadratic assignment problem and low-level circuit reductions yielding depths reduced by up to 50% and CNOT counts by 33% (Szołdra et al., 12 Feb 2026). On 19–50 qubits, optimized brick-wall circuits minimize depth whereas optimized staircase-like circuits minimize gate counts (Szołdra et al., 12 Feb 2026). This suggests that generative MPS are increasingly being treated not only as classical tensor models but also as deployable circuit-based generative primitives.

The same circuit perspective has recently been applied to amplitude encodings of structured functions and images. Smooth functions, low-degree piecewise polynomials, and DWT-compressed images often admit low-bond-dimension MPS approximations, which can then be compiled into $\mathbb{P}(\tau)=\frac{|\psi(\tau)|^2}{Z}, \qquad Z=\sum_\tau |\psi(\tau)|^2$ 2-depth ancilla-free circuits by Matrix Product Disentangler methods (Green et al., 23 Feb 2025). The paper reports accuracy exceeding 99.99% for low-degree piecewise polynomials and fidelity exceeding 99.1% for a $\mathbb{P}(\tau)=\frac{|\psi(\tau)|^2}{Z}, \qquad Z=\sum_\tau |\psi(\tau)|^2$ 3 ChestMNIST image encoded on 14 qubits with total depth 425 single-qubit rotation and two-qubit CNOT gates (Green et al., 23 Feb 2025). A plausible implication is that MPS-based generative encodings of structured classical data may be one of the most realistic near-term pathways to amplitude-based quantum generative models.

Generative MPS therefore occupy a dual status. On one hand they are classical probabilistic tensor networks with exact or controllable contraction, explicit inductive biases, and interpretable low-rank structure. On the other hand they are circuit-preparable quantum states whose measurement statistics define quantum generative models. The recent literature indicates that both viewpoints are now active: one line improves density estimation, sampling, and privacy-aware learning on classical datasets (Li et al., 2018, R. et al., 8 Aug 2025, Duan et al., 12 Mar 2026), while another line optimizes the compilation of those same structured states into shallow quantum circuits (Ran, 2019, Mansuroglu et al., 30 Apr 2025, Szołdra et al., 12 Feb 2026, Green et al., 23 Feb 2025). This suggests that generative MPS are best understood not as a single algorithm, but as a family of low-entanglement generative representations spanning classical tensor-network learning, probabilistic modeling, and quantum state generation.