Perfect Probabilistic Mappings

Updated 13 April 2026

Perfect probabilistic mappings are structured transformations that preserve key properties like distances, probability mass functions, and functional relationships almost everywhere.
They employ concrete constructs such as measure-theoretic isometries, greedy divergence minimization for distribution shaping, and smooth bijections between the probability simplex and Euclidean space.
This unified framework underpins applications in high-dimensional statistics, communications, and knowledge graph alignment with rigorous theoretical guarantees and practically efficient algorithms.

A perfect probabilistic mapping describes a structural or statistical correspondence between elements, distributions, or entities that satisfies a specified notion of "perfection" with respect to a probability measure, target distribution, or categorical law. The precise semantic of perfection varies by context—geometric, information-theoretic, compositional, or relational—but always involves the exact or (almost) everywhere preservation of essential properties such as distances, probability mass functions, or functional relationships under a probabilistic or measure-theoretic formalism.

1. Measure-Theoretic Isometric Mappings

A principal formulation arises in the context of measurable distance-preserving maps between subsets of Euclidean spaces with respect to a probability measure. Given a Borel-measurable set $A \subset \mathbb{R}^d$ equipped with a probability measure $\mu$ whose support is not contained in any affine hyperplane (full-dimensional support), a measurable mapping

$h: A \to \mathbb{R}^d$

is called a perfect probabilistic mapping (or a $\mu$ -almost-everywhere isometry) if there exists a Borel set $D \subset A \times A$ with $(\mu \times \mu)(D^c) = 0$ such that

$\|h(x) - h(y)\| = \|x - y\|, \quad \forall (x, y) \in D.$

The probabilistic Mazur–Ulam theorem then asserts that under these conditions, there is an orthogonal matrix $U \in O(d)$ and vector $b \in \mathbb{R}^d$ such that

$h(x) = Ux + b \quad \text{for $\mu $-almost all$ x \in A$}.$

No surjectivity is required of $\mu$ 0; the only essential requirements are Borel-measurability, almost-everywhere distance preservation, and full support of $\mu$ 1 (Zaliaduonis et al., 7 Jan 2026).

Conditions and Explicit Construction

Support: $\mu$ 2 must have full-dimensional support.
Measurability: $\mu$ 3 must be Borel-measurable.
Determination: An explicit orthogonal transformation $\mu$ 4 and translation $\mu$ 5 can be computed by selecting $\mu$ 6 affinely independent points $\mu$ 7 in $\mu$ 8, setting $\mu$ 9, $h: A \to \mathbb{R}^d$ 0, constructing $h: A \to \mathbb{R}^d$ 1, $h: A \to \mathbb{R}^d$ 2, and then $h: A \to \mathbb{R}^d$ 3, $h: A \to \mathbb{R}^d$ 4.

This paradigm ensures that in high-dimensional statistics, any learned transformation that preserves pairwise distances almost surely on a random cloud of points must, almost surely, be an orthogonal transformation plus translation (Zaliaduonis et al., 7 Jan 2026).

2. Probabilistic Mappings for Distribution Shaping

In information theory, perfect probabilistic mappings are central to the realization of prescribed probability mass functions for discrete channel inputs via variable-length encoding. Given a uniform binary source and a discrete target alphabet $h: A \to \mathbb{R}^d$ 5 with a desired pmf $h: A \to \mathbb{R}^d$ 6, the mapping $h: A \to \mathbb{R}^d$ 7 (prefix-free) is said to induce a perfect mapping if the resulting pmf $h: A \to \mathbb{R}^d$ 8 approaches $h: A \to \mathbb{R}^d$ 9 arbitrarily closely. The Kraft inequality guarantees that the aggregate induced distribution is a valid pmf.

Finite Precision and Divergence Minimization

To operationalize this, for a chosen block length $\mu$ 0, one constructs $\mu$ 1-type pmfs:

$\mu$ 2

The goal is to minimize the relative entropy $\mu$ 3 over all $\mu$ 4-type pmfs $\mu$ 5. An explicit greedy allocation algorithm iteratively assigns mass to the $\mu$ 6 with the smallest increment in $\mu$ 7, guaranteeing the global optimum in $\mu$ 8 steps (Böcherer, 2012).

Convergence: The divergence $\mu$ 9 decays as $D \subset A \times A$ 0; thus, for sufficiently large $D \subset A \times A$ 1, the induced pmf is an essentially perfect match to $D \subset A \times A$ 2.

Application: For probabilistic shaping in AWGN channels with finite constellations, this methodology produces mappings that strictly outperform CLT-based binomial shaping for moderate to large $D \subset A \times A$ 3, enabling asymptotically vanishing SNR gap as the mapping precision increases (Böcherer, 2012).

3. Bijective Mappings Between Probability Simplex and Euclidean Space

Perfect probabilistic mapping is also realized through smooth, invertible transformations between the open probability simplex

$D \subset A \times A$ 4

and $D \subset A \times A$ 5. Two canonical bijections are the isometric log-ratio (ILR) transform and the shifted stick-breaking (SB) transform:

Transform	Forward Map	Inverse Map	Jacobian Determinant
ILR	$D \subset A \times A$ 6	$D \subset A \times A$ 7	$D \subset A \times A$ 8
Shifted SB	see text for $D \subset A \times A$ 9 definition	see text for $(\mu \times \mu)(D^c) = 0$ 0 recursion	$(\mu \times \mu)(D^c) = 0$ 1

The ILR map is an isometry between $(\mu \times \mu)(D^c) = 0$ 2 (Aitchison geometry) and $(\mu \times \mu)(D^c) = 0$ 3, preserving inner products and thereby straight-line geodesics (Williams et al., 31 Oct 2025).

Dirichlet Interpolation and Exact Discretization

To allow for perfect recovery of categorical distributions, observed one-hot data $(\mu \times \mu)(D^c) = 0$ 4 are dequantized into the open simplex via

$(\mu \times \mu)(D^c) = 0$ 5

ensuring the supports are disjoint for different categories. Any continuous density $(\mu \times \mu)(D^c) = 0$ 6 approximating the mixture $(\mu \times \mu)(D^c) = 0$ 7 in $(\mu \times \mu)(D^c) = 0$ 8 norm yields exact recovery of the categorical law via $(\mu \times \mu)(D^c) = 0$ 9 mapping (Williams et al., 31 Oct 2025).

4. Knowledge Graph Alignment via Probabilistic Mappings

In relational domains, perfect probabilistic mappings emerge in the context of unsupervised knowledge graph alignment. The PRASE framework iteratively fuses global probabilistic reasoning (PARIS) with local semantic embedding similarity. For two KGs $\|h(x) - h(y)\| = \|x - y\|, \quad \forall (x, y) \in D.$ 0 and $\|h(x) - h(y)\| = \|x - y\|, \quad \forall (x, y) \in D.$ 1, PARIS computes the equivalence probabilities $\|h(x) - h(y)\| = \|x - y\|, \quad \forall (x, y) \in D.$ 2 for entities and $\|h(x) - h(y)\| = \|x - y\|, \quad \forall (x, y) \in D.$ 3 for relations through measure-theoretic updates over the observed triples.

The PRASE system enhances this by initializing and updating these probabilities using learned embedding similarities $\|h(x) - h(y)\| = \|x - y\|, \quad \forall (x, y) \in D.$ 4 and a convex combination parameter $\|h(x) - h(y)\| = \|x - y\|, \quad \forall (x, y) \in D.$ 5, ultimately yielding high-quality, nearly perfect mappings as indicated by empirical F1 scores approaching unity on several reference datasets (Qi et al., 2021).

Quantitative Alignment Accuracy

Dataset	PARIS F1	PRASE F1
EN–FR–100K	0.926	0.954
EN–DE–100K	0.948	0.972
D–Y–100K	0.983	0.996
MED–BBK–9K	0.499	0.711

Empirically, near-perfect mappings (F1 > 0.99) are attainable for standard KG alignment benchmarks (Qi et al., 2021).

5. Theoretical Guarantees and Operational Implications

The structures outlined above reveal several key principles:

Measure-theoretic rigidity: For almost-everywhere distance-preserving maps on $\|h(x) - h(y)\| = \|x - y\|, \quad \forall (x, y) \in D.$ 6 with full-support measure, the only permissible mappings (modulo null sets) are orthogonal transformations plus translations (Zaliaduonis et al., 7 Jan 2026).
Statistical shaping: For block-input source-channel mappings, greedy allocation within the space of $\|h(x) - h(y)\| = \|x - y\|, \quad \forall (x, y) \in D.$ 7-type pmfs guarantees minimization of relative entropy and ensures convergence to the target pmf at a rate $\|h(x) - h(y)\| = \|x - y\|, \quad \forall (x, y) \in D.$ 8 (Böcherer, 2012).
Information geometry: Smooth bijections (ILR, SB) respecting the geometry of the simplex permit invertible, volume-correct mappings between compositional distributions and Euclidean space, facilitating exact synthesis and recovery of target distributions (Williams et al., 31 Oct 2025).
Structural alignment: Probabilistically defined entity and relation mappings, iteratively refined using both global logical and local semantic evidence, reach high-precision alignment in unsupervised knowledge graph matching (Qi et al., 2021).

Each of these formulations highlights the essential duality of probabilistic mappings, balancing global structure (distance, mass preservation) against local or discrete statistical fidelity, and underscores their universality across geometric, statistical, and relational domains.

6. Representative Examples and Implementation Guidance

Rigid geometic isometry: Any measurable transformation of a dataset sampled from Lebesgue measure that preserves inter-point distances almost everywhere is, up to sets of measure zero, a rigid Euclidean motion (Zaliaduonis et al., 7 Jan 2026).
Discrete probabilistic shaping: For $\|h(x) - h(y)\| = \|x - y\|, \quad \forall (x, y) \in D.$ 9 and any $U \in O(d)$ 0, the optimal mapping procedure yields divergence less than $U \in O(d)$ 1 nats between the induced and target pmf (Böcherer, 2012).
Categorical flow matching: With Dirichlet interpolation parameter $U \in O(d)$ 2 and a suitably large $U \in O(d)$ 3, training a flow matching model in mapped Euclidean space achieves arbitrarily low total variation error to the categorical law; discretization by $U \in O(d)$ 4 recovers the law exactly (Williams et al., 31 Oct 2025).
Knowledge graph alignment: PRASE achieves F1 scores up to $U \in O(d)$ 5 on public datasets, with convergence in typically $U \in O(d)$ 6 outer iterations (Qi et al., 2021).

7. Cross-Domain Significance and Limitations

Perfect probabilistic mappings unify geometric, algebraic, and statistical perspectives under a measure-theoretic paradigm, yielding explicit characterizations, computationally efficient algorithms, and practical implementation schemes. The various frameworks surveyed provide completeness theorems, performance bounds, and algorithmic blueprints for high-precision mapping in geometry, communications, statistics, and relational learning.

A limitation is that global perfection is attained only under specific regularity and support conditions (e.g., full-dimensional support for geometric isometries, full-rank Helmert matrices for simplex mappings, or sufficient functional diversity for knowledge graph probabilistic reasoning). In practical applications, exactness is constrained by the discretization parameter (e.g., $U \in O(d)$ 7 for $U \in O(d)$ 8-type shaping, concentration parameter $U \in O(d)$ 9 for simplex dequantization), but operational error can be made arbitrarily small. No formal proof of global optimality is provided for iterative feedback KG alignment, though empirical evidence supports stable convergence (Qi et al., 2021).

In summary, perfect probabilistic mappings encapsulate a broad, precise framework for realizing structure-preserving transformations across probabilistically weighted domains, guaranteeing almost-everywhere preservation of essential properties under explicit, checkable conditions.