Asymmetric Norms in State Embedding
- Asymmetric Norms are generalized norms that relax symmetry to capture intrinsic directional costs in statistical manifolds and control systems.
- They are constructed using divergence-induced measures, such as the KL divergence, and leverage parametric architectures like WideNorm for state embeddings in MDPs.
- Empirical evaluations demonstrate that these norms enhance planning performance by accurately modeling irreversible dynamics and directional transition costs.
Asymmetric norms are generalizations of classical norms that relax the requirement of symmetry, , thereby enabling faithful mathematical modeling of settings where natural “directions” or “arrows” exist in the underlying geometry or dynamics. In the context of state embedding for statistical manifolds and Markov decision process (MDP) models, asymmetric norms quantify non-reversible behavior and directional costs, ranging from information geometry (notably Kullback–Leibler divergence-induced topologies) to action-based distances in reinforcement learning.
1. Formal Definition and Construction of Asymmetric Norms
Given a vector space , an asymmetric quasi-norm is a functional satisfying positive homogeneity ( for ) and subadditivity (), but not generally symmetry (). For statistical manifolds, let and be dual via (with 0 representing random variables and 1 measures). Suppose 2 is closed and subdifferentiable. Define the divergence:
3
Two dual asymmetric seminorms then arise:
- On 4:
5
- On 6:
7
These retain positive homogeneity and subadditivity, but not symmetry, so the resulting metric:
8
is generally a quasipseudometric. In embedding contexts, parametric variants are also used, e.g., the WideNorm (Steccanella et al., 2023):
9
where 0 is a learned weight matrix and 1 is the concatenation of 2; ReLU enforces asymmetry.
2. Asymmetric Norms Induced by Information Divergences
A canonical realization employs Bregman-type divergences on probability manifolds, notably the Kullback–Leibler (KL) divergence (Belavkin, 2015). Let 3 on 4 (positive measures). The Bregman divergence,
5
induces sublevel sets: \begin{align*} M &= {u = y-z : D_{KL}[y,z] \leq 1} \subset Y-z, \ N &= {x : D_{KL}*[x,0] \leq 1} \subset X. \end{align*} The resulting asymmetric “norms” are:
- On 6
7
- On 8
9
The asymmetry reflects the fundamental non-reversibility of the KL divergence: 0 generally, mirrored by 1.
3. Embedding Methods and Parametric Architectures
For state embedding in MDPs, the WideNorm construction (Steccanella et al., 2023) provides a learnable asymmetric semi-norm. A parametric state encoder 2 (typically an MLP or CNN) is paired with a learnable weight matrix 3. Given two states 4, the distance is:
5
Because ReLU is not equivariant under sign swapping, 6 in general.
The embedding function and WideNorm parameters are trained end-to-end by stochastic gradient descent to approximate task-specific quasidistances, e.g., minimum action distances in MDPs.
4. Quasimetric Topologies and Topological Properties
Given a positively homogeneous, subadditive but asymmetric functional 7, the quasipseudometric 8 yields a topology with closed balls:
9
The induced topology is non-symmetric: neighborhoods and convergence are “one-sided.” In the KL case (Belavkin, 2015), both 0 (random variables) and 1 (measures) equipped with their respective asymmetric norms become Hausdorff (T₂), T₁ (nonzero vectors have positive “length”), and 2-sequentially complete (nested closed sets with vanishing diameter have nonempty intersection; Cauchy sequences converge). Classical Orlicz subspaces (with even gauge functions) embed densely and metrically inside these asymmetric normed topologies.
5. Approximation and Applications in Control, Inference, and Planning
The asymmetry of quasipseudometrics provides a faithful model when transition costs or information flows are direction-dependent. In reinforcement learning, asymmetric norms enable the approximation of the minimum action distance (MAD), defined as the least number of actions needed to transition between two states in an MDP, which is inherently non-symmetric for environments with, e.g., drift or irreversible dynamics (Steccanella et al., 2023).
The objective is to train 3 and 4 such that 5 approximates 6. Supervision comes from trajectory-derived temporal distances 7. The loss function,
8
drives the learned WideNorm to match trajectory distances and upper-bound single-step transitions. No explicit symmetry or positive semi-definiteness constraints are imposed.
Once learned, the embedding supports differentiable goal-conditioned planning: a transition model 9 is learned so that 0. Action selection for goal-reaching exploits the asymmetric embedding metric,
1
6. Empirical Evaluation and Comparative Analysis
In environments with symmetric transitions, both symmetric (2) and asymmetric (WideNorm) parameterizations yield low mean squared error (MSE) to ground-truth 3 and high planning success. In asymmetric environments—such as PointMass with drift or one-way wind—4 saturates at nonzero MSE (effectively learning 5) and suffers reduced planning success (≈50%), whereas WideNorm achieves MSE approaching zero and sustains ≈90–100% planning success, matching the optimal number of actions (Steccanella et al., 2023). This consistently demonstrates the advantage of asymmetric norms when inherent directional costs or irreversibilities are present.
7. Embedding of Distributions and Implications
Equipping the statistical manifold of distributions with an asymmetric KL-norm provides a non-symmetric linear embedding into a topological vector space where the local structure encodes the forward divergence. The dual space inherits an asymmetric norm from the reverse KL, reflecting the one-way behavior of information flows. Unlike Orlicz–Banach space embeddings—which require even gauge functions and thus symmetrize—the asymmetric normed setting preserves the natural “wedge” structure, crucial for modeling cost/utility with one-sided boundedness and non-finite cumulant generating functions (Belavkin, 2015). This approach enables vector-space–style embeddings for states in control and inference tasks, retaining the intrinsic arrow of information.
A plausible implication is that asymmetric norm constructions offer a unified mathematical language for embedding spaces where symmetry would erase critical directional information, facilitating advances in control, planning, and statistical inference in environments characterized by irreversibility, non-reversibility, and asymmetry.