Papers
Topics
Authors
Recent
Search
2000 character limit reached

Observed Transition Factorization (OTF)

Updated 2 July 2026
  • Observed Transition Factorization is a method that decomposes state transitions into sparse, interpretable primitives, enabling robust structure discovery in high-dimensional and ambiguous environments.
  • It uses a two-stage process—primitive extraction and latent action aggregation—to model transitions in both visual dynamical systems and Markov processes.
  • Empirical results reveal that OTF improves policy learning, enhances transfer across morphologies and visual modes, and effectively partitions complex networks.

Observed Transition Factorization (OTF) is a factorization methodology for decomposing observed state transitions into interpretable, sparse, and reusable primitives. Developed to address identifying structured transitions in high-dimensional, ambiguous, or partially observed environments, OTF provides a bottom-up representation of transitions, enabling robust latent action modeling, domain transfer, and efficient policy learning under challenging conditions such as distractors and morphology shifts. The approach is implemented in both online matrix factorization for Markov processes in complex networks (Yang et al., 2017) and in visual dynamical systems for latent action inference (Nam et al., 29 Jun 2026).

1. Mathematical Foundation of Observed Transition Factorization

OTF is predicated on the insight that observed transitions—whether in discrete Markovian state spaces or high-dimensional continuous domains (e.g., images)—can be approximated by a sparse linear combination of transition primitives. Formally, given observed transitions Δst=st+1st\Delta s_t = s_{t+1} - s_t or in visual domains, Δxt=xt+τxt\Delta x_t = x_{t+\tau} - x_t, OTF seeks a dictionary {pk}k=1K\{p_k\}_{k=1}^K and nonnegative activation vectors αt,k\alpha_{t,k} satisfying

Δstk=1Kαt,kpk,αt,k0,αt0K,\Delta s_t \approx \sum_{k=1}^K \alpha_{t,k}\,p_k, \qquad \alpha_{t,k}\ge 0,\quad \|\alpha_t\|_0\ll K,

with a canonical factorization loss

minp1:K,{αt}tΔstk=1Kαt,kpk22+λ1tαt1+λ2kkpk,pk2,\min_{p_{1:K},\{\alpha_t\}} \sum_t \Big\|\Delta s_t-\sum_{k=1}^K\alpha_{t,k}\,p_k\Big\|_2^2 + \lambda_{1}\sum_t\|\alpha_t\|_{1} + \lambda_{2}\sum_{k\neq k'}\bigl\langle p_k,p_{k'}\bigr\rangle^2,

promoting accurate reconstruction, activation sparsity, and diverse primitive representations (Nam et al., 29 Jun 2026).

In network analysis, analogous matrix factorization is posed for an unknown, low-rank Markov operator PRd×dP \in \mathbb{R}^{d \times d} with observed transitions (itjt)(i_t \to j_t): PXX,XRd×k,XX=Ik,P \approx X X^\top, \quad X \in \mathbb{R}^{d \times k},\quad X^\top X = I_k, with minimization of PXXF2\|P - X X^\top\|_F^2 under orthogonality constraints (Yang et al., 2017).

2. Algorithmic Frameworks: OTF in World Modeling and Network Analysis

In the latent action modeling context, OTF is structured as a two-stage process:

Stage 1: Primitive Extraction

  • Input: Sets of state pairs Δxt=xt+τxt\Delta x_t = x_{t+\tau} - x_t0 or transitions.
  • Motion-centric input Δxt=xt+τxt\Delta x_t = x_{t+\tau} - x_t1 (e.g., frame differences or spatial gradients) is encoded patchwise via a learned encoder Δxt=xt+τxt\Delta x_t = x_{t+\tau} - x_t2, producing features Δxt=xt+τxt\Delta x_t = x_{t+\tau} - x_t3.
  • Quantization: Each patch feature is mapped to a codebook vector Δxt=xt+τxt\Delta x_t = x_{t+\tau} - x_t4 by nearest-neighbor search, forming code assignments Δxt=xt+τxt\Delta x_t = x_{t+\tau} - x_t5 and quantized codes Δxt=xt+τxt\Delta x_t = x_{t+\tau} - x_t6.
  • Statistical representations: For each code Δxt=xt+τxt\Delta x_t = x_{t+\tau} - x_t7, the occupancy map Δxt=xt+τxt\Delta x_t = x_{t+\tau} - x_t8 tracks patch spatial assignments, and Δxt=xt+τxt\Delta x_t = x_{t+\tau} - x_t9 tracks activation strength.
  • A small decoder {pk}k=1K\{p_k\}_{k=1}^K0 reconstructs {pk}k=1K\{p_k\}_{k=1}^K1 from these factors, optimizing the combined loss for reconstruction, vector quantization, commitment, and code orthogonality.

Stage 2: Latent Action Aggregation (OTF-LAM)

  • For each time {pk}k=1K\{p_k\}_{k=1}^K2, the frozen OTF tokenizer yields codes, occupancy, and activations.
  • Each primitive is embedded (state-aware factor embedding) via a network {pk}k=1K\{p_k\}_{k=1}^K3.
  • Gating: Factors are softly gated, {pk}k=1K\{p_k\}_{k=1}^K4, producing a sparse weighted set.
  • Aggregation yields a compact latent action {pk}k=1K\{p_k\}_{k=1}^K5 via averaging and optional projection.
  • Forward Model: A decoder {pk}k=1K\{p_k\}_{k=1}^K6 predicts future state or frame via {pk}k=1K\{p_k\}_{k=1}^K7.
  • Training minimizes the next-frame prediction error with the OTF tokenizer held fixed.

OTF-LAM-DINO replaces the pixelwise decoder with prediction in a frozen DINOv2 representation space, with loss defined as {pk}k=1K\{p_k\}_{k=1}^K8, benefiting from learned, domain-agnostic visual features (Nam et al., 29 Jun 2026).

For online network factorization, a stochastic generalized Hebbian algorithm updates {pk}k=1K\{p_k\}_{k=1}^K9 per observed transition, using a projection onto the Stiefel manifold. Under proper step-size schedules and spectral gap assumptions, convergence to principal eigenspaces and optimal sample complexity is achieved (Yang et al., 2017).

3. Empirical Performance and Transferability

OTF provides substantial empirical improvements in factor reusability, policy learning, and network partitioning.

  • Zero-shot transfer: OTF primitives transfer robustly across agent morphologies (e.g., Walker→Cheetah) and across visual modes (e.g., MovingMNIST digit classes), with relative MSE degradation ("drop") on the order of 20–50% (depending on transform), compared to 58–72% for monolithic vector quantization methods. This indicates an improved separation between local, reusable transition effects and global, context-specific templates (Nam et al., 29 Jun 2026).
  • Policy learning: In downstream task imitation, OTF-LAM and OTF-LAM-DINO demonstrate competitive or superior average returns versus several baselines under distractors. For example, OTF-LAM-DINO achieves αt,k\alpha_{t,k}0 on Cheetah-Run (vs. αt,k\alpha_{t,k}1 for FLAM-4, αt,k\alpha_{t,k}2 for HiLAM) and αt,k\alpha_{t,k}3 on Walker-Run (Nam et al., 29 Jun 2026).
  • Capacity: Increasing the code vocabulary αt,k\alpha_{t,k}4 enhances OTF-LAM performance up to αt,k\alpha_{t,k}5 on specific tasks, while OTF-LAM-DINO peaks around αt,k\alpha_{t,k}6, reflecting αt,k\alpha_{t,k}7 as a tunable capacity rather than a critical hyperparameter.
  • Network partitioning: OTF methods recover meaningful city partitions in Manhattan taxi flow, achieving modularity above αt,k\alpha_{t,k}8 and tight correspondence with known neighborhoods (Yang et al., 2017).

4. Exact Recovery and Theoretical Guarantees

OTF admits strong guarantees under Markov process lumpability and spectral separation. Specifically:

  • Exact recovery: If the Markov chain's partition is lumpable and a sufficient spectral gap is present, OTF achieves exact block recovery with high probability (αt,k\alpha_{t,k}9) after Δstk=1Kαt,kpk,αt,k0,αt0K,\Delta s_t \approx \sum_{k=1}^K \alpha_{t,k}\,p_k, \qquad \alpha_{t,k}\ge 0,\quad \|\alpha_t\|_0\ll K,0 samples (Yang et al., 2017).
  • Sample complexity: To ensure Δstk=1Kαt,kpk,αt,k0,αt0K,\Delta s_t \approx \sum_{k=1}^K \alpha_{t,k}\,p_k, \qquad \alpha_{t,k}\ge 0,\quad \|\alpha_t\|_0\ll K,1 with high probability, Δstk=1Kαt,kpk,αt,k0,αt0K,\Delta s_t \approx \sum_{k=1}^K \alpha_{t,k}\,p_k, \qquad \alpha_{t,k}\ge 0,\quad \|\alpha_t\|_0\ll K,2 transitions suffice.
  • Global convergence: Under properly diminishing stepsizes and mixing, stochastic OTF updates converge almost surely to the span of top eigenvectors of Δstk=1Kαt,kpk,αt,k0,αt0K,\Delta s_t \approx \sum_{k=1}^K \alpha_{t,k}\,p_k, \qquad \alpha_{t,k}\ge 0,\quad \|\alpha_t\|_0\ll K,3.

This suggests that in both controlled and complex environments, OTF enables interpretable and reliable structure discovery from transition data, without explicit supervision or prior knowledge of state/action semantics.

5. Implementation and Algorithmic Details

OTF is instantiated via distinct but conceptually unified procedures in world modeling and Markov network contexts.

OTF Primitives (World Modeling):

  • Training involves an encoder–decoder pipeline with a learned codebook, quantizing patchwise motion or state-difference signals.
  • Sparsity is encoded via Δstk=1Kαt,kpk,αt,k0,αt0K,\Delta s_t \approx \sum_{k=1}^K \alpha_{t,k}\,p_k, \qquad \alpha_{t,k}\ge 0,\quad \|\alpha_t\|_0\ll K,4 loss terms on activations; diversity via pairwise orthogonality penalties.
  • The full pipeline is optimized end-to-end until convergence, then encoder and codebook are frozen for downstream tasks.

OTF-LAM and OTF-LAM-DINO:

Algorithmic stages include:

  • Tokenization of observed motion or transition via the OTF encoder;
  • Embedding, gating, and aggregation of primitive activations;
  • Prediction of subsequent state or DINO feature via a learned dynamic model.

Pseudocode for each stage is provided in (Nam et al., 29 Jun 2026). No new statistics, pseudocode, or tool names not present in the original data are introduced.

Markov Chain Setting:

  • Upon observing a transition Δstk=1Kαt,kpk,αt,k0,αt0K,\Delta s_t \approx \sum_{k=1}^K \alpha_{t,k}\,p_k, \qquad \alpha_{t,k}\ge 0,\quad \|\alpha_t\|_0\ll K,5, Δstk=1Kαt,kpk,αt,k0,αt0K,\Delta s_t \approx \sum_{k=1}^K \alpha_{t,k}\,p_k, \qquad \alpha_{t,k}\ge 0,\quad \|\alpha_t\|_0\ll K,6 is updated via a projected stochastic gradient with nonconvex objectives.
  • Orthonormality is preserved by projection onto the Stiefel manifold (e.g., via QR factorization).

6. Applications, Limitations, and Open Questions

OTF is demonstrated in large-scale partitioning of city traffic networks and in visually complex dynamical systems with distractors or ambiguous transition sources.

  • Applications: Traffic region discovery, zero-shot motion code transfer, improved latent action policy learning, and partition recovery in large networks.
  • Limitations: Success depends on the adequacy of the primitive vocabulary size Δstk=1Kαt,kpk,αt,k0,αt0K,\Delta s_t \approx \sum_{k=1}^K \alpha_{t,k}\,p_k, \qquad \alpha_{t,k}\ge 0,\quad \|\alpha_t\|_0\ll K,7 and the representational adequacy of patchwise encoders. Performance can be sensitive to the specified transforms (e.g., velocity vs. acceleration, Sobel vs. gradient filters).
  • Open questions: How to optimally select or adapt Δstk=1Kαt,kpk,αt,k0,αt0K,\Delta s_t \approx \sum_{k=1}^K \alpha_{t,k}\,p_k, \qquad \alpha_{t,k}\ge 0,\quad \|\alpha_t\|_0\ll K,8 for unseen domains; how to guarantee interpretability of primitives under nonlinear context; and extension to non-Markovian or temporally extended transition factorizations.

7. Connections to Broader Literature

OTF generalizes online matrix factorization paradigms from implicit large-scale networks (Yang et al., 2017) to high-dimensional continuous observation spaces and world modeling for reinforcement learning (Nam et al., 29 Jun 2026).

  • Compared to monolithic vector quantization, OTF provides increased transferability and interpretability by imposing spatial, sparse, and orthogonal structuring on primitive codes.
  • In comparison to classical spectral clustering, OTF is applicable in settings lacking explicit transition matrices, relying instead on raw transition samples or temporal state observations.
  • The OTF-LAM-DINO framework leverages self-supervised vision representation learning (e.g., DINOv2) as a fixed representation space for decoder-free world modeling.

In summary, OTF establishes a scalable, flexible framework for unsupervised structure discovery in dynamically observed systems, bridging graph factorization, visual representation learning, and latent action modeling (Yang et al., 2017, Nam et al., 29 Jun 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Observed Transition Factorization (OTF).