Papers
Topics
Authors
Recent
Search
2000 character limit reached

Percept-WAM: Neural and Autonomous Models

Updated 19 December 2025
  • The paper introduces Percept-WAM to encode perceptual metrics via coupled oscillator interactions that converge to accurately represent spatial relationships.
  • Percept-WAM integrates 2D/3D world-token embeddings within vision–language models, boosting object detection and trajectory prediction in autonomous driving.
  • The framework employs grid-conditioned dense prediction with AR decoding, advancing both neuroscientific modeling and robust world-awareness for AI control.

Percept-WAM refers to distinct frameworks in contemporary computational neuroscience and embodied AI, each focused on embedding world structure into dynamic representations. In neuroscience, Percept–WAM (Weighted Adjacency Matrix) denotes a formal system for encoding perceptual metrics via coupled neural-like oscillators, such that mutual interactions precisely reflect distances in perceptual space (Kraikivski, 2019). In autonomous driving, Percept-WAM represents a unified World-Awareness-Action Model that explicitly incorporates learned 2D/3D spatial world tokens within a vision–LLM, enabling robust perception and direct mapping to action (Han et al., 24 Nov 2025).

1. Mathematical Formulation of Percept–WAM in Conscious Perception

Percept–WAM, as established by Kraikivski (Kraikivski, 2019), models the encoding of a specific conscious percept as a system of n coupled oscillator processes P=(P1,,Pn)TP = (P_1, \ldots, P_n)^T. A perceptual structure is specified by points x1,,xnx_1, \ldots, x_n in a chosen metric space, for which the Weighted Adjacency Matrix (WAM) AA encodes all inter-process relationships:

  • For iji \neq j, Aij=γxixj2A_{ij} = \gamma \|x_i - x_j\|^2;
  • Aii=0A_{ii} = 0, where γ\gamma is a scaling parameter often set such that $1$ is an eigenvalue of AA, enforcing the steady-state constraint det(AI)=0\det(A - I) = 0.

The network's evolution is governed either via first-order:

dZidt=Pi,dPidt=j=1nAijPj(Zi+Pi),\frac{dZ_i}{dt} = P_i,\qquad \frac{dP_i}{dt} = \sum_{j=1}^n A_{ij} P_j - (Z_i + P_i),

or equivalently, the second-order system:

d2Zidt2+dZidt+Zij=1nAijdZjdt=0,\frac{d^2 Z_i}{dt^2} + \frac{dZ_i}{dt} + Z_i - \sum_{j=1}^n A_{ij} \frac{dZ_j}{dt} = 0,

enforcing that, in the long-time limit, the system's amplitudes PP converge to a self-interpretable state P=APP = AP.

2. Properties and Interpretation of the WAM Oscillator System

By design, the WAM AA is symmetric, hollow (Aii=0A_{ii}=0), and parameterized to possess the uniqueness condition 1Spec(A)1 \in \operatorname{Spec}(A). This ensures the following operational property: at steady state, each process PjP_j is representable solely by a weighted sum of its complement, i.e., Pj=iAjiPiP_j = \sum_i A_{ji} P_i, where iji \ne j. This completeness or self-interpretation guarantees that the amplitude vector PP encodes the metric relations among points, preserving the perceptual structure in the oscillatory regime.

Empirically, system trajectories initialized away from P=APP = AP quickly converge with all Pj(t)P_j(t) satisfying Pj(t)=iAjiPi(t)P_j(t) = \sum_i A_{ji}P_i(t) after transient phases. Numerical studies confirm robustness to initial condition variations and demonstrate stable convergence for various nn and γ\gamma.

3. Neural-Inspired Encoding and Applications

Functionally, the WAM acts as a memory trace for perceptual similarity, and the oscillator system provides a dynamic method of encoding spatial (or feature-based) relationships via amplitude patterns rather than mean firing rates. For points xix_i arranged in Rd\mathbb{R}^d, the amplitude relationships remain isomorphic to the metric of the chosen perceptual geometry. This model has been proposed as a functional analogy to how neural populations maintain relational spatial or feature maps in cortex, where oscillatory amplitude carries computational meaning.

Illustrative examples include systems of n=2,5,9,10n=2,5,9,10 units, where for carefully chosen γ\gamma the system universally converges to an amplitude profile matching the Euclidean distance-squared structure among the xix_i. The construction is robust, with high reproducibility of the targeted perceptual map regardless of multi-site initializations.

4. Percept-WAM for Robust World-Awareness-Action in Autonomous Driving

Distinct from the dynamical systems context, Percept-WAM in embodied AI (Han et al., 24 Nov 2025) extends the principles of world-state embedding to perception and control for autonomous driving via deep learning. The architecture integrates 2D/3D scene understanding within a single vision–LLM (VLM), avoiding explicit spatial reasoning by using two token families:

  • World-PV tokens, ti,jpvRH×W×C{t^{pv}_{i,j}} \in \mathbb{R}^{H \times W \times C}, represent perspective-view spatial features;
  • World-BEV tokens, tu,vbevRH×W×C{t^{bev}_{u,v}} \in \mathbb{R}^{H' \times W' \times C}, encode bird’s-eye-view metrics.

Detection heads decode these tokens into sequences expressing object class, geometric parameters, and confidence, discretized into BB bins and trained with a cross-entropy objective. The grid-conditioned dense prediction mechanism interpolates object-centric queries directly from the world-token grids, supporting parallel autoregressive (AR) decoding and explicit IoU-aware scoring, which empirically reduces false positives in challenging settings.

5. Model Architecture, Training, and Evaluation Protocols

Percept-WAM leverages an InternVL2-8B VLM backbone, retaining general-purpose visual–linguistic representations while extending them with BEV cross-attention, Transformer-based decoding heads for both PV and BEV, and a trajectory Action Head. Training is structured in two stages: first, spatial perception and driving QA (combining detection, segmentation, and auxiliary tasks); second, trajectory imitation learning using a SmoothL1 loss on predicted waypoints.

Experiments cover a diverse range of datasets (COCO, nuImages, nuScenes, Waymo, NAVSIM), with metrics including 2D/3D detection mean Average Precision (mAP), segmentation IoU, and open-/closed-loop motion control statistics. Main results indicate that Percept-WAM achieves 51.7/58.9 mAP on COCO 2D and nuScenes BEV 3D detection, outperforming existing detectors such as DINO and PointPillars. Comprehensive closed-loop performance on NAVSIM (PMDS = 90.2) demonstrates improved planning compared to DiffusionDrive.

6. Key Contributions, Limitations, and Future Directions

Percept-WAM's central innovations include:

  • Explicit world-token embeddings in both perspective and metric (BEV) spaces, encoding coordinates and confidence within a unified VLM.
  • Grid-conditioned dense prediction with IoU-aware scoring and parallel AR decoding, boosting reliability in long-tail and far-range conditions.
  • A unified perception-to-action paradigm that supports both rich scene understanding and low-latency trajectory prediction.

Identified limitations include uniform task mixing in training (suggesting mixture-of-expert routing could yield additional efficiency), a reliance on imitation learning for planning (where reinforcement learning could better align to closed-loop objectives), and latency bottlenecks in streaming scenarios (potentially addressable through adaptive cache optimization). A plausible implication is that further development of world-tokenized VLMs could generalize the paradigm to more complex, open-set environments and multi-agent interaction scenarios.

7. Comparative Table: Percept-WAM Across Domains

Domain Core Representation Key Mechanism
Conscious Perception WAM matrix of perceptual space Coupled oscillators satisfy P=APP=AP at steady state
Autonomous Driving World-PV/BEV token grids VLM-based dense prediction with AR decoding & IoU scoring

The unifying theme across both usages is the explicit embedding of world-structure into process dynamics—be it oscillatory interactions (conscious percept) or spatial-tokenized deep models (autonomous driving)—to support self-interpretable and robust world-awareness (Kraikivski, 2019, Han et al., 24 Nov 2025).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Percept-WAM.