Attractor-Based Frame-Level Modeling

Updated 30 January 2026

Attractor-based frame-level modeling is a framework that binds spatiotemporal features into stable Gestalt states using recurrent neural dynamics.
The approach employs LSTM architectures with retrospective inference and mutually-exclusive softmax to dynamically bind features and predict canonical motions.
Experimental demonstrations, such as the silhouette illusion, validate its ability to resolve ambiguous sensory inputs by switching attractor states.

Attractor-based frame-level modeling is a computational framework for dynamic perceptual inference that combines recurrent neural architectures, feature-binding mechanisms, and latent bias adaptation to resolve ambiguous sensory inputs into stable, interpretable Gestalt states. This approach, exemplified by the model described in "Binding Dancers Into Attractors" (Kaltenberger et al., 2022), implements perception as an online process that binds spatiotemporal features into canonical entities and infers observer-centric viewpoints through attractor dynamics in recurrent networks.

1. LSTM-based Gestalt Encoding

The core of attractor-based modeling in this paradigm is a long short-term memory (LSTM) architecture that predicts canonical 3D motion dynamics from input sequences of feature markers. At each time step $t$ , the network receives $N$ body-marker features, each comprising a 3D position $p^t_j \in \mathbb{R}^3$ and velocity $v^t_j \in \mathbb{R}^3$ . For $N=15$ , the input vector $x^t \in \mathbb{R}^{90}$ .

The LSTM cell is instantiated with hidden state $h^t \in \mathbb{R}^{100}$ and cell state $c^t_{\text{cell}} \in \mathbb{R}^{100}$ (silhouette task: $300$ units). The read-out layer $W^{\text{out}} \in \mathbb{R}^{90 \times 100}$ , with bias $b^{\text{out}} \in \mathbb{R}^{90}$ , predicts the next frame: $\hat{x}^{t+1} = W^{\text{out}} h^t + b^{\text{out}}$ LSTM update follows standard equations. In closed-loop operation, the LSTM’s state converges to a periodic orbit—a limit-cycle attractor—that corresponds to a learned Gestalt motion pattern (walking, spinning, etc.).

2. Retrospective Inference of Binding and Perspective

Online, the model infers latent bias variables at each frame: the binding activities $A \in \mathbb{R}^{N \times N}$ , translational bias $c \in \mathbb{R}^3$ , and rotational bias $q$ (unit quaternion). Retrospective inference is performed by minimizing the prediction error over a temporal window of length $H \geq 10$ using the smooth-L1 (Huber) loss: $\mathcal{L}_t(A,c,q) = \sum_{s = t-H+1}^t \ell\bigl(x^s, \hat{x}^s(A, c, q)\bigr)$ Gradients of $\mathcal{L}_t$ with respect to each latent are computed via backpropagation-through-time and updated using momentum: $A \leftarrow A - \eta^b \frac{\partial \mathcal{L}}{\partial A}, \quad c \leftarrow c - \eta^t \frac{\partial \mathcal{L}}{\partial c}, \quad q \leftarrow q - \eta^r \frac{\partial \mathcal{L}}{\partial q}$ The quaternion $q$ is normalized after each update to enforce $|q| = 1$ . Sign-damping is optionally applied to stabilize gradients.

3. Mutual-Exclusive Softmax Feature Binding

The model binds observed features to canonical entities using a mutually-exclusive softmax scheme, enforcing near one-to-one assignments. Given raw binding logits $A = (a_{ij}) \in \mathbb{R}^{N \times N}$ , row-wise softmax yields selection probabilities: $b^{rw}_{ij} = \frac{\exp(a_{ij}/\tau^{rw})}{\sum_{j'}\exp(a_{ij'}/\tau^{rw})}$ Column-wise softmax excludes duplicate assignments: $b^{cw}_{ij} = \frac{\exp(a_{ij}/\tau^{cw})}{\sum_{i'}\exp(a_{i'j}/\tau^{cw})}$ The final binding assignment is: $b_{ij} = \sqrt{b^{rw}_{ij} \cdot b^{cw}_{ij}}$ Temperature $\tau$ is annealed over time to sharpen assignments, with outcast features channeled to a “reject” row for asymmetric binding.

4. Attractor States and Dynamic Stability

Each trained motion pattern yields a distinct attractor—a limit cycle in LSTM state-space $(h^t, c^t_{\text{cell}})$ . In closed-loop mode, after teacher-forcing on initial $H_{\text{init}}$ frames, the network’s state converges to a periodic orbit: $h^{t+T} \approx h^t,\quad \hat{x}^{t+T} \approx \hat{x}^t$ for some period $T$ . Perturbations $\Delta h^0$ decay exponentially,

$h^t - h^{t-1} \to 0$

implying asymptotic convergence to the basin of a Gestalt attractor. Prediction errors minimize as the system settles onto the most plausible canonical interpretation.

5. Training Paradigms and Hyper-parameters

Training proceeds with mean-squared-error loss for offline LSTM fitting: $\mathcal{L}_{\mathrm{train}} = \frac{1}{T} \sum_{t=1}^T\|\hat{x}^t - x^t\|^2$ Adam optimizer is used (lr = 0.01), with noise injection $U(-\delta, \delta)$ on inputs ( $\delta \approx 2\cdot10^{-5}$ for walker, $10^{-4}$ for dancer). Batches are 10 (walker) or 20 (dancer) consecutive frames; epochs: 2000 (walker), 500 (dancer); hidden units: 100 (walker), 300 (dancer).

Online retrospective inference optimizes smooth L1 loss over horizon $H=10$ , with learning rates and momentum from Table I in (Kaltenberger et al., 2022). Tuning cycles per step are $cyc=1$ (walker), $cyc=4$ (silhouette), and sign-damping $\alpha=0.9$ –$0.95$.

6. Silhouette Illusion: Experimental Demonstration

To probe perceptual bistability, the system is trained on four Gestalt attractors—D $^+$ (un-mirrored CCW rotation), D $^-$ (CW), E $^+$ (mirrored CCW), E $^-$ (mirrored CW). During inference (partial observation: $x$ , $y$ only; depth $z$ latent):

Initial 80 frames: binding matrix $A$ is clamped, perspective fixed, network settles into an attractor (e.g., D $^+$ ).
Post 80 frames: $A$ is unfixed, RI proceeds, attractor persists.
At frame 200, a true depth cue ( $z$ ) for one feature (left hand) is injected from the opposite Gestalt (E $^-$ ); temperature $\tau$ is reset.
Within $\sim$ 50 frames, hidden state flips to E $^-$ attractor, binding matrix $A$ reconfigures to mirrored structure.

Read-outs monitor feature-binding error $\mathrm{FBE}(k)$ ,

$\mathrm{FBE}(k) = \sum_j \sqrt{\sum_i (b_{ij}^k - b_{ij}^{opt})^2}$

and prediction MSE, which spikes at the attractor switch then stabilizes. Reconstruction of depth and rotation direction confirms ambiguity resolution and CW $\leftrightarrow$ CCW flip.

7. Claims on Universality and Broader Applicability

The model’s mechanisms—temporal Gestalt encoding via RNN attractors, retrospective inference, and mutually-exclusive softmax binding—are proposed as general solutions for perceptual interpretation, conceptual event binding, and language (binding words to semantic roles). Connections are drawn to predictive coding and free-energy models: adaptation of latent biases approximates inferring causes by prediction-error minimization.

A plausible implication is extension to multi-object scenes, audio streams, and static bistable phenomena (e.g., Necker cube) via hybrid dynamic/static Gestalt modules. This suggests that attractor-based frame-level modeling can serve as a universal schema for real-time perceptual inference across domains (Kaltenberger et al., 2022).

Markdown Report Issue Upgrade to Chat

References (1)

Binding Dancers Into Attractors (2022)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Attractor-Based Frame-Level Modeling.

Attractor-Based Frame-Level Modeling

1. LSTM-based Gestalt Encoding

2. Retrospective Inference of Binding and Perspective

3. Mutual-Exclusive Softmax Feature Binding

4. Attractor States and Dynamic Stability

5. Training Paradigms and Hyper-parameters

6. Silhouette Illusion: Experimental Demonstration

7. Claims on Universality and Broader Applicability

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Attractor-Based Frame-Level Modeling

1. LSTM-based Gestalt Encoding

2. Retrospective Inference of Binding and Perspective

3. Mutual-Exclusive Softmax Feature Binding

4. Attractor States and Dynamic Stability

5. Training Paradigms and Hyper-parameters

6. Silhouette Illusion: Experimental Demonstration

7. Claims on Universality and Broader Applicability

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research