I2PERCEPTION: Cognitive & Computational Perception

Updated 24 December 2025

I2PERCEPTION is a multifaceted framework combining Bayesian, empirical, and information-theoretic models to formalize both biological and artificial perception.
It bridges high-level cognitive principles with computational mechanisms, enabling robust applications in robotics, neural decoding, and computer vision.
Key techniques such as KL divergence, perceptual information metrics, and active sampling strategies support systematic design and analysis of perceptual systems.

I2PERCEPTION comprises a set of technical frameworks, computational architectures, empirical models, and theoretical constructs that formalize the processes by which agents (biological or artificial) perceive, interpret, and adapt to their environment. The term spans applications from 3D object recognition using Bayesian inference to models of interactive robotic perception, to formalizations in perception engineering, information-theoretic metrics of perceptual quality, and distributed models of multistable perception. I2PERCEPTION systematically bridges high-level cognitive principles with computational mechanisms, providing a principled basis for perception modeling, experimental analysis, and engineering design.

1. Foundations of I2PERCEPTION: Bayesian, Empirical, and Information-Theoretic Models

At the core of many I2PERCEPTION implementations is the Bayesian generative-inference model. In "Bayesian Brain: Computation with perception to recognize 3D objects," I2PERCEPTION models the recognition of a 3D object from multiple 2D projections as sequential Bayesian binomial inference over a latent recognition success probability $Q$ (Ray, 2022). Observations are summarized as a count of correct recognitions $k$ out of $n$ views, giving the binomial likelihood: $p(k | n, Q) = {n \choose k} Q^k (1-Q)^{n-k}$ An empirical-Bayes $\mathrm{Beta}(a,b)$ prior is maintained over $Q$ , updated iteratively via the closed-form conjugate posterior: $p(Q|k,n) = \mathrm{Beta}(a+k, b+n-k)$ The model accommodates sequential updates over batches of observed aspects, integrating empirical data and prior beliefs to converge on a robust estimate of object identity. This formalism exemplifies "computation with perception"—dynamic inference that mimics cognitive adaptation.

In perceptual quality modeling, I2PERCEPTION instantiates information-theoretic objectives such as the Perceptual Information Metric (PIM) (Bhardwaj et al., 2020), which maximizes the multivariate mutual information $I(X;Y;Z)$ for temporally adjacent frames. Key architectural choices—multi-scale divisive normalization, translation equivariance—reflect properties of human vision. The final perceptual metric is given by the symmetric KL divergence of learned latent distributions: $\mathrm{PIM}(x, y) = D_{\mathrm{KL}}[q(z|x) \| q(z|y)] + D_{\mathrm{KL}}[q(z|y) \| q(z|x)]$ PIM demonstrates robust alignment with human judgment in image assessment without supervised training, emphasizing the value of unsupervised, efficient-coding–based I2PERCEPTION architectures.

2. Cognitive, Biological, and Distributed Interpretations

Cognitively motivated I2PERCEPTION frameworks postulate that perception is a product of passive, distributed, or inference-driven processes. In "Illusions – a model of mind" (Maniatis, 2017), perception and consciousness are modeled as unfree, passive cascades of neural activity, with all apparent agency or selfhood arising as emergent illusions from physiological dynamics. No explicit symbolic framework is provided; instead, the model is cast as a critique of "inner agent" metaphors, positing that the felt sense of perception arises from internally phrased, automatic associations (the "silent communication" in the grammar of "I").

Distributed communication models of I2PERCEPTION formalize multistable and monostable perception in terms of information complementarity among sensory components (Kong, 2023). A sensory vector $\mathbf{X} = (X_1, \ldots, X_n)$ conveys information about a perceptual variable $W$ . If components are substitutes ( $I(X_i; X_j|W) = 0$ ), monostability ensues; complementarity ( $I(X_i, X_j; W) > I(X_i; W) + I(X_j; W)$ ) enables multistability and order effects. The communication protocol, with deterministic agent policies and log-score utility,

$v(h, \omega) = \log \mathbf{q}_h(\omega)$

balances communication cost and consensus regret. Associated optimization and market-theoretic perspectives illuminate local-optimum behavior and strategic neural coordination.

3. Perception Engineering: Formal I2SPACE and Intent-to-Perception Dynamics

Perception engineering frames I2PERCEPTION as the deliberate design of environmental manipulations to steer a receiver agent's perception (LaValle et al., 27 Mar 2024). Producer and receiver traverse their respective information spaces ( $I^p$ , $I^r$ ); the producer's actions induce environmental transitions that are measured by the receiver's sensor model $h^r$ . The minimal I-state is formalized as the preimage

$I = \{ h^{-1}(y) : y \in Y \}$

and advances through prediction–update cycles: $X_{k+1}^- = \{ f(x, u_k) : x \in X_k^+ \},\quad X_{k+1}^+ = X_{k+1}^- \cap h^{-1}(y_{k+1})$ An I2PERCEPTION design is successful if each $I^r_k$ is plausible under the receiver model but systematically diverges from reality—a perceptual illusion constructed for a specific agent.

Metrics such as plausibility robustness and forced fusion ( $\delta_P$ , $\epsilon_P$ ) quantify the resilience and depth of perceived illusions. This formalism supports analysis of art, virtual reality, robotics, and adversarial perception manipulation, enabling scientific, engineering, and cognitive advances within a common language.

4. Interactive and Task-Driven Perception: Robotics and Active Sampling

I2PERCEPTION supports action-perception coupling, exemplified in interactive robotics where perception is constructed through exploration and manipulation. In (Goff et al., 2019), robots build a moveability relevance map via interactive probing: regions of an RGB-D scene are repeatedly pushed, and movement feedback is used to train an online Collaborative Mixture Models (CMMs) classifier. The probability of moveability for region $i$ is: $R_i = P(L=1|X_i) = \frac{1 + \Gamma_{1}(X_i)}{2 + \Gamma_{1}(X_i) + \Gamma_{0}(X_i)}$ where $\Gamma_l(X)$ is the density of a learned Gaussians mixture for class $l$ . The region to be probed next is sampled according to a weighting that favors high classifier uncertainty and unexplored areas: $P_c(X_i) = u(X_i)\, [1 - c(X_i)]$ This active strategy ensures efficient data collection and model improvement, yielding high segmentation accuracy on both simulated and real systems.

For deformable object manipulation, the interaction of perception and action is formalized as a constrained POMDP, with camera and manipulator actions optimized within a dynamically constructed manifold (DAVS) that aligns camera viewpoints with structure-of-interest (Weng et al., 8 Mar 2024). The camera's action space is encoded as a low-dimensional submanifold, reshaped via sequential geometric computation to maximize exposure of occluded features and synergy between sensing and manipulation.

5. Brain Decoding, Semantic Intervention, and Neural-Perceptual Coupling

Neural decoding instantiates I2PERCEPTION as the mapping from brain activity to semantic and spatial perceptual predictions (Xu et al., 3 Jul 2025). Here, fMRI-derived representations are injected into multi-scale features of an object detection and segmentation network via shared cross-attention at each level: $A^{(\ell)} = \mathrm{Softmax}\Bigl(\frac{Q^{(\ell)} K^\top}{\tau\sqrt{d}}\Bigr)$ where $Q^{(\ell)}$ and $K$ are projected feature and fMRI tokens, and the resulting attended features

$\widetilde{F}^{(\ell)} = \mathrm{LayerNorm}(F^{(\ell)} + A^{(\ell)} V W_O)$

demonstrate measurable improvements in object recall (AR) and the emergence of feature distributions and attention activations closely aligned with semantic categories encoded in fMRI. This architecture confirms that brain signals contain rich multi-object information and that "I2PERCEPTION" can enhance model interpretability and performance by directly intervening within network feature hierarchies.

6. Perception as Structured Interpretation and Its Extension via IIT

In theories rooted in Integrated Information Theory (IIT), such as "Intrinsic meaning, perception, and matching," perception is modeled as a process where environmental stimuli serve merely as triggers that select from the system's pre-existing maximally irreducible cause-effect structures (Φ-structures) (Mayner et al., 30 Dec 2024). For any stimulus $x$ , the system's perceptual structure is characterized by the triggering coefficient $\tau(x, m)$ for each mechanism $m$ , reflecting how much $x$ contributes to activating that internal distinction, and the overall perceptual richness: $\mathcal{P}(x, s) = \sum_{c \in \mathrm{C}^\Phi(s)} \pi(x, c)$ where $\pi(x, d(m)) = \tau(x, m)\, \varphi_d(m)$ and $\varphi_d(m)$ is the irreducibility of the distinction.

Perceptual differentiation is then defined as the diversity of triggered internal structures across sequences of stimuli, with “matching” between systems and environments maximizing this diversity and thus the meaningfulness of perception. This theoretical construct formalizes perception as a weighted, structured selection from intrinsic meaning, independent of representational fidelity to the environment.

7. Applications, Limitations, and Forthcoming Directions

I2PERCEPTION spans experimental design, sensory processing models, robotic affordance mapping, neural decoding, and perception engineering:

In dense perception tasks such as depth or normal estimation, editing-based diffusion models (Edit2Perceive) optimize for pixel-space consistency as well as latent structural priors to outperform traditional text-to-image–based backbones (Shi et al., 24 Nov 2025).
In semiotic networks, perceptual inference is realized as convergent attractor dynamics in an autoencoder, yielding perceptualized classifiers with strong generalization under data scarcity (Kupeev et al., 2023).
In decision and group cognition, I2PERCEPTION is operationalized using cognitive move diagrams (CMD), visualizing agent-wise or group-wise transitions and exposing the magnitude and heterogeneity of information perception biases (Iorio et al., 2014).

Common limitations include abstraction from high-dimensional structure, reliance on simplifying assumptions, and challenges in extending compact formalisms to complex, noisy, or embodied real-world settings (Ray, 2022, Goff et al., 2019, Kong, 2023). Current research trends focus on scaling hierarchical Bayesian models, connecting neural and machine perception, and formalizing intervention-based and end-to-end interpretability frameworks.

Continued development of I2PERCEPTION promises advances in robust, adaptive perception systems, interpretable neural and robotic cognition, and principled design of environments for both artificial and biological receivers.