Synthetic Perceptual Proxies

Updated 28 January 2026

Synthetic perceptual proxies are engineered systems that replace or approximate natural perception using synthetic data, algorithmic mapping, or physical surrogates.
They are constructed via methods like physical-virtual alignment, neural emulation, and synthetic data-driven techniques to ensure accurate representation and task alignment.
Evaluations combine quantitative metrics and user studies to measure fidelity, adaptivity, and overall utility in diverse applications such as VR/AR and multimodal AI.

A synthetic perceptual proxy is a learned or constructed system—algorithmic, physical, or hybrid—that replaces, approximates, or abstracts one or more aspects of natural perception for the purposes of interaction, representation, measurement, or optimization. Proxies encode, simulate, or emulate perceptual qualities by leveraging synthetic data, algorithmic mapping, or physical surrogates, and are evaluated with respect to their fidelity, adaptability, and alignment with human or task-relevant perception. Applications span virtual and mixed reality, perceptual metric learning, dataset evaluation, scene understanding, object manipulation, sound synthesis, and multimodal AI mediation.

1. Foundations and Definitions

Synthetic perceptual proxies act as intermediaries between unobservable or non-differentiable systems and algorithms designed for perception, interaction, or evaluation. This includes:

Physical-digital surrogates in VR/AR where an object in the physical world (or an abstracted version) is rigidly mapped to a virtual counterpart for high fidelity haptic and proprioceptive feedback (Savino, 2020, Kahl et al., 2023).
Learned metrics and embeddings that encapsulate human or task-relevant similarity relations between signals (images, audio, text) using synthetic or automatically annotated data (Fu et al., 2023, Combes et al., 9 Sep 2025).
Algorithmic stand-ins for non-differentiable or black-box systems, such as neural networks trained to emulate perceptual responses for downstream optimization (Combes et al., 9 Sep 2025).
Abstracted representations that enable decoupled selection and manipulation of objects, mediating input independently from physical constraints (Liu et al., 23 Jul 2025).

The key requirement is that the proxy is explicitly optimized, constructed, or designed to replace real perceptual signals or affordances without requiring complete fidelity to the original.

2. Construction Methodologies

2.1 Physical–Virtual Alignment and Registration (VR/AR)

Synthetic perceptual proxies in mixed or virtual reality often involve tracked physical dummies matched geometrically and semantically to virtual objects. Calibration minimizes both positional and rotational alignment error—e.g., the Umeyama algorithm for rigid-body transforms is used to register physical coordinate frames to their virtual twins, with residuals typically on the order of $E_\text{pos} \approx 2~\text{mm}$ and $E_\text{rot} < 0.5^\circ$ (Savino, 2020). Interaction is grounded in real touch or manipulation but overlaid with virtual content.

2.2 Neural/Algorithmic Proxies

Neural synthetic perceptual proxies are trained to approximate a complex, often non-differentiable, perceptual process. For example, audio proxies $f_\phi(\theta)$ map synthesizer control vectors $\theta$ to a perceptually informed embedding space, supervised against the output of a black-box synthesizer $s(\theta)$ processed by a pretrained feature model $E[s(\theta)]$ (Combes et al., 9 Sep 2025). Proxies are optimized with respect to norms in embedding space (commonly $L_1$ or $L_2$ ), and architectures include MLPs, highway networks, RNNs/GRUs, and transformers.

2.3 Synthetic Data-Driven Proxy Construction

Perceptual metrics or models are often learned from synthetic data where perturbations systematically span perceptual dimensions. DreamSim, for instance, uses diffusion models to generate image triplets with controlled variations in pose, layout, and appearance to sample a space of mid-level visual similarity, enabling learning of metrics that align to human judgments on holistic similarity (Fu et al., 2023).

2.4 Abstraction and Simplification Tolerances

Proxy abstraction studies quantitatively map the bounds within which simplified representations remain perceptually functional. In OST AR, deviations up to $1.4~\text{cm}$ in principal shape parameters of proxy objects do not degrade users' alignment accuracy, sense of presence, or task performance for everyday manipulation (Kahl et al., 2023). Beyond empirically grounded thresholds, users detect mismatches and experience degraded usability.

3. Evaluation Metrics and User Studies

Evaluation of synthetic perceptual proxies combines quantitative error measurement, human subject studies, and proxy-task correlation analysis.

Physical-digital proxies: Misplacement ( $E_\text{pos}$ ), rotation error ( $E_\text{rot}$ ), touch accuracy, and subjective intuitiveness via System Usability Scale (SUS) (Savino, 2020).

Learned neural proxies: Embedding distance (e.g., mean $|f(\theta)-E[s(\theta)]|_1$ ), retrieval mean reciprocal rank (MRR), and generalization gap from synthetic to human-crafted data (Combes et al., 9 Sep 2025).

Metric learning proxies: 2AFC agreement with human similarity judgments, JND triplets, retrieval/classification accuracy, and out-of-domain generalization performance on real-world datasets (Fu et al., 2023).

Abstraction proxies: Task completion time, alignment error, and Likert-scale ratings of fit/realism/ease, analyzed for statistical significance via Friedman and Wilcoxon tests (Kahl et al., 2023).

Proxy-based dataset evaluation: Correlation of proxy-based metrics (MMD, PAD, LENS) with downstream task accuracy on real data, with top-3 selection deltas and Pearson/Spearman coefficients as primary indicators (Chen et al., 6 Nov 2025).

4. Proxy Fidelity, Adaptivity, and Utility Trade-Offs

The design of synthetic perceptual proxies is characterized by explicit trade-offs:

Fidelity ( $F$ ): Exponential decay in utility as matching error increases, e.g., $F = \exp(-E_\text{pos}/E_\text{crit})$ (Savino, 2020). High fidelity ensures near-identical percepts or interactions.
Adaptivity ( $A$ ): Coverage of the proxy across multiple virtual or task contexts, $A = N_\text{virt}/N_\text{max}$ , where $N_\text{virt}$ is the number of objects served (Savino, 2020).
Combined Utility ( $U$ ): Weighted sum $U = w_f\,F + w_a\,A$ where preference for fidelity or adaptivity is task-dependent.

Typical systems, such as domain-specific VR proxies, accept low adaptivity (e.g., $A\approx0.05$ –$0.20$) in exchange for extremely high fidelity ( $F\approx0.98$ –$1.0$), yielding precise, naturalistic interaction within narrow task constraints. Greater adaptivity requires tolerating increased matching error, abstraction, or complexity in proxy construction.

5. Proxy Modalities and Applications

Synthetic perceptual proxies are deployed across diverse modalities and systems:

Virtual/Mixed Reality: Size- and shape-matched proxies for realistic haptic feedback and proprioceptive cuing, including paper maps, smartphones, and abstracted physical surrogates (Savino, 2020, Kahl et al., 2023).
Multiview Mesh Recovery: Diffusion-generated dense pixel-wise proxies (segmentation + UV maps) for fitting 3D human meshes in vision pipelines, achieving zero-shot generalization (Wang et al., 5 Jan 2026).
Similarity and Quality Metrics: Learned models (e.g., DreamSim) predicting perceptual similarities between signals, trained on synthetic perturbations and filtered/batched human judgments (Fu et al., 2023).
Black-box System Surrogates: Neural proxies enabling end-to-end differentiable training for non-differentiable processes such as sound synthesis, facilitating downstream matching and retrieval tasks (Combes et al., 9 Sep 2025).
Synthetic Dataset Selection: Proxy metrics (e.g., MMD, PAD, LENS) measuring correspondence between synthetic and real datasets for downstream performance estimation without explicit annotation (Chen et al., 6 Nov 2025).
Semantic Mediation: Systems such as Semantic See-through Goggles where an AI pipeline compresses visual input into language and reconstructs an image, making explicit the information bottleneck introduced by a perceptual proxy (Muramoto et al., 2024).
MR Object Interaction: Reality Proxy mediates physical interaction via abstract, AI-enriched object proxies that support complex selection, semantic filtering, and group navigation tasks (Liu et al., 23 Jul 2025).

6. Limitations, Bias, and Future Directions

Synthetic perceptual proxies inherit and expose the limitations of their construction methods:

Generalization gaps: Neural proxies trained on synthetic or random parameterizations often underperform on human-crafted or out-of-domain data clusters, suggesting the need for domain adaptation and hybrid/semi-supervised methods (Combes et al., 9 Sep 2025).
Bias and semantic drift: Linguistic bottlenecks (as in SSTG) lead to loss of spatial and temporal detail, and inherent model biases (race, gender) manifest in reconstructed proxies (Muramoto et al., 2024).
Variance in metric evaluation: Embedding-based dataset proxies can exhibit high variance, especially on ambiguous inputs; strong LLM proxies (LENS) incur inference costs and are sensitive to rubric quality (Chen et al., 6 Nov 2025).
Abstraction tolerances: Empirical shape deviation thresholds for abstract proxies are object- and task-dependent, requiring per-domain calibration (Kahl et al., 2023).

Advancements may include task-adaptive proxy selection, learning with richer synthetic data, multimodal or hierarchical proxy design, theoretical modeling of proxy reliability, and explicit bias auditing frameworks.

7. Theoretical Implications and Broader Impact

Synthetic perceptual proxies formalize and render explicit a longstanding theme in cognitive and AI systems: that perception is always mediated by information bottlenecks, abstraction, and task-specific encoding. As such, they serve as:

Tools for AI transparency: By living inside or interacting through a proxy, users and researchers can audit, debug, and understand how abstraction, misalignment, or bias arise (e.g., linguistic VR in SSTG) (Muramoto et al., 2024).
Foundations for universality and transfer: Synthetic proxies enable cross-domain transfer (e.g., dataset selection, metric learning) by abstracting away task-irrelevant idiosyncrasies while preserving actionable information (Chen et al., 6 Nov 2025, Fu et al., 2023).
Blueprints for interaction design: Measured abstraction boundaries in proxy shapes or surrogates inform VR/AR design, haptic rendering, and user experience optimization (Kahl et al., 2023).

These systems collectively represent a paradigm shift from modeling ground-truth perception to engineering, learning, and studying tractable, synthetic proxies that are both actionable and analyzable.