OmniEAR: AI Benchmark & In-Ear Sensing

Updated 3 March 2026

OmniEAR is a dual-purpose platform providing both a comprehensive benchmark for embodied agent reasoning and a novel in-ear multimodal sensing technology.
The benchmark challenges agents with continuous property reasoning, dynamic tool use, and implicit collaboration within graph-structured environments using detailed quantitative metrics.
The in-ear sensing system employs integrated EEG and acoustic sensors with advanced artifact reduction and signal processing, enabling unobtrusive, long-term monitoring of health and neural activity.

OmniEAR refers to two distinct but thematically linked threads in current research: (1) a comprehensive evaluation framework for benchmarking embodied agent reasoning in artificial intelligence, and (2) a technological vision of multimodal in-ear physiological sensing enabling unobtrusive, long-term brain and health monitoring. Both usages share the central motif of holistic, adaptable intelligence—whether in virtual reasoning or in integrated human–machine measurement—but are rooted in separate literatures with explicit continuity in technical goals, design challenges, and impact potential.

1. Embodied Agent Reasoning: The OmniEAR Benchmark

OmniEAR is a unified, open-source framework for assessing the capacity of modern LLM–based agents to reason about, and act within, embodied environments. Unlike prior benchmarks constrained to discrete state spaces and static task structures, OmniEAR systematically challenges agents to acquire physical affordances, engage in tool use, and decide when collaboration is necessary—all within a text-based, graph-structured simulation of the physical world (Wang et al., 7 Aug 2025).

Motivation and Novelty

Existing embodied AI benchmarks (e.g., ALFRED, PARTNR, BEHAVIOR-1K) typically discretize environments (e.g., binary object states), predefine each agent’s action set, and articulate collaboration protocols directly. OmniEAR departs by requiring agents to:

Infer action feasibility via reasoning over continuous object attributes (e.g., weight, temperature, material).
Dynamically expand their capability sets through tool discovery and application (e.g., acquiring the function of a mop).
Decide independently when a task necessitates multi-agent cooperation, driven by implicit environmental constraints rather than explicit directives.

2. Environment and Task Formalization

States in OmniEAR are expressed as directed, attributed graphs, $G_t = (V_t, E_t, A_t)$ , where $V_t$ includes spatial, object, and agent nodes; $A_t$ stores continuous-valued properties per node; and $E_t$ subdivides into static containment and dynamic interaction (proximity) edges. Each task is specified as $\mathcal{T} = (S_{init}, I, G_{goal}, \mathcal{A}_{task})$ , with $S_{init}$ the initial world graph, $I$ a natural-language instruction, $G_{goal}$ a set of logical predicates over $(V, E, A)$ , and $\mathcal{A}_{task}$ the agent set. Agent plans are action sequences $V_t$ 0 transforming $V_t$ 1 to a state satisfying $V_t$ 2.

The scenario generation pipeline employs LLM-driven scene and instruction synthesis refined by rule-based validators, yielding 1,500 scenarios spanning 64,057 objects, 6,381 property types, and 214 action classes across domains from household to industrial to medical.

3. Task Types and Metrics

Tasks are distributed among seven core categories:

Single-Agent: Direct Command (explicit action following), Attribute Reasoning (inference over continuous properties), Tool Use (dynamic capability acquisition), Compound Reasoning (chained subgoals).
Multi-Agent: Explicit Collaboration (stipulated coordination), Implicit Collaboration (autonomous joint action from constraints), Compound Collaboration (full, unscripted integration).

Evaluation employs multiple quantitative metrics:

Metric	Definition	Use Case
Task Success Rate	$V_t$ 3	All tasks
Step Count	$V_t$ 4 (mean for successful runs)	Efficiency analysis
Relative Step Ratio	$V_t$ 5	Path optimality comparison
Reasoning Accuracy	Fraction correct in attribute or tool selection	Subtask granularity

4. Key Findings in Embodied Reasoning

Experiments reveal a pronounced gradient in agent performance as task structure shifts from explicitness to implicit, constraint-driven reasoning:

Direct Command: 85–96% SR.
Tool Reasoning: 56–85% SR.
Implicit Collaboration: 63–85% SR.
Compound (multi-step, multi-agent): >50% failure rates.

A central result is the information-overload paradox: exposure to the full environment state graph ("World Graph") degrades collaboration performance (e.g., Qwen-72B SR drops from 65.4% to 42.5%), attributed to weak selectivity in standard self-attention, which fails to filter task-relevant from irrelevant constraints (paired t-test, $V_t$ 6).

Fine-tuning on expert (oracle) traces leads to dramatic single-agent gains (e.g., 0.6%→76.3% SR for Direct Command, 1.8%→45.0% for Tool Use) but negligible improvements in implicit collaboration (1.5%→5.5%). This asymmetry points to a persistent gap in models’ ability to internalize and apply environmental constraints autonomously.

5. Architectural Implications and Future Directions

Findings from the OmniEAR benchmark underscore fundamental architectural limitations in current LLM-centric agents. Larger model size and increased chain-of-thought planning afford longer action sequences, yet do not confer deep grounding in continuous physical properties. Transformer-based memory saturates above ~7B parameters; smaller models lack planning capacity, larger ones see diminishing returns in attribute reasoning.

Remediation likely requires hybrid neural–symbolic systems, modules for explicit physical-law instantiation (e.g., neural simulators or symbolic physics engines), and advanced attention mechanisms that can dynamically gate constraints in complex, graph-structured worlds. The OmniEAR corpus—spanning a wide attribute/action space and made available as EAR-Sim and EAR-Bench—facilitates rigorous, comparative ablation and architecture testing for such research (Wang et al., 7 Aug 2025).

6. In-Ear Multimodal Physiological Sensing: OmniEAR as Holistic Health Interface

In parallel, OmniEAR is also used to designate a suite of in-ear, multimodal sensing solutions for long-term, unobtrusive health and neural monitoring (Goverdovsky et al., 2016, Kaveh et al., 2020). These platforms seek to overcome the limitations of current on-body systems (e.g., actigraphy, wrist PPG) by exploiting the ear canal’s stable, proximal position for robust cross-modal signal capture.

Mechanical and Electrical Architecture

OmniEAR devices feature a memory-foam (or thermoformed polycarbonate) substrate with integrated sensors:

EEG: Conductive stretch-fabric or silver spray electrodes (typ. 4–6 per earpiece), placed for optimal scalp proximity and referenced to concha contacts.
Acoustic/Mechanical: In-ear MEMS microphones or ECMs detect jaw movement, speech, heart and respiratory sounds.
Interconnection: Low-impedance (<10 kΩ) maintained through foam or springy carriers, with micro-coax wiring to external or embedded digitization modules (e.g., g.tec g.USBamp, WANDmini).

Analog Front End and Signal Conditioning

Custom analog front ends provide:

EEG: Differential preamplification ( $V_t$ 7 ≈ 1000 V/V), 1–45 Hz 4th-order Butterworth filtering, 24-bit ADC (1.2 kS/s), input-referred noise <2 μVpp.
Mechanical/acoustic: High-gain preamp ( $V_t$ 8 ≈ 200 V/V), 100–600 Hz 10th-order Butterworth, 24-bit or 15-bit digitization.

Impedance modeling incorporates constant phase elements (CPE): $V_t$ 9, with empirical Rs, Rct, Q, n parameters fitted per sensor (Kaveh et al., 2020).

Multimodal Processing and Fusion

The processing chain consists of:

Artifact reduction: Motion- or jaw-clench-induced artifacts in EEG are removed via regression on the co-located ECM channel (e.g., $A_t$ 0), with α empirically estimated.
Spectral analysis: Using Welch’s method, features such as ASSR at 40 Hz, SSVEP, alpha power (8–12 Hz), and VEP latencies (e.g., P1/N1 at ~180 ms) are extracted.
Cardiovascular and respiratory detection: Mechanical plethysmography (MPG) and inward-facing ECM yield beat-to-beat intervals and breath rate estimation, with cross-modal correlation ( $A_t$ 1) to ECG and finger PPG.
Sleep scoring: Combined EEG and respiratory features improve stage discrimination, yielding substantial wake vs. NREM agreement ( $A_t$ 2).

Experimental Validation

Testing on 3–5 human subjects demonstrates:

Median in-ear electrode impedance <10 kΩ over 8 h, 231 kΩ (geometric mean) with dry Ag electrodes and PC shell.
EEG SNR: ASSR (ear vs. scalp Cz) ≈10 dB; alpha modulation ratio $A_t$ 3 (ear), outperforming other dry-in-ear EEG.
Jaw-clench artifact attenuation: up to 80% reduction in peak artifact via multimodal regression.
Speech: >80% word recognition of inward-ECM audio in noisy environments.
Battery-powered, wireless streaming operation (44 h) via 2.5×2.5 cm² PCB module (WANDmini).

Applications and Trajectory

Use cases encompass ambulatory sleep assessment, continuous cardiac/respiratory monitoring, artifact-robust EEG for BCI, and covert speech logging. Future developments aim to integrate chemical and haptic sensors, advance embedding of all electronics into the earbud, expand cross-user adaptability, and establish standardized APIs for both biopotential and acoustic channels (Goverdovsky et al., 2016, Kaveh et al., 2020).

7. Concluding Perspective

OmniEAR occupies a dual role: as an embodied agent reasoning testbed it exposes the inability of LLMs to autonomously reason about continuous physics, tool use, and emergent cooperation; as a hardware platform it provides scalable, cross-modal in-ear monitoring for advanced health applications. Both domains reflect the persistent demand for unified, context-sensitive intelligence and measurement—whether in artificial agents or in seamless human–machine interfaces. The open scenarios and datasets released under both threads are foundational resources for benchmarking, algorithmic development, and translational deployment across embodied AI and wearable neurotechnology.

Markdown Report Issue Upgrade to Chat

References (3)

OmniEAR: Benchmarking Agent Reasoning in Embodied Tasks (2025)

Hearables: Multimodal physiological in-ear sensing (2016)

Wireless User-Generic Ear EEG (2020)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to OmniEAR.