Robotic Ultrasound System (RUSS)
- Robotic Ultrasound Systems are integrated platforms that autonomously control US probes using advanced robotics and real-time sensing.
- They combine precise robot kinematics, force/torque sensing, and AI-driven perception to standardize imaging and enhance reproducibility.
- RUSS leverages learning from demonstration, deep learning segmentation, and multimodal feedback to improve diagnostic accuracy and workflow automation.
A Robotic Ultrasound System (RUSS) is an integrated hardware–software platform that autonomously or semi-autonomously manipulates an ultrasound (US) probe for medical imaging, replacing or augmenting the human sonographer. By leveraging real-time sensing (US imaging, surface geometry, force/torque), advanced robot kinematics and control, and often deep learning modules for perception and guidance, RUSS aims to standardize image acquisition, improve reproducibility, reduce operator dependence, and enable diagnostic access in settings where skilled sonographers are scarce. Recent research developments focus on anatomy-aware probe servoing, adaptive force and orientation control, learning from expert demonstrations, and closed-loop operation informed by multi-modal feedback.
1. System Architectures and Hardware Integration
Most RUSS implementations are built around collaborative industrial robot arms (6–7 DoF, e.g., Franka Emika Panda, KUKA LBR iiwa, UR5e), often equipped with torque or force/torque sensing and additional custom end-effectors. A-SEE–class end-effectors use integrated distance sensors (e.g., laser or RGB-D arrays) to sense local surface geometry and maintain probe alignment normal to the patient surface (Zhetpissov et al., 7 Mar 2025, Ma et al., 17 Jun 2024). For compliant and safe interaction, quasi-direct drive (QDD) mechanisms have been developed, combining passive mechanical compliance with high-bandwidth active force regulation (up to 100 Hz control) and low backdrive torque (Chen et al., 4 Oct 2024).
A typical RUSS hardware configuration includes:
- Robotic arm: 6–7 DoF, high kinematic precision, real-time joint sensing and control
- End-effector: Probe-specific gripper, with A-SEE (laser or depth) sensors or QDD actuators
- Ultrasound probe: Rigidly mounted, either wireless or cabled (e.g., Clarius, Siemens, Telemed)
- Force/torque sensors: Inline at probe or wrist for axial and lateral contact force estimation
- Vision sensors: RGB-D cameras or time-of-flight arrays for surface modeling and registration
- Computing infrastructure: Workstations (GPU for segmentation, CPU for control), ROS for integration
Some advanced systems integrate patient-wearable, lightweight, cable-driven robots for robust mounting against motion—enabling robust scanning in ambulatory or high-motion environments (Li et al., 9 Oct 2025).
2. Perception, Feature Extraction, and Guidance
Modern RUSS heavily employs data-driven perception for accurate probe navigation. Anatomical feature extraction is typically performed with CNN-based segmentation networks (U-Net, attention-gated U-Net, QuickNAT, etc.) trained on B-mode images to delineate key structures (pleural lines, rib shadows, thyroid lobes) (Ma et al., 17 Jun 2024, Zielke et al., 2021). Feature centroids are extracted via first-order moments, supporting both in-plane and out-of-plane guidance.
Template matching aligns live extracted features with a non–patient-specific template of the standardized imaging plane (SIP), solving a minimal assignment between live and template landmarks:
where are query centroids, are template centroids, and the optimal permutation is found via combinatorial search (Ma et al., 17 Jun 2024).
Probe surface-normal alignment uses multi-point distance sensing (A-SEE) or dense RGB-D–based local principal component analysis (PCA) for estimating normals, with angular velocity commands computed via PD control (Zhetpissov et al., 7 Mar 2025). In addition, force–mechanics models combined with Bayesian Optimization enable sample-efficient calibration-free probe normalization, attaining sub-3° mean error (Raina et al., 2023).
3. Control and Servoing Strategies
RUSS control strategies unify classic robotic kinematics with image- and force-guided visual servoing.
In-plane servoing is classically managed via image-based visual servoing (IBVS):
with the landmark error, the interaction matrix, and the probe twist. Multiple feature-derived velocity commands (e.g., pleural line, rib shadow) are summed for interpretable, anatomy-aware navigation (Ma et al., 17 Jun 2024).
Normal alignment and force control are handled via hierarchical fusion of commands: A-SEE for orientation, IBVS for planar correction, and a velocity-based force control loop for axial load. Joint velocities are computed via the robot Jacobian pseudo-inverse mapping.
Recent architectures exploit flow-matching and neural imitation learning for real-time, closed-loop 3D tracking, achieving >60 Hz control cycles and sub-7 mm mean tracking error on dynamic targets (Qian et al., 2 Nov 2025). Compliance is increasingly implemented through QDD actuators, offering bandwidth sufficient to maintain <1 N RMSE under simulated breathing or abrupt shift disturbances (Chen et al., 4 Oct 2024).
4. Learning from Demonstration and Human Integration
Robotic ultrasound scanning demands expert-level skill and contextual adaptation. High-performing systems leverage learning from demonstration (LfD), imitation learning, and reinforcement learning:
- Multi-modal policies: Deep networks fuse US images, pose/orientation, and force/torque to generate incremental control actions. Policies are typically trained from large datasets of expert demonstrations (> samples) and refined via post-optimization with human in-the-loop corrections (Deng et al., 2021).
- Coaching frameworks: Combining off-policy DRL (e.g., Soft Actor-Critic) with sparse expert “coaching” (kinesthetic corrections) models the pedagogy process as a POMDP, accelerating convergence (25% faster) and yielding 74.5% more high-quality frames in phantom studies (Raina et al., 3 Sep 2024).
- Imitation learning with trajectory-force coupling: Kernelized movement primitives (KMP) learn task-specific position-force couplings from expert trajectories, outperforming naive or constant-force baselines for complex Doppler or vessel-compression scans (Dall'Alba et al., 11 Jul 2024).
A key unifying trend is the explicit separation of distinct control axes (translation, orientation, force) informed by learned policies, with state-quality classifiers gating execution based on real-time image feedback.
5. Path Planning, Automation, and Workflow Integration
Full and partial automation in RUSS is achieved via modular planning pipelines:
- Long-range guidance: Cubic Bézier curve planners, fused with real-time RGB-D pose estimation, autonomously guide the probe to the anatomic target, handing off to local feature-based guidance for the final approach (Liu et al., 2023).
- Autonomous plane navigation: Anatomy-aware frameworks support SIP localization in the last centimeter with template-guided IBVS, reaching under 2 mm/2° accuracy consistently in phantoms and human subjects (Ma et al., 17 Jun 2024).
- Gel dispensing: Autonomous gel-applicator modules (UltraGelBot) integrate image-based presence detection and PI-controlled syringe actuation, improving US coupling and cutting scan time by 37.2% (Raina et al., 28 Jun 2024).
Integration of LLM-driven task planning and conversation (e.g., USPilot, IVS) allows user intent (from speech/text) to be parsed by LLMs, mapped to API call subgraphs by GNN planners, and sequenced into safe, efficient scan execution with end-to-end feedback (Chen et al., 18 Feb 2025, Song et al., 17 Jul 2025).
6. Evaluation and Clinical Performance
Validation protocols universally involve quantitative biomechanical and imaging metrics:
- Kinematic and pose precision: Errors are routinely sub-millimeter/sub-degree for both planar and curved anatomies (Zhetpissov et al., 7 Mar 2025, Li et al., 9 Oct 2025).
- Force tracking: Compliant end-effectors reduce RMSE by >80% compared to stiff arms alone, adapting to dynamic contact scenarios (Chen et al., 4 Oct 2024).
- Image similarity gains: Normalized cross-correlation improvements of >50% over time demonstrate effective convergence to diagnostic planes (Ma et al., 17 Jun 2024).
- Task completion and accuracy: End-to-end LLM+GNN task planners achieve >78% completion on real robotic pipelines and >97% API selection accuracy on generic benchmarks (Chen et al., 18 Feb 2025).
- In-vivo robustness: Systems achieve expert-level scan plane acquisition, high ICC for anatomical measures (R² >0.9), and maintain field-of-view during substantial patient or phantom motion (Li et al., 9 Oct 2025, Qian et al., 2 Nov 2025).
Limitations are noted in terms of computational latency (which is now being broken by synergistic perception-control codesign), clinical generalization (often requiring retraining for new anatomies or image domains), and the need for regulatory approval for broader adoption.
7. Future Directions and Open Challenges
Current research trajectories for RUSS include:
- Continuous multi-modal sensing: Joint fusion of surface, tactile, Doppler, and US data; AI-based confidence and interpretability (Zhetpissov et al., 7 Mar 2025, Jiang et al., 2023).
- Higher-level autonomy and dialogue: LLM-augmented workflow orchestration, knowledge-augmented dialogue for both patient and physician agency, and dynamic, context-aware re-planning (Xu et al., 18 Jun 2024, Song et al., 17 Jul 2025).
- Generalization and personalized adaptation: Training robust, anatomy- and disease-aware models that cross patient domains and handle variable tissue mechanics.
- Integration with novel imaging tasks: Automated 3D volume compounding, neural-field–based volume reconstruction and localization (AIA-UltraNeRF), and beyond-hospital/field diagnostics with compact, wearable robots (Zhang et al., 23 Nov 2025, Li et al., 9 Oct 2025).
- Safety and regulatory compliance: Embedding barrier policies and formal verification into black-box RL schemes; clinical trials across broader anatomical and demographic populations.
RUSS thus represents a convergence point for robotics, AI-driven perception and planning, and procedural automation, with strong evidence for approaching human-level competence in standardized scanning—provided continued advances in adaptable perception, robust compliance, and explainability (Jiang et al., 2023, Ma et al., 17 Jun 2024, Qian et al., 2 Nov 2025).