OpenRC: An Open-Source Robotic Colonoscopy Framework for Multimodal Data Acquisition and Autonomy Research

Published 4 Apr 2026 in cs.RO | (2604.03781v1)

Abstract: Colorectal cancer screening critically depends on colonoscopy, yet existing platforms offer limited support for systematically studying the coupled dynamics of operator control, instrument motion, and visual feedback. This gap restricts reproducible closed-loop research in robotic colonoscopy, medical imaging, and emerging vision-language-action (VLA) learning paradigms. To address this challenge, we present OpenRC, an open-source modular robotic colonoscopy framework that retrofits conventional scopes while preserving clinical workflow. The framework supports simultaneous recording of video, operator commands, actuation state, and distal tip pose. We experimentally validated motion consistency and quantified cross-modal latency across sensing streams. Using this platform, we collected a multimodal dataset comprising 1,894 teleoperated episodes ~19 hours across 10 structured task variations of routine navigation, failure events, and recovery behaviors. By unifying open hardware and an aligned multimodal dataset, OpenRC provides a reproducible foundation for research in multimodal robotic colonoscopy and surgical autonomy.

Abstract PDF Upgrade to Chat

Authors (7)

Summary

The paper introduces a modular, retrofittable OpenRC system that synchronizes video, kinematics, operator intent, and EM-tracked tip pose in robotic colonoscopy.
The paper details a ROS 2-based control architecture with precise actuation and multimodal logging, achieving median latency offsets as low as 55.6 ms.
The paper provides an extensive dataset of 1,894 teleoperated episodes, facilitating research in vision-based tracking, policy learning, and failure recovery.

OpenRC: An Open-Source Framework for Multimodal Robotic Colonoscopy and Autonomy Research

Motivation and Significance

Robotic systems for colonoscopy have evolved substantially, yet existing research and clinical platforms inadequately support joint study of operator-instrument dynamics and synchronized multimodal data acquisition. Traditional frameworks often limit access to either video-centric datasets or isolated hardware innovations, thereby restricting reproducibility, multimodal learning, and closed-loop autonomy studies in endoluminal robotics. Addressing this gap, "OpenRC: An Open-Source Robotic Colonoscopy Framework for Multimodal Data Acquisition and Autonomy Research" (2604.03781) proposes a modular open-source approach, enabling comprehensive recording of video, kinematics, operator intent, and EM-tracked distal tip pose while retrofitting conventional colonoscopes.

Framework Architecture

OpenRC is a low-cost ( $<$ \$5,000 USD, excluding the EM tracker) retrofittable modular robotic actuation system targeting the three clinically salient DoFs: longitudinal insertion, lateral, and vertical bending. The framework prioritizes reproducibility and extensibility, supporting rapid assembly and compatibility with commercial flexible endoscopes.

Figure 1: Overview of the OpenRC architecture integrating actuation, sensing, and multimodal logging via ROS~2 and EM tracking.

The actuation subsystem comprises:

A bending module employing 3D-printed, multi-jaw collets and DYNAMIXEL servos, providing direct drive to the native handle dials with high-resolution proprioception.
A feeding module, independent and friction-based, enabling controlled insertion/retraction, using roller-driven mechanisms with adjustable preload for safe grip across devices.
Figure 2: Main OpenRC hardware subsystems, ROS~2 data flow, and experimental testbed implementation.

The software stack leverages ROS~2 on embedded NVIDIA hardware, ensuring temporally consistent communication between operator inputs (abstracted via normalized control vectors from common devices), actuator states, and external sensors. The system supports both manual teleoperation and straightforward interfacing with autonomous policy engines, allowing for closed-loop ML pipeline integration.

Multimodal Data Acquisition and Synchronization

A central feature of OpenRC is unified, timestamped rosbag-based logging across four synchronized modalities: colonoscope video, operator action vectors, actuation state, and 6-DoF EM-tracked tip pose. The architecture supports extensible incorporation of additional data streams as needed for future autonomy paradigms.

Extensive characterization was conducted to validate cross-modal latency, exploiting controlled periodic excitation to benchmark lag and phase coherence across sensing pipelines.

Figure 3: (a) Hardware actuation and response characterization; (b,c) histograms of post-alignment residual lag for major modality pairs.

The results demonstrate:

Encoder state lags operator command by 102 ms (median)
EM-tracked pose and optical-flow-derived visual motion lag operator command by 435 ms and 412 ms, respectively
Post-calibration residual alignment is robust, with the operator action-to-state median offset at 55.6 ms, and state-to-EM pose at 0.0 ms, supporting high-fidelity downstream data fusion and policy learning.

Dataset Collection and Properties

A key contribution is a structured, open dataset of 1,894 teleoperated episodes (approximately 19 hours) acquired on multi-material colon phantoms with embedded lesions. Each episode logs synchronized multimodal trajectories under teleoperated control, including challenging conditions featuring navigation failures and recovery—from lumen loss to tissue contact and fold engagement.

Figure 4: Example episode visualization, depicting synchronized trajectories for video, operator actions, EM pose, and robot state.

The dataset exhibits diversity across task types (e.g., insertion along defined walls, centerline tracking, retraction scenarios), task complexity, and operator strategies. Tasks are annotated with natural language instructions and segmented according to maneuver type, supporting hierarchical and language-conditioned policy research.

Distributional analysis reveals significant behavioral variability, with navigation tasks featuring longer trajectory lengths and durations compared to failure/recovery episodes.

Figure 5: Distributions of episode duration, trajectory length, and task composition across the OpenRC dataset.

All streams are temporally aligned to 30 Hz, facilitating immediate applicability for VLA learning setups and multimodal autonomy research.

Validation and Performance

System-level experiments confirmed robust actuation, tight synchronization, and latency bounds suitable for closed-loop policy learning. Validation on heterogeneous phantom anatomies established the generalizability of the platform and dataset beyond a single geometry or appearance domain. The combination of direct kinematics, video, pose, and operator command histories enables comprehensive benchmarking of vision-based tracking, policy learning, and SLAM formulations under realistic, deformable conditions.

Limitations and Future Directions

The present dataset is ex vivo and does not capture in vivo anatomical or physiological variability, nor does it simulate patient-specific compliance and peristaltic effects. Integration of more sophisticated phantom materials and transition to animal/in vivo scenarios is a direct avenue for future work. The open-source design with modular interface allows rapid adaptation and augmentation, including higher DoF actuation, incorporation of tactile sensors, and real-time feedback channels.

From an autonomy research perspective, the release supports rapid benchmarking of not only reinforcement and imitation learning policies but also language-guided, control-aware perception models. It also enables systematic study of failure modes and data-efficient recovery policy design, which are underrepresented in existing colonoscopy datasets that focus nearly exclusively on visual perception.

Conclusion

OpenRC provides the field with a rigorously validated, open-source hardware and software platform for robotic colonoscopy experimentation, as well as the first large-scale, tightly time-aligned multimodal dataset capturing the closed-loop dynamics of teleoperated endoluminal navigation. The platform overcomes key experimental barriers impeding reproducible autonomy research and VLA learning in surgical robotics, and positions the community for development, evaluation, and benchmarking of next-generation autonomous colonoscopic navigation and decision-making systems (2604.03781).

Markdown Report Issue