Safety Bot: Autonomous Hazard Monitoring

Updated 11 November 2025

Safety Bot is a hybrid system that employs distributed sensors and integrated computation to dynamically monitor, predict, and intervene in hazardous situations.
The system fuses data from RGB, depth, and infrared sensors with machine learning and classical planning techniques to ensure real-time safe navigation.
Validated with high accuracy in PPE compliance and accident detection, Safety Bots deliver industrial-grade safety and robust regulatory compliance.

A Safety Bot is a system or subsystem—physical, virtual, or hybrid—that dynamically monitors, predicts, and enforces safety constraints in interactive environments involving robots or autonomous systems. Safety Bots span a spectrum of domains: laboratory automation, human-robot collaboration, language-based agents, conversational AI, and industrial manufacturing. Their primary function is to detect hazardous situations in real time and intervene, either by modulating autonomous decisions, interrupting dangerous behavior, or alerting stakeholders, to maintain operation within provably safe bounds and compliance with standards or formal specifications.

1. Architectures and Sensing Modalities

Safety Bots employ a distributed network of multimodal sensors and embedded computation for comprehensive hazard perception, context representation, and actuation. The “Chemist Eye” system for self-driving laboratories exemplifies this design, integrating:

Distributed RGB–Depth Sensing: Stations combine Intel RealSense D435i RGB-D cameras (RGB: 1920×1080 @ 30 Hz, Depth: 1280×720 @ 30 Hz, FOV: 86°×57°), mounted on NVIDIA Jetson Orin Nano, supported by speakers (Amazon Echo Dot) for on-site feedback.
Infrared (IR) Sensing: Separate stations employ Raspberry Pi 5 with long-wave IR cameras (20 °C–400 °C, 640×480 @ 9 Hz, FOV: 45°×35°), providing thermal context for fire and heat incident detection.
ROS-Based Backbone: All sensor streams are synchronized and time-stamped under ROS topics (e.g., /camera/rgb, /camera/depth, /camera/ir). Stations are physically arrayed for overlapping 2–4 meter coverage, with rigid transform-based extrinsic calibration to guarantee 3D localization consistency.
Integration with Actuators: Downstream actuation is realized on KUKA KMR iiwa robots (or any platform supporting ROS move_base API and standardized control interfaces).
Third-Party Notification: Interfacing with systems such as Slack via rosbridge allows for low-latency alert propagation to human operators.

Calibration uses standard checkerboard or AprilTag procedures to estimate $\mathbf{R}\in SO(3),\mathbf{t}\in \mathbb{R}^3$ for cross-modality alignment.

2. Hazard Detection and Scene Understanding

Safety Bots implement multi-stage hazard detection pipelines leveraging the latest in computer vision and machine learning:

Preprocessing and Data Alignment:

Incoming RGB images are resized (640×480) and normalized; depth and IR data are aligned via intrinsic/extrinsic calibration.

Object and Event Detection:

YOLOv8 networks detect persons, PPE, and thermal hotspots using a multi-task loss combining classification, bounding-box regression, and objectness terms.
Mask R-CNN refines bounding boxes to pixel-level masks for more accurate localization.

Multi-Sensor Fusion:

Spatial risk is modeled by fusing depth $D_d(x,y)$ and IR temperature $T_{ir}(x,y)$ into a composite risk map $R(x,y)$ :

$R(x,y) = \alpha\frac{D_d(x,y)}{D_{max}} + \beta\frac{T_{ir}(x,y)-T_{ambient}}{T_{max}-T_{ambient}}$
This risk map is used to spatially weight path planning and avoidance behaviors.

Vision–LLM (VLM) Integration:

Systems query large-scale VLMs (LLaVA-7B, LLaVA-Phi3) on composite images (RGB-Depth overlays, with map context) to determine PPE compliance, incident presence, or medical emergencies.
Prompt engineering replaces domain-specific fine-tuning. Each VLM output is mapped via softmax to per-class confidence, $p_i = \exp(z_i)/\sum_j\exp(z_j)$ ; events are flagged if $\max_i p_i \ge T_{conf}$ (e.g., $T_{conf}=0.7$ ).
Calibration against annotated data attains state-of-the-art accuracy—97.5% on PPE, 97.0% on accident detection, with $<5\%$ hallucination rates.

3. Safe Decision-Making and Robot Control

The intervention logic in Safety Bots incorporates both classical and learning-based planning:

Planning Algorithms:

A* search is performed on a dynamic occupancy grid, using a cost function $g(n) = g_{\mathrm{euclid}}(n) + w_{\mathrm{risk}} R(n)$ , where $R(n)$ is the local risk estimate.
Potential Fields: Goals exert attractive potentials; hazards yield repulsive terms, with control $u=-\nabla U_{\mathrm{tot}}(x)$ , $U_{\mathrm{tot}}=U_{\mathrm{att}}+U_{\mathrm{rep}}$ .

Constraint Violations and Emergency Intervention:

If the risk map exceeds $R_{crit}$ $R_{cr i t}$ or VLM confidence $<T_{safe}$ $< T_{s a f e}$ , the robot issues an immediate stop via ROS:
- robot.stop_movement() disables motion until hazards are cleared.
- If a robot is near a detected hazard or an individual not in PPE, active plans are canceled and replanned to a “safe node”.

Pseudocode for Intervention:

if hazard_detected and p_conf >= T_conf:
    foreach robot:
        if dist(robot, hazard) < d_safe:
            cancel_current_goal()
            new_goal = PlanSafeNode(robot, hazard_map)
            send_goal(robot, new_goal)
        else if R_max >= R_crit:
            robot.stop()

Performance: The end-to-end response time (sensor event to robot action) is empirically bounded at $<250$ ms ( $\approx$ 20 ms network latency, $\approx$ 200 ms processing), ensuring real-time hazard avoidance.

4. Communication, Latency, and System Integration

Safety Bots depend on robust, low-latency messaging architectures for distributed decision-making:

ROS 2.0 Middleware: All sensors and actuators publish/subscribe to topics for RGB, depth, IR, hazard flags, and robot goals.
Custom Services: VLM queries are dispatched as RPC services, with hazard labels collated system-wide.
Actuation & Alerts: Robot navigation commands use /move_base/goal, while /emergency_stop (Bool) messages can override any motion. Alerts to human operators are relayed via HTTP API calls to Slack channels or equivalent.
Empirical System Metrics:
- Sensor–ROS ingress: $\tau_{\text{sensor}\to\text{ros}} \approx 20$ ms.
- Processing time: $T_{\text{proc}} \approx 200$ ms.
- ROS–robot actuation: $\tau_{\text{ros}\to\text{robot}}$ negligible in local networks.

5. Experimental Validation and Quantitative Results

Deployed in a fully instrumented SDL with three KUKA KMR iiwa robots, the Safety Bot was evaluated under multiple real-world hazard scenarios:

Metric	Value
PPE compliance accuracy	97.5% (LLaVA-Phi3)
Accident detection accuracy	97.0%
Decision-making success	95%
Hallucination rate (VLM)	<5%
Precision (PPE)	0.98
Recall (PPE)	0.96
F₁ (PPE)	0.97
Precision (accident)	0.95
Recall (accident)	0.99
F₁ (accident)	0.97

Scenario-based navigation tasks (accident or fire) achieved success rates of 90–100% in filtered contexts.

These results demonstrate that integrating distributed sensing, robust per-frame hazard detection with VLMs, and reactive control yields industrial-grade safety and rapid recovery from multiple classes of laboratory hazard.

6. Limitations, Robustness, and Standards Compliance

Identified limitations of current Safety Bot architectures include:

Coverage Gaps: Physical occlusions or sensor blind spots may miss hazards; overlapping fields of view and regular calibration are essential.
VLM Hallucinations: While hallucination rates are $<5\%$ , rare instances of missed detections or phantom alarms remain.
Threshold Sensitivity: Variation in $T_{conf}$ or $R_{\text{crit}}$ can trade off sensitivity versus false positives; empirical tuning is required for each deployment.
Integration Overhead: Processing time is dominated by VLM inference. No GPU-induced latency bottleneck was observed on Jetson Orin Nano or similar edge accelerators at 30 Hz.
Standards Compliance: While the system matches or exceeds typical stop/reaction times required for lab-scale safety (i.e., maximum $T_{resp} < 250$ ms), integration with broader regulatory requirements (e.g., ISO 10218 or ISO/TS 15066 for laboratory robots) should be context-validated.

Safety Bots as realized in Chemist Eye build on and integrate methodological components found across industrial safety and robotics:

Control Barrier Functions (CBFs): For formally verified force or contact constraints in soft robotics, as in (Dickson et al., 20 Apr 2025).
Semantic Constraint Certification: For enforcing human-expected semantic safety (e.g., object relations), using LLM-assisted analysis plus CBF-based filtering (Brunke et al., 19 Oct 2024).
Formal Verification and Runtime Monitoring: For chatbots in safety domains and for collaborative robot controllers (Gatti et al., 21 Nov 2024, Gleirscher et al., 2020).
Learning-Based Approaches: For adapting hazard detection and response in environments where explicit rules or models are insufficient, via reinforcement learning and continual validation.
Multi-Agent and Human-in-the-Loop Schemes: Combining autonomous hazard avoidance with distributed human notification, rapid alerting via industry-standard communication APIs, and real-time logging for traceability.

Safety Bots represent a compositional paradigm, combining distributed perception, machine learning, classical planning, and formal safety contracts for robust, adaptive safety in complex, real-world environments.