- The paper introduces CAIRN, which integrates Bayesian context modeling and LLM reasoning to enhance autonomous decision-making in drone swarm SAR operations.
- It uses a multi-stage LLM pipeline to evaluate clues and update search strategies dynamically through discrete probability updates.
- Cognitive guardrails, including cost-benefit analysis and ethical oversight, ensure safe and reliable mission execution under uncertainty.
This paper presents the CAIRN (Context-Aware Inference for Reasoning and plaNning) framework, designed to enhance autonomous decision-making for small Uncrewed Aerial System (sUAS) swarms operating in open-world environments, specifically focusing on Search and Rescue (SAR) missions.
The core problem addressed is the limitation of traditional computer vision (CV) systems in identifying and interpreting novel objects in dynamic, unstructured settings. While LLMs can provide valuable reasoning capabilities for open-world scene interpretation and mission adaptation, they are prone to errors like hallucinations, which can lead to unsafe or incorrect decisions. CAIRN integrates LLMs with cognitive guardrails to ensure that autonomous decisions are safe, justifiable, and aligned with mission goals under uncertainty.
The CAIRN framework consists of several key components:
- Bayesian Model for Contextual Reasoning: A discrete Bayesian model is used to represent the mission context, including environmental conditions, lost person profiles, and potential search strategies (e.g., Trail Search, Waterways Search, Region Search). It maintains a probability distribution over candidate strategies, updated dynamically as new information (evidence) emerges. The initial probabilities are based on the known mission context (like terrain, weather, lost person profile). Runtime updates, triggered by detected clues or changes in conditions, use a specific equation (Equation 2) to adjust the belief in relevant strategies, either positively (increasing belief for supporting evidence) or negatively (decreasing belief, e.g., when a strategy fails to yield results after sufficient coverage). The strength of positive evidence updates is calculated using a weighted combination of clue relevance, CV classification confidence, and tactical interpretation confidence (Equation 3).
- LLM-Based Reasoning Pipeline: When an sUAS detects a potential clue using its onboard CV, an off-line LLM pipeline is triggered. This pipeline, orchestrated using LangChain and employing Retrieval-Augmented Generation (RAG) from a curated knowledge base of SAR best practices, processes the clue through multiple stages:
- Stage 1-2 (CV and Visual LLM): Detect and classify the object, providing initial confidence scores.
- Stage 3 (Clue Relevance): Assess how relevant the clue is to finding the lost person based on the person's description and the clue's characteristics.
- Stage 4 (Tactical Inference): Infer the immediate implications of the clue for the search area and tasks.
- Stage 5 (Strategic Plan): Identify the most relevant overarching search strategy and a prioritized list of tasks.
- Stage 6 (Ethical, Safety, Regulatory Concerns): Use advocate personas (independent LLM agents grounded in specific standards like MIL-STD-882E, FAA Part 107, GDPR, etc.) to evaluate the proposed plan for potential issues.
- Stage 7 (Human Engagement): Determine if human oversight is required for the proposed actions based on various factors, including guardrail checks.
- Cognitive Guardrails: These safeguards are integrated into the decision-making process and runtime execution:
- Decision-Time Guardrails: Applied before actions are executed.
- Belief Entropy: Controls autonomous decision-making based on the certainty of the Bayesian model's strategy distribution. Low entropy (clear dominant strategy) allows more autonomy within that strategy, while high entropy (uncertainty) or switching to a less dominant strategy may require human notification or approval.
- Cost-Benefit Analysis: Evaluates the expected value of a proposed action against its mission cost (e.g., time, resources). Autonomous execution is permitted only if the benefit outweighs the cost above a certain threshold. Tasks exceeding the threshold are deferred for human or Global Planner review.
- Ethical and Regulatory Oversight: Advocate personas (Stage 6) flag potential conflicts with safety, ethical, and regulatory standards, escalating to humans if conflicts cannot be resolved autonomously.
- Runtime Guardrail:
- Safety Envelope: A hard physical constraint layer that enforces non-negotiable boundaries during execution (e.g., minimum altitude, geofencing, battery limits), independent of the mission-level reasoning.
- Dual-Phase Human Oversight: Humans are kept "on the loop" during both decision-making (consulted on uncertain or high-risk actions) and runtime (monitoring situational awareness, task status, and system intent, with the ability to intervene).
The framework was initially developed and validated in a custom Python-based simulation environment using GPT-4o for the LLM components and pgmpy for the Bayesian model. The simulation allowed rapid iteration and demonstrated expected behaviors, such as strategy shifts based on detected clues in various SAR scenarios. The authors are integrating CAIRN into their existing DroneResponse platform, which uses the Edge Matrix Cube (mCube) for onboard processing and a Ground Control Station (GCS) for centralized services. The Local Planner will run on the mCube, while the Global Planner, Bayesian model, and LLM pipeline will be hosted on the GCS.
Future work includes refining the Bayesian model probabilities and LLM prompts based on expert feedback and field experiments, broader evaluation against baselines, and exploring applications in other domains requiring real-time reasoning and adaptation, such as inspections and environmental monitoring.