Papers

Topics

Authors

Recent

View all

Detailed Answer

Quick Answer

Concise responses based on abstracts only

Detailed Answer

Well-researched responses based on abstracts and relevant paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses

Gemini 2.5 Flash

Gemini 2.5 Flash 49 tok/s

Gemini 2.5 Pro 53 tok/s Pro

GPT-5 Medium 19 tok/s Pro

GPT-5 High 16 tok/s Pro

GPT-4o 103 tok/s Pro

Kimi K2 172 tok/s Pro

GPT OSS 120B 472 tok/s Pro

Claude Sonnet 4 39 tok/s Pro

2000 character limit reached

Enhancing Accuracy and Maintainability in Nuclear Plant Data Retrieval: A Function-Calling LLM Approach Over NL-to-SQL (2506.08757v1)

Published 10 Jun 2025 in cs.CL and cs.LG

Abstract: Retrieving operational data from nuclear power plants requires exceptional accuracy and transparency due to the criticality of the decisions it supports. Traditionally, natural language to SQL (NL-to-SQL) approaches have been explored for querying such data. While NL-to-SQL promises ease of use, it poses significant risks: end-users cannot easily validate generated SQL queries, and legacy nuclear plant databases -- often complex and poorly structured -- complicate query generation due to decades of incremental modifications. These challenges increase the likelihood of inaccuracies and reduce trust in the approach. In this work, we propose an alternative paradigm: leveraging function-calling LLMs to address these challenges. Instead of directly generating SQL queries, we define a set of pre-approved, purpose-specific functions representing common use cases. Queries are processed by invoking these functions, which encapsulate validated SQL logic. This hybrid approach mitigates the risks associated with direct NL-to-SQL translations by ensuring that SQL queries are reviewed and optimized by experts before deployment. While this strategy introduces the upfront cost of developing and maintaining the function library, we demonstrate how NL-to-SQL tools can assist in the initial generation of function code, allowing experts to focus on validation rather than creation. Our study includes a performance comparison between direct NL-to-SQL generation and the proposed function-based approach, highlighting improvements in accuracy and maintainability. This work underscores the importance of balancing user accessibility with operational safety and provides a novel, actionable framework for robust data retrieval in critical systems.

Summary

The paper demonstrates a hybrid function-calling LLM method that outperforms traditional NL-to-SQL in nuclear plant data retrieval.
It uses specialized sub agents and validated SQL query functions to manage complex legacy databases with enhanced robustness.
Evaluation results show significantly higher human-rated correctness and maintainability compared to non-function-calling approaches.

Function-Calling LLMs for Nuclear Plant Data Retrieval

This paper introduces a function-calling LLM approach to improve the accuracy and maintainability of operational data retrieval in nuclear power plants. This method addresses the risks associated with traditional NL-to-SQL methods when dealing with complex legacy databases. The authors propose a hybrid approach that leverages pre-approved, purpose-specific functions to encapsulate validated SQL queries, combined with expert review and automated NL-to-SQL tools.

Challenges in Nuclear Plant Data Retrieval

The authors identified several technical challenges in building an LLM-based data retrieval system for nuclear plants. These challenges include managing the agent's autonomous behavior, particularly in determining when to perform data retrieval, and handling follow-up questions that do not require further retrieval. Technical jargon and acronyms, which are common in nuclear plant operations but not in natural language, pose another challenge. Ensuring structured outputs in a strict JSON schema is also critical to avoid crashes and unpredictable behavior. The system employs Pydantic models and OpenAI's constrained decoding to enforce data types and validate parameters. The system also uses retries to handle incorrect agent calls, and the authors acknowledge the need to address chained queries that require multiple tool calls.

Proposed Methodology

The proposed system architecture involves a multi-agent, function-calling approach. A central "main agent" routes user queries to specialized "sub agents" responsible for specific domains, such as work orders. These sub agents then execute pre-defined SQL queries through function calls. The system maintains a history of interactions and employs retry logic to handle errors. The authors also implemented a non-function calling approach using a blended RAG system with a custom NL-to-SQL query agent for comparison. For model selection, the authors used OpenAI's GPT-4o, citing its strong performance in function calling and general benchmarks.

Evaluation Metrics and Results

The authors evaluated the performance of the function-calling methodology against a non-function calling approach using several metrics. These metrics included answer relevance (query-only), relevance (query plus context), faithfulness (context-only), and answer correctness (based on ground truth). LLM-computed metrics showed similar performance between the two methods, with the function-calling approach slightly outperforming the non-function calling approach. However, human-evaluated correctness scores indicated that the function-calling method significantly outperformed the non-function calling method, due to its partially deterministic nature.

Conclusion and Future Directions

The authors conclude that the function-calling methodology offers clear improvements in accuracy and maintainability compared to traditional NL-to-SQL approaches. They propose several avenues for future work, including fine-tuning LLMs for domain jargon, optimizing function selection, exploring a pure function-calling approach, enhancing multi-agent reasoning for chained queries, implementing dynamic function filtering, refining logging and error handling, and investigating model scalability. These enhancements aim to further improve the robustness and efficiency of data retrieval in critical nuclear plant environments.