Synthesizing Precise Protocol Specs from Natural Language for Effective Test Generation (2511.17977v1)

Published 22 Nov 2025 in cs.SE, cs.LG, and cs.NI

Abstract: Safety- and security-critical systems have to be thoroughly tested against their specifications. The state of practice is to have natural language specifications, from which test cases are derived manually - a process that is slow, error-prone, and difficult to scale. Formal specifications, on the other hand, are well-suited for automated test generation, but are tedious to write and maintain. In this work, we propose a two-stage pipeline that uses LLMs to bridge the gap: First, we extract protocol elements from natural-language specifications; second, leveraging a protocol implementation, we synthesize and refine a formal protocol specification from these elements, which we can then use to massively test further implementations. We see this two-stage approach to be superior to end-to-end LLM-based test generation, as 1. it produces an inspectable specification that preserves traceability to the original text; 2. the generation of actual test cases no longer requires an LLM; 3. the resulting formal specs are human-readable, and can be reviewed, version-controlled, and incrementally refined; and 4. over time, we can build a corpus of natural-language-to-formal-specification mappings that can be used to further train and refine LLMs for more automatic translations. Our prototype, AUTOSPEC, successfully demonstrated the feasibility of our approach on five widely used internet protocols (SMTP, POP3, IMAP, FTP, and ManageSieve) by applying its methods on their RFC specifications written in natural-language, and the recent I/O grammar formalism for protocol specification and fuzzing. In its evaluation, AUTOSPEC recovers on average 92.8% of client and 80.2% of server message types, and achieves 81.5% message acceptance across diverse, real-world systems.