Papers
Topics
Authors
Recent
Search
2000 character limit reached

Large Language Model-Driven Concolic Execution for Highly Structured Test Input Generation

Published 24 Apr 2025 in cs.SE | (2504.17542v1)

Abstract: How can we perform concolic execution to generate highly structured test inputs for systematically testing parsing programs? Existing concolic execution engines are significantly restricted by (1) input structure-agnostic path constraint selection, leading to the waste of testing effort or missing coverage; (2) limited constraint-solving capability, yielding many syntactically invalid test inputs; (3) reliance on manual acquisition of highly structured seed inputs, resulting in non-continuous testing. This paper proposes Cottontail, a new LLM-driven concolic execution engine, to mitigate the above limitations. A more complete program path representation, named Expressive Structural Coverage Tree (ESCT), is first constructed to select structure-aware path constraints. Later, an LLM-driven constraint solver based on a Solve-Complete paradigm is designed to solve the path constraints smartly to get test inputs that are not only satisfiable to the constraints but also valid to the input syntax. Finally, a history-guided seed acquisition is employed to obtain new highly structured test inputs either before testing starts or after testing is saturated. We implemented Cottontail on top of SymCC and evaluated eight extensively tested open-source libraries across four different formats (XML, SQL, JavaScript, and JSON). The experimental result is promising: it shows that Cottontail outperforms state-of-the-art approaches (SymCC and Marco) by 14.15% and 14.31% in terms of line coverage. Besides, Cottontail found 6 previously unknown vulnerabilities (six new CVEs have been assigned). We have reported these issues to developers, and 4 out of them have been fixed so far.

Summary

Overview of Large Language Model-Driven Concolic Execution for Highly Structured Test Input Generation

This paper presents a pivotal advancement in concolic execution through the introduction of Cottontail, an innovative system designed to generate highly structured test inputs for parsing programs. Traditional concolic executors face multiple limitations due to their inability to generate syntactically valid inputs consistently. Cottontail leverages Large Language Models (LLMs) to overcome these limitations, providing a structure-aware and intelligent approach to test input generation.

Key Components and Contributions

The study discusses several critical innovations in Cottontail:

  1. Expressive Structural Coverage Tree (ESCT): This new representation is a cornerstone of Cottontail, addressing the first limitation by providing a structure-aware path constraint selection mechanism. It significantly reduces redundant path exploration and maintains comprehensive coverage information, ensuring that only meaningful constraints are solved.

  2. LLM-driven Constraint Solver: A novel "Solve-Complete" paradigm is employed where LLMs solve path constraints for both satisfiability and syntax validity. This approach mitigates the issues of traditional solvers which often generate syntactically invalid inputs.

  3. History-Guided Seed Acquisition: Integrating historical coverage data, Cottontail dynamically acquires seed inputs which either enhance the initial seed pool or are newly generated when testing reaches saturation. This ensures continuous and effective exploration of code paths.

Empirical Validation

The implementation of Cottontail is evaluated against state-of-the-art concolic execution engines, SymCC and Marco, across eight open-source libraries handling different formats including XML, SQL, JavaScript, and JSON. The results demonstrate that Cottontail achieves significantly better code coverage, with improvements exceeding 14% in line coverage and 11% in branch coverage compared to these established engines.

Furthermore, Cottontail's ability to detect previously unknown vulnerabilities highlights its practical impact on software security. Six new vulnerabilities were reported, with four receiving immediate fixes, which emphasizes the effectiveness of its structured testing approach.

Implications for AI Development

The integration of LLMs in concolic execution represents a significant shift towards the use of AI in systematic software testing. By harnessing LLMs' semantic capabilities, Cottontail sets the stage for further exploration into combining AI with traditional program analysis techniques. The potential applications of this approach extend beyond security testing, suggesting pathways for automated software validation in complex systems.

Future research could focus on optimizing the interaction between LLMs and concolic execution engines, exploring open-source alternatives for broader accessibility, and extending this approach to binary execution systems. The study presents a robust framework conducive to further innovations in software testing, particularly in environments demanding stringent validity checks and comprehensive code path explorations.

In conclusion, Cottontail exemplifies the power of combining AI with program analysis to advance the field of software testing significantly, marking a substantial development in achieving syntactically valid and security-focused test input generation.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.