Papers
Topics
Authors
Recent
2000 character limit reached

RESTifAI: LLM-Powered API Testing

Updated 14 December 2025
  • RESTifAI is an LLM-based framework for automated, reusable, CI/CD-ready REST API testing that synthesizes both happy-path and negative test scenarios from an OpenAPI Specification.
  • It leverages a modular architecture—including an OAS parser, happy-path and negative test generators—to systematically map API call sequences and execute tests with key–value execution traces.
  • Empirical evaluations demonstrate that RESTifAI achieves competitive coverage and error detection while seamlessly integrating into industrial CI/CD pipelines.

RESTifAI is a LLM-based framework for automated, reusable, and CI/CD-ready REST API testing focused on generating comprehensive “happy-path” and negative test scenarios from an OpenAPI Specification (OAS). It leverages decomposed LLM prompting to systematically map valid call sequences, synthesize both positive and negative oracles, and emit executable suites in a repeatable, industrial context. RESTifAI addresses limitations in existing LLM-powered API testing tools with respect to test reusability, oracle design, and pipeline integration, while achieving competitive or superior empirical results against reference tools such as AutoRestTest and LogiAgent (Kogler et al., 9 Dec 2025).

1. Architectural Organization and Pipeline

RESTifAI’s architecture comprises four principal modules, orchestrated to transform an OAS document into reusable, executable API test suites:

  1. OAS Parser: Ingests the OAS (v3.x), extracting endpoints, HTTP methods, parameter schemas (body, query, header, path, cookie), and defined response codes for the Service Under Test (SUT).
  2. Happy Path Generator (LLM Prompt Engine, Value Generator, Request Sender): Uses an LLM (e.g., Azure OpenAI GPT-4.1-mini) to determine dependency orderings and construct a valid invocation sequence for each endpoint. A usage guide for parameter computation is generated. Parameter instantiation occurs via an LLM-driven value synthesis module, with iterative error-driven refinement using feedback from any 4xx responses. Calls are executed, and successful responses (2xx) are parsed into an execution trace—a key–value store aggregating reusable outputs for downstream operations.
  3. Negative Value Generator (Oracle Module): Applies LLM-based prompting to generate negative test cases. For each parameter in the happy-path scenario, structural oracles (violating OAS constraints) and functional oracles (business rule violations not captured in OAS, e.g., negative payment amounts) are synthesized. Expected behavior for these cases is a 4xx HTTP response.
  4. Test Case Builder (CI/CD integration): Assembles all cases (happy and negative paths) into parameterized Postman collections, supporting both CLI (restifai generate ...) and lightweight Python GUI modalities. Initialization scripts (e.g., SQL resets, mocks) are supported for test isolation. Parameterization uses the execution trace to maximize test reusability across runs and environments.

The following pseudocode summarizes the core test generation procedures:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
HappyPathGenerator(OAS, targetOp):
    trace ← ∅
    sequence, guide ← LLM₁.prompt(“Given OAS, list dependent ops and usage guide for targetOp”)
    for op in sequence do
        repeat up to K times:
            inputs ← LLM₂.prompt(op, OAS.schema(op), guide, trace)
            resp ← sendRequest(op, inputs)
            if resp.code ∈ 200…299 then
                newPairs ← parseJSON(resp.payload)
                trace ← trace ∪ newPairs
                break
            else
                guide ← guide ∪ “Error message: ” + resp.body
        end
    end
    return (sequence, trace)

1
2
3
4
5
6
7
8
9
10
NegativePathGenerator(sequence, trace, targetOp):
    tests ← []
    for each parameter p used by targetOp do
        invalids ← LLM₃.prompt(“Generate structural and functional invalid values for p given trace”)
        for iv in invalids do
            negTrace ← trace; negTrace[p] ← iv
            tests.append((sequence, negTrace, expectedCode ∈ 400…499))
        end
    end
    return tests

2. LLM Prompt Engineering and Error Feedback

Prompt engineering in RESTifAI is intentionally modularized to minimize LLM “hallucinations” and maximize prompt interpretability and controllability:

  • Happy-path prompt: Requests dependency analysis (“Which prior operations … so that operation X’s parameters are satisfiable?”) and a natural-language usage guide for parameter values.
  • Parameter value synthesis prompt: Given the operation schema, guide, and accumulated trace, requests concrete instantiations for all parameters.
  • Error feedback loop: If parameter synthesis triggers a 4xx server response, the error message is appended to subsequent prompts, refining context and discouraging repeated mistakes.
  • Negative test prompt: Separate LLM queries generate (i) structurally invalid values based on the parameter type, format, requiredness, and constraints; and (ii) functionally invalid values based on domain semantics (e.g., business logic such as “dates must be from ≤ to”).

This decomposed design increases reliability by reducing prompt scope and enabling targeted error correction.

3. Reusability and Oracle Simplification

Three architectural features underpin RESTifAI’s reusability:

  • Key–value execution trace: Output values from each API call are stored and propagated, decoupling test logic from fixed concrete values and facilitating reuse across test runs.
  • Framework-agnostic output: Tests are emitted as Postman collections by default, but the architecture allows output targeting different frameworks by adapting only the Test Case Builder.
  • Isolated initialization and environmental parameterization: User-provided initialization scripts and environment variable management ensure test suites can be re-run deterministically, essential for CI.

Oracle complexity is minimized by asserting only on HTTP status codes—2xx for happy-path and 4xx for negative-path tests. This “two-tier oracle” approach reduces brittleness and overhead compared to payload-level assertions, with potential extensions to deeper payload validation indicated as future work.

Key coverage and effectiveness metrics used include:

  • Operation Coverage (OC): OC={ops for which a 2xx happy-path was generated}{all ops in OAS}OC = \frac{|\{ops \text{ for which a 2xx happy-path was generated}\}|}{|\{all~ops~in~OAS\}|}
  • Branch Coverage (BC), Line Coverage (LC), Method Coverage (MC): Measured via JaCoCo for instrumented local services.
  • #DetectedServerErrors: Number of 5xx or unexpected 2xx returned from negative tests.

4. Empirical Results and Comparative Analysis

RESTifAI was evaluated against AutoRestTest and LogiAgent on five representative APIs (three local, two remote), all using GPT-4.1-mini as the LLM backend. Core quantitative results are summarized below.

Operation Coverage and Server Errors

Service AutoRestTest (OC, SE) LogiAgent (OC, SE) RESTifAI (OC, SE)
FDIC 6, 41 6, 0 6, 0
Genome Nexus 23, 0 19, 0 23, 0
Languagetool 2, 0 1, 0 2, 1
OhSome 33, 0 4, 3 128, 50
Restcountries 22, 83 13, 0 20, 0

Code Coverage (Local Services, JaCoCo Metrics)

Metric Service AutoRestTest LogiAgent RESTifAI
BC GenomeNexus 0% 0% 0%
Languagetool 16% 7% 19%
Restcountries 88% 40% 74%
LC GenomeNexus 65% 60% 65%
Languagetool 27% 16% 28%
Restcountries 78% 49% 72%
MC GenomeNexus 45% 39% 45%
Languagetool 28% 16% 30%
Restcountries 85% 68% 83%

Test-Suite Size and Prompt Cost

Service LogiAgent (#TC, Tokens/TC) RESTifAI (#TC, Tokens/TC)
FDIC 26, 37,001 133, 88,762
Genome Nexus 38, 62,071 326, 22,258
Languagetool 6, 46,689 35, 3,585
OhSome 234, 48,036 2,213, 37,947
Restcountries 27, 62,143 191, 9,296

In a production evaluation at CASABLANCA Hotelsoftware, RESTifAI was run on two microservices. Breakdown of failed tests:

Service Bugs Enhancements Invalid (structural) Passed
Guest-Review 2 12 9 51
Inventory 11 6 30 91

In Guest-Review, 60.87% of non-passing tests revealed true bugs or enhancement opportunities, rather than simple invalid input rejection, demonstrating the value of negative oracle generation over crash-centric approaches.

5. Integration Practices and Open-Source Availability

RESTifAI is designed for seamless CI/CD integration. Recommended practices include:

  • Committing generated Postman collections and initialization scripts alongside application code for traceable, reproducible pipelines.
  • Invoking the restifai generate CLI from within typical CI orchestrators (GitHub Actions, Jenkins, GitLab CI), followed by test execution via newman run. The CLI supports reproducibility features such as fixed seeds and human-provided hints (e.g., secrets or credentials) to insulate test suites from LLM nondeterminism.
  • Initialization scripts can be supplied for environment reinitialization, ensuring test isolation and data integrity across runs.

RESTifAI is open-source (GitHub: https://github.com/casablancahotelsoftware/RESTifAI), implemented in Python 3.9+ with dependencies on requests, openai, click, and optionally postman-collection. The workflow comprises cloning the repository, installing dependencies, exporting the OpenAI API key, running the generator with OAS and init scripts, and executing the tests via Postman Newman.

6. Significance, Limitations, and Outlook

RESTifAI demonstrates that decomposing LLM prompting and explicitly engineering error feedback yields state-of-the-art operation coverage, streamlined oracles, and high test reusability. The simplified assertion model (status-code only) is both lightweight and robust; however, extension to payload-level validations remains as future work.

By outperforming or matching prior tools in coverage and surfacing true business-rule issues (not just server crashes), RESTifAI evidences the utility of LLM-based test pipeline optimization in industrial API development contexts. Its modular, open-source architecture and explicit support for deterministic CI/CD integration support broad practical adoption and extensibility (Kogler et al., 9 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to RESTifAI.