RESTifAI: LLM-Powered API Testing

Updated 14 December 2025

RESTifAI is an LLM-based framework for automated, reusable, CI/CD-ready REST API testing that synthesizes both happy-path and negative test scenarios from an OpenAPI Specification.
It leverages a modular architecture—including an OAS parser, happy-path and negative test generators—to systematically map API call sequences and execute tests with key–value execution traces.
Empirical evaluations demonstrate that RESTifAI achieves competitive coverage and error detection while seamlessly integrating into industrial CI/CD pipelines.

RESTifAI is a LLM-based framework for automated, reusable, and CI/CD-ready REST API testing focused on generating comprehensive “happy-path” and negative test scenarios from an OpenAPI Specification (OAS). It leverages decomposed LLM prompting to systematically map valid call sequences, synthesize both positive and negative oracles, and emit executable suites in a repeatable, industrial context. RESTifAI addresses limitations in existing LLM-powered API testing tools with respect to test reusability, oracle design, and pipeline integration, while achieving competitive or superior empirical results against reference tools such as AutoRestTest and LogiAgent (Kogler et al., 9 Dec 2025).

1. Architectural Organization and Pipeline

RESTifAI’s architecture comprises four principal modules, orchestrated to transform an OAS document into reusable, executable API test suites:

OAS Parser: Ingests the OAS (v3.x), extracting endpoints, HTTP methods, parameter schemas (body, query, header, path, cookie), and defined response codes for the Service Under Test (SUT).
Happy Path Generator (LLM Prompt Engine, Value Generator, Request Sender): Uses an LLM (e.g., Azure OpenAI GPT-4.1-mini) to determine dependency orderings and construct a valid invocation sequence for each endpoint. A usage guide for parameter computation is generated. Parameter instantiation occurs via an LLM-driven value synthesis module, with iterative error-driven refinement using feedback from any 4xx responses. Calls are executed, and successful responses (2xx) are parsed into an execution trace—a key–value store aggregating reusable outputs for downstream operations.
Negative Value Generator (Oracle Module): Applies LLM-based prompting to generate negative test cases. For each parameter in the happy-path scenario, structural oracles (violating OAS constraints) and functional oracles (business rule violations not captured in OAS, e.g., negative payment amounts) are synthesized. Expected behavior for these cases is a 4xx HTTP response.
Test Case Builder (CI/CD integration): Assembles all cases (happy and negative paths) into parameterized Postman collections, supporting both CLI (restifai generate ...) and lightweight Python GUI modalities. Initialization scripts (e.g., SQL resets, mocks) are supported for test isolation. Parameterization uses the execution trace to maximize test reusability across runs and environments.

The following pseudocode summarizes the core test generation procedures:

HappyPathGenerator(OAS, targetOp):
    trace ← ∅
    sequence, guide ← LLM₁.prompt(“Given OAS, list dependent ops and usage guide for targetOp”)
    for op in sequence do
        repeat up to K times:
            inputs ← LLM₂.prompt(op, OAS.schema(op), guide, trace)
            resp ← sendRequest(op, inputs)
            if resp.code ∈ 200…299 then
                newPairs ← parseJSON(resp.payload)
                trace ← trace ∪ newPairs
                break
            else
                guide ← guide ∪ “Error message: ” + resp.body
        end
    end
    return (sequence, trace)

NegativePathGenerator(sequence, trace, targetOp):
    tests ← []
    for each parameter p used by targetOp do
        invalids ← LLM₃.prompt(“Generate structural and functional invalid values for p given trace”)
        for iv in invalids do
            negTrace ← trace; negTrace[p] ← iv
            tests.append((sequence, negTrace, expectedCode ∈ 400…499))
        end
    end
    return tests

2. LLM Prompt Engineering and Error Feedback

Prompt engineering in RESTifAI is intentionally modularized to minimize LLM “hallucinations” and maximize prompt interpretability and controllability:

Happy-path prompt: Requests dependency analysis (“Which prior operations … so that operation X’s parameters are satisfiable?”) and a natural-language usage guide for parameter values.
Parameter value synthesis prompt: Given the operation schema, guide, and accumulated trace, requests concrete instantiations for all parameters.
Error feedback loop: If parameter synthesis triggers a 4xx server response, the error message is appended to subsequent prompts, refining context and discouraging repeated mistakes.
Negative test prompt: Separate LLM queries generate (i) structurally invalid values based on the parameter type, format, requiredness, and constraints; and (ii) functionally invalid values based on domain semantics (e.g., business logic such as “dates must be from ≤ to”).

This decomposed design increases reliability by reducing prompt scope and enabling targeted error correction.

3. Reusability and Oracle Simplification

Three architectural features underpin RESTifAI’s reusability:

Key–value execution trace: Output values from each API call are stored and propagated, decoupling test logic from fixed concrete values and facilitating reuse across test runs.
Framework-agnostic output: Tests are emitted as Postman collections by default, but the architecture allows output targeting different frameworks by adapting only the Test Case Builder.
Isolated initialization and environmental parameterization: User-provided initialization scripts and environment variable management ensure test suites can be re-run deterministically, essential for CI.

Oracle complexity is minimized by asserting only on HTTP status codes—2xx for happy-path and 4xx for negative-path tests. This “two-tier oracle” approach reduces brittleness and overhead compared to payload-level assertions, with potential extensions to deeper payload validation indicated as future work.

Key coverage and effectiveness metrics used include:

Operation Coverage (OC): $OC = \frac{|\{ops \text{ for which a 2xx happy-path was generated}\}|}{|\{all~ops~in~OAS\}|}$
Branch Coverage (BC), Line Coverage (LC), Method Coverage (MC): Measured via JaCoCo for instrumented local services.
#DetectedServerErrors: Number of 5xx or unexpected 2xx returned from negative tests.

4. Empirical Results and Comparative Analysis

RESTifAI was evaluated against AutoRestTest and LogiAgent on five representative APIs (three local, two remote), all using GPT-4.1-mini as the LLM backend. Core quantitative results are summarized below.

Operation Coverage and Server Errors

Service	AutoRestTest (OC, SE)	LogiAgent (OC, SE)	RESTifAI (OC, SE)
FDIC	6, 41	6, 0	6, 0
Genome Nexus	23, 0	19, 0	23, 0
Languagetool	2, 0	1, 0	2, 1
OhSome	33, 0	4, 3	128, 50
Restcountries	22, 83	13, 0	20, 0

Code Coverage (Local Services, JaCoCo Metrics)

Metric	Service	AutoRestTest	LogiAgent	RESTifAI
BC	GenomeNexus	0%	0%	0%
	Languagetool	16%	7%	19%
	Restcountries	88%	40%	74%
LC	GenomeNexus	65%	60%	65%
	Languagetool	27%	16%	28%
	Restcountries	78%	49%	72%
MC	GenomeNexus	45%	39%	45%
	Languagetool	28%	16%	30%
	Restcountries	85%	68%	83%

Test-Suite Size and Prompt Cost

Service	LogiAgent (#TC, Tokens/TC)	RESTifAI (#TC, Tokens/TC)
FDIC	26, 37,001	133, 88,762
Genome Nexus	38, 62,071	326, 22,258
Languagetool	6, 46,689	35, 3,585
OhSome	234, 48,036	2,213, 37,947
Restcountries	27, 62,143	191, 9,296

In a production evaluation at CASABLANCA Hotelsoftware, RESTifAI was run on two microservices. Breakdown of failed tests:

Service	Bugs	Enhancements	Invalid (structural)	Passed
Guest-Review	2	12	9	51
Inventory	11	6	30	91

In Guest-Review, 60.87% of non-passing tests revealed true bugs or enhancement opportunities, rather than simple invalid input rejection, demonstrating the value of negative oracle generation over crash-centric approaches.

5. Integration Practices and Open-Source Availability

RESTifAI is designed for seamless CI/CD integration. Recommended practices include:

Committing generated Postman collections and initialization scripts alongside application code for traceable, reproducible pipelines.
Invoking the restifai generate CLI from within typical CI orchestrators (GitHub Actions, Jenkins, GitLab CI), followed by test execution via newman run. The CLI supports reproducibility features such as fixed seeds and human-provided hints (e.g., secrets or credentials) to insulate test suites from LLM nondeterminism.
Initialization scripts can be supplied for environment reinitialization, ensuring test isolation and data integrity across runs.

RESTifAI is open-source (GitHub: https://github.com/casablancahotelsoftware/RESTifAI), implemented in Python 3.9+ with dependencies on requests, openai, click, and optionally postman-collection. The workflow comprises cloning the repository, installing dependencies, exporting the OpenAI API key, running the generator with OAS and init scripts, and executing the tests via Postman Newman.

6. Significance, Limitations, and Outlook

RESTifAI demonstrates that decomposing LLM prompting and explicitly engineering error feedback yields state-of-the-art operation coverage, streamlined oracles, and high test reusability. The simplified assertion model (status-code only) is both lightweight and robust; however, extension to payload-level validations remains as future work.

By outperforming or matching prior tools in coverage and surfacing true business-rule issues (not just server crashes), RESTifAI evidences the utility of LLM-based test pipeline optimization in industrial API development contexts. Its modular, open-source architecture and explicit support for deterministic CI/CD integration support broad practical adoption and extensibility (Kogler et al., 9 Dec 2025).

Markdown Upgrade to Chat

References (1)

RESTifAI: LLM-Based Workflow for Reusable REST API Testing (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to RESTifAI.

RESTifAI: LLM-Powered API Testing

1. Architectural Organization and Pipeline

2. LLM Prompt Engineering and Error Feedback

3. Reusability and Oracle Simplification

4. Empirical Results and Comparative Analysis

Operation Coverage and Server Errors

Code Coverage (Local Services, JaCoCo Metrics)

Test-Suite Size and Prompt Cost

5. Integration Practices and Open-Source Availability

6. Significance, Limitations, and Outlook

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

RESTifAI: LLM-Powered API Testing

1. Architectural Organization and Pipeline

2. LLM Prompt Engineering and Error Feedback

3. Reusability and Oracle Simplification

4. Empirical Results and Comparative Analysis

Operation Coverage and Server Errors

Code Coverage (Local Services, JaCoCo Metrics)

Test-Suite Size and Prompt Cost

5. Integration Practices and Open-Source Availability

6. Significance, Limitations, and Outlook

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research