AutoRestTest: Automated REST API Testing

Updated 14 December 2025

AutoRestTest is a framework for automated REST API testing that employs evolutionary algorithms and real-time model inference to generate complex, multi-step test sequences.
The approach uses advanced search heuristics such as reinforcement learning and automaton synthesis to uncover rare, stateful inter-service interactions.
Empirical results demonstrate significant improvements in code coverage and fault detection, with gains of up to 42% in coverage and 76% in identifying critical errors.

AutoRestTest is a class of automated frameworks and methodologies for the generation of system-level test cases targeting REST Application Programming Interfaces (APIs), with a focus on complex, distributed microservice architectures. These approaches utilize advanced search heuristics—including real-time behavioral model inference, evolutionary algorithms, and reinforcement learning—to automatically generate and prioritize test sequences that exercise realistic, multi-step workflows across intertwined services. Recent research has advanced AutoRestTest from code-centric, unit-level coverage maximization to system-level behavioral diversity and stateful dependency exploration, demonstrating significant gains in code coverage and unique fault detection compared to earlier black-box and evolutionary approaches (Cao et al., 2024).

1. Problem Motivation in Microservices and REST API Testing

As microservice architectures proliferate, REST APIs serve as the primary communication channel between distributed services. This architectural style introduces two primary challenges for test automation:

Emergent Inter-Service Dependencies: System faults commonly arise from sequences of API calls that manifest only at the system level, involving dependent states across multiple services. These cannot be inferred by code analysis on isolated units or endpoints.
Limitations of Unit-Level Coverage Heuristics: Dominant automated test generators (e.g., EvoMaster) employ heuristics such as branch distance to guide search. While effective for exercising line and branch coverage within a single service, such metrics are agnostic to call sequencing and inter-service workflows, leaving protocol violations and multi-service bugs untested.

Manual construction of comprehensive system tests for each potential cross-service scenario is infeasible in realistic microservice ecosystems (Cao et al., 2024).

2. Model Inference Search Heuristic (MISH): Core Principle

AutoRestTest addresses the above by incorporating the Model Inference Search Heuristic (MISH), which replaces or supplements traditional unit-level fitness functions with a global, system-driven behavioral objective. The key mechanism is real-time automaton learning from log event streams:

Log Event Abstraction: During execution of each test case, all timestamped log lines across microservices are mapped, in-stream, to template symbols via Drain3 (an online log template miner).
Trace Construction: All log events strictly between test start and end are grouped and ordered to yield a trace $t = t_1 t_2 \dots t_k$ over a finite alphabet $\Sigma$ .
Automaton Synthesis: The set of traces is incrementally generalized using a streaming DFA learner (FlexFringe), yielding $A = (Q, \Sigma, \delta, q_0, F)$ , where all states are final ( $F=Q$ ).
Path Fitness via State Visit Statistics: Deeper or less-visited regions of the automaton correspond to rarely exercised system behaviors.

Two complementary fitness metrics are provided:

Metric	Formula	Intuition
Lower-Than-Median (LM)	$f_{LM}(t) = \frac{\|\{s_i \mid visits(s_i) < median\}\|}{n}$	Fraction of states in the trace with below-median visit counts
Weighted State-Visit (WS)	$f_{WS}(t) = 1/\sum_{i=1}^n i \cdot f_i$	Inverse weighted sum of state visit counts along trace

These fitnesses prioritize the exploration of under-sampled, complex interaction pathways (Cao et al., 2024).

3. Evolutionary Search and Execution Workflow

AutoRestTest is implemented as an extension of EvoMaster, inheriting its evolutionary search infrastructure but altering the objective. The main workflow operates over a population $P$ of test sequences as follows:

Population Initialization: Generate $S$ random sequences of HTTP calls.
Execution and Logging: Run all sequences against the live microservice system, recording all relevant logs.
Trace Processing: Transform test-aligned log events into automaton traces.
Automaton Learning: Update the global DFA via online state merging.
Fitness Evaluation: Compute $f_{LM}(t)$ or $f_{WS}(t)$ for each sequence.
Selection and Offspring Generation: Use tournament selection and mutation on $P$ to generate $O$ new candidate sequences.
Replacement: Select the top $S$ from $P \cup O$ for the next generation.
Termination: After the time budget (e.g., 60 min), output tests with maximal fitness.

This methodology directs the search toward system behaviors previously unexercised, enabling earlier and broader fault finding, especially where multi-service state transitions are poorly documented or change dynamically (Cao et al., 2024).

4. Empirical Evaluation and Quantitative Results

AutoRestTest was rigorously evaluated on six real-world microservice benchmarks from the EMB suite. The systems ranged from 2 to 258 endpoints and up to 174k lines of code. Each algorithm (MISH-LM, MISH-WS, and the many-objective sorting baseline, MOSA) was run 20 times for 60 minutes per system. Evaluation metrics include:

Target Coverage: Sum of lines, branches, and schema checks reached.
Faults Found: Distinct HTTP 500 errors and schema mismatches.

Key empirical outcomes (selected examples):

Application	Cov. (MOSA)	Cov. (LM)	Cov. (WS)	Faults (MOSA)	Faults (LM)	Faults (WS)
Features-Service	879	883	885	—	—	—
ProxyPrint	2,922	3,302	3,378	66	68	69
OCVN	4,465	6,358	6,084	274	312	303

Statistical analysis (Wilcoxon rank-sum, Vargha-Delaney $A_{12}$ ) confirmed large, significant effect sizes on large multi-endpoint systems. AutoRestTest demonstrated up to 42% more coverage and up to 76% higher 500-error detection on complex benchmarks compared to MOSA. For small or highly specialized APIs, both approaches plateaued at similar levels (Cao et al., 2024).

AutoRestTest distinguishes itself in several dimensions:

System-Level Behavior Modeling: MISH enables behavioral coverage not accessible via branch-distance heuristics.
Search Guidance by Behavioral Novelty: Unlike black-box or random approaches, it biases the search toward novel or rare inter-service paths instead of statically defined targets.
Integration with Established Frameworks: Built atop EvoMaster, it leverages mature evolutionary search, execution, and test minimization infrastructure.

Comparison with alternative approaches:

MOSA: Covers more static targets but less able to find deep-sequence faults.
Branch-Distance EAs: Maximizes local code coverage but fails to traverse cross-service workflows.
Black-Box Testing: Lacks feedback to escalate promising test candidates into system-level failures.

6. Limitations and Prospective Research

While AutoRestTest represents a substantial advance in behavioral testing for microservices, several limitations are acknowledged:

Single-Objective Search Plateau: The purely automaton-driven objective can result in early stagnation (local optima) when target diversity or endpoint count is low.
Scalability Bottlenecks: Current file-based communication with the model inference learner induces I/O overhead; a tighter, API-based link is proposed to alleviate this.
Lack of Hybridization: Integrating automaton-based heuristics with many-objective diversity metrics (e.g., MOSA) may increase robustness on varied system types.

Future work includes exploring hybrid search objectives, scaling state-merging to larger trace volumes, and further optimizing search operator design (Cao et al., 2024).

7. Broader Implications and Synthesis

The introduction of AutoRestTest (MISH-based) marks a definitive shift in the REST API testing landscape, demonstrating the empirical superiority of system-wide behavioral coverage as a guiding test objective. By formalizing the use of inferred global models and rare-path fitness, AutoRestTest identifies faults in the emergent behaviors of distributed microservices that evade legacy approaches. The framework is especially valuable for large-scale APIs where realistic call sequencing, dependency handling, and interaction diversity are essential for uncovering complex bugs.

In sum, AutoRestTest advances automated REST API testing beyond traditional code-centric metrics, providing an extensible foundation for robust, behaviorally-driven, evolution-in-the-loop test generation (Cao et al., 2024).

Markdown Upgrade to Chat

References (1)

Automated Test-Case Generation for REST APIs Using Model Inference Search Heuristic (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to AutoRestTest.