Fuzzing REST APIs in Industry: Necessary Features and Open Problems

Published 2 Apr 2026 in cs.SE | (2604.01759v1)

Abstract: REST APIs are widely used in industry, in all different kinds of domains. An example is Volkswagen AG, a German automobile manufacturer. Established testing approaches for REST APIs are time consuming, and require expertise from professional test engineers. Due to its cost and importance, in the scientific literature several approaches have been proposed to automatically test REST APIs. The open-source, search-based fuzzer EvoMaster is one of such tools proposed in the academic literature. However, how academic prototypes can be integrated in industry and have real impact to software engineering practice requires more investigation. In this paper, we report on our experience in using EvoMaster at Volkswagen AG, as an EvoMaster user from 2023 to 2026. We share our learnt lessons, and discuss several features needed to be implemented in EvoMaster to make its use in an industrial context successful. Feedback about value in industrial setups of EvoMaster was given from Volkswagen AG about 4 APIs. Additionally, a user study was conducted involving 11 testing specialists from 4 different companies. We further identify several real-world research challenges that still need to be solved.

Abstract PDF Upgrade to Chat

Authors (5)

Summary

The paper identifies key industrial challenges and feature requirements for search-based REST API fuzzing through an extensive multi-year collaboration.
It evaluates open-source fuzzers against practical metrics like integration ease, coverage, and adaptability in high-scale industrial environments.
The study highlights open problems including semantic coverage gaps, test evolution issues, and integration with generative AI testing tools.

Fuzzing REST APIs in Industry: Features, Evaluation, and Open Problems

Introduction

This paper presents a comprehensive analysis of the challenges, features, and empirical realities of deploying search-based fuzzing for REST APIs in industrial environments, centered on a multi-year academia-industry collaboration between the EvoMaster team and test engineers at Volkswagen AG. Contrary to most REST API fuzzing literature, which is typically confined to isolated lab-based benchmarks, this study uniquely captures the feature requirements, technical constraints, and open research challenges encountered in real, large-scale industrial software engineering.

Motivations for Fuzzer Selection and Integration

A critical analysis was undertaken to determine which open-source REST API fuzzers were viable for industrial adoption at Volkswagen AG. Several heuristics dictated tool selection: active and sustained project health (e.g., maintenance, contributor activity), explicit use of “AI” or search-based paradigms (which closely aligns with trending industrial priorities), ease of integration into cloud and Java testing workflows, and demonstrated superior benchmark results. The study concludes that EvoMaster was prioritized over alternative fuzzers (e.g., RESTler, Schemathesis, CATS) chiefly due to its maturity, explicit AI branding, open architecture, and output format compatibility with existing testing infrastructure.

Industrial Feature Requirements

The study identifies and implements a granular set of required features for search-based REST API fuzzing in production:

Domain Expert Inputs (Examples and Links): Instrumental for guiding fuzzers toward business-relevant input invariants and parameter dependencies. Integrating OpenAPI “example(s)” and “links” affords both standardized guidance and minimized adaptation costs across fuzzers.
Schema Validation: Essential for robust exploitation of OpenAPI artifacts; industrial schemas frequently contain subtle specification errors undetected by casual YAML linters, which kill fuzzer precision.
Robust Authentication Handling: Both static and dynamic (token-based) authentication flows must be supported, with the latter requiring automated token acquisition and reuse. Establishment of the WFC (Web Fuzzing Commons) configuration standard addresses cross-tool portability.
Resource and Rate Control: Adaptive early termination and endpoint subsetting allow practical resource scheduling at enterprise scale.
Coverage Criteria: Black-box coverage criteria had to move beyond trivial endpoint–status code tuples, incorporating combinatorial and semantic criteria from recent research.
Test Artifacts Enhancements: Machine-generated tests require precise summaries, descriptive names (per (Garrett et al., 1 Dec 2025)), DTO wrappers for payloads, and automatic data cleanup strategies to be legible, maintainable, and modifiable by engineers.

Empirical Evaluation in Industry

Across two years, EvoMaster was deployed on several APIs at Volkswagen AG with joint evaluation by seasoned QA engineers. Notable findings include:

In the first year, on the auth-service and user-service APIs, 100% and 84% of generated tests, respectively, were directly useful or modifiable into final suites, but significant, critical scenarios were always missing from the generated suites, particularly negative tests and edge cases. Conversely, EvoMaster-generated suites systematically included boundary and random cases that experienced engineers would consider low value, but are useful for robustness.
The second year, with enhanced OpenAPI schemas and the latest fuzzer versions, minimized the coverage gap, but the lack of precise higher-level API semantics in schemas (e.g., workflows, inter-endpoint logic) remains a bottleneck.
A unique industry-wide user study, involving 11 AI-testing specialists from 4 leading consulting companies, established robust evidence that unassisted engineers using LLM-facilitated test generation consistently underperformed both in quality and coverage relative to EvoMaster, when provided with identical schemas and infrastructure.

Testing as a Service and Tooling Ecosystem

The industrial adoption of EvoMaster at Volkswagen AG extends beyond standalone use, with integration into the internal TaaS/Tanius orchestration system.

Figure 2: Overview of Volkswagen AG’s TaaS architecture, positioning EvoMaster as a core test case generation microservice.

This integration enables seamless pipeline execution and sharing of results among enterprise QA teams and supports automated regression, provisioning, and maintenance strategies at scale.

Key Open Problems and Research Directions

Several open challenges remain and are substantiated by direct industrial feedback:

Domain Knowledge and Arazzo Workflows: Full test coverage is gated by lack of declarative, higher-order semantics (e.g., business workflows, parameter constraints beyond type/format) in OpenAPI. The recent Arazzo specification introduces the potential for portable sequence/workflow semantics, which fuzzers should eventually exploit.
Test Seeding from Existing Test Assets: Translating legacy Postman, Python, or DSL-based tests into effective fuzzer seeds (possibly via proxy recording or LLM translation) presents practical and technical hurdles around dynamic data tracking and flaky test mitigation.
Test Suite Lifetime and Evolution: Effective fuzzer deployment in continuous integration requires robust handling of test migration, API evolution, and automated revalidation in the face of changing schemas and application logic.
Database State Handling: Ensuring database consistency/destructiveness, especially for destructive operations, is non-trivial—partial rollback and resource copy-on-write strategies are presently only partially automated.
Integration with Generative AI: While industry demand for LLM-assisted testing is high, empirical evidence shows current LLM-generated tests are weak on semantic correctness and scenario diversity. Hybridizing SBST with LLM-guided input selection and scenario mining may address these limitations (Augusto et al., 29 Sep 2025, Sartaj et al., 26 May 2025).
Low-Code and Human-in-the-Loop UX: Improving test readability, modifiability (DTOs, natural language comments), and domain expert access (low-code dashboards) is an active research and engineering direction.

Broader Industrial Feedback and Domain-Specific Extensions

A notable outcome is the diverse feature requests from the community. Some (e.g., support for complex encryption flows with derived parameters) require tight integration with hand-written stubs and runtime code reuse, and necessitate new architecture in the fuzzer core. Open-source collaboration and representative API artifacts from the field are critical for extending academic fuzzers into specialized verticals (e.g., finance).

Conclusion

This paper establishes, based on extensive industrial collaboration, that research fuzzers—when architected and extended with industry-led feature sets—offer non-trivial practical value and can exceed the productivity of both manual and LLM-aided black-box testing in enterprise settings. However, no fuzzer to date is capable of fully supplanting expert-written suites or matching coverage for high-level semantic scenarios, highlighting the need for richer specifications (Arazzo, domain-specific schemas) and better black-box inference. Industry adoption is limited not just by algorithmic performance, but by the breadth and flexibility of integration, documentation, and workflow features.

The formalization and solution of feature requirements, artifact compatibility, and empirical limitations, as presented, will inform the design of next-generation fuzzing platforms, and highlight the imminent need for joint synergy between search-based testing, foundation models, and structured specification artifacts.

The EvoMaster fuzzer and related artifacts are available open source at https://github.com/WebFuzzing/EvoMaster.

References

For an extensive survey of REST API testing, see "Testing RESTful APIs: A Survey" [golmohammadi2023testing]. For SBST and LLMs, see [fraser2025retrospective]. For naming and summary generation in automated tests, see (Garrett et al., 1 Dec 2025), [djajadi2025using].