- The paper identifies key industrial challenges and feature requirements for search-based REST API fuzzing through an extensive multi-year collaboration.
- It evaluates open-source fuzzers against practical metrics like integration ease, coverage, and adaptability in high-scale industrial environments.
- The study highlights open problems including semantic coverage gaps, test evolution issues, and integration with generative AI testing tools.
Fuzzing REST APIs in Industry: Features, Evaluation, and Open Problems
Introduction
This paper presents a comprehensive analysis of the challenges, features, and empirical realities of deploying search-based fuzzing for REST APIs in industrial environments, centered on a multi-year academia-industry collaboration between the EvoMaster team and test engineers at Volkswagen AG. Contrary to most REST API fuzzing literature, which is typically confined to isolated lab-based benchmarks, this study uniquely captures the feature requirements, technical constraints, and open research challenges encountered in real, large-scale industrial software engineering.
Motivations for Fuzzer Selection and Integration
A critical analysis was undertaken to determine which open-source REST API fuzzers were viable for industrial adoption at Volkswagen AG. Several heuristics dictated tool selection: active and sustained project health (e.g., maintenance, contributor activity), explicit use of “AI” or search-based paradigms (which closely aligns with trending industrial priorities), ease of integration into cloud and Java testing workflows, and demonstrated superior benchmark results. The study concludes that EvoMaster was prioritized over alternative fuzzers (e.g., RESTler, Schemathesis, CATS) chiefly due to its maturity, explicit AI branding, open architecture, and output format compatibility with existing testing infrastructure.
Industrial Feature Requirements
The study identifies and implements a granular set of required features for search-based REST API fuzzing in production:
- Domain Expert Inputs (Examples and Links): Instrumental for guiding fuzzers toward business-relevant input invariants and parameter dependencies. Integrating OpenAPI “example(s)” and “links” affords both standardized guidance and minimized adaptation costs across fuzzers.
- Schema Validation: Essential for robust exploitation of OpenAPI artifacts; industrial schemas frequently contain subtle specification errors undetected by casual YAML linters, which kill fuzzer precision.
- Robust Authentication Handling: Both static and dynamic (token-based) authentication flows must be supported, with the latter requiring automated token acquisition and reuse. Establishment of the WFC (Web Fuzzing Commons) configuration standard addresses cross-tool portability.
- Resource and Rate Control: Adaptive early termination and endpoint subsetting allow practical resource scheduling at enterprise scale.
- Coverage Criteria: Black-box coverage criteria had to move beyond trivial endpoint–status code tuples, incorporating combinatorial and semantic criteria from recent research.
- Test Artifacts Enhancements: Machine-generated tests require precise summaries, descriptive names (per (Garrett et al., 1 Dec 2025)), DTO wrappers for payloads, and automatic data cleanup strategies to be legible, maintainable, and modifiable by engineers.
Empirical Evaluation in Industry
Across two years, EvoMaster was deployed on several APIs at Volkswagen AG with joint evaluation by seasoned QA engineers. Notable findings include:
- In the first year, on the auth-service and user-service APIs, 100% and 84% of generated tests, respectively, were directly useful or modifiable into final suites, but significant, critical scenarios were always missing from the generated suites, particularly negative tests and edge cases. Conversely, EvoMaster-generated suites systematically included boundary and random cases that experienced engineers would consider low value, but are useful for robustness.
- The second year, with enhanced OpenAPI schemas and the latest fuzzer versions, minimized the coverage gap, but the lack of precise higher-level API semantics in schemas (e.g., workflows, inter-endpoint logic) remains a bottleneck.
- A unique industry-wide user study, involving 11 AI-testing specialists from 4 leading consulting companies, established robust evidence that unassisted engineers using LLM-facilitated test generation consistently underperformed both in quality and coverage relative to EvoMaster, when provided with identical schemas and infrastructure.
The industrial adoption of EvoMaster at Volkswagen AG extends beyond standalone use, with integration into the internal TaaS/Tanius orchestration system.
Figure 2: Overview of Volkswagen AG’s TaaS architecture, positioning EvoMaster as a core test case generation microservice.
This integration enables seamless pipeline execution and sharing of results among enterprise QA teams and supports automated regression, provisioning, and maintenance strategies at scale.
Key Open Problems and Research Directions
Several open challenges remain and are substantiated by direct industrial feedback:
- Domain Knowledge and Arazzo Workflows: Full test coverage is gated by lack of declarative, higher-order semantics (e.g., business workflows, parameter constraints beyond type/format) in OpenAPI. The recent Arazzo specification introduces the potential for portable sequence/workflow semantics, which fuzzers should eventually exploit.
- Test Seeding from Existing Test Assets: Translating legacy Postman, Python, or DSL-based tests into effective fuzzer seeds (possibly via proxy recording or LLM translation) presents practical and technical hurdles around dynamic data tracking and flaky test mitigation.
- Test Suite Lifetime and Evolution: Effective fuzzer deployment in continuous integration requires robust handling of test migration, API evolution, and automated revalidation in the face of changing schemas and application logic.
- Database State Handling: Ensuring database consistency/destructiveness, especially for destructive operations, is non-trivial—partial rollback and resource copy-on-write strategies are presently only partially automated.
- Integration with Generative AI: While industry demand for LLM-assisted testing is high, empirical evidence shows current LLM-generated tests are weak on semantic correctness and scenario diversity. Hybridizing SBST with LLM-guided input selection and scenario mining may address these limitations (Augusto et al., 29 Sep 2025, Sartaj et al., 26 May 2025).
- Low-Code and Human-in-the-Loop UX: Improving test readability, modifiability (DTOs, natural language comments), and domain expert access (low-code dashboards) is an active research and engineering direction.
Broader Industrial Feedback and Domain-Specific Extensions
A notable outcome is the diverse feature requests from the community. Some (e.g., support for complex encryption flows with derived parameters) require tight integration with hand-written stubs and runtime code reuse, and necessitate new architecture in the fuzzer core. Open-source collaboration and representative API artifacts from the field are critical for extending academic fuzzers into specialized verticals (e.g., finance).
Conclusion
This paper establishes, based on extensive industrial collaboration, that research fuzzers—when architected and extended with industry-led feature sets—offer non-trivial practical value and can exceed the productivity of both manual and LLM-aided black-box testing in enterprise settings. However, no fuzzer to date is capable of fully supplanting expert-written suites or matching coverage for high-level semantic scenarios, highlighting the need for richer specifications (Arazzo, domain-specific schemas) and better black-box inference. Industry adoption is limited not just by algorithmic performance, but by the breadth and flexibility of integration, documentation, and workflow features.
The formalization and solution of feature requirements, artifact compatibility, and empirical limitations, as presented, will inform the design of next-generation fuzzing platforms, and highlight the imminent need for joint synergy between search-based testing, foundation models, and structured specification artifacts.
The EvoMaster fuzzer and related artifacts are available open source at https://github.com/WebFuzzing/EvoMaster.
References
For an extensive survey of REST API testing, see "Testing RESTful APIs: A Survey" [golmohammadi2023testing]. For SBST and LLMs, see [fraser2025retrospective]. For naming and summary generation in automated tests, see (Garrett et al., 1 Dec 2025), [djajadi2025using].
Acknowledgments
This body of work is supported in part by the European Research Council (EAST project, grant No. 864972).