Fuzzing Frameworks for Server-side Web Applications: A Survey (2406.03208v1)

Published 5 Jun 2024 in cs.SE and cs.CR

Abstract: There are around 5.3 billion Internet users, amounting to 65.7% of the global population, and web technology is the backbone of the services delivered via the Internet. To ensure web applications are free from security-related bugs, web developers test the server-side web applications before deploying them to production. The tests are commonly conducted through the interfaces (i.e., Web API) that the applications expose since they are the entry points to the application. Fuzzing is one of the most promising automated software testing techniques suitable for this task; however, the research on (server-side) web application fuzzing has been rather limited compared to binary fuzzing which is researched extensively. This study reviews the state-of-the-art fuzzing frameworks for testing web applications through web API, identifies open challenges, and gives potential future research. We collect papers from seven online repositories of peer-reviewed articles over the last ten years. Compared to other similar studies, our review focuses more deeply on revealing prior work strategies in generating valid HTTP requests, utilising feedback from the Web Under Tests (WUTs), and expanding input spaces. The findings of this survey indicate that several crucial challenges need to be solved, such as the ineffectiveness of web instrumentation and the complexity of handling microservice applications. Furthermore, some potential research directions are also provided, such as fuzzing for web client programming. Ultimately, this paper aims to give a good starting point for developing a better web fuzzing framework.

PDF HTML Abstract

This paper, "Fuzzing Frameworks for Server-side Web Applications: A Survey" (Dharmaadi et al., 5 Jun 2024 ), provides a comprehensive review of the state-of-the-art in fuzzing techniques applied to server-side web applications via their Web APIs. It highlights the growing importance of securing web applications due to their widespread use and the increasing complexity of their APIs, noting that traditional binary fuzzing techniques are often not directly applicable. Web APIs, especially RESTful ones, present unique challenges, such as requiring syntactically and semantically valid inputs to interact effectively and the need to handle request dependencies and state. The survey aims to bridge this gap by analyzing how existing frameworks address these challenges and identifying areas for future research.

The authors collected 53 relevant papers published between 2013 and 2023 from various research repositories. They filtered papers based on their focus, specifically targeting studies that propose or improve web API fuzzers for server-side applications. The analysis categorizes existing web API fuzzers into two main approaches:

Standard Fuzzing: Focuses on finding any bug or crash by exploring as much of the application's code and states as possible.
Vulnerability-driven Fuzzing: Targets specific types of vulnerabilities using tailored mutators and checkers.

The paper details a general workflow for web API fuzzers, breaking it down into four crucial processes:

Request Template Generation: This process involves creating the structure of HTTP requests. A major challenge is understanding request dependencies (e.g., creating a resource before deleting it). Solutions include building dependency graphs (inferring relationships from OpenAPI specs or CRUD semantics), using tree-based dependency models for efficiency, employing pre-defined templates, annotating OpenAPI specifications, or simply having no explicit dependency model for single-request fuzzing. Another problem is generating long request sequences to reach deeper states, addressed by techniques like length-oriented selection and smart sampling. For web applications without OpenAPI specifications, frameworks use HTML or JavaScript crawlers to discover endpoints and parameters or rely on manual input/corpus. Inferring specific data types beyond basic strings/numbers (Format-encoded Type inference) is also discussed to narrow down input space effectively.
Template Rendering: This step populates the generated templates with concrete values. A key issue is obtaining initial valid values, often solved by using default dictionaries or extracting example values from OpenAPI specifications. Generating semantically valid values is challenging as OpenAPI typically lacks this information. Approaches include leveraging external knowledge bases like DBPedia or grouping semantically similar parameters. Handling inter-parameter dependencies (where one parameter's value depends on another's) is also critical and addressed through NLP-based dependency inference, using specific languages like IDL, or employing deep learning models to predict request validity.
Execution and Getting Feedback: Fuzzers send generated requests and analyze the responses. Feedback is vital for guiding the mutation process. Common feedback types include HTTP response codes (especially 5xx for crashes), code coverage (requires instrumentation), branch distance (how close an input is to satisfying a condition), taint feedback (tracking input flow), Test Coverage Level (TCL, a black-box metric), and sometimes specific application-level messages like XML responses. Instrumentation, necessary for grey/white-box feedback like code coverage or branch distance, is often done via source code augmentation (e.g., using Java agents, Babel plugins for JavaScript, or AST parsing for PHP) or interpreter augmentation (modifying the web server's interpreter). Regarding observed vulnerabilities, most standard fuzzers report crashes (5xx responses). Vulnerability-driven fuzzers look for specific issues like resource violations (use-after-free, resource leaks), mass assignment, XSS, SQL injection, command injection, etc., often using custom checkers or analyzing responses for specific patterns or payloads.
Mutation: This is the core fuzzing process where existing inputs are modified to explore the input space. Simple value mutation (random changes) is adapted for web parameters (string/number modifications). Generating more valid requests often uses response dictionaries (reusing values from successful responses, especially for generated IDs), corpus mutation (blending parameters from stored requests), adaptive hypermutation (focusing mutation on parameters with higher impact), attention-based models (learning parameter relationships), or Many Independent Objective (MIO) algorithms for covering multiple objectives. Generating more invalid requests, crucial for triggering error handling code, uses techniques like constraint violation (ignoring schema rules), rule-based schema mutators (modifying the structure of request bodies), vulnerability dictionaries (injecting known exploit payloads), and tracked fault generators (systematically inserting faults).

The paper reviews how researchers evaluate their fuzzers, primarily using:

Public WUTs (43% of studies): Online services like those on APIS.Guru, Azure, GitLab, GitHub. Challenges include black-box nature and rate limiting/authentication issues.
Self-developed benchmarks (39%): Open-source web applications (Java, PHP, Node.js, etc.), sometimes with manually injected bugs (e.g., EvoMaster Benchmark, PHP benchmarks with XSS).
Third-party benchmarks (18%): Known vulnerable test-bed applications like WebGoat, Gruyere, Nodegoat, DVWA.

Finally, the survey identifies several open challenges:

Effectiveness of Instrumentation: While beneficial for coverage, instrumentation introduces overhead and complexity, requiring recompilation/rebuilding, which can be slow in fast-paced development environments.
Complexity of Handling Microservice Architecture: Testing interactions between independent services is difficult, especially in identifying which specific service is responsible for a failure when testing the API gateway.
Difficulty of Testing Public WUTs: Lack of source code access limits deep testing, and rate limits/authentication complexity hinders extensive campaigns. Testing large open-source WUTs is also difficult due to their complexity.
Low-Quality Corpus: Initial corpora derived solely from documentation might be incomplete, limiting the fuzzer's exploration. Minimizing the corpus to keep it powerful is a potential area for improvement.
Lack of Web API Fuzzing Benchmarks: Unlike binary fuzzing, there is no widely accepted, comprehensive benchmark with annotated bugs and standardized metrics for fair comparison of web API fuzzers.

Potential future research directions highlighted include extending fuzzing to web client programming (e.g., WebAssembly), mobile web applications, leveraging generative AI for input generation, and developing strategies to detect a wider variety of web security vulnerabilities beyond crashes and common injection/XSS flaws, including novel or complex bugs.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

I Putu Arya Dharmaadi (1 paper)
Elias Athanasopoulos (1 paper)
Fatih Turkmen (15 papers)

Related Papers

Find Related Papers

Tweets

https://twitter.com/turkmenf/status/1799001415215718749

https://twitter.com/ComputerPapers/status/1798594960343650575