Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LlamaRestTest: Effective REST API Testing with Small Language Models (2501.08598v2)

Published 15 Jan 2025 in cs.SE and cs.AI

Abstract: Modern web services rely heavily on REST APIs, typically documented using the OpenAPI specification. The widespread adoption of this standard has resulted in the development of many black-box testing tools that generate tests based on OpenAPI specifications. Although LLMs have shown promising test-generation abilities, their application to REST API testing remains mostly unexplored. We present LlamaRestTest, a novel approach that employs two custom LLMs-created by fine-tuning and quantizing the Llama3-8B model using mined datasets of REST API example values and inter-parameter dependencies-to generate realistic test inputs and uncover inter-parameter dependencies during the testing process by analyzing server responses. We evaluated LlamaRestTest on 12 real-world services (including popular services such as Spotify), comparing it against RESTGPT, a GPT-powered specification-enhancement tool, as well as several state-of-the-art REST API testing tools, including RESTler, MoRest, EvoMaster, and ARAT-RL. Our results demonstrate that fine-tuning enables smaller models to outperform much larger models in detecting actionable parameter-dependency rules and generating valid inputs for REST API testing. We also evaluated different tool configurations, ranging from the base Llama3-8B model to fine-tuned versions, and explored multiple quantization techniques, including 2-bit, 4-bit, and 8-bit integer formats. Our study shows that small LLMs can perform as well as, or better than, LLMs in REST API testing, balancing effectiveness and efficiency. Furthermore, LlamaRestTest outperforms state-of-the-art REST API testing tools in code coverage achieved and internal server errors identified, even when those tools use RESTGPT-enhanced specifications.

Summary

  • The paper introduces a novel testing framework that leverages fine-tuned small language models based on Llama3-8b to dynamically generate test inputs from server feedback.
  • It demonstrates significant improvements over state-of-the-art tools by achieving up to 195.3% higher branch coverage and enhanced detection of internal server errors.
  • Empirical evaluation on 12 real-world APIs confirms the framework's efficiency and adaptability, as detailed through rigorous ablation studies.

LlamaRestTest: Effective REST API Testing with Small LLMs

The paper "LlamaRestTest: Effective REST API Testing with Small LLMs" introduces a novel testing framework aimed at enhancing REST API testing by employing fine-tuned small LLMs derived from the Llama3-8b model. This paper addresses the limitations of traditional REST API testing tools that depend solely on static API specifications and that could significantly benefit from dynamic server feedback during testing.

Summary

Modern REST APIs, often documented using OpenAPI specifications, enable web-service transactions using HTTP protocols. REST APIs have become integral to software architecture due to their ability to allow interactions without needing extensive prior knowledge between clients and servers. The rapid growth of REST API deployments has spurred the development of black-box testing tools. These tools leverage machine-readable parts of API specifications to automate testing aimed at detecting errors like internal server errors (500 status codes).

The use of NLP and LLMs in enhancing REST API testing has evolved, primarily focusing on static specifications. However, these approaches haven't fully utilized the potential of iterative refinement from server feedback. To bridge this gap, LlamaRestTest is presented, which uses fine-tuned smaller LLMs to dynamically adjust test inputs based on server responses.

LlamaRestTest utilizes two core components: LlamaREST-IPD and LlamaREST-EX. These models were developed by fine-tuning the Llama3-8b model using a dataset of REST API example values and inter-parameter constraints. This fine-tuning significantly improved the model's ability to detect actionable testing rules and to generate test inputs effectively. The models are further enhanced through quantization techniques to optimize their efficiency and applicability on a broader array of hardware setups.

Empirical Evaluation

LlamaRestTest's evaluation on 12 real-world APIs, including major ones like Spotify, showed that finely tuned smaller LLMs can outperform much larger models and existing state-of-the-art tools like RESTler, MoRest, EvoMaster, and ARAT-RL in terms of both code coverage and the number of internal server errors identified.

Key Findings

  1. Performance Over Traditional Tools: LlamaRestTest outperformed state-of-the-art REST API testing tools significantly on key metrics including branch coverage (by up to 195.3%) and internal server errors detection.
  2. Efficiency and Adaptability: LlamaRestTest achieves a balance between the precision typically associated with LLMs and the efficiency needed for real-world application. The quantized models showed similar performance to non-quantized versions with drastically reduced computational requirements, highlighting a pragmatic approach to deploying effective REST API testing frameworks.
  3. Ablation Studies: The paper further demonstrated through ablation studies that both LlamaREST-IPD and LlamaREST-EX components critically contribute to its superior performance by directly leveraging server feedback for improved testing efficacy.

Implications and Future Developments

The implications of LlamaRestTest extend both practically and theoretically. In practice, it could substantially enhance the robustness of web-applications by identifying errors that are tightly bound to dynamic interactions, which static testing approaches may miss. Theoretically, it provides a framework showcasing how dynamic server feedback can be systematically integrated with machine learning models for better application testing outcomes.

Future research might focus on expanding the database for LlamaREST-IPD and LlamaREST-EX to further refine their capability in complex API scenarios, particularly those involving more intricate inter-parameter dependencies. Additionally, exploring the integration of these models with other testing frameworks could position LlamaRestTest as a pivotal tool in the evolution of REST API testing territory.

In conclusion, LlamaRestTest offers a notable advancement in REST API testing by employing LLMs refined through domain-specific fine-tuning and enhanced by dynamic feedback integrations, achieving a marked improvement over existing state-of-the-art API testing solutions.

Youtube Logo Streamline Icon: https://streamlinehq.com