Montage: A Neural Network Language Model-Guided JavaScript Engine Fuzzer (2001.04107v2)

Published 13 Jan 2020 in cs.CR and cs.LG

Abstract: JavaScript (JS) engine vulnerabilities pose significant security threats affecting billions of web browsers. While fuzzing is a prevalent technique for finding such vulnerabilities, there have been few studies that leverage the recent advances in neural network LLMs (NNLMs). In this paper, we present Montage, the first NNLM-guided fuzzer for finding JS engine vulnerabilities. The key aspect of our technique is to transform a JS abstract syntax tree (AST) into a sequence of AST subtrees that can directly train prevailing NNLMs. We demonstrate that Montage is capable of generating valid JS tests, and show that it outperforms previous studies in terms of finding vulnerabilities. Montage found 37 real-world bugs, including three CVEs, in the latest JS engines, demonstrating its efficacy in finding JS engine bugs.

Authors (4)

Suyoung Lee (13 papers)
HyungSeok Han (3 papers)
Sang Kil Cha (3 papers)
Sooel Son (6 papers)

Citations (73)

View on Semantic Scholar

Summary

Overview of "\sysname: A Neural Network LLM-Guided JavaScript Engine Fuzzer"

The paper presents \sysname, a pioneering fuzzer designed to identify vulnerabilities in JavaScript (JS) engines by leveraging advances in neural network LLMs (NNLMs). With billions of users relying on browsers that integrate JS engines, these vulnerabilities pose significant security risks, emphasizing the importance of effective detection techniques. \sysname introduces an innovative approach by utilizing LLMs to guide the fuzzing process, enhancing the ability to discover complex bugs within JS engines.

Key Contributions

The main contribution of \sysname is the integration of NNLMs in the fuzzing process, focusing specifically on the structural transformation of JavaScript's abstract syntax tree (AST) into sequences of AST subtrees referred to as fragments. This technique enables more effective training of NNLMs, which in turn improves the generation of valid JS test inputs capable of revealing security vulnerabilities. The system demonstrates substantial improvements over prior techniques, both in generating linguistically correct JS test cases and in the number of vulnerabilities detected.

Methodology

\sysname operates through a three-phase workflow:

Fragment Sequence Generation: The JS files from regression test suites are parsed into ASTs. These ASTs are normalized and sliced into fragments, allowing NNLMs to learn semantic relationships embedded within them.
Training the Neural Model: An LSTM model is trained using these fragment sequences to predict the probability distribution for sequences of fragments, thus learning syntactic and semantic patterns inherent in the JS test suites.
Test Generation and Execution: Utilizing the trained model, \sysname generates new JS test cases by mutating existing seed tests. This involves replacing AST subtrees guided by the model's predictions and resolving reference errors before executing them against a target JS engine.

Key Findings

\sysname's efficacy is demonstrated through its ability to uncover 133 bugs, including 15 security vulnerabilities across multiple trials. The fuzzer outperformed several existing techniques in discovering unique crashes and uncovering CVEs within JS engines like \chakra. Furthermore, its integration of NNLMs achieved notable improvements over models that did not utilize such neural guidance, underscoring the advantages of employing a structured LLM in fuzzing.

Implications and Future Directions

\sysname's approach of leveraging trained LLMs sets a precedent for integrating machine learning techniques in security testing, particularly in structured code environments like JS engines. The practical implications of \sysname are significant, offering improved tools for developers and vendors in detecting vulnerabilities that could otherwise be exploited in drive-by download attacks.

Theoretically, this work opens pathways for further exploration in utilizing deep learning models for other structured data analysis tasks in software security beyond JS engines. Future research could expand on refining NNLM architectures to enhance predictive accuracy and efficiency, potentially employing more complex neural structures or hybrid models combining rule-based approaches.

In conclusion, \sysname serves as a critical step forward in fuzz testing methodology, showcasing the potential of NNLMs in uncovering security vulnerabilities in complex software environments like JavaScript engines. The implications of this research highlight not only advancements in security measures but also the broader application of AI methodologies to practical security challenges.

Related Papers

YouTube

Show All Videos