MULVULN: Multilingual Vulnerability Detection

Updated 12 October 2025

MULVULN is a multilingual vulnerability detection framework that integrates shared semantic learning with language-dedicated modules to capture both common and unique code patterns.
The architecture employs a dual mechanism combining query-key matching and language-aware parameter masking, achieving a test F1-score of ~72.20% and high Recall across several programming languages.
By leveraging a CodeT5 backbone and adaptive parameter pooling, MULVULN facilitates practical integration into CI/CD pipelines for automated vulnerability detection in heterogeneous codebases.

MULVULN refers to a multilingual vulnerability detection framework that enhances pre-trained LLMs (PLMs) by combining knowledge shared across programming languages with language-specific information to improve software vulnerability detection in heterogeneous, polyglot codebases. MULVULN achieves superior detection capability by structuring its architecture to extract common semantic and syntactic patterns via a deep backbone (such as the encoder of CodeT5) while enabling the system to adapt to unique coding conventions by utilizing language-dedicated parameter modules. This section synthesizes MULVULN in terms of architecture, training mechanisms, evaluation, empirical results, practical implications, and limitations, based strictly on the rigorously specified implementation and empirical details.

1. Architectural Design: Shared and Language-Specific Knowledge Integration

MULVULN leverages a dual-pronged modeling approach. The core is a pre-trained LLM (such as CodeT5) that captures common cross-lingual code regularities, including broad syntactic and semantic cues that are prevalent across modern programming languages. To address the crucial challenge of language-specific patterns, MULVULN includes a pool of parameter matrices, each associated with unique language traits (e.g., pythonic idioms vs. C-style pointer manipulation).

Integration is realized through two primary mechanisms:

Parameter Selection via Key–Parameter Query: For each code sample $X$ , the PLM’s embedding layer extracts a query vector $q(X)$ (for example, the [CLS] token embedding). This vector is then compared to a set of learnable key vectors $\{k_1, k_2, \ldots, k_S\}$ (each linked to distinct parameter matrices $P_1, \ldots, P_S$ ) using a similarity function, such as cosine similarity:

$i^* = \arg \max_i \varphi(q(X), k_i)$

where $i^*$ determines the selected parameter matrix $P_X$ for this instance. The adapted input is realized as:

$X_p = \text{concat}(P_X, X_e)$

where $X_e$ is the original token embedding of $X$ . This concatenation is then processed by the multi-head attention layers of the PLM backbone.

Language-Aware Parameter Masking: In supervised settings, MULVULN constrains the query matching to parameter indices that correspond to the known language of a given input, ensuring language-consistent adaptation during training while permitting dynamic selection during inference.

The combined effect is to enable robust transfer learning: common aspects of vulnerability are learned by the backbone, while subtleties unique to individual languages are captured by the added matrix pool.

2. Training Protocol and Objective Functions

MULVULN is trained end-to-end using a joint objective that explicitly encourages correct vulnerability classification as well as effective parameter selection:

$\min_\Theta L = L_{CE}(g(f_{\text{mha}}(X_p)), Y) - \lambda \varphi(q(X), k_{i^*})$

where $L_{CE}$ denotes standard cross-entropy classification loss, $g(\cdot)$ is a classifier head, $f_{\text{mha}}$ is the multi-head attention module leveraging the adapted embedding $X_p$ , and $\lambda$ balances the main loss with a surrogate term that drives the closeness of queries to their language keys. Here, $\Theta$ encompasses all model parameters including the PLM, the dedicated parameter pool, and the classifier.

During mini-batch processing (as outlined in Algorithm 1 of the paper), the parameter pool, language keys, and classifier head are all updated. The workflow proceeds by selecting the appropriate matrix from the pool, concatenating with the token embeddings, and forwarding through the PLM and classifier for prediction.

3. Dataset and Empirical Evaluation

The REEF dataset is used for empirical validation. This dataset comprises:

4,466 unique CVEs
30,987 code patches
7 programming languages: C, C++, C#, Go, Java, JavaScript, and Python
After data curation (including restricting function length to match PLM tokenization constraints), 20,165 labeled functions are available (16,126 train / 2,013 val / 2,026 test).

MULVULN is evaluated against 13 state-of-the-art baselines (spanning classical and LLM-based models). The most effective instantiation uses the Language-Aware Parameter Masking mechanism and achieves a test F1-score of approximately 72.20%, with relative improvements between 1.45% and 23.59% over all baselines. The ablation results indicate that integrating the parameter pool notably improves both Recall (up to 96–100% for major languages) and F1-score relative to the PLM backbone alone.

4. Performance Metrics and Analysis

The metrics adopted for performance assessment include Precision, Recall, and their harmonic mean, the F1-score. In vulnerability detection, Recall is particularly important given the risk of missed vulnerabilities, but Precision is also maximized to reduce false positives.

Metric	Achieved (Language-Aware Masking)	Baseline Improvement
F1-score	~72.20%	+1.45% to +23.59%
Recall	96–100% (varies by language)	Baseline exceeded

The empirical studies corroborate that language-specific parameterization is crucial, especially for languages with substantial code in the dataset; where training data is limited (for example, C#), query-key alignment is weaker, suggesting the need for adaptive strategies.

5. Applications and Utility in Modern Development

MULVULN is directly applicable to security assessment and automated vulnerability detection in polyglot (multi-language) codebases. By capturing both shared semantics and language idiosyncrasies, the system provides reliable detection across diverse code repositories—a major advancement for modern DevSecOps practices. This enables:

Cross-language vulnerability detection without building and maintaining separate models for each language.
Automated identification and triage of vulnerabilities in complex projects mixing, for example, C, Java, and Python subcomponents.
Integration into CI/CD pipelines for regular multilingual security checks.

6. Limitations and Future Directions

Several challenges remain:

For underrepresented languages (with fewer samples, such as C#), query-parameter alignment is less robust; visualization indicates greater distance between queries and their respective language keys.
Multi-parameter selection improves Recall, but may degrade Precision and F1-score due to redundancy or representation overlap—indicating a trade-off between coverage and overfitting.
Hyperparameters—particularly parameter pool size $L_p$ —require careful tuning; the paper finds an intermediate value (e.g., $L_p = 5$ ) yields optimal results.

Future enhancements proposed include:

Adaptive parameter selection schemes that vary number and strength of language-specific components by language data size and complexity.
Regularization or gating mechanisms on the pool to control information flow and prevent over-parameterization.
Exploration of broader multi-task or continual learning paradigms, targeting further extension as new languages or domains are added to the codebase.

7. Significance in the Context of Multilingual Vulnerability Detection

MULVULN represents a concrete advancement in the field of vulnerability detection for multilingual, real-world code. By designing a parameterized adapter mechanism over a deep pre-trained LLM, the approach bridges the gap between generalization and specialization: enabling broad pattern learning while remaining sensitive to the subtleties of individual programming languages.

Its empirical superiority on the REEF benchmark, robust recall, and interpretability of the query-key mechanism, render it well suited for deployment in industry-scale heterogeneous software environments. MULVULN thus provides essential methodologies for tackling the pervasive challenge of multilingual vulnerability detection in safety-critical and large-scale software systems (Nguyen et al., 5 Oct 2025).

PDF Markdown Chat (Pro)

References (1)

MulVuln: Enhancing Pre-trained LMs with Shared and Language-Specific Knowledge for Multilingual Vulnerability Detection (2025)

Follow Topic

Get notified by email when new papers are published related to MULVULN.