Security Bug Reports (SBRs)

Updated 6 February 2026

Security Bug Reports (SBRs) are specialized issue-tracker entries that document vulnerabilities affecting confidentiality, integrity, or availability.
They are identified and categorized using techniques from classic ML to transformers, enabling precise detection and automated triage across software and hardware contexts.
Integrated workflows and robust tooling ensure rapid resolution and mitigation, while addressing challenges like data imbalance, mislabeling, and adversarial exploits.

A Security Bug Report (SBR) is an issue-tracker entry that documents a defect or vulnerability with direct or potential impact on a system's confidentiality, integrity, or availability. SBRs play a crucial role in vulnerability management for both software and hardware, supporting identification, triage, and coordinated remediation of security failures across the software development lifecycle. This article surveys the current understanding of SBRs, including core definitions, categorization, data-driven characteristics, automated identification methodologies, practical workflows, and persistent challenges in both open-source and industrial contexts.

1. Definitions, Taxonomies, and Distinctions

An SBR is distinct from a generic bug report by its explicit or implicit association with a security impact on the CIA triad. Formal criteria vary by context:

Manual triage: Human analysis of an issue and associated fix determines if a bug has CIA impact (as in OpenTitan RTL, where 52.9% of sampled design bugs were labeled security-relevant—i.e., their manifestation could compromise confidentiality, integrity, or availability; all others were classified as functional) (Ah-kiow et al., 2024).
Security labeling: Some studies identify SBRs by presence of explicit "security" tags, CVE references, or labels in issue trackers (Bühlmann et al., 2021, Nakano et al., 2020).
Regex-based tagging: Automated schemes sometimes treat issues referencing a CVE identifier as SBRs, independent of whether the vulnerability lies in the project or a third-party dependency (Nakano et al., 2020).
Hardware-specific SBRs: RTL design errors in hardware, if they directly affect security properties under the project's threat model, are classified as security bugs (Ah-kiow et al., 2024).

Major categories include:

Confidentiality: Bugs leading to information leakage.
Integrity: Flaws granting unauthorized manipulation.
Availability: Defects causing denial-of-service or stability breakdown.

2. Empirical Characteristics Across Domains

Quantitative Distribution

SBRs are relatively rare in large-scale trackers:

On GitHub, only 1.4% of issues carried "security" labels across 182 Java repositories (Bühlmann et al., 2021).
In hardware (OpenTitan), over half of the sampled RTL bugs (90 of 170) had potential security implications, far exceeding rates observed in software (Ah-kiow et al., 2024).

Impact, Localization, and Fix Patterns

Impact across the CIA triad: Security bugs may impact multiple CIA aspects; e.g., 37.8% compromised confidentiality, 52.2% affected integrity, 47.8% harmed availability in hardware RTL bugs (Ah-kiow et al., 2024).
Module distribution: SBRs disproportionately affect components with cryptographic or memory management responsibility (cryptography blocks accounted for 45.6% of security bugs in OpenTitan) (Ah-kiow et al., 2024).
Fix size and locality: In hardware, 55.3% of bug fixes required modification of a single file, and 61% were ≤30 lines; software fixes for SBRs may similarly be highly localized (Ah-kiow et al., 2024). Software SBRs referencing CVEs were often resolved rapidly—median fix time of 0–1 days for Version Update/Fixing Code categories (Nakano et al., 2020).

3. SBR Detection and Classification Methodologies

Classic Machine Learning and Feature Engineering

Preprocessing: Tokenization, stop-word removal, stemming, and TF–IDF feature construction are standard (Sawadogo et al., 2021).
Classifiers: Ensemble methods (Random Forest, XGBoost) trained on TF–IDF features yield high recall and F1 (e.g., RF: recall 0.960, F1 0.974) with the full text of bug reports (Sawadogo et al., 2021).
Imbalance handling: SMOTE-based oversampling with careful hyperparameter optimization (SMOTUNED) greatly increases recall (by 35–65% absolute) with tolerable increases in false positives (Shu et al., 2019).

Deep Learning, Transformers, and Prompting

BERT-based models: Fine-tuned Transformers (BERT, DistilBERT) on labeled issue text outperform Random Forest in cross-project scenarios and achieve high recall when sufficient labeled data is available (Soltaniani et al., 28 Apr 2025).
Semantic surrogate masking: SeBERTis, a BERT variant trained to mask class-specific surrogate keywords, improves generalization and achieves in-distribution F1 ≈ 0.99, outperforming both ML and LLM baselines. Out-of-distribution F1 remains competitive (≈0.68) (Masoumzadeh et al., 17 Dec 2025).
Few-shot learning: SetFit, leveraging parameter-efficient transformer fine-tuning, identifies SBRs with AUC up to 0.865 using as few as 5–20 labeled samples per class (Laiq, 6 Jan 2026).
Prompt-based approaches: Prompted proprietary LLMs (GPT-4.1, Gemini-2.5) can achieve up to 0.77 G-measure and 0.74 recall but incur elevated false positive rates (precision 0.22) compared to fine-tuned local models (precision up to 0.75, but reduced recall) (Soltaniani et al., 30 Jan 2026).

Data Augmentation and Generative Models

CVAE-based oversampling (SEDAC): Combines distilBERT embeddings with conditional variational autoencoders to generate synthetic SBR feature vectors, fully repairing security/nonsecurity class imbalance and achieving 14.24–50.10% improvement in g-measure over prior baselines (Liao et al., 2024).

Secret and Adversarial SBRs

Secret disclosure detection: LM-based models (RoBERTa) trained with high-entropy regular expression prefiltering and local context achieve 0.635 F1 on a heavily imbalanced secret breach benchmark, compared to 0.0341 for regex-only baselines (Wahab et al., 2024).
Adversarial bug reports: SBRs crafted to mislead automated program repair (APR) systems expose a major attack vector; 90% attack success rates were observed on state-of-the-art APR pipelines, with current pre-filtering and post-repair detection insufficient to block most crafted SBRs (Przymus et al., 4 Sep 2025).

4. SBR Management Workflows and Developer Practices

Reporting and Labeling

Label assignment: Security tag or CVE reference usage is inconsistent; mislabeling remains a key source of disclosure risk (Bühlmann et al., 2021, Nakano et al., 2020).
Reproducibility and mitigation suggestions: Only 14.7% of SBRs in manual samples contained reproduction artifacts, and prototype fixes were seldom proposed in initial reports (0.6%) (Bühlmann et al., 2021).

Triaging and Resolution

Reaction and resolution time: Prompt first response correlates strongly with shorter resolution time (r = 0.637, p < 0.001) (Bühlmann et al., 2021). The presence of a CVE reference in the issue or discussion halves mean resolution time (52.7 vs 113.2 days) (Bühlmann et al., 2021).
Assignment and involvement: SBRs are often assigned to a small set of core developers, with "security assignees" resolving issues faster and at higher acceptance rates (Bühlmann et al., 2021).
Automation: SBR classifiers can be integrated into CI/CD and issue-tracker workflows to flag likely SBRs for manual review, triage, or embargoed handling (Sawadogo et al., 2021).

Privacy and Disclosure Risks

Open-source risks: Even when SBRs are withheld from public trackers, the metadata of landed patches (author, directory, etc.) leaks sufficient signals for attackers to identify security updates and exploit pre-disclosure "window of vulnerability" (Barth et al., 2011).
Countermeasures: The only robust method is to keep security fixes secret (in private branches) until coordinated release; metadata obfuscation alone achieves limited benefit (Barth et al., 2011).

5. Tooling, Frameworks, and Real-world Deployment

Production deployment: SBMBot, a browser extension for secret detection, performs real-time analysis of issue descriptions using a LLM backend and proactively warns users of secret disclosures before submission (Wahab et al., 2024).
SBR classification frameworks: Reusable pipelines built on BERT, Random Forest, or SetFit combined with project-specific pre-processing and data augmentation modules enable integration into bug-tracker or CI systems for continuous SBR surveillance (Masoumzadeh et al., 17 Dec 2025, Liao et al., 2024, Soltaniani et al., 28 Apr 2025).
Automated attack simulation: APR-Sec enables generation of adversarial SBRs to red-team APR pipelines, supporting continuous evaluation of repair robustness and detection strategies (Przymus et al., 4 Sep 2025).

6. Limitations, Challenges, and Future Directions

Data imbalance: Minority-class SBRs remain rare; synthetic oversampling, contrastive and semi-supervised learning, and domain-adapted data augmentation are critical for robust classifier performance (Liao et al., 2024, Laiq, 6 Jan 2026).
Overfitting and shortcut learning: Classifiers risk memorizing lexical patterns (e.g., "vulnerability") rather than learning deeper semantic or contextual features unless architectural/augmentation interventions like semantic surrogate masking are applied (Masoumzadeh et al., 17 Dec 2025).
Domain shift and generalization: Models trained on GitHub or open-source data may not generalize to proprietary or industrial issue trackers; periodic adaptation and retraining are necessary (Laiq, 6 Jan 2026).
Explainability: Most high-performing approaches do not provide model explanations or feature attributions suitable for audited workflows (Laiq, 6 Jan 2026).
Adversarial resilience: LLM-based APR systems remain highly vulnerable to adversarial SBRs, with current filtering and detection insufficient. Layered defenses and continuous adversarial evaluation are recommended (Przymus et al., 4 Sep 2025).
Open research questions: Cross-project transfer, explainable SBR detection, active learning for annotation reduction, multi-modal integration (metadata, stack traces), and semantic validation workflows constitute dominant directions (Masoumzadeh et al., 17 Dec 2025, Przymus et al., 4 Sep 2025, Soltaniani et al., 30 Jan 2026).

7. Comparative Summary Table

The table below organizes representative SBR detection and management approaches, showing their method class, reported best metric, domain, and key limitation.

Approach	Best Metric (F1/G/AUC)	Domain	Primary Limitation
RF/TFIDF (Sawadogo et al., 2021)	F1 ≈ 0.97	Software	May overfit lexical features
BERT-based (Soltaniani et al., 28 Apr 2025)	G ≈ 0.66–0.83	Software	Degrades in low-SBR projects
SEDAC (Liao et al., 2024)	g-measure ≈ 0.99	Software	Needs distilBERT/CVAE integration
SeBERTis (Masoumzadeh et al., 17 Dec 2025)	In-dist. F1 ≈ 0.99	Software	Manual surrogate curation
SetFit (Laiq, 6 Jan 2026)	AUC ≈ 0.87	Software	Opacity/Explainability
RoBERTa/LSTM (Wahab et al., 2024)	F1 ≈ 0.63	Secrets	Minority-class scarcity
Prompted LLM (Soltaniani et al., 30 Jan 2026)	G ≈ 0.77	Software	High false positives (FPR~0.18)

A plausible implication is that the highest in-distribution performance can be achieved with deep masking-aware Transformers, but robust handling of class imbalance and domain shift remains challenging in practice.

Security Bug Reports form a critical control surface in modern vulnerability management, requiring rigorous, context-dependent definition, multidimensional feature extraction, high-precision workflow integration, and resilience against adversarial exploitation. Current research demonstrates both the potential and complexity of automating SBR detection, labeling, triage, and risk mitigation, and suggests that future advances must address data quality, model transparency, adversarial robustness, and practical operationalization across the software and hardware security landscape.