SysPro: Automated Concurrency Bug Reproduction

Updated 16 January 2026

SysPro is a fully automated framework that reproduces system-level concurrency bugs by orchestrating precise system-call scheduling from natural-language bug reports in Linux C/C++ utilities.
It integrates system-call extraction, input-data generation via a Test Specification Language, and dynamic instrumentation with Intel PIN to ensure deterministic bug manifestation.
Empirical benchmarks demonstrate SysPro’s robustness and efficiency, outperforming baseline methods with a 95.8% bug reproduction rate across diverse real-world Linux utilities.

SysPro is a fully automated framework designed to reproduce system-level concurrency bugs from natural-language bug reports. It targets the challenging problem of creating minimal, precise test cases—comprising both input data and scheduling of system calls—that manifest non-deterministic bugs in Linux C/C++ utilities. By extracting, localizing, and orchestrating system call interleavings referenced (even implicitly) in bug reports, SysPro addresses limitations in prior tools that are unable to manage interleaving at the needed granularity. Empirical evidence indicates SysPro achieves a 95.8% bug reproduction rate on a diverse set of real-world concurrency bugs (Zaman et al., 14 Jan 2026).

1. Architectural Overview

SysPro’s architecture consists of three principal modules (Figure 1, p. 12):

System-Call Extraction Module: Identifies relevant system calls implicated in the bug, using both direct keyword matching and TF-IDF-based IR over man-page NAME sections to handle implicit or indirect report references.
Input-Data Generation Module: Produces concrete commands, options, and test data using regular-expression matching for explicit values; otherwise, it employs the category-partition method with a concise Test Specification Language (TSL) to systematically assemble plausible inputs.
Dynamic Instrumentation & Scheduler Module: Instruments ranked syscall locations using Intel PIN Tool, introducing small controlled delays before or after syscalls to systematically force the required interleaving for bug manifestation. The scheduler exhaustively explores ranked syscall locations within a configurable run/time budget (up to 100 runs or 120 minutes).

These modules interact to convert unstructured bug reports and source code into a prioritized, orchestrated sequence of test executions likely to elicit the target concurrency bug.

2. Algorithms and Core Workflows

The core workflow proceeds as follows (see Figure 2, p. 12).

2.1 Extracting and Locating System Calls

KeySysCalls Extraction: Matches bug report terms to a canonical syscall list (SysCallM). If no direct match, computes similarity between the bug report and each syscall’s man-page mini-document using TF-IDF (Figure 3, p. 13). Top-n matches (n=10) form the working set.
Structured IR-Based Code Search: Converts all source files to XML with srcML, then performs structured IR by mapping queries comprising the report's subject, body, and KeySysCalls onto code fields (FileNames, FunctionNames, VariableNames, Code+Comments). For each (query, field) pair, TF-IDF similarity is computed, and the mean across all pairs ( $S_{\textrm{avg}}$ ) yields a ranked file list.
Apriori Mining of Call Pairs: Applies the Apriori algorithm to report sentences to detect frequent syscall pairs, ranking them for likely buggy interleaving.
Location Search: For each (pair or singleton) syscall from the ranked list, systemically scans top-K files for code locations (file, function, line) where the calls may interleave, aggregating resulting locations.

2.2 Input Data Generation

Explicit Extraction: Applies regex to discover any explicit command, flag, or file references in the report.
Category-Partition Enumeration: If any required element is missing, defines a TSL with parameters (Command, Options, Data) and employs category-partition to enumerate plausible valid test cases.
Annotation Agreement: Manual TSL-supplied elements are sometimes needed for incomplete reports; inter-annotator agreement $P$ is measured at 0.958 (Eq. 1, p. 17).

2.3 Controlled Interleaving Orchestration

Ranks call locations, inserts PIN-based instrumentation (using Sleep()) just before/after syscalls, and organizes iterative test runs.
Terminates upon either bug manifestation (observed crash or assertion failure) or exhaustion of run/time budget.

3. Mathematical Formulation and Evaluation Metrics

SysPro’s retrieval and orchestration effectiveness is quantitatively assessed:

Mean Average Precision (MAP) for ranked syscall identification:

$\textrm{MAP} = \frac{1}{|Q|} \sum_{Q_i} AP(Q_i)$

Recall@K for call localization:

$\textrm{Recall}@K = \frac{\#\textrm{identified buggy calls in top } K}{\textrm{total buggy calls}}$

Annotation Agreement:

$P = \frac{\#\textrm{agreements}}{\#\textrm{items}}$

These metrics underpin the empirical analysis distinguishing both retrieval and reproduction accuracy (Section 4.2, p. 19).

4. Empirical Benchmarks and Comparative Performance

Evaluation spans 24 bug reports across 17 representative Linux utilities (e.g., mv, rm, mkdir, gzip, bzip2, bash, coreutils) with program sizes ranging from 3 KLOC to 39 KLOC (Table 4, p. 20). SysPro’s key empirical outcomes (Table 5, p. 21–23):

Reproduction rate: 23/24 (95.8%). The sole failure (findutils) involved a report lacking any process or syscall identifiers.
Retrieval/Discovery: Recall@K = 0.75; MAP = 0.77 (averages).
Efficiency: Average of 2 instrumentation runs per bug (range 1–14); average total reproduction time 8.9 min (range 0.6–59 min).
Algorithmic Contributions: Structured IR improves file-ranking by up to 100% with an average of +54.5% over basic IR, and Apriori increases MAP by 0–80% (avg. +46%) (Figure 4, p. 25).
Baseline Comparison: SysPro outperforms random stress testing (SysPro_man) which, under the same resource constraints, produces zero successful bug reproductions compared to SysPro’s 18/19 successes (Figure 5, p. 26).

Robustness analyses indicate that up to 10% random deletion of report words does not impact success; 50% removal causes timeouts only in cases already classified as “incomplete” (Figure 6, p. 27).

5. Strengths, Assumptions, and Limitations

SysPro systematically advances automated bug reproduction by:

Automated, report-driven extraction: Driven solely by natural-language bug reports, bypassing manual triage.
Direct system-call orchestration: Achieves deterministic execution paths for inherently non-deterministic concurrency issues.

Several constraints remain (Section 5):

Reliance on bug report specificity: Requires explicit mention of the application or syscall names; cases lacking these are infeasible.
Manual input supplementation: When reports are incomplete, TSL-based manual annotation must bridge gaps, but requires negligible developer time (high annotation agreement $P=0.958$ ).
Assumption of process knowledge: Presumes the user provides or knows the relevant process(es) and signal handlers; the extraction of multi-process relationships from free-form text is unresolved.
Platform/Language scope: Currently limited to C/C++ Linux utilities at the syscall level; extending to other languages (e.g., Java), platforms (e.g., Windows), or intra-process races would necessitate significant architectural adaptation.
Prototypical tooling: Current embodiment is a prototype; further development of user-facing tooling (IDE/CLI) and anomaly detection is open for future work.

6. Broader Impact and Future Directions

SysPro is the first system to ingest natural-language bug reports and deterministically reproduce system-level concurrency bugs via precise source localization and orchestrated syscall scheduling, yielding a high automation success rate on challenging real-world benchmarks. Anticipated future extensions include fully automated extraction of all input data via NLP or LLMs, addressing multi-process interleavings, cross-language/platform extension, and development of more user-centric tooling (Zaman et al., 14 Jan 2026). A plausible implication is that the techniques pioneered in SysPro may inform broader research into automated program understanding, test generation, and bug reproduction from unstructured documentation.

Markdown Report Issue Upgrade to Chat

References (1)

SysPro: Reproducing System-level Concurrency Bugs from Bug Reports (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to SysPro.