Probability-Weighed Control Laws
- Probability-weighed control laws are mathematical strategies that blend probability measures with deterministic control to manage uncertainty in dynamic systems.
- They employ statistical models and probabilistic computations to optimize decision-making processes and improve system reliability in variable environments.
- Recent advancements demonstrate these laws can enhance test-case generation and fault tolerance, bridging theory with practical applications in system design.
Report: LLMCFG-TGen: Using LLM-Generated Control Flow Graphs to Automatically Create Test Cases from Use Cases
- Formal Definition of the CFG Representation
1.1 Control Flow Graph (CFG) as a Mathematical Object We define a Control Flow Graph (CFG) as the directed graph G = (V, E, v₀, V_exit) where * V is a finite set of nodes (basic steps), * E ⊆ V × V is a set of directed edges (control transfers), * v₀ ∈ V is the unique start node (root), * V_exit ⊆ V is the set of exit (leaf) nodes.
Each node v ∈ V is labeled with a natural-language statement extracted from a use-case step (main, alternative, or exception). An edge (u, v) ∈ E represents a possible transition from step u to step v. Conditional edges are annotated with guard conditions g(u, v) ∈ {“true‐branch”, “false‐branch”} when they arise from an if-then-else or branching sentence.
1.2 Correspondence of V and E to Use-Case Elements
- A main-flow sentence “System displays menu” becomes a node vᵢ ∈ V.
- A conditional sentence “If the user is authenticated, then proceed; otherwise show error” yields one node for the condition check and two outgoing edges: one to the “proceed” node (true‐branch) and one to the “show error” node (false‐branch).
- Alternative and exception flows are linked back into the main flow at specified join points.
1.3 Formulas for Coverage and Path Enumeration Given G, the set of all root-to-exit paths P can be enumerated. Let |P| denote the number of distinct paths. In the absence of loops,
|P| = ∑ over all branches ∏ (branch-fan-out)
More formally, if at k distinct decision nodes the fan-outs are d₁, d₂, …, d_k, then
|P| = ∏_{i=1}k dᵢ.
Branch Coverage BC is defined as
BC = |E_exercised| / |E|.
Full path coverage implies BC = 1 and all node visits at 100%.
- LLM-Based CFG Construction Algorithm
2.1 Prompt Engineering: Prompt #1 Prompt #1 to the LLM comprises five parts: 1. Role Definition (“You are a software‐engineering expert.”) 2. Task Instructions (how to extract steps and branches) 3. Algorithm Specification (see Algorithm 1 below) 4. Output Specification (“Return valid JSON with fields ‘nodes’ and ‘edges’.”) 5. Input: the raw NL use-case description.
2.2 Pseudocode for CFG Generation (Algorithm 1) Algorithm 1 formalizes the mapping of use-case steps into CFG nodes and edges.
Algorithm 1: CFG Generation Input: Ordered list of use-case steps S₁…Sₙ (main + alt + ext) Output: CFG G = (V, E), root v₀
- V ← {S₁, S₂, …, Sₙ}
- v₀ ← S₁
- E ← ∅
- For i from 1 to n−1 do 4.1 If S_{i+1} is a conditional step then Add edge (Sᵢ → S_{i+1}) with label “true”, Add edge (Sᵢ → S_{i+2}) with label “false”, Skip next index accordingly. 4.2 Else Add edge (Sᵢ → S_{i+1}).
- Return G = (V, E, v₀).
2.3 Post-Processing and Validation After generation, the JSON is parsed and validated by checking: * No isolated nodes (each v ∈ V appears in at least one edge). * All non-root nodes have ≥ 1 incoming edge. * Every edge’s “from” and “to” IDs exist in V. On failure, the tool re-prompts the LLM until a valid CFG is returned.
- Path Enumeration Technique
3.1 Depth-First Search (DFS) with Cycle Pruning Starting at root v₀, a recursive DFS collects all simple paths to exit nodes in V_exit or until a node is revisited (to avoid infinite loops). Conditions on edges are recorded as separate path items.
3.2 Pseudocode for Test-Path Extraction (Algorithm 2) Algorithm 2: Test-Path Extraction Input: Verified CFG G = (V, E) Output: Set of paths P, each a list of (node, condition)
procedure DFS(curr, path): if curr ∈ path then Record path in P (cycle entry) and return Append curr to path if curr ∈ V_exit then Record path in P and return for each outgoing edge (curr → nbr) with label cond do DFS(nbr, path ∪ [cond]) Call DFS(v₀, []). Translate each vᵢ to its NL statement; inject conditions as statements.
3.3 Complexity In a DAG with branching factor d and depth h, worst-case path count is O(dh). DFS thus may be exponential in the number of decision nodes, but practical use cases remain tractable.
- Test Case Generation from Paths
4.1 Prompt #2 for Test-Case Creation Prompt #2 guides the LLM to format each path into an abstract test case with: Title, Preconditions, Step 1…n with expected result. It includes one illustrative example.
4.2 Mapping Rules and Heuristics
- Title derived from the use-case name plus a branch summary.
- Preconditions accumulate any “if” guards encountered.
- Steps list each NL statement in the path.
- Expected results are inferred from system responses in the statements.
No additional concrete inputs are generated; the result is an abstract test case.
- Evaluation and Metrics
5.1 Metrics Definitions Precision, Recall, and F1 for nodes/edges: Precision = TP / (TP + FP) Recall = TP / (TP + FN) F1 = 2·Precision·Recall / (Precision + Recall)
Normalized Graph Edit Distance (nGED): nGED(G₁, G₂) = 1 − GED(G₁, G₂) / (|V₁|+|E₁|+|V₂|+|E₂|)
Discrepancy Rate (DR): DR% = (N_diff / N_UC) * 100%
Avg.|Δ|: Avg.|Δ| = (1/N_UC) Σ |LLM_path_i − GT_path_i|
5.2 Quantitative Results
RQ1 (CFG Accuracy, threshold = 0.75)
- Node F1_avg = 0.895; Edge F1_avg = 0.761; nGED_avg = 0.933.
RQ2 (Test-Case Coverage)
- LLMCFG-TGen DR = 2.38%; Avg.|Δ| = 0.02.
- Baselines: Direct LLM DR=57.0%, Avg.|Δ|=1.12; AGORA DR=33.3%, Avg.|Δ|=0.45.
RQ3 (Practitioner Ratings over 20 UCs, 113 cases) Average Likert (5-point) scores: Relevance: AGORA=4.25, LLMCFG-TGen=4.75 Completeness: AGORA=3.84, LLMCFG-TGen=4.64 Correctness: AGORA=3.74, LLMCFG-TGen=4.51 Clarity: AGORA=3.73, LLMCFG-TGen=4.48
RQ4 (LLM Comparison for CFGs)
| Model | Nodes vs. 304 | Paths vs. 103 | DR% | Avg.|Δ| | Node F1 | Edge F1 | nGED | Time(s) | |---------------------|--------------:|-------------:|------:|----:|--------:|--------:|------:|--------:| | GPT-4o | 308 | 103 | 2.30% |0.02 | 0.895 | 0.761 | 0.933 | 336 | | Gemini 2.5 Flash | 314 | 110 |16.67% |0.17 | 0.862 | 0.683 | 0.912 | 1006 | | LLaMA4 Scout-Inst. | 314 | 106 |11.90% |0.17 | 0.865 | 0.696 | 0.909 | 540 |
- Practitioners’ Assessment and Case Studies
6.1 Study Protocol Four senior practitioners compared side-by-side test suites from AGORA and LLMCFG-TGen, blinded to method. Each test case was scored on four 5-point Likert dimensions and qualitative feedback was collected.
6.2 Key Qualitative Findings
- LLMCFG-TGen test cases were praised for logical consistency (“Steps follow exactly the use-case flow”), comprehensiveness (“No missing edge cases”), and readability (“Clear titles and preconditions”).
- Senior engineers noted occasional verbosity and suggested a concise checklist mode.
- All agreed LLMCFG-TGen reduced manual modeling effort substantially.
- Discussion and Future Work
7.1 Limitations
- Single-use-case processing – no batch handling of related cases.
- No built-in test prioritization or ranking.
- Generates abstract, not executable, test scripts.
7.2 Future Directions
- Extend to batch and inter-use-case CFG generation (include «include»/«extend» relations).
- Add path-level priority annotations for test-case ranking.
- Integrate concrete data generation and script templates for end-to-end executable tests.
- Incorporate human-in-the-loop refinement and multimodal inputs (diagrams, images).
Appendix: Example CFG in TikZ (Sample)
\begin{verbatim} \begin{tikzpicture}[->, node distance=1.5cm] \nodestart {Start: User opens menu}; \nodedecision, below of=n1 {Is user logged in?}; \nodeaction, below left of=n2 {Show login screen}; \nodeaction, below right of=n2 {Display dashboard}; \nodeterminal, below of=n4 {End}; \path (n1) edge (n2) (n2) edge node[left]{false} (n3) (n2) edge node[right]{true} (n4) (n3) edge (n2) % alternative: user logs in then back (n4) edge (n5); \end{tikzpicture} \end{verbatim}
This report synthesizes the method and its comprehensive evaluation, demonstrating that LLMCFG-TGen effectively bridges NL requirements and systematic test-case generation.