Papers
Topics
Authors
Recent
2000 character limit reached

Probability-Weighed Control Laws

Updated 13 December 2025
  • Probability-weighed control laws are mathematical strategies that blend probability measures with deterministic control to manage uncertainty in dynamic systems.
  • They employ statistical models and probabilistic computations to optimize decision-making processes and improve system reliability in variable environments.
  • Recent advancements demonstrate these laws can enhance test-case generation and fault tolerance, bridging theory with practical applications in system design.

Report: LLMCFG-TGen: Using LLM-Generated Control Flow Graphs to Automatically Create Test Cases from Use Cases

  1. Formal Definition of the CFG Representation

1.1 Control Flow Graph (CFG) as a Mathematical Object We define a Control Flow Graph (CFG) as the directed graph G = (V, E, v₀, V_exit) where * V is a finite set of nodes (basic steps), * E ⊆ V × V is a set of directed edges (control transfers), * v₀ ∈ V is the unique start node (root), * V_exit ⊆ V is the set of exit (leaf) nodes.

Each node v ∈ V is labeled with a natural-language statement extracted from a use-case step (main, alternative, or exception). An edge (u, v) ∈ E represents a possible transition from step u to step v. Conditional edges are annotated with guard conditions g(u, v) ∈ {“true‐branch”, “false‐branch”} when they arise from an if-then-else or branching sentence.

1.2 Correspondence of V and E to Use-Case Elements

  • A main-flow sentence “System displays menu” becomes a node vᵢ ∈ V.
  • A conditional sentence “If the user is authenticated, then proceed; otherwise show error” yields one node for the condition check and two outgoing edges: one to the “proceed” node (true‐branch) and one to the “show error” node (false‐branch).
  • Alternative and exception flows are linked back into the main flow at specified join points.

1.3 Formulas for Coverage and Path Enumeration Given G, the set of all root-to-exit paths P can be enumerated. Let |P| denote the number of distinct paths. In the absence of loops,

|P| = ∑ over all branches ∏ (branch-fan-out)

More formally, if at k distinct decision nodes the fan-outs are d₁, d₂, …, d_k, then

|P| = ∏_{i=1}k dᵢ.

Branch Coverage BC is defined as

BC = |E_exercised| / |E|.

Full path coverage implies BC = 1 and all node visits at 100%.

  1. LLM-Based CFG Construction Algorithm

2.1 Prompt Engineering: Prompt #1 Prompt #1 to the LLM comprises five parts: 1. Role Definition (“You are a software‐engineering expert.”) 2. Task Instructions (how to extract steps and branches) 3. Algorithm Specification (see Algorithm 1 below) 4. Output Specification (“Return valid JSON with fields ‘nodes’ and ‘edges’.”) 5. Input: the raw NL use-case description.

2.2 Pseudocode for CFG Generation (Algorithm 1) Algorithm 1 formalizes the mapping of use-case steps into CFG nodes and edges.

Algorithm 1: CFG Generation Input: Ordered list of use-case steps S₁…Sₙ (main + alt + ext) Output: CFG G = (V, E), root v₀

  1. V ← {S₁, S₂, …, Sₙ}
  2. v₀ ← S₁
  3. E ← ∅
  4. For i from 1 to n−1 do 4.1 If S_{i+1} is a conditional step then Add edge (Sᵢ → S_{i+1}) with label “true”, Add edge (Sᵢ → S_{i+2}) with label “false”, Skip next index accordingly. 4.2 Else Add edge (Sᵢ → S_{i+1}).
  5. Return G = (V, E, v₀).

2.3 Post-Processing and Validation After generation, the JSON is parsed and validated by checking: * No isolated nodes (each v ∈ V appears in at least one edge). * All non-root nodes have ≥ 1 incoming edge. * Every edge’s “from” and “to” IDs exist in V. On failure, the tool re-prompts the LLM until a valid CFG is returned.

  1. Path Enumeration Technique

3.1 Depth-First Search (DFS) with Cycle Pruning Starting at root v₀, a recursive DFS collects all simple paths to exit nodes in V_exit or until a node is revisited (to avoid infinite loops). Conditions on edges are recorded as separate path items.

3.2 Pseudocode for Test-Path Extraction (Algorithm 2) Algorithm 2: Test-Path Extraction Input: Verified CFG G = (V, E) Output: Set of paths P, each a list of (node, condition)

procedure DFS(curr, path): if curr ∈ path then Record path in P (cycle entry) and return Append curr to path if curr ∈ V_exit then Record path in P and return for each outgoing edge (curr → nbr) with label cond do DFS(nbr, path ∪ [cond]) Call DFS(v₀, []). Translate each vᵢ to its NL statement; inject conditions as statements.

3.3 Complexity In a DAG with branching factor d and depth h, worst-case path count is O(dh). DFS thus may be exponential in the number of decision nodes, but practical use cases remain tractable.

  1. Test Case Generation from Paths

4.1 Prompt #2 for Test-Case Creation Prompt #2 guides the LLM to format each path into an abstract test case with: Title, Preconditions, Step 1…n with expected result. It includes one illustrative example.

4.2 Mapping Rules and Heuristics

  • Title derived from the use-case name plus a branch summary.
  • Preconditions accumulate any “if” guards encountered.
  • Steps list each NL statement in the path.
  • Expected results are inferred from system responses in the statements.

No additional concrete inputs are generated; the result is an abstract test case.

  1. Evaluation and Metrics

5.1 Metrics Definitions Precision, Recall, and F1 for nodes/edges: Precision = TP / (TP + FP) Recall = TP / (TP + FN) F1 = 2·Precision·Recall / (Precision + Recall)

Normalized Graph Edit Distance (nGED): nGED(G₁, G₂) = 1 − GED(G₁, G₂) / (|V₁|+|E₁|+|V₂|+|E₂|)

Discrepancy Rate (DR): DR% = (N_diff / N_UC) * 100%

Avg.|Δ|: Avg.|Δ| = (1/N_UC) Σ |LLM_path_i − GT_path_i|

5.2 Quantitative Results

RQ1 (CFG Accuracy, threshold = 0.75)

  • Node F1_avg = 0.895; Edge F1_avg = 0.761; nGED_avg = 0.933.

RQ2 (Test-Case Coverage)

  • LLMCFG-TGen DR = 2.38%; Avg.|Δ| = 0.02.
  • Baselines: Direct LLM DR=57.0%, Avg.|Δ|=1.12; AGORA DR=33.3%, Avg.|Δ|=0.45.

RQ3 (Practitioner Ratings over 20 UCs, 113 cases) Average Likert (5-point) scores: Relevance: AGORA=4.25, LLMCFG-TGen=4.75 Completeness: AGORA=3.84, LLMCFG-TGen=4.64 Correctness: AGORA=3.74, LLMCFG-TGen=4.51 Clarity: AGORA=3.73, LLMCFG-TGen=4.48

RQ4 (LLM Comparison for CFGs)

| Model | Nodes vs. 304 | Paths vs. 103 | DR% | Avg.|Δ| | Node F1 | Edge F1 | nGED | Time(s) | |---------------------|--------------:|-------------:|------:|----:|--------:|--------:|------:|--------:| | GPT-4o | 308 | 103 | 2.30% |0.02 | 0.895 | 0.761 | 0.933 | 336 | | Gemini 2.5 Flash | 314 | 110 |16.67% |0.17 | 0.862 | 0.683 | 0.912 | 1006 | | LLaMA4 Scout-Inst. | 314 | 106 |11.90% |0.17 | 0.865 | 0.696 | 0.909 | 540 |

  1. Practitioners’ Assessment and Case Studies

6.1 Study Protocol Four senior practitioners compared side-by-side test suites from AGORA and LLMCFG-TGen, blinded to method. Each test case was scored on four 5-point Likert dimensions and qualitative feedback was collected.

6.2 Key Qualitative Findings

  • LLMCFG-TGen test cases were praised for logical consistency (“Steps follow exactly the use-case flow”), comprehensiveness (“No missing edge cases”), and readability (“Clear titles and preconditions”).
  • Senior engineers noted occasional verbosity and suggested a concise checklist mode.
  • All agreed LLMCFG-TGen reduced manual modeling effort substantially.
  1. Discussion and Future Work

7.1 Limitations

  • Single-use-case processing – no batch handling of related cases.
  • No built-in test prioritization or ranking.
  • Generates abstract, not executable, test scripts.

7.2 Future Directions

  • Extend to batch and inter-use-case CFG generation (include «include»/«extend» relations).
  • Add path-level priority annotations for test-case ranking.
  • Integrate concrete data generation and script templates for end-to-end executable tests.
  • Incorporate human-in-the-loop refinement and multimodal inputs (diagrams, images).

Appendix: Example CFG in TikZ (Sample)

\begin{verbatim} \begin{tikzpicture}[->, node distance=1.5cm] \nodestart {Start: User opens menu}; \nodedecision, below of=n1 {Is user logged in?}; \nodeaction, below left of=n2 {Show login screen}; \nodeaction, below right of=n2 {Display dashboard}; \nodeterminal, below of=n4 {End}; \path (n1) edge (n2) (n2) edge node[left]{false} (n3) (n2) edge node[right]{true} (n4) (n3) edge (n2) % alternative: user logs in then back (n4) edge (n5); \end{tikzpicture} \end{verbatim}

This report synthesizes the method and its comprehensive evaluation, demonstrating that LLMCFG-TGen effectively bridges NL requirements and systematic test-case generation.

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Probability-Weighed Control Laws.