CTHS: Risk-Aware Error Allocation Strategy

Updated 3 July 2026

CTHS is a statistical error-budget allocation strategy that triggers spending solely on confirmed events to maintain strict control over the familywise error rate in recursive self-modification systems.
It uses a preliminary screening process to filter out inferior proposals, thereby directing statistical power toward genuinely promising modifications.
Mathematical guarantees based on Hoeffding’s and Ville’s inequalities ensure accurate confidence bounds and prevent harmful acceptances during iterative system updates.

Confirm-Triggered Harmonic Spending (CTHS) is a statistical error-budget allocation strategy for controlling cumulative risk in recursive self-modification frameworks. First introduced in the context of the Statistical Gödel Machine (SGM) paradigm, CTHS ensures that familywise error rate (FWER) is tightly bounded during iterative, proof-less modification of code or hyperparameters. By allocating α only at confirmation events—rather than every iteration or proposal—CTHS concentrates statistical power on genuinely promising candidates while preserving global error guarantees across potentially unbounded self-modification trajectories (Wu et al., 11 Oct 2025).

1. Foundational Motivation and Formalism

In recursive self-modification settings such as SGM, each outer round proposes a modification to system code or hyperparameters. This proposal is initially subjected to a lightweight screening procedure, intended to filter out obviously inferior options without committing error budget. Only if the screening is sufficiently promising does the candidate escalate to a confirmation event, where a thorough statistical test is conducted to certify strict improvement.

To maintain a global upper bound δ on the probability of ever accepting a harmful modification, standard approaches split δ among tests a priori, either uniformly or harmonically across all rounds. The naïve split δ/T or harmonic per-round rule

$\delta_t = \frac{\delta}{t H_T},\quad H_T = \sum_{i=1}^T \frac{1}{i}$

spends α regardless of whether a true confirmation occurs, thus dissipating power by allocating to rounds that never escalate beyond screening.

CTHS addresses this inefficiency by indexing the error budget not by attempted rounds t but by the actual sequence of confirmation events k. At the k-th confirmation event (out of total at most T), the test is conducted at level

$\alpha_k = \frac{\delta}{k H_T},\quad k = 1,2,\ldots,T.$

As a consequence,

$\sum_{k=1}^T \alpha_k = \delta \frac{(\sum_{k=1}^T \frac{1}{k})}{H_T} = \delta,$

so the global FWER remains controlled at the target δ (Wu et al., 11 Oct 2025).

2. Operational Algorithm and Integration

The CTHS protocol is implemented as a wrapper for the outer self-modification loop:

Initialization: Set global δ, fix T (maximum rounds), compute $H_T = \sum_{i=1}^T 1/i$ , set confirmation index $k \gets 0$ .
Proposal and Screening: For each round t, proposer $\Pi$ $Π$ suggests a candidate $\theta_t'$ $θ_{t}^{'}$ . A screening evaluation is performed.
- If screening fails, reject instantly; no error budget is spent.
- If screening passes, escalate to confirmation.
Confirmation: Increment $k \gets k+1$ $k \leftarrow k + 1$ , allocate $\alpha_k = \delta/(k H_T)$ $α_{k} = δ / (k H_{T})$ .
- Gather n paired improvement samples $(\Delta_1, \ldots, \Delta_n)$ via thorough evaluation.
- Compute one-sided lower confidence bound $\alpha_k = \frac{\delta}{k H_T},\quad k = 1,2,\ldots,T.$ 0 for the mean improvement $\alpha_k = \frac{\delta}{k H_T},\quad k = 1,2,\ldots,T.$ 1.
- If $\alpha_k = \frac{\delta}{k H_T},\quad k = 1,2,\ldots,T.$ 2, accept the edit and update incumbent parameters.
- Else, reject.
Termination: Halt at T rounds or upon external stopping criteria.

This approach strictly requires that no portion of δ is spent unless a confirmation event occurs, focusing expenditure on decisions with real risk (Wu et al., 11 Oct 2025).

3. Mathematical Guarantees: Confidence Bounds and FWER Control

Given bounded improvement samples $\alpha_k = \frac{\delta}{k H_T},\quad k = 1,2,\ldots,T.$ 3 and normalized $\alpha_k = \frac{\delta}{k H_T},\quad k = 1,2,\ldots,T.$ 4 for $\alpha_k = \frac{\delta}{k H_T},\quad k = 1,2,\ldots,T.$ 5, the empirical mean is

$\alpha_k = \frac{\delta}{k H_T},\quad k = 1,2,\ldots,T.$ 6

Hoeffding’s inequality yields the bound

$\alpha_k = \frac{\delta}{k H_T},\quad k = 1,2,\ldots,T.$ 7

and for chosen $\alpha_k = \frac{\delta}{k H_T},\quad k = 1,2,\ldots,T.$ 8, solving for $\alpha_k = \frac{\delta}{k H_T},\quad k = 1,2,\ldots,T.$ 9 gives

$\sum_{k=1}^T \alpha_k = \delta \frac{(\sum_{k=1}^T \frac{1}{k})}{H_T} = \delta,$ 0

The acceptance criterion is $\sum_{k=1}^T \alpha_k = \delta \frac{(\sum_{k=1}^T \frac{1}{k})}{H_T} = \delta,$ 1.

The total probability of a harmful acceptance is controlled by the union bound:

$\sum_{k=1}^T \alpha_k = \delta \frac{(\sum_{k=1}^T \frac{1}{k})}{H_T} = \delta,$ 2

thus proving FWER control. Substitution of anytime e-value tests with Ville’s inequality yields equivalent guarantees (Wu et al., 11 Oct 2025).

4. Comparison to Traditional α-Spending Rules

Scheme	Allocation Index	α Spent per Test	Typical Outcome
Fixed per-round	$\sum_{k=1}^T \alpha_k = \delta \frac{(\sum_{k=1}^T \frac{1}{k})}{H_T} = \delta,$ 3	$\sum_{k=1}^T \alpha_k = \delta \frac{(\sum_{k=1}^T \frac{1}{k})}{H_T} = \delta,$ 4	Small α per round, power diluted
Harmonic per-round	$\sum_{k=1}^T \alpha_k = \delta \frac{(\sum_{k=1}^T \frac{1}{k})}{H_T} = \delta,$ 5	$\sum_{k=1}^T \alpha_k = \delta \frac{(\sum_{k=1}^T \frac{1}{k})}{H_T} = \delta,$ 6	Somewhat larger α early, but waste
CTHS	$\sum_{k=1}^T \alpha_k = \delta \frac{(\sum_{k=1}^T \frac{1}{k})}{H_T} = \delta,$ 7 (event count)	$\sum_{k=1}^T \alpha_k = \delta \frac{(\sum_{k=1}^T \frac{1}{k})}{H_T} = \delta,$ 8	Highest α at first confirmations

Traditional methods spend α at every round, regardless of screening outcome, leading to "wasted" budget when many proposals are rejected early. Even harmonic per-round schedules suffer if confirmations are infrequent or delayed. CTHS, indexed by actual event count k, spends budget only when a full statistical test is required, concentrating resources on proposals that may genuinely yield improvement. This maximizes statistical power, particularly in the first few confirmations, while exactly preserving the global error threshold (Wu et al., 11 Oct 2025).

5. Pseudocode Template for CTHS Integration

$H_T = \sum_{i=1}^T 1/i$ 4

In this scheme, α is only decremented at actual confirmation events, and each acceptance is validated via a rigorous confidence bound tied to the current α allocation.

6. Empirical Evidence and Experimental Implications

Experimental results from SGM stress tests confirm the superior power of CTHS. In a synthetic power analysis (CIFAR-100), a +4 percentage point true gain was injected at confirmation. Under the harmonic per-round rule with $\sum_{k=1}^T \alpha_k = \delta \frac{(\sum_{k=1}^T \frac{1}{k})}{H_T} = \delta,$ 9, total α spent over four confirmations ( $H_T = \sum_{i=1}^T 1/i$ 0) was 0.0388, insufficient for acceptance (zero accepts). With CTHS ( $H_T = \sum_{i=1}^T 1/i$ 1 confirmations), allocation was more aggressive: $H_T = \sum_{i=1}^T 1/i$ 2; the first confirmation yielded an immediate accept (Wu et al., 11 Oct 2025).

Schedule	Confirm. Rounds	Total Spend	Accepts	Outcome
CTHS	1,5,6	0.0748	1	Early accept; later rejects
Harmonic	3,4,5,6	0.0388	0	No acceptance; small αₜ

On real CIFAR-100 hyperparameter optimization (Table 2), CTHS certified a genuine +5.51 pp gain under $H_T = \sum_{i=1}^T 1/i$ 3 at iteration 6 (30-seed confirmation), while rejecting spurious tweaks and never exceeding the global error rate (Wu et al., 11 Oct 2025).

7. Context, Applicability, and Significance

CTHS formalizes a principled and computationally parsimonious approach to α-spending in recursive self-modification and continual learning systems where risk control is critical. By linking the error budget to actual confirmation events, CTHS not only preserves familywise validity but also materially increases power relative to traditional splits. Its impact is most pronounced in scenarios with frequent screening rejection, highly variable confirmation timing, or open-ended search processes as seen in AutoML and neural architecture search pipelines.

A plausible implication is that CTHS could generalize to other statistical decision processes with sequential filtering, wherever error budget must be tightly controlled over dynamically triggered, rather than deterministic, test sequences. The method’s adoption in SGM positions it as a foundational building block for scalable, risk-aware self-modifying machine learning (Wu et al., 11 Oct 2025).

Markdown Report Issue Upgrade to Chat

References (1)

SGM: A Statistical Godel Machine for Risk-Controlled Recursive Self-Modification (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Confirm-Triggered Harmonic Spending (CTHS).