CTHS: Risk-Aware Error Allocation Strategy
- CTHS is a statistical error-budget allocation strategy that triggers spending solely on confirmed events to maintain strict control over the familywise error rate in recursive self-modification systems.
- It uses a preliminary screening process to filter out inferior proposals, thereby directing statistical power toward genuinely promising modifications.
- Mathematical guarantees based on Hoeffding’s and Ville’s inequalities ensure accurate confidence bounds and prevent harmful acceptances during iterative system updates.
Confirm-Triggered Harmonic Spending (CTHS) is a statistical error-budget allocation strategy for controlling cumulative risk in recursive self-modification frameworks. First introduced in the context of the Statistical Gödel Machine (SGM) paradigm, CTHS ensures that familywise error rate (FWER) is tightly bounded during iterative, proof-less modification of code or hyperparameters. By allocating α only at confirmation events—rather than every iteration or proposal—CTHS concentrates statistical power on genuinely promising candidates while preserving global error guarantees across potentially unbounded self-modification trajectories (Wu et al., 11 Oct 2025).
1. Foundational Motivation and Formalism
In recursive self-modification settings such as SGM, each outer round proposes a modification to system code or hyperparameters. This proposal is initially subjected to a lightweight screening procedure, intended to filter out obviously inferior options without committing error budget. Only if the screening is sufficiently promising does the candidate escalate to a confirmation event, where a thorough statistical test is conducted to certify strict improvement.
To maintain a global upper bound δ on the probability of ever accepting a harmful modification, standard approaches split δ among tests a priori, either uniformly or harmonically across all rounds. The naïve split δ/T or harmonic per-round rule
spends α regardless of whether a true confirmation occurs, thus dissipating power by allocating to rounds that never escalate beyond screening.
CTHS addresses this inefficiency by indexing the error budget not by attempted rounds t but by the actual sequence of confirmation events k. At the k-th confirmation event (out of total at most T), the test is conducted at level
As a consequence,
so the global FWER remains controlled at the target δ (Wu et al., 11 Oct 2025).
2. Operational Algorithm and Integration
The CTHS protocol is implemented as a wrapper for the outer self-modification loop:
- Initialization: Set global δ, fix T (maximum rounds), compute , set confirmation index .
- Proposal and Screening: For each round t, proposer suggests a candidate . A screening evaluation is performed.
- If screening fails, reject instantly; no error budget is spent.
- If screening passes, escalate to confirmation.
- Confirmation: Increment , allocate .
- Gather n paired improvement samples via thorough evaluation.
- Compute one-sided lower confidence bound 0 for the mean improvement 1.
- If 2, accept the edit and update incumbent parameters.
- Else, reject.
- Termination: Halt at T rounds or upon external stopping criteria.
This approach strictly requires that no portion of δ is spent unless a confirmation event occurs, focusing expenditure on decisions with real risk (Wu et al., 11 Oct 2025).
3. Mathematical Guarantees: Confidence Bounds and FWER Control
Given bounded improvement samples 3 and normalized 4 for 5, the empirical mean is
6
Hoeffding’s inequality yields the bound
7
and for chosen 8, solving for 9 gives
0
The acceptance criterion is 1.
The total probability of a harmful acceptance is controlled by the union bound:
2
thus proving FWER control. Substitution of anytime e-value tests with Ville’s inequality yields equivalent guarantees (Wu et al., 11 Oct 2025).
4. Comparison to Traditional α-Spending Rules
| Scheme | Allocation Index | α Spent per Test | Typical Outcome |
|---|---|---|---|
| Fixed per-round | 3 | 4 | Small α per round, power diluted |
| Harmonic per-round | 5 | 6 | Somewhat larger α early, but waste |
| CTHS | 7 (event count) | 8 | Highest α at first confirmations |
Traditional methods spend α at every round, regardless of screening outcome, leading to "wasted" budget when many proposals are rejected early. Even harmonic per-round schedules suffer if confirmations are infrequent or delayed. CTHS, indexed by actual event count k, spends budget only when a full statistical test is required, concentrating resources on proposals that may genuinely yield improvement. This maximizes statistical power, particularly in the first few confirmations, while exactly preserving the global error threshold (Wu et al., 11 Oct 2025).
5. Pseudocode Template for CTHS Integration
4
In this scheme, α is only decremented at actual confirmation events, and each acceptance is validated via a rigorous confidence bound tied to the current α allocation.
6. Empirical Evidence and Experimental Implications
Experimental results from SGM stress tests confirm the superior power of CTHS. In a synthetic power analysis (CIFAR-100), a +4 percentage point true gain was injected at confirmation. Under the harmonic per-round rule with 9, total α spent over four confirmations (0) was 0.0388, insufficient for acceptance (zero accepts). With CTHS (1 confirmations), allocation was more aggressive: 2; the first confirmation yielded an immediate accept (Wu et al., 11 Oct 2025).
| Schedule | Confirm. Rounds | Total Spend | Accepts | Outcome |
|---|---|---|---|---|
| CTHS | 1,5,6 | 0.0748 | 1 | Early accept; later rejects |
| Harmonic | 3,4,5,6 | 0.0388 | 0 | No acceptance; small αₜ |
On real CIFAR-100 hyperparameter optimization (Table 2), CTHS certified a genuine +5.51 pp gain under 3 at iteration 6 (30-seed confirmation), while rejecting spurious tweaks and never exceeding the global error rate (Wu et al., 11 Oct 2025).
7. Context, Applicability, and Significance
CTHS formalizes a principled and computationally parsimonious approach to α-spending in recursive self-modification and continual learning systems where risk control is critical. By linking the error budget to actual confirmation events, CTHS not only preserves familywise validity but also materially increases power relative to traditional splits. Its impact is most pronounced in scenarios with frequent screening rejection, highly variable confirmation timing, or open-ended search processes as seen in AutoML and neural architecture search pipelines.
A plausible implication is that CTHS could generalize to other statistical decision processes with sequential filtering, wherever error budget must be tightly controlled over dynamically triggered, rather than deterministic, test sequences. The method’s adoption in SGM positions it as a foundational building block for scalable, risk-aware self-modifying machine learning (Wu et al., 11 Oct 2025).