Papers
Topics
Authors
Recent
Search
2000 character limit reached

Supervised Iterative Computation (SIC)

Updated 4 December 2025
  • Supervised Iterative Computation (SIC) is a framework that alternates between unconstrained regression and explicit convex projection to enforce constraints such as fairness or physical laws.
  • It decouples model training from constraint enforcement, allowing any regression model to be used while ensuring convergence via a contraction mapping under mild assumptions.
  • Empirical evaluations demonstrate SIC’s improved stability, performance, and constraint satisfaction compared to standard regression and Moving Targets methods on benchmark datasets.

Supervised Iterative Computation (SIC) is an algorithmic framework for supervised learning under constraints, specifically targeted at regression tasks where the predicted outputs must satisfy arbitrary convex constraints, such as fairness, physics, or structural requirements. SIC formulates learning as an alternating sequence of unconstrained regression and explicit constraint enforcement by target adjustment and projection, and provides a convergence guarantee via contraction mapping provided certain mild assumptions hold. This decoupled approach allows the use of any off-the-shelf regression model and enables general handling of convex constraint sets.

1. Formal Mathematical Structure

Let XRn×dX\in\mathbb{R}^{n\times d} denote a set of nn input samples, yRny\in\mathbb{R}^n the ideal target outputs, and f(X,θ)Rnf(X, \theta)\in\mathbb{R}^n a parametric regression model with parameters θRp\theta\in\mathbb{R}^p. The SIC framework imposes a closed, convex feasible set CRnC\subset\mathbb{R}^n on the model outputs, encoding required constraints (e.g., fairness, structural properties), and is equipped with a loss function L:Rn×RnR0L:\mathbb{R}^n\times\mathbb{R}^n\to\mathbb{R}_{\ge 0}, such as mean squared error (MSE) or mean absolute error (MAE).

Denote by B={f(X,θ)θRp}B = \{ f(X, \theta)\mid \theta\in\mathbb{R}^p \} the set of outputs achievable by the model. SIC operates via two fundamental operators:

  • Constraint projection: PC,L(u)=argminzCL(z,u)P_{C,L}(u) = \arg\min_{z\in C} L(z, u), projecting onto CC.
  • Model projection: PB,L(v)=argminy^BL(y^,v)P_{B,L}(v) = \arg\min_{\hat{y}\in B} L(\hat{y}, v), unconstrained retraining to track a given target vv.

An affine extension operator h:RnRnh:\mathbb{R}^n\to\mathbb{R}^n, h(x)=(1α)y+αxh(x) = (1-\alpha) y + \alpha x, α[0,1)\alpha\in[0,1), blends the ideal and current model outputs.

The base SIC iteration is as follows, given the current model prediction y^i\hat{y}^i:

  • If y^iC\hat{y}^i\notin C (infeasible),
    • Construct the affine-extended target yα=(1α)y+αy^iy^\alpha = (1-\alpha) y + \alpha \hat{y}^i.
    • Compute zi=PC,L(yα)z^i = P_{C,L}(y^\alpha).
  • If y^iC\hat{y}^i\in C (feasible),
    • Compute zi=argminzCL(z,y)z^i = \arg\min_{z\in C} L(z, y) subject to L(z,y^i)βL(z, \hat{y}^i) \le \beta.

Then update by solving the unconstrained regression problem y^i+1=PB,L(zi)\hat{y}^{i+1} = P_{B,L}(z^i).

The overall one-step update operator is

y^i+1=T(y^i)=PB,L(PC,L(h(y^i))).\hat{y}^{i+1} = T(\hat{y}^i) = P_{B,L}(P_{C,L}(h(\hat{y}^i))).

Initialization is via an unconstrained fit: y^1=argminy^BL(y^,y)\hat{y}^1 = \arg\min_{\hat{y}\in B} L(\hat{y}, y) (C. et al., 2022).

2. Algorithm and Convergence

The SIC framework defines a strict contraction mapping provided the following conditions hold:

  1. BB and CC are closed convex subsets of Rn\mathbb{R}^n.
  2. Each projection operator PA,LP_{A,L} (A=BA=B or CC) is Lipschitz continuous with constant K1K\ge 1 under the considered norm.

For u,vBu,v\in B,

T(u)T(v)K2αuv.\|T(u) - T(v)\| \le K^2 \alpha \|u - v\|.

Thus, if K2α<1K^2 \alpha < 1, TT is a contraction mapping. Invoking the Banach fixed point theorem, SIC has a unique fixed point yˉ\bar{y}, and the iterates converge linearly with

y^iyˉ(K2α)i1y^1yˉ.\|\hat{y}^i - \bar{y}\| \le (K^2 \alpha)^{i-1} \|\hat{y}^1 - \bar{y}\|.

For MSE loss, K=1K=1, so convergence holds for any α[0,1)\alpha\in[0,1); for MAE (L1L^1), K=2K=2 and α<1/4\alpha<1/4 is required (C. et al., 2022).

This contraction property is central, ensuring that iterates approach a unique fixed point regardless of initialization, provided the normed operator is contractive.

3. Practical Implementation

A typical iteration scheme is summarized in the following pseudocode, as appearing verbatim in the formal exposition:

Input: X,y,C,α,β, max_iters

// 1) Initial unconstrained fit
ŷ[1] ← argmin_{ŷ∈B} L(ŷ, y)

for i in 1..max_iters–1:
   if ŷ[i] ∉ C:
     // infeasible adjustment
     yα ← (1−α)·y + α·ŷ[i]
     z ← argmin_{z∈C} L(z, yα)
   else:
     // feasible adjustment
     z ← argmin_{z∈C} L(z, y) subject to L(z, ŷ[i]) ≤ β
   end
   // unconstrained retraining
   ŷ[i+1] ← argmin_{ŷ∈B} L(ŷ, z)
end

return ŷ[max_iters]

The approach decouples the constraint-enforcement logic from the machine learning model, so any off-the-shelf regressor can be plugged into the iteration. Each iteration nonetheless requires solving a (convex) projection and retraining.

4. Empirical Evaluation and Benchmarking

SIC's practical impact has been demonstrated on regression tasks with fairness constraints. In these experiments, the constraint set CC was specified by the Disparate Impact Discrimination Index (DIDI): DIDI(z)=protected groups1ni=1nzi1GiGziϵ.\mathrm{DIDI}(z) = \sum_{\text{protected groups}} \left\lvert \frac{1}{n}\sum_{i=1}^n z_i - \frac{1}{|G|}\sum_{i\in G}z_i \right\rvert \le \epsilon. Benchmarks were conducted on three UCI-style datasets:

  • Student (n=649n=649, target=final grade, protected=sex)
  • Crime (n=2215n=2215, target=crime rate, protected=race)
  • BlackFriday (n=50,000n=50,000, target=purchase amount, protected=gender)

Performance was compared to unconstrained regression (α=0\alpha=0) and the Moving Targets algorithm of Detassis et al. Key metrics were the R2R^2-score on held-out folds, constraint satisfaction C=DIDI(y^)/DIDI(y)\mathcal{C} = \mathrm{DIDI}(\hat{y})/\mathrm{DIDI}(y), and variability across cross-validation.

Abridged results (mean ± std over 5 folds, after 30 SIC iterations, for MAE loss, α=0.5\alpha=0.5):

Algorithm Crime R2R^2 Crime C\mathcal{C} Student R2R^2 Student C\mathcal{C} BF R2R^2 BF C\mathcal{C}
SIC 0.467 (0.019) 0.239 (0.013) 0.874 (0.019) 0.333 (0.054) 0.624 (0.002) 0.478 (0.018)
MovingTargets 0.342 (0.085) 0.265 (0.005) 0.883 (0.029) 0.327 (0.026) 0.590 (0.003) 0.577 (0.028)

SIC exhibited equal or better R2R^2, improved stability (lower variance), and faster approach to constraint satisfaction at higher α\alpha values (C. et al., 2022).

5. Advantages, Limitations, and Discussion

Advantages

  • Decoupling: By modifying targets instead of model parameters, SIC allows incorporation of any underlying regression architecture.
  • Generalizability: Operates with arbitrary convex constraints CC, not limited to specific types or structures.
  • Convergence: Proven fixed-point existence and linear convergence rate under weak assumptions.
  • Numerical Stability: Empirically demonstrates reduced variability in cross-validation, especially at higher trade-off values α\alpha.

Limitations

  • Soft constraint satisfaction: Constraints are only approached asymptotically; strict hard satisfaction in finite iterations is not assured.
  • Computational cost: Each iteration involves a potentially expensive convex projection and model retraining.
  • Lipschitz continuity: The guarantee relies on the projection operators being Lipschitz; non-Lipschitz or nonconvex constraint sets remain outside the established theory.
  • Limited guarantees beyond regression: Formal convergence guarantees are not currently established for classification or structured prediction tasks (C. et al., 2022).

A plausible implication is that further developments might address projection in nonconvex or non-Lipschitz contexts, and extend the scheme to multi-output or non-regression settings.

6. Context and Relation to Constraint-Constrained Learning

SIC generalizes and formally unifies previous approaches such as Moving Targets (Detassis et al., ICML 2020) for mean-squared error loss under mild parameter correspondence. It is fundamentally connected to alternating projection methods (e.g., Dykstra’s algorithm with Bregman projections [Bauschke & Combettes 1997]) and draws on contraction mappings and the Banach fixed-point theorem for its guarantee [Ciesielski 2007]. Its technical flexibility places it distinct from methods that tie constraint enforcement directly to model parameter updates, thus facilitating ready deployment across model types so long as convex projections and unconstrained regression are accessible.

The separation of constraint projection and model learning steps and the analytical convergence underpin its utility in practical, stable enforcement of structural or fairness constraints in machine learning models, with typical applications in supervised regression (C. et al., 2022).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Supervised Iterative Computation (SIC).