Near-optimality of the adaptive scaling function
Establish a formal optimality proof demonstrating that the adaptive layer-wise scaling function α_ℓ = α_base (1 + β [1 − |ξ_ℓ|]), where ξ_ℓ denotes the normalized position of layer ℓ within the effective layer set L_eff, is near-optimal for the weighted modification objective max_{ {α_ℓ} } ∑_{ℓ ∈ L_eff} w_ℓ α_ℓ^2 subject to ∑_{ℓ ∈ L_eff} α_ℓ = C, with w_ℓ reflecting layer importance proportional to the separability metric S_ℓ. Specify and justify the necessary assumptions on layer-wise contribution functions under which this near-optimality result holds.
Sponsor
References
While we conjecture this scaling is near-optimal under weighted modification objectives, a formal optimality proof requires additional assumptions about layer-wise contribution functions and remains future work.