Papers
Topics
Authors
Recent
2000 character limit reached

Adaptive Coordinate-based Regression Loss

Updated 17 December 2025
  • ACR Loss is an objective function that adaptively weights landmark errors using statistical shape modeling and Smooth-Face constructions.
  • It employs a piecewise loss function that modulates curvature from L2-like to L1-like behavior based on per-landmark difficulty.
  • Empirical evaluations on COFW and 300W show a 15–20% error reduction, bridging the performance gap with heatmap regression methods.

Adaptive Coordinate-based Regression (ACR) Loss is an objective function designed to optimize landmark localization, particularly in face alignment, by adaptively emphasizing harder-to-predict landmark points based on statistical shape modeling. It addresses the limitations of conventional coordinate-based regression (CBR), offering a principled formulation that improves performance via per-landmark adaptive weighting and curvature modulation, ultimately reducing the performance gap with heatmap-based regression methods in resource-constrained or mobile scenarios (Fard et al., 2022).

1. Foundation: Active Shape Model and Smooth-Face Generation

The ACR loss leverages concepts from the Active Shape Model (@@@@2@@@@) to construct "Smooth-Face" objects that provide canonical, low-variation reference configurations for facial landmarks. Given a training set of NN faces, each annotated with MM 2D landmarks (FaceiR2M\mathrm{Face}_i \in \mathbb{R}^{2M}), the dataset's mean shape Mean_Face\mathrm{Mean\_Face} and the covariance matrix are computed. Principal component analysis is then performed to extract the kk leading eigenvectors V=[v1,,vk]V = [v_1,\ldots, v_k], with eigenvalues λ1λk\lambda_1 \geq \cdots \geq \lambda_k. Each training face can be approximated by a linear combination: FaceMean_Face+Vb, where b=V(FaceMean_Face).\mathrm{Face} \approx \mathrm{Mean\_Face} + V b, \text{ where } b = V^\top (\mathrm{Face} - \mathrm{Mean\_Face}). A Smooth-Face is generated by truncating this expansion to the first \ell modes: Smooth_Face=Mean_Face+V:,1:b1:.\mathrm{Smooth\_Face} = \mathrm{Mean\_Face} + V_{:,1:\ell} b_{1:\ell}. This truncation ensures Smooth-Faces vary less from the mean shape, thereby isolating the landmarks whose ground-truth configuration significantly diverges from the mean along modes not captured in the truncated expansion. These landmarks are interpreted as more challenging for prediction.

2. Landmark Difficulty Quantification

Each landmark's prediction difficulty is quantified using a normalized residual: Φi,m=Smooth_Facei,mFacei,m2max1qMSmooth_Facei,qFacei,q2\Phi_{i,m} = \frac{\|\mathrm{Smooth\_Face}_{i,m} - \mathrm{Face}_{i,m}\|_2}{\max_{1 \leq q \leq M} \|\mathrm{Smooth\_Face}_{i,q} - \mathrm{Face}_{i,q}\|_2} where Facei,m\mathrm{Face}_{i,m} and Smooth_Facei,m\mathrm{Smooth\_Face}_{i,m} denote the ground-truth and ASM-smoothed positions of the mmth landmark for sample ii. The resulting difficulty weight Φi,m[0,1]\Phi_{i,m} \in [0,1] reflects the degree to which each landmark deviates from typical population behavior: Φ1\Phi \approx 1 for landmarks in highly variable locations (hard), Φ0\Phi \approx 0 for those close to the mean (easy).

3. ACR Loss Formulation

The predicted coordinates for landmark mm of image ii are denoted Pr_Facei,m\mathrm{Pr\_Face}_{i,m}. The Euclidean error is: Δi,m=Facei,mPr_Facei,m2.\Delta_{i,m} = \|\mathrm{Face}_{i,m} - \mathrm{Pr\_Face}_{i,m}\|_2. The per-landmark ACR loss is defined as a piecewise function modulated by Φi,m\Phi_{i,m}: losspt(Δ,Φ)={λln(1+Δ2Φ),0Δ1 Δ2+C,Δ>1\text{loss}_\mathrm{pt}(\Delta, \Phi) = \begin{cases} \lambda \cdot \ln \left(1 + \Delta^{2 - \Phi}\right), & 0 \leq \Delta \leq 1 \ \Delta^2 + C, & \Delta > 1 \end{cases} where C=Φln21C = \Phi \cdot \ln 2 - 1 ensures C1C^1 continuity at Δ=1\Delta = 1, and λ>0\lambda > 0 adjusts the loss sharpness. The total ACR loss for a minibatch of BB images is: LossACR=1BMi=1Bm=1Mlosspt(Δi,m,Φi,m).\mathrm{Loss}_\mathrm{ACR} = \frac{1}{B \cdot M} \sum_{i=1}^B \sum_{m=1}^M \text{loss}_\mathrm{pt}(\Delta_{i,m}, \Phi_{i,m}).

The curvature in the region Δ1\Delta \leq 1 transitions smoothly from 2\ell_2-like (Φ0\Phi \approx 0) to 1\ell_1-like (Φ1\Phi \approx 1) behavior. The gradient for Δ1\Delta \leq 1 is: lossptΔ=λ(2Φ)Δ1Φ1+Δ2Φ,\frac{\partial\, \text{loss}_\mathrm{pt}}{\partial \Delta} = \lambda \frac{(2 - \Phi)\Delta^{1 - \Phi}}{1 + \Delta^{2 - \Phi}}, which increases for small Δ\Delta as Φ1\Phi \rightarrow 1, driving the network to focus on achieving lower error in "hard" landmarks.

4. Adaptive Scheduling of Difficulty: Mode Progression Strategy

To maintain focus on genuinely hard points as training advances, the number of ASM modes \ell included in the Smooth-Face is increased according to a fixed schedule:

  • Epochs 0–15: =80%\ell = 80\% of available modes
  • Epochs 16–30: =85%\ell = 85\%
  • Epochs 31–70: =90%\ell = 90\%
  • Epochs 71–100: =95%\ell = 95\%
  • Epochs 101–150: =97%\ell = 97\%

This progressive refinement ensures early training emphasizes global structure, while later epochs prioritize increasingly fine-grained, outlier-resistant error signals.

5. Training Workflow and Implementation

The following pseudocode details the typical training step using ACR loss:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
initialize network weights θ
for epoch = 1 to T:
    ℓ = schedule[epoch]
    V_ℓ = V[:,:ℓ]
    for each batch of B images:
        predict Pr_Face_i for i in 1...B
        for i in 1...B:
            b_i = V.T @ (Face_i - Mean_Face)
            Smooth_Face_i = Mean_Face + V_ℓ @ b_i[:ℓ]
            for m in 1...M:
                Δ_{i,m} = ||Face_{i,m} - Pr_Face_{i,m}||_2
                r_{i,m} = ||Smooth_Face_{i,m} - Face_{i,m}||_2
            normalizer_i = max_m r_{i,m}
            for m:
                Φ_{i,m} = r_{i,m} / normalizer_i
                if Δ_{i,m} <= 1:
                    loss_{i,m} = λ*log(1 + Δ_{i,m}^{2 - Φ_{i,m}})
                else:
                    C = Φ_{i,m}*ln2 - 1
                    loss_{i,m} = Δ_{i,m}**2 + C
        Loss_ACR = (1/(B*M)) * sum_{i,m} loss_{i,m}
        backward and update θ

6. Experimental Evaluation and Benchmarking

ACR loss was empirically validated on established datasets with strong baselines. The principal architectures were MobileNetV2, EfficientNet-B0, and EfficientNet-B3. Two datasets were used:

  • COFW: 1,345 training, 507 testing images, 29 landmarks, high occlusion
  • 300W: ~3,148 training faces, three splits for evaluation, 68 landmarks

All images were cropped and resized to 224×224224\times224, and random brightness/contrast/color jitters were applied. Networks were trained for 150 epochs using Adam (learning rate 10310^{-3}, β1=0.9\beta_1 = 0.9, β2=0.999\beta_2 = 0.999, weight decay 10510^{-5}), with batch sizes around 32 and ACR curvature λ=4\lambda=4 chosen by ablation.

Key metrics:

  • Normalized Mean Error (NME, inter-ocular)
  • Failure Rate (FR, at threshold 0.1)
  • Area Under Curve (AUC)
Dataset Model Baseline NME ACR NME Baseline FR ACR FR Baseline AUC ACR AUC
COFW MobileNetV2 4.93% 3.78% 0.59% 0.39% 0.734 0.822
COFW EfficientNet-B3 3.71% 3.47% 0.39% 0.39% 0.828 0.842
300W MobileNetV2 7.32% (Chal) 6.16%
300W EfficientNet-B3 6.01% (Chal) 5.36%
300W EfficientNet-B3 4.24% (Full) 3.75%

EfficientNet-B3 + ACR achieved state-of-the-art performance on COFW (NME = 3.47%), outperforming all published methods, including LAB (3.92%) and ACN (3.83%). On the 300W challenging split, ACR matched heatmap-based regression methods (e.g., CHR2c at 5.15%) while maintaining the computational efficiency of a coordinate-based approach.

7. Impact and Conclusion

The ACR loss bridges the efficiency of coordinate regression with the robustness of adaptive difficulty weighting derived from statistical shape analysis. By identifying and adaptively emphasizing hard-to-localize points, the method delivers relative error reductions of 15–20% over standard 2\ell_2-based objectives and narrows the performance gap with heatmap regression, even when deployed on compact architectures. The approach is practically validated to achieve superior performance in the presence of occlusion, pose variation, and landmark ambiguity, reinforcing its applicability for real-world, resource-constrained face alignment scenarios (Fard et al., 2022).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Adaptive Coordinate-based Regression (ACR) Loss.