Adaptive Coordinate-based Regression Loss

Updated 17 December 2025

ACR Loss is an objective function that adaptively weights landmark errors using statistical shape modeling and Smooth-Face constructions.
It employs a piecewise loss function that modulates curvature from L2-like to L1-like behavior based on per-landmark difficulty.
Empirical evaluations on COFW and 300W show a 15–20% error reduction, bridging the performance gap with heatmap regression methods.

Adaptive Coordinate-based Regression (ACR) Loss is an objective function designed to optimize landmark localization, particularly in face alignment, by adaptively emphasizing harder-to-predict landmark points based on statistical shape modeling. It addresses the limitations of conventional coordinate-based regression (CBR), offering a principled formulation that improves performance via per-landmark adaptive weighting and curvature modulation, ultimately reducing the performance gap with heatmap-based regression methods in resource-constrained or mobile scenarios (Fard et al., 2022).

1. Foundation: Active Shape Model and Smooth-Face Generation

The ACR loss leverages concepts from the Active Shape Model (@@@@2@@@@) to construct "Smooth-Face" objects that provide canonical, low-variation reference configurations for facial landmarks. Given a training set of $N$ faces, each annotated with $M$ 2D landmarks ( $\mathrm{Face}_i \in \mathbb{R}^{2M}$ ), the dataset's mean shape $\mathrm{Mean\_Face}$ and the covariance matrix are computed. Principal component analysis is then performed to extract the $k$ leading eigenvectors $V = [v_1,\ldots, v_k]$ , with eigenvalues $\lambda_1 \geq \cdots \geq \lambda_k$ . Each training face can be approximated by a linear combination: $\mathrm{Face} \approx \mathrm{Mean\_Face} + V b, \text{ where } b = V^\top (\mathrm{Face} - \mathrm{Mean\_Face}).$ A Smooth-Face is generated by truncating this expansion to the first $\ell$ modes: $\mathrm{Smooth\_Face} = \mathrm{Mean\_Face} + V_{:,1:\ell} b_{1:\ell}.$ This truncation ensures Smooth-Faces vary less from the mean shape, thereby isolating the landmarks whose ground-truth configuration significantly diverges from the mean along modes not captured in the truncated expansion. These landmarks are interpreted as more challenging for prediction.

2. Landmark Difficulty Quantification

Each landmark's prediction difficulty is quantified using a normalized residual: $\Phi_{i,m} = \frac{\|\mathrm{Smooth\_Face}_{i,m} - \mathrm{Face}_{i,m}\|_2}{\max_{1 \leq q \leq M} \|\mathrm{Smooth\_Face}_{i,q} - \mathrm{Face}_{i,q}\|_2}$ where $\mathrm{Face}_{i,m}$ and $\mathrm{Smooth\_Face}_{i,m}$ denote the ground-truth and ASM-smoothed positions of the $m$ th landmark for sample $i$ . The resulting difficulty weight $\Phi_{i,m} \in [0,1]$ reflects the degree to which each landmark deviates from typical population behavior: $\Phi \approx 1$ for landmarks in highly variable locations (hard), $\Phi \approx 0$ for those close to the mean (easy).

3. ACR Loss Formulation

The predicted coordinates for landmark $m$ of image $i$ are denoted $\mathrm{Pr\_Face}_{i,m}$ . The Euclidean error is: $\Delta_{i,m} = \|\mathrm{Face}_{i,m} - \mathrm{Pr\_Face}_{i,m}\|_2.$ The per-landmark ACR loss is defined as a piecewise function modulated by $\Phi_{i,m}$ : $\text{loss}_\mathrm{pt}(\Delta, \Phi) = \begin{cases} \lambda \cdot \ln \left(1 + \Delta^{2 - \Phi}\right), & 0 \leq \Delta \leq 1 \ \Delta^2 + C, & \Delta > 1 \end{cases}$ where $C = \Phi \cdot \ln 2 - 1$ ensures $C^1$ continuity at $\Delta = 1$ , and $\lambda > 0$ adjusts the loss sharpness. The total ACR loss for a minibatch of $B$ images is: $\mathrm{Loss}_\mathrm{ACR} = \frac{1}{B \cdot M} \sum_{i=1}^B \sum_{m=1}^M \text{loss}_\mathrm{pt}(\Delta_{i,m}, \Phi_{i,m}).$

The curvature in the region $\Delta \leq 1$ transitions smoothly from $\ell_2$ -like ( $\Phi \approx 0$ ) to $\ell_1$ -like ( $\Phi \approx 1$ ) behavior. The gradient for $\Delta \leq 1$ is: $\frac{\partial\, \text{loss}_\mathrm{pt}}{\partial \Delta} = \lambda \frac{(2 - \Phi)\Delta^{1 - \Phi}}{1 + \Delta^{2 - \Phi}},$ which increases for small $\Delta$ as $\Phi \rightarrow 1$ , driving the network to focus on achieving lower error in "hard" landmarks.

4. Adaptive Scheduling of Difficulty: Mode Progression Strategy

To maintain focus on genuinely hard points as training advances, the number of ASM modes $\ell$ included in the Smooth-Face is increased according to a fixed schedule:

Epochs 0–15: $\ell = 80\%$ of available modes
Epochs 16–30: $\ell = 85\%$
Epochs 31–70: $\ell = 90\%$
Epochs 71–100: $\ell = 95\%$
Epochs 101–150: $\ell = 97\%$

This progressive refinement ensures early training emphasizes global structure, while later epochs prioritize increasingly fine-grained, outlier-resistant error signals.

5. Training Workflow and Implementation

The following pseudocode details the typical training step using ACR loss:

initialize network weights θ
for epoch = 1 to T:
    ℓ = schedule[epoch]
    V_ℓ = V[:,:ℓ]
    for each batch of B images:
        predict Pr_Face_i for i in 1...B
        for i in 1...B:
            b_i = V.T @ (Face_i - Mean_Face)
            Smooth_Face_i = Mean_Face + V_ℓ @ b_i[:ℓ]
            for m in 1...M:
                Δ_{i,m} = ||Face_{i,m} - Pr_Face_{i,m}||_2
                r_{i,m} = ||Smooth_Face_{i,m} - Face_{i,m}||_2
            normalizer_i = max_m r_{i,m}
            for m:
                Φ_{i,m} = r_{i,m} / normalizer_i
                if Δ_{i,m} <= 1:
                    loss_{i,m} = λ*log(1 + Δ_{i,m}^{2 - Φ_{i,m}})
                else:
                    C = Φ_{i,m}*ln2 - 1
                    loss_{i,m} = Δ_{i,m}**2 + C
        Loss_ACR = (1/(B*M)) * sum_{i,m} loss_{i,m}
        backward and update θ

6. Experimental Evaluation and Benchmarking

ACR loss was empirically validated on established datasets with strong baselines. The principal architectures were MobileNetV2, EfficientNet-B0, and EfficientNet-B3. Two datasets were used:

COFW: 1,345 training, 507 testing images, 29 landmarks, high occlusion
300W: ~3,148 training faces, three splits for evaluation, 68 landmarks

All images were cropped and resized to $224\times224$ , and random brightness/contrast/color jitters were applied. Networks were trained for 150 epochs using Adam (learning rate $10^{-3}$ , $\beta_1 = 0.9$ , $\beta_2 = 0.999$ , weight decay $10^{-5}$ ), with batch sizes around 32 and ACR curvature $\lambda=4$ chosen by ablation.

Key metrics:

Normalized Mean Error (NME, inter-ocular)
Failure Rate (FR, at threshold 0.1)
Area Under Curve (AUC)

Dataset	Model	Baseline NME	ACR NME	Baseline FR	ACR FR	Baseline AUC	ACR AUC
COFW	MobileNetV2	4.93%	3.78%	0.59%	0.39%	0.734	0.822
COFW	EfficientNet-B3	3.71%	3.47%	0.39%	0.39%	0.828	0.842
300W	MobileNetV2	7.32% (Chal)	6.16%	—	—	—	—
300W	EfficientNet-B3	6.01% (Chal)	5.36%	—	—	—	—
300W	EfficientNet-B3	4.24% (Full)	3.75%	—	—	—	—

EfficientNet-B3 + ACR achieved state-of-the-art performance on COFW (NME = 3.47%), outperforming all published methods, including LAB (3.92%) and ACN (3.83%). On the 300W challenging split, ACR matched heatmap-based regression methods (e.g., CHR2c at 5.15%) while maintaining the computational efficiency of a coordinate-based approach.

7. Impact and Conclusion

The ACR loss bridges the efficiency of coordinate regression with the robustness of adaptive difficulty weighting derived from statistical shape analysis. By identifying and adaptively emphasizing hard-to-localize points, the method delivers relative error reductions of 15–20% over standard $\ell_2$ -based objectives and narrows the performance gap with heatmap regression, even when deployed on compact architectures. The approach is practically validated to achieve superior performance in the presence of occlusion, pose variation, and landmark ambiguity, reinforcing its applicability for real-world, resource-constrained face alignment scenarios (Fard et al., 2022).

PDF Markdown Chat (Pro)

References (1)

ACR Loss: Adaptive Coordinate-based Regression Loss for Face Alignment (2022)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Adaptive Coordinate-based Regression (ACR) Loss.