Papers
Topics
Authors
Recent
Search
2000 character limit reached

Human-Aware Loss Functions (HALOs)

Updated 17 May 2026
  • Human-Aware Loss Functions are a class of loss functions that integrate human-derived constraints and perceptual cues into training, enhancing model generalization and interpretability.
  • They incorporate semantic, perceptual, and saliency-based regularizations to align model predictions with human context, particularly improving performance in data-scarce regimes.
  • HALOs are applied in diverse domains including human activity recognition, image restoration, and saliency modeling, enabling context-aware decision-making and state-of-the-art results.

Human-Aware Loss Functions (HALOs) are a class of loss functions that incorporate explicit knowledge about human behaviors, perceptual abilities, or context-dependent constraints directly into the training objectives of neural networks. Established across multiple domains—including human activity recognition, perceptual image restoration, and human-annotated saliency modeling—HALOs are designed to align learning with high-level symbolic rules, human perception, or human annotation without invoking external reasoning modules at inference time. This differentiable guidance enables deep models to generalize better in data-scarce regimes, improve interpretability, and achieve state-of-the-art performance in context-critical tasks.

1. Taxonomy and Definitions of Human-Aware Losses

HALOs encompass a family of loss functions constructed to inject human-derived constraints or knowledge into neural network optimization. Three principal paradigms emerge in contemporary literature:

A canonical HALO is integrated additively with a standard task loss (such as cross-entropy), with a hyperparameter modulating the strength of the human-aware term.

2. Formalization and Variants of HALOs

Several specific HALO formulations have been proposed, each instantiating the principle of human-grounded regularization via different mathematical constructs:

Let P=⟨p1,...,pk⟩P = \langle p_1, ..., p_k\rangle denote the output probability vector over kk classes for input ww, and A∗A^* the set of activities consistent with the high-level context CwC^w under an ontology KK. The following semantic losses guide the network to favor contextually-legal activities:

  • AllConsistentActs (All):

Lsemantic(All)(P,A∗)=1−∑i:ai∈A∗piL_\text{semantic}^\text{(All)}(P, A^*) = 1 - \sum_{i:a_i\in A^*} p_i

Encourages the sum probability mass over allowed classes.

  • MinusProb-Prob (–PP):

Lsemantic(−PP)(p^,a^,A∗)={1−p^a^∈A∗ p^otherwiseL_\text{semantic}^{(-PP)}(\hat p,\hat a,A^*)= \begin{cases} 1-\hat p & \hat a\in A^* \ \hat p & \text{otherwise} \end{cases}

  • Zero-One (01):

Lsemantic(01)(p^,a^,A∗)={0a^∈A∗ 1otherwiseL_\text{semantic}^{(01)}(\hat p,\hat a,A^*)= \begin{cases} 0 & \hat a\in A^* \ 1 & \text{otherwise} \end{cases}

  • MinusProb-One (–P1) and Zero-Prob (0P): Variants with different penalization of confidence or misalignment.

Semantic losses are integrated as:

Ltotal=Lcross(P,y)+α⋅Lsemantic(P,A∗)L_\text{total} = L_\text{cross}(P, y) + \alpha \cdot L_\text{semantic}(P, A^*)

where kk0 is the standard cross-entropy, kk1 the ground-truth label, and kk2 a regularization weight.

In image classification, HALOs may take the form:

kk3

kk4, where kk5 is the human-provided saliency map, and kk6 is extracted from the model (e.g., CAM). kk7 (or equivalently, a trade-off parameter kk8) determines the weighting.

For low-level vision tasks, the "Mix" loss combines multi-scale SSIM and pixelwise kk9:

ww0

Here, ww1 is the multi-scale structural similarity loss, ww2 is a Gaussian kernel at the coarsest scale, and ww3 balances structural and absolute fidelity.

3. Construction and Differentiability from Symbolic or Human-Centric Knowledge

HALOs depend on mapping symbolic predicates or human data sources into differentiable loss terms:

  • Ontology-Based Context Constraints (Arrotta et al., 2023): Activity classes carry symbolic context restrictions, e.g., ww4. At each training step, a reasoner computes which activities are context-consistent, and the semantic loss penalizes violations. The penalization is constructed to be differentiable almost everywhere for backpropagation.
  • Saliency Alignment (Boyd et al., 2021): Human saliency maps are aggregated and processed into smooth, real-valued matrices, and compared to model CAMs using ww5 loss, which is differentiable with respect to model parameters.
  • Perceptual Similarity (Zhao et al., 2015): SSIM and MS-SSIM depend on local statistics (mean, variance, covariance) of image patches and are differentiable via the chain rule with respect to pixel outputs, enabling end-to-end gradient-based optimization.

4. Network Architectures, Datasets, and Optimization Protocols

HALOs have been deployed in diverse neural architectures and evaluated on both synthetic and real-world datasets:

  • Context-Aware HAR (Arrotta et al., 2023):
    • Input: Smartphone and smartwatch inertial streams, high-level context.
    • Architecture: Parallel 1D convolutions for inertial signals; context one-hot processed by dense layer; feature concatenation followed by dropout and dense layers with softmax.
    • Datasets: DOMINO (scripted, 25 users, 14 activities, contexts) and ExtraSensory (in-the-wild).
    • Training: Adam optimizer, batch size 32, cross-validation per user.
  • Image Classification with Saliency Supervision (Boyd et al., 2021):
    • Models: DenseNet-121, ResNet-50, Inception-v3, Xception.
    • Input saliency maps: Collected via crowd-sourcing and preprocessed.
    • Training: SGD with momentum, cross-entropy and human saliency loss, batch size 32, early stopping.
  • Perceptual Image Restoration (Zhao et al., 2015):
    • Fully convolutional network, e.g., for denoising + demosaicking, super-resolution.
    • Loss computed on central pixels or patches using Mix (MS-SSIM + ww6).
    • Datasets: Images for denoising, super-resolution, JPEG artifact removal.

5. Empirical Results, Efficiency, and Comparative Analysis

Substantial improvements are reported across tasks and settings for models trained with HALOs, particularly under data scarcity or domain shifts.

Quantitative Results

Domain Baseline HALO/Best Δ Metric Dataset Loss Variant
HAR (DOMINO, 100%) F1=0.90 F1=0.93 +0.026 25 users, 14 activities Semantic (–P1, α=7)
HAR (ExtraSensory, 10%) F1=0.52 F1=0.59 +0.067 31 users, 7 activities AllConsistentActs, α=30
Face Detection (ResNet) AUC=0.55 AUC=0.67 +0.12 600k synthetic faces CYBORG (α=0.5)
Image Restoration (Mix) PSNR/SSIM/MS-SSIM ↑ Highest across all Denoise, SISR, Deblock Mix (MS-SSIM+ww7)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Human-Aware Loss Functions (HALOs).