ANSUR I: Anthropometric Data & Fairness Benchmark

Updated 18 July 2025

ANSUR I dataset is a comprehensive anthropometric survey capturing diverse body metrics of US Army personnel with clear gender-specific measurements.
Researchers apply methods like Archetypal Analysis and its fairness-aware variants to evaluate unsupervised representation learning on this real-world data.
Empirical results demonstrate that fairness constraints reduce sensitive attribute encoding while preserving data utility for robust analysis.

The ANSUR I dataset is a comprehensive anthropometric survey comprising a variety of body measurements collected from U.S. Army personnel, including both male and female subjects. Its structure and demographic diversity make it a central benchmark for evaluating the interplay between data-driven representation learning and fairness constraints, particularly when sensitive attributes such as sex are inherently encoded in latent representations.

1. Dataset Overview and Characteristics

The ANSUR I dataset consists of detailed anthropometric measurements for U.S. Army personnel, encompassing both sexes. The features captured span a broad range of body metrics, making it suitable for studies in human factors, ergonomics, and fairness within unsupervised learning. Measurements in the dataset differ significantly between sexes, which creates a risk that standard unsupervised methods will yield latent representations encoding sensitive group information, notably sex. Consequently, the dataset is particularly relevant as a real-world testbed for fairness-aware machine learning methods (Alcacer et al., 16 Jul 2025).

2. Archetypal Analysis and Fairness Concerns

Archetypal Analysis (AA) is an unsupervised method that models data as convex combinations of archetypes—extreme points or patterns in the dataset. Formally, AA solves

$\min_{S, C} \| X - S C X \|^2_F$

where $X$ is the data matrix, $S$ represents data point representations, and $C$ defines archetype contributions, subject to nonnegativity and simplex constraints on $S$ and $C$ . When applied directly to the ANSUR I dataset, standard AA tends to encode sensitive group information (e.g., sex) in the latent representation. This results in significant group separability, as evidenced by high linear separability (LS) and mean maximum discrepancy (MMD) scores. Such encoding raises fairness concerns in downstream applications.

3. Fair Archetypal Analysis (FairAA) on ANSUR I

FairAA is a principled modification of AA that aims to obtain archetypes and coefficient matrices while mitigating the influence of sensitive group information. For the ANSUR I dataset, let $z_i \in \{0, 1\}$ denote the sensitive attribute (sex). The attribute is centered:

$\bar{z} = \frac{1}{n} \sum_{i=1}^n z_i,\quad z = [z_1 - \bar{z}, ..., z_n - \bar{z}] \in \mathbb{R}^n$

Fairness is operationalized by requiring that, for all projections in the archetypal space, the sensitive attribute is uncorrelated, i.e., $z S = 0$ . The FairAA optimization problem becomes:

$\min_{S, C} \| X - S C X \|^2_F + \lambda \| z S \|^2_F$

with $S$ and $C$ constrained as before, and $\lambda \ge 0$ controlling the fairness–utility trade-off.

For kernelized data, FairKernelAA extends this logic by replacing $X X^\top$ with a kernel matrix $K$ , adapting the gradient-based updates accordingly.

4. Experimental Findings: Utility–Fairness Trade-off

Empirical evaluation on the ANSUR I dataset demonstrates that standard AA yields latent projections with clear group separation along the sex attribute, as quantified by elevated values of LS and MMD. With FairAA, however, the additional regularization term ( $\lambda \|z S\|^2_F$ ) enforces a marked reduction in group separability:

The explained variance (EV), measuring reconstruction quality, declines minimally under FairAA.
Both MMD and LS are significantly reduced, indicating diminished encoding of sensitive group information.
The archetypes themselves remain nearly identical between AA and FairAA, suggesting that data structure and interpretability are preserved, but sensitive group variability is largely “hidden” in the latent representations.

FairKernelAA is proposed for scenarios in which nonlinear group-to-feature relationships complicate fairness, though for ANSUR I the linear formulation demonstrates the core trade-off suitably.

5. Methodological and Practical Implications

Application of FairAA to ANSUR I establishes several methodological insights:

Incorporation of fairness constraints at the level of representation mitigates the risk of downstream discriminatory decisions.
The utility loss, as measured by explained variance, is minimal compared to the reduction of group separability.
The regularization approach is modular: the fairness constraint can be generalized to multi-group or multiple-sensitive-attribute settings.
The preservation of archetype interpretability with diluted sensitive attribute encoding increases model trustworthiness in decision-critical domains.

This positions FairAA, and its nonlinear extension FairKernelAA, as practical tools with clear utility for responsible representation learning in sensitive applications—such as hiring, healthcare, or legal analytics—where equitable representation is essential.

6. Summary and Significance

On the ANSUR I dataset, FairAA achieves a robust balance between data utility and fairness. By integrating the regularization term $\lambda \| z S \|^2_F$ into the AA optimization, the resulting representations preserve essential data variability while substantially reducing the potential for sensitive attribute inference. Embedding such fairness-aware mechanisms at the representation-learning stage stands as a viable solution to bias mitigation, particularly where the stakes of inappropriate group separability are high. The demonstration on ANSUR I exemplifies the approach’s effectiveness in a real-world, sensitive dataset and provides a foundation for broader applications in fairness-critical machine learning contexts (Alcacer et al., 16 Jul 2025).

PDF Markdown Chat (Pro)

References (1)

Incorporating Fairness Constraints into Archetypal Analysis (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to ANSUR I Dataset.