ANSUR I: Anthropometric Data & Fairness Benchmark
- ANSUR I dataset is a comprehensive anthropometric survey capturing diverse body metrics of US Army personnel with clear gender-specific measurements.
- Researchers apply methods like Archetypal Analysis and its fairness-aware variants to evaluate unsupervised representation learning on this real-world data.
- Empirical results demonstrate that fairness constraints reduce sensitive attribute encoding while preserving data utility for robust analysis.
The ANSUR I dataset is a comprehensive anthropometric survey comprising a variety of body measurements collected from U.S. Army personnel, including both male and female subjects. Its structure and demographic diversity make it a central benchmark for evaluating the interplay between data-driven representation learning and fairness constraints, particularly when sensitive attributes such as sex are inherently encoded in latent representations.
1. Dataset Overview and Characteristics
The ANSUR I dataset consists of detailed anthropometric measurements for U.S. Army personnel, encompassing both sexes. The features captured span a broad range of body metrics, making it suitable for studies in human factors, ergonomics, and fairness within unsupervised learning. Measurements in the dataset differ significantly between sexes, which creates a risk that standard unsupervised methods will yield latent representations encoding sensitive group information, notably sex. Consequently, the dataset is particularly relevant as a real-world testbed for fairness-aware machine learning methods (Alcacer et al., 16 Jul 2025).
2. Archetypal Analysis and Fairness Concerns
Archetypal Analysis (AA) is an unsupervised method that models data as convex combinations of archetypes—extreme points or patterns in the dataset. Formally, AA solves
where is the data matrix, represents data point representations, and defines archetype contributions, subject to nonnegativity and simplex constraints on and . When applied directly to the ANSUR I dataset, standard AA tends to encode sensitive group information (e.g., sex) in the latent representation. This results in significant group separability, as evidenced by high linear separability (LS) and mean maximum discrepancy (MMD) scores. Such encoding raises fairness concerns in downstream applications.
3. Fair Archetypal Analysis (FairAA) on ANSUR I
FairAA is a principled modification of AA that aims to obtain archetypes and coefficient matrices while mitigating the influence of sensitive group information. For the ANSUR I dataset, let denote the sensitive attribute (sex). The attribute is centered:
Fairness is operationalized by requiring that, for all projections in the archetypal space, the sensitive attribute is uncorrelated, i.e., . The FairAA optimization problem becomes:
with and constrained as before, and controlling the fairness–utility trade-off.
For kernelized data, FairKernelAA extends this logic by replacing with a kernel matrix , adapting the gradient-based updates accordingly.
4. Experimental Findings: Utility–Fairness Trade-off
Empirical evaluation on the ANSUR I dataset demonstrates that standard AA yields latent projections with clear group separation along the sex attribute, as quantified by elevated values of LS and MMD. With FairAA, however, the additional regularization term () enforces a marked reduction in group separability:
- The explained variance (EV), measuring reconstruction quality, declines minimally under FairAA.
- Both MMD and LS are significantly reduced, indicating diminished encoding of sensitive group information.
- The archetypes themselves remain nearly identical between AA and FairAA, suggesting that data structure and interpretability are preserved, but sensitive group variability is largely “hidden” in the latent representations.
FairKernelAA is proposed for scenarios in which nonlinear group-to-feature relationships complicate fairness, though for ANSUR I the linear formulation demonstrates the core trade-off suitably.
5. Methodological and Practical Implications
Application of FairAA to ANSUR I establishes several methodological insights:
- Incorporation of fairness constraints at the level of representation mitigates the risk of downstream discriminatory decisions.
- The utility loss, as measured by explained variance, is minimal compared to the reduction of group separability.
- The regularization approach is modular: the fairness constraint can be generalized to multi-group or multiple-sensitive-attribute settings.
- The preservation of archetype interpretability with diluted sensitive attribute encoding increases model trustworthiness in decision-critical domains.
This positions FairAA, and its nonlinear extension FairKernelAA, as practical tools with clear utility for responsible representation learning in sensitive applications—such as hiring, healthcare, or legal analytics—where equitable representation is essential.
6. Summary and Significance
On the ANSUR I dataset, FairAA achieves a robust balance between data utility and fairness. By integrating the regularization term into the AA optimization, the resulting representations preserve essential data variability while substantially reducing the potential for sensitive attribute inference. Embedding such fairness-aware mechanisms at the representation-learning stage stands as a viable solution to bias mitigation, particularly where the stakes of inappropriate group separability are high. The demonstration on ANSUR I exemplifies the approach’s effectiveness in a real-world, sensitive dataset and provides a foundation for broader applications in fairness-critical machine learning contexts (Alcacer et al., 16 Jul 2025).