NutriScreener: Automated Nutritional Screening

Updated 24 November 2025

NutriScreener is a deep learning system for automated nutritional assessment, integrating visual encoding, graph attention, and retrieval augmentation.
It fuses multi-pose image inputs with contextual metadata to perform malnutrition classification and anthropometric regression with high accuracy.
The system supports scalable, privacy-preserving dietary management and clinical screening, proving robust against pose variations and domain shifts.

NutriScreener denotes a class of intelligent systems for automated nutritional assessment and malnutrition screening, leveraging state-of-the-art visual deep learning, retrieval augmentation, and context-aware fusion. It is used both for precision population screening in clinical and low-resource settings, and for dietary management in the general population. Core functionalities span malnutrition risk classification, anthropometric regression from child images, food recognition, nutrient estimation, and privacy-preserving dietary recommendation.

1. System Architecture and Core Pipeline Components

The foundational architecture of NutriScreener operates as a multi-stage inference pipeline: (i) extraction of visual embeddings via a frozen CLIP image encoder, (ii) multi-pose processing through a fully connected Graph Attention Network (GAT), (iii) retrieval augmentation from an external, demographically matched knowledge base using FAISS, and (iv) context-aware fusion for adaptive prediction weighting (Khan et al., 20 Nov 2025). This enables robust handling of variable pose images, generalization across population cohorts, and explicit correction for class imbalance.

For each subject $i$ , up to $P=8$ RGB images $\{x_{i,1},\dots,x_{i,P}\}$ are encoded using CLIP (RN50 $\times$ 64), yielding $1024$-dimensional feature vectors, which are concatenated with subject metadata (e.g., age) to form graph nodes $v_{i,j}\in\mathbb{R}^{1025}$ . The pose graph $G_i=(V_i,E_i)$ is processed with a two-layer, eight-head GAT, enforcing permutation invariance and modeling cross-view correlations. Global mean pooling generates the subject descriptor $h_i$ , which is input to dual MLP heads for malnutrition probability ( $\hat y_i^{\mathrm{cls}}$ ) and anthropometric regression ( $\hat y_i^{\mathrm{reg}}\in\mathbb{R}^4$ ). External retrieval is performed by querying FAISS over global embeddings to obtain top- $k$ neighbors, whose metadata and measurement labels are fused via temperature-scaled softmax weighting, with boost factors $\gamma$ for malnourished cases.

The fusion module adaptively combines model-internal and retrieval-informed predictions via a learned scalar coefficient, using an MLP on confidence, retrieval agreement, and KB density features. The final binary malnutrition score is given by $\hat y_i = \alpha_{\mathrm{cls}}\hat y_i^{\mathrm{cls}} + (1-\alpha_{\mathrm{cls}})y_i^{\mathrm{ret}}$ , with analogous regression fusion.

2. Algorithmic Formulation and Losses

The training objective is a multi-task loss

$\mathcal{L} = \mathcal{L}_{\mathrm{cls}} + \lambda\,\mathcal{L}_{\mathrm{reg}}$

where $\mathcal{L}_{\mathrm{cls}}$ is the binary cross-entropy for classification,

$\mathcal{L}_{\mathrm{cls}} = -\frac{1}{N}\sum_{i=1}^N [ y_i\log\hat y_i + (1-y_i)\log(1-\hat y_i)]$

and $\mathcal{L}_{\mathrm{reg}}$ is mean squared error for anthropometric targets (Height, Weight, MUAC, Head Circumference): $\mathcal{L}_{\mathrm{reg}} = \frac{1}{N}\sum_{i=1}^N \left\| \hat y_i^{\mathrm{reg}} - y_i^{\mathrm{reg}} \right\|_2^2$ Retrieval weights are modulated for class imbalance: malnourished neighbors are weighted by $\gamma>1$ and then renormalized.

Metrics include Recall, AUC (ROC area), and regression RMSE per anthropometric variable: $\mathrm{Recall} = \frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FN}}, \quad \mathrm{RMSE}_v = \sqrt{\frac{1}{N}\sum_{i=1}^N (\hat y_{i,v} - y_{i,v})^2}$ (Khan et al., 20 Nov 2025)

3. Datasets and Training Protocols

Primary training and evaluation are conducted on the AnthroVision dataset (2,141 Indian children, 6–59 months, 8-pose RGB image sets, WHO-z-score malnutrition labels, 29.9% malnourished), with external validation on ARAN (512 Kurdish children, 16–98 months, 4 multi-pose images), and CampusPose (college-aged adults, BMI proxy) (Khan et al., 20 Nov 2025). All images are captured frontal/lateral/back/selfie, with subject age metadata. Cross-validation uses four splits, seed 42, batch size 8, and Adam optimizer ( $\mathrm{lr}=1\times10^{-3}$ ). Data augmentation is limited to cropping and brightness jitter due to robustness gains from multi-view input.

Class imbalance (healthy:malnourished $\sim$ 70:30) is handled by (a) upweighting malnourished cases in retrieval, (b) using context-aware fusion to dampen unreliable retrieval regions, and (c) ablations with focal loss for hard positive emphasis.

4. Performance Evaluation and Ablations

On AnthroVision test folds, the retrieval-augmented, weighted NutriScreener achieves Recall $0.79\pm0.02$ , AUC $0.82\pm0.01$ , Height RMSE $6.38$ cm, and Weight RMSE $5.32$ kg (Khan et al., 20 Nov 2025). Compared to a DomainAdapt baseline (Recall $0.67$, Height RMSE $22$ cm), NutriScreener delivers a 12% absolute recall gain and up to $15$ cm RMSE improvement.

Cross-dataset generalization analysis shows that a demographically matched knowledge base at inference yields recalls up to $25\%$ higher and height RMSEs up to $3.5$ cm lower on ARAN; in out-of-domain CampusPose, performance is invariant to retrieval, likely due to feature space shift.

Ablation studies confirm that multi-pose GAT with CLIP (RN50 $\times$ 64, frozen) outperforms single-view CNNs (GAT Recall $0.54$, CNN Recall $<0.40$ ), and that retrieval-only is highly sensitive but lacks regression accuracy. Fused weighted retrieval and GAT achieves the best balanced metrics (Recall $0.79$, AUC $0.82$). Statistical tests (Friedman, $p<0.05$ ) confirm significant architecture differences; Cohen's $d\approx7.8$ for Recall marks a very large effect size.

5. Clinical Validation and Deployment

A prospective paper with 12 clinicians (mean experience 9.5 years) rating 15 real cases each found NutriScreener consistent with their clinical judgments (mean 4.3/5), efficient (4.6/5), trustworthy (4.4/5), and deployment-ready (4.1/5) (Khan et al., 20 Nov 2025). Doctors highlighted its role as an "objective second opinion" and suggested addition of uncertainty estimation and interpretability features. The system is validated for CPU-only deployment with $<1$ GB RAM, supporting low-resource operation.

6. Strengths, Limitations, and Prospective Directions

Strengths:

Joint classification and continuous anthropometric regression from unconstrained 2D images.
Robustness to pose variation and domain shift via multi-pose GAT and CLIP embeddings.
Quantitative improvements over prior domain adaptation and CNN-only baselines.
Class-boosted retrieval augments sensitivity in minority classes.
Fusion module adapts between model- and KB-based inference for calibration (ECE $=0.06$ , Brier $=0.16$ ).
Real-world validation in clinical flows confirms utility in community-health and telemedicine.

Limitations:

High dependence on demographic alignment between subject and KB; domain shifts degrade performance.
Absence of explicit CLIP fine-tuning to preserve generalization may underutilize domain cues.
Interpretability for end users is limited by opacity of fusion weights.
Unaddressed extensions include explicit uncertainty quantification and explanation support.

Potential extensions include expansion of knowledge bases to cover more geographies, integration of visual explanations, model quantization for ultra-low-power devices, and further robustness via 2D keypoint or segmentation-guided priors (Khan et al., 20 Nov 2025).

7. Relationship to Broader NutriScreener Paradigms

While NutriScreener as described here is focused on pediatric malnutrition screening from images, closely related systems—such as food image recognition and dietary intake estimation platforms—deploy architectures with deep vision backbones, semantic segmentation, and personalized recommendation modules for dietary management and hospital nutrition monitoring (Nossair et al., 2 Jun 2024, Lu et al., 2019, Freitas et al., 2020, Han et al., 20 Aug 2024). Across all use cases, NutriScreener systems emphasize automated, scalable, and privacy-preserving nutritional assessment by harnessing contemporary advances in visual representation learning, robust multi-modal fusion, and context-aware inference.