Gait Energy Images (GEI) Overview

Updated 19 September 2025

Gait Energy Images (GEI) are compact representations that average binary silhouettes over a gait cycle, capturing both static body shape and dynamic motion.
They offer enhanced storage efficiency and noise robustness, achieving high recognition accuracies on benchmarks like CASIA-B and USF.
Extensions such as Active Energy Image (AEI) and deep learning integrations further improve GEI’s performance in handling covariate variations and cross-view challenges.

Gait Energy Images (GEI) represent a foundational concept in silhouette-based gait recognition. GEI condenses a sequence of binary silhouette images captured over a full gait cycle into a single gray-scale image, where the intensity at each pixel measures the proportion of time that pixel is occupied during the cycle. This compact formulation preserves the global static and dynamic structure of a person's walking pattern, and is inherently robust to noise and intra-cycle variability. Since its introduction, GEI has become a benchmark for biometric identification in computer vision, security, and medical domains.

1. Mathematical Formulation and Construction

The construction of a Gait Energy Image is mathematically defined as an average over $N$ homogeneous silhouette images, $B_t(x,y)$ , obtained from a walking sequence:

$G(x, y) = \frac{1}{N} \sum_{t=1}^N B_t(x, y)$

where $(x, y)$ denotes the pixel coordinates and $N$ is the number of frames in a complete gait cycle (Isaac et al., 2019, Sepas-Moghaddam et al., 2021, Bakchy et al., 2022). Homogeneity refers to each $B_t$ having consistent size and orientation, typically ensured through foreground extraction (e.g., background subtraction, binarization) and normalization. This averaging operation compresses both the static body shape and dynamic swinging motions into a single energy map, efficiently representing the temporal periodicity of walking.

2. Properties and Merits of GEI

GEI's advantages stem from its ability to synthesize the discriminative aspects of gait into a static template:

Storage and Computation Efficiency: By condensing each gait cycle into a single image, GEI requires less storage and computational overhead compared to sequential methods that analyze each frame (Bakchy et al., 2022). For example, processing time for GEI-based recognition is reported as 1.61 seconds versus 2.84 seconds for sequential template methods.
Noise Reduction: The averaging process smooths frame-wise variations and random noise, yielding a robust representation. If $B_t(x, y) = f_t(x, y) + n_t(x, y)$ (true silhouette plus noise), GEI aggregates out $n_t(x,y)$ if the noise distribution is symmetric about zero.
Baseline Robustness: GEI has shown strong recognition performance across benchmark datasets (e.g., USF, CASIA-B, TUM-GAID), with reported recognition rates outperforming several alternative approaches (Bakchy et al., 2022, Apostolidis et al., 2021).

The following table summarizes key properties:

Property	Impact	Reference
Efficiency	Reduces storage/time	(Bakchy et al., 2022)
Noise Robustness	Smoother template	(Bakchy et al., 2022)
Recognition	High accuracy on CASIA-B	(Apostolidis et al., 2021)

3. Limitations and Covariate Sensitivity

GEI's principal limitation lies in the temporal information loss, as the averaging operation discards the order of limb motions (Isaac et al., 2019, Sepas-Moghaddam et al., 2021, Bharadwaj et al., 2020). Distinct gaits with similar averaged silhouettes (shape or leg swing amplitude) may become indistinguishable. Additionally, GEI is sensitive to silhouette quality—occlusions, segmentation errors, or covariates such as clothing and carried objects can adversely affect recognition performance (Aggarwal et al., 2016, Isaac et al., 2017).

Covariate challenges are often addressed by segmenting or masking affected regions. For instance, frameworks such as Genetic Template Segmentation (GTS) and covariate-conscious Zernike moment analysis mask out unreliable regions (midsections) or select stable regions (head/leg), improving robustness in real-world scenarios (Isaac et al., 2017, Aggarwal et al., 2016).

4. Extensions, Variants, and Comparative Analysis

To address GEI's loss of temporal dynamics, several template extensions have been proposed:

Active Energy Image (AEI): Focuses on dynamic regions, computed by averaging difference images between consecutive silhouettes. AEI provides improved resistance to appearance changes and better highlights motion regions (Bharadwaj et al., 2020).
Gait Entropy Image (GEnI), Chrono Gait Image (CGI), Period Energy Image (PEI), Gait Flow Image (GFI): These alternatives attempt to preserve more fine-grained spatio-temporal features, with some (e.g., GFI) leveraging optical flow rather than binary silhouettes (Isaac et al., 2019, Sepas-Moghaddam et al., 2021).

Comparisons demonstrate that while GEI generally offers high computational efficiency and baseline robustness, variants such as AEI can outperform GEI under appearance or covariate variation, given their emphasis on motion changes (Bharadwaj et al., 2020).

5. Deep Learning Integration and Practical Applications

GEIs are well-suited to deep learning pipelines due to their fixed-size global representation. Popular architectures—ResNet, MobileNet, DenseNet, VGG, and others—have achieved robust performance when retrained on GEI images from large datasets (e.g., CASIA-B) (Apostolidis et al., 2021). Transfer learning on GEI images enables effective adaptation of models pre-trained on unrelated domains. Grad-CAM analyses have shown that CNNs typically attend to the torso and hip regions, which are most discriminative for re-identification tasks.

In addition, deep learning models have addressed practical issues such as occlusion (RGait-NET, Incomplete-To-Complete GEI Network) by reconstructing missing data for effective GEI generation from partial or noisy sequences (Babaee et al., 2018, Das et al., 2019).

Gait-based age estimation frameworks employ GEI inputs to extract both global and local features (head, body, feet), enabling ordinal regression architectures that outperform alternative age estimation methods (Zhu et al., 2019).

6. Cross-View and Multimodal Fusion Approaches

GEI's view sensitivity—variations caused by changing camera angles—is mitigated by advanced synthesis and disentanglement techniques. Dense-View GAN (DV-GAN) generates intermediate GEIs at 1-degree intervals, enabling dense view coverage and improving cross-view robustness (Liao et al., 2020). Discriminant Gait GAN (DiGGAN) aligns view-invariant features before synthesizing probe GEIs at target views, not only boosting identification accuracy but also providing evidence images for system decisions (Hu et al., 2018).

Multimodal feature fusion strategies further enhance recognition by combining GEI with raw pixel values, optical flow, and depth cues. CNN models operating on these fused modalities yield accuracies that match or surpass those of traditional silhouette-based features, even when operating on downsampled, low-resolution input (Castro et al., 2018).

7. Benchmarks, Datasets, and Applications

GEI-based methods have formed the basis for performance assessment on a range of publicly available datasets:

USF Gait Challenge Dataset: Provides ground for evaluation amid covariates (load, terrain, shoe type) (Isaac et al., 2019).
CASIA-B: Offers multi-view, multi-condition gait recordings. GEI approaches achieve greater than 90% accuracy under standard conditions, with specialized methods (e.g., GTS, AESI+ZNK) further improving performance under covariate stress (Apostolidis et al., 2021, Aggarwal et al., 2016, Isaac et al., 2017).
TUM-GAID: Includes RGB, depth, and audio synchronizations that have enabled multimodal fusion studies (Castro et al., 2018).

In clinical domains, advances such as GAITGen (Conditional Residual VQ-VAE with Transformer refinement) generate realistic, pathology-conditioned gait sequences, supporting diagnosis and prognosis in Parkinson's Disease gait analysis (Adeli et al., 28 Mar 2025). The disentanglement of motion and pathology representations facilitates synthetic dataset enrichment for downstream model training and clinical studies.

Gait Energy Images represent a compact, noise-robust baseline for biometric identification and behavioral analysis, forming the substrate for a range of template-based, deep learning–driven, and generative approaches. While their loss of temporal nuances is a recognized limitation, ongoing research continues to build on and transcend the GEI paradigm in handling covariate, cross-view, multimodal, and clinical challenges in gait analysis.