MADRAS Benchmark

Updated 20 November 2025

MADRAS Benchmark is a comprehensive framework designed to evaluate multi-agent trajectory prediction with an emphasis on joint social plausibility and goal awareness.
It leverages a dataset of over 7,000 pedestrian trajectories from diverse urban scenes recorded during a public festival to simulate real-world crowd dynamics.
The benchmark employs a leave-one-out cross-validation protocol with metrics like ADE, FDE, and collision rate, enabling systematic comparison of state-of-the-art models such as VISTA.

The MADRAS benchmark is a rigorously designed evaluation framework for multi-agent trajectory prediction in dense, unstructured urban environments. Developed for assessing models in settings where joint social plausibility and agent-level goal awareness are critical, MADRAS is notable for its high-density crowd data, complex scenarios, and comprehensive set of accuracy and social-compliance metrics. It enables systematic, scene-agnostic comparison of forecasting methods and has become the reference evaluation for recent state-of-the-art approaches such as VISTA (Martins et al., 13 Nov 2025).

1. Dataset Characteristics and Structure

The MADRAS dataset consists of over 7,000 pedestrian trajectories extracted from zenithal video recordings acquired during the Festival of Lights in Lyon (France). Data were collected across nine distinct urban scenes featuring diverse geometries (open plazas, narrow alleys, pedestrianized squares) within the city center. Scenes are deliberately unstructured: there are no lane markings or sidewalks, reflecting authentic spontaneous crowd dynamics in a public event.

Key dataset statistics:

Maximum localized density reaches 4 pedestrians/m² in the most crowded scenes.
Experimental agent batches typically include tens of simultaneously active agents (scale: dozens to hundreds per scene).
Each trajectory provides a sequence of $(x, y)$ positions in a world-calibrated metric coordinate system, sampled at 2.5 Hz (1 frame per 0.4 s).
The observation window spans $T_{\text{obs}} = 8$ frames (3.2 s), and the prediction target covers $T_{\text{pred}} = 12$ frames (4.8 s).
No explicit goal annotations; models are expected to infer intent by utilizing the final future position as the training “goal.”
Static, semantic scene segmentation maps with $D$ classes are provided (optional), derived from a pretrained segmentation backbone.

This composition ensures that the benchmark reflects the real-world complexity of collective pedestrian motion in heterogeneous environments.

2. Experimental Protocol

MADRAS employs a leave-one-out cross-validation protocol across its nine scenes:

Splitting scheme: For each fold, eight scenes serve for training (an internal validation split may be used), while the uniquely held-out ninth scene is dedicated to testing.
Aggregation: Reported results are averaged over all nine possible train/test splits to yield scene-agnostic performance metrics.
Prediction task: Models are conditioned on 8 observed positions per agent and tasked to forecast the next 12 future positions.

This protocol prioritizes robust generalization across varied environmental layouts, densities, and crowd flow regimes, minimizing scene-specific overfitting.

MADRAS evaluates both trajectory accuracy and joint social realism with a standard set of quantitative metrics, measuring displacement error as well as interaction-aware joint feasibility:

Metric	Measures	Formula (if present)
ADE	Average displacement error (all timesteps, all agents)	$\frac{1}{N k \Delta T} \sum_{i=1}^N \sum_{j=1}^k \sum_{t=T_{\text{obs}}+1}^{T_{\text{pred}}} \\|\hat{y}_t^{i,j} - y_t^i\\|_2$
FDE	Final displacement error (at $T_{\text{pred}}$ )	$\frac{1}{N k} \sum_{i=1}^N \sum_{j=1}^k \\| \hat{y}_{T_{\text{pred}}}^{i,j} - y_{T_{\text{pred}}}^i\\|_2$
minADE $_k$	Best-of- $k$ ADE (multimodal)	$\frac{1}{N} \sum_{i=1}^N \min_{j=1\ldots k}\left(\frac{1}{\Delta T} \sum_t \\|\hat{y}_t^{i,j} - y_t^i\\|_2\right)$
minFDE $_k$	Best-of- $k$ FDE (multimodal)	$\frac{1}{N} \sum_{i=1}^N \min_{j=1\ldots k} \\| \hat{y}_{T_{\text{pred}}}^{i,j} - y_{T_{\text{pred}}}^i\\|_2$
AUC	Area under the error curve for $K = 1..k$	$\sum_{i=1}^N \sum_{K=1}^k E_K^i$ (details as above)
Collision Rate (CR)	Frequency of predicted inter-agent collisions	$\frac{1}{N(N-1)\Delta T} \sum_{t=T_{\text{obs}}+1}^{T_{\text{pred}}} \sum_{i \neq j} 1[\\|\hat{y}_t^i - \hat{y}_t^j\\|_2 < \epsilon]$

$N$ : number of agents.
$k$ : number of sampled prediction futures per agent.
$\Delta T = T_{\text{pred}} - T_{\text{obs}}$ .
$\epsilon$ : maximum inter-agent distance yielding zero collisions in ground truth.

Displacement errors (ADE, FDE) focus on pointwise prediction fidelity, while CR directly penalizes physically unrealistic, overlapping trajectories in dense crowds. The AUC metric penalizes models for generating excessively dispersed multimodal predictions.

4. Baseline Methods and Comparative Performance

The benchmark includes results for several strong baselines retrained under identical preprocessing, as well as the VISTA method, designed specifically with goal and social-awareness components. Summary statistics (all units in meters, CR in percent):

Method	ADE	FDE	minADE	minFDE	AUC	Collision Rate
Y-Net	8.74	15.29	0.50	0.65	118	5.36%
MART	0.69	1.29	0.17	0.24	5.65	2.14%
TUTR	0.91	1.41	0.37	0.56	5.71	N.A.
VISTA	0.64	1.13	0.18	0.25	5.59	0.03%

Notable findings:

Y-Net, a single-agent goal-conditioned predictor, achieves high accuracy in minADE/minFDE but incurs large ADE/FDE scores and frequent collisions (5.36% CR), indicating limited joint realism in dense scenarios.
MART, a multi-agent Transformer, considerably reduces collision rate relative to Y-Net but still exhibits 2.14% collisions.
VISTA attains state-of-the-art accuracy across all error metrics and reduces collision rate to 0.03%, demonstrating superior social compliance and feasibility in highly interactive crowds (Martins et al., 13 Nov 2025).

The inclusion of collision rate (CR) alongside displacement-based metrics addresses a critical shortcoming in crowd trajectory prediction. In highly congested environments, forecasts based solely on ADE/FDE may yield plausible single-agent futures that overlap unrealistically, failing to capture feasible multi-agent coordination. CR directly penalizes such “hallucinated” collisions, incentivizing models to generate jointly plausible, non-overlapping agent paths.

This dual focus on accuracy and interaction-aware realism reflects the demands of autonomous systems in safety-critical settings, where physically plausible group behaviors are as essential as individual prediction fidelity.

6. Benchmark Impact and Research Directions

The MADRAS benchmark establishes a high-bar evaluation setting for multi-agent trajectory prediction, characterized by

Dataset scale and realism: authentic pedestrian crowds at urban events; dense, unstructured, agent-rich scenes.
Scene-agnostic, rigorous evaluation protocol: leave-one-out across diverse spatial contexts.
Emphasis on both goal-oriented forecasting and social-compliance.

By exposing the limitations of existing models (notably high collision rates) and rewarding improvements in joint feasibility, MADRAS has spurred the development of advanced architectures such as VISTA that fuse long-horizon intent modeling with fine-grained social attention.

A plausible implication is that widespread adoption of MADRAS-style evaluation encourages progress toward more robust, socially aware prediction algorithms necessary for deploying autonomous agents in real-world, high-density environments (Martins et al., 13 Nov 2025).

PDF Markdown Chat (Pro)

References (1)

VISTA: A Vision and Intent-Aware Social Attention Framework for Multi-Agent Trajectory Prediction (2025)

Follow Topic

Get notified by email when new papers are published related to MADRAS Benchmark.

MADRAS Benchmark

1. Dataset Characteristics and Structure

2. Experimental Protocol

3. Metrics for Forecast Quality and Social Compliance

4. Baseline Methods and Comparative Performance

5. Rationale and Significance of Social-Compliance Metrics

6. Benchmark Impact and Research Directions

Follow Topic

Continue Learning

Related Topics