Papers
Topics
Authors
Recent
2000 character limit reached

MADRAS Benchmark

Updated 20 November 2025
  • MADRAS Benchmark is a comprehensive framework designed to evaluate multi-agent trajectory prediction with an emphasis on joint social plausibility and goal awareness.
  • It leverages a dataset of over 7,000 pedestrian trajectories from diverse urban scenes recorded during a public festival to simulate real-world crowd dynamics.
  • The benchmark employs a leave-one-out cross-validation protocol with metrics like ADE, FDE, and collision rate, enabling systematic comparison of state-of-the-art models such as VISTA.

The MADRAS benchmark is a rigorously designed evaluation framework for multi-agent trajectory prediction in dense, unstructured urban environments. Developed for assessing models in settings where joint social plausibility and agent-level goal awareness are critical, MADRAS is notable for its high-density crowd data, complex scenarios, and comprehensive set of accuracy and social-compliance metrics. It enables systematic, scene-agnostic comparison of forecasting methods and has become the reference evaluation for recent state-of-the-art approaches such as VISTA (Martins et al., 13 Nov 2025).

1. Dataset Characteristics and Structure

The MADRAS dataset consists of over 7,000 pedestrian trajectories extracted from zenithal video recordings acquired during the Festival of Lights in Lyon (France). Data were collected across nine distinct urban scenes featuring diverse geometries (open plazas, narrow alleys, pedestrianized squares) within the city center. Scenes are deliberately unstructured: there are no lane markings or sidewalks, reflecting authentic spontaneous crowd dynamics in a public event.

Key dataset statistics:

  • Maximum localized density reaches 4 pedestrians/m² in the most crowded scenes.
  • Experimental agent batches typically include tens of simultaneously active agents (scale: dozens to hundreds per scene).
  • Each trajectory provides a sequence of (x,y)(x, y) positions in a world-calibrated metric coordinate system, sampled at 2.5 Hz (1 frame per 0.4 s).
  • The observation window spans Tobs=8T_{\text{obs}} = 8 frames (3.2 s), and the prediction target covers Tpred=12T_{\text{pred}} = 12 frames (4.8 s).
  • No explicit goal annotations; models are expected to infer intent by utilizing the final future position as the training “goal.”
  • Static, semantic scene segmentation maps with DD classes are provided (optional), derived from a pretrained segmentation backbone.

This composition ensures that the benchmark reflects the real-world complexity of collective pedestrian motion in heterogeneous environments.

2. Experimental Protocol

MADRAS employs a leave-one-out cross-validation protocol across its nine scenes:

  • Splitting scheme: For each fold, eight scenes serve for training (an internal validation split may be used), while the uniquely held-out ninth scene is dedicated to testing.
  • Aggregation: Reported results are averaged over all nine possible train/test splits to yield scene-agnostic performance metrics.
  • Prediction task: Models are conditioned on 8 observed positions per agent and tasked to forecast the next 12 future positions.

This protocol prioritizes robust generalization across varied environmental layouts, densities, and crowd flow regimes, minimizing scene-specific overfitting.

3. Metrics for Forecast Quality and Social Compliance

MADRAS evaluates both trajectory accuracy and joint social realism with a standard set of quantitative metrics, measuring displacement error as well as interaction-aware joint feasibility:

Metric Measures Formula (if present)
ADE Average displacement error (all timesteps, all agents) 1NkΔTi=1Nj=1kt=Tobs+1Tpredy^ti,jyti2\frac{1}{N k \Delta T} \sum_{i=1}^N \sum_{j=1}^k \sum_{t=T_{\text{obs}}+1}^{T_{\text{pred}}} \|\hat{y}_t^{i,j} - y_t^i\|_2
FDE Final displacement error (at TpredT_{\text{pred}}) 1Nki=1Nj=1ky^Tpredi,jyTpredi2\frac{1}{N k} \sum_{i=1}^N \sum_{j=1}^k \| \hat{y}_{T_{\text{pred}}}^{i,j} - y_{T_{\text{pred}}}^i\|_2
minADEk_k Best-of-kk ADE (multimodal) 1Ni=1Nminj=1k(1ΔTty^ti,jyti2)\frac{1}{N} \sum_{i=1}^N \min_{j=1\ldots k}\left(\frac{1}{\Delta T} \sum_t \|\hat{y}_t^{i,j} - y_t^i\|_2\right)
minFDEk_k Best-of-kk FDE (multimodal) 1Ni=1Nminj=1ky^Tpredi,jyTpredi2\frac{1}{N} \sum_{i=1}^N \min_{j=1\ldots k} \| \hat{y}_{T_{\text{pred}}}^{i,j} - y_{T_{\text{pred}}}^i\|_2
AUC Area under the error curve for K=1..kK = 1..k i=1NK=1kEKi\sum_{i=1}^N \sum_{K=1}^k E_K^i (details as above)
Collision Rate (CR) Frequency of predicted inter-agent collisions 1N(N1)ΔTt=Tobs+1Tpredij1[y^tiy^tj2<ϵ]\frac{1}{N(N-1)\Delta T} \sum_{t=T_{\text{obs}}+1}^{T_{\text{pred}}} \sum_{i \neq j} 1[\|\hat{y}_t^i - \hat{y}_t^j\|_2 < \epsilon]
  • NN: number of agents.
  • kk: number of sampled prediction futures per agent.
  • ΔT=TpredTobs\Delta T = T_{\text{pred}} - T_{\text{obs}}.
  • ϵ\epsilon: maximum inter-agent distance yielding zero collisions in ground truth.

Displacement errors (ADE, FDE) focus on pointwise prediction fidelity, while CR directly penalizes physically unrealistic, overlapping trajectories in dense crowds. The AUC metric penalizes models for generating excessively dispersed multimodal predictions.

4. Baseline Methods and Comparative Performance

The benchmark includes results for several strong baselines retrained under identical preprocessing, as well as the VISTA method, designed specifically with goal and social-awareness components. Summary statistics (all units in meters, CR in percent):

Method ADE FDE minADE minFDE AUC Collision Rate
Y-Net 8.74 15.29 0.50 0.65 118 5.36%
MART 0.69 1.29 0.17 0.24 5.65 2.14%
TUTR 0.91 1.41 0.37 0.56 5.71 N.A.
VISTA 0.64 1.13 0.18 0.25 5.59 0.03%

Notable findings:

  • Y-Net, a single-agent goal-conditioned predictor, achieves high accuracy in minADE/minFDE but incurs large ADE/FDE scores and frequent collisions (5.36% CR), indicating limited joint realism in dense scenarios.
  • MART, a multi-agent Transformer, considerably reduces collision rate relative to Y-Net but still exhibits 2.14% collisions.
  • VISTA attains state-of-the-art accuracy across all error metrics and reduces collision rate to 0.03%, demonstrating superior social compliance and feasibility in highly interactive crowds (Martins et al., 13 Nov 2025).

5. Rationale and Significance of Social-Compliance Metrics

The inclusion of collision rate (CR) alongside displacement-based metrics addresses a critical shortcoming in crowd trajectory prediction. In highly congested environments, forecasts based solely on ADE/FDE may yield plausible single-agent futures that overlap unrealistically, failing to capture feasible multi-agent coordination. CR directly penalizes such “hallucinated” collisions, incentivizing models to generate jointly plausible, non-overlapping agent paths.

This dual focus on accuracy and interaction-aware realism reflects the demands of autonomous systems in safety-critical settings, where physically plausible group behaviors are as essential as individual prediction fidelity.

6. Benchmark Impact and Research Directions

The MADRAS benchmark establishes a high-bar evaluation setting for multi-agent trajectory prediction, characterized by

  • Dataset scale and realism: authentic pedestrian crowds at urban events; dense, unstructured, agent-rich scenes.
  • Scene-agnostic, rigorous evaluation protocol: leave-one-out across diverse spatial contexts.
  • Emphasis on both goal-oriented forecasting and social-compliance.

By exposing the limitations of existing models (notably high collision rates) and rewarding improvements in joint feasibility, MADRAS has spurred the development of advanced architectures such as VISTA that fuse long-horizon intent modeling with fine-grained social attention.

A plausible implication is that widespread adoption of MADRAS-style evaluation encourages progress toward more robust, socially aware prediction algorithms necessary for deploying autonomous agents in real-world, high-density environments (Martins et al., 13 Nov 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to MADRAS Benchmark.