Integrated Multimodal UAM Framework

Updated 28 November 2025

Multimodal UAM frameworks are integrated systems that coordinate aerial and ground transport with real-time trajectory management and adaptive AI perception.
They employ layered architectures, advanced optimization algorithms, and reinforcement learning to enhance operational efficiency and safety in urban environments.
High-fidelity simulations demonstrate significant improvements in ridership, resource utilization, and the reduction of Loss-of-Separation events.

A Multimodal UAM (Urban Air Mobility) Framework refers to an integrated set of methodologies, architectures, and optimization algorithms that jointly coordinate aerial and ground transport, multimodal perception and decision-support, and the multi-actor dynamics required for large-scale, safety- and efficiency-critical deployments. The field draws from advances in transport network optimization, behavioral simulation, deep multimodal learning, reinforcement learning-based traffic management, and AI-driven resource allocation. Leading frameworks in this domain underpin strategic planning, operational coordination, and real-time adaptive control for UAM networks with interactions across physical transport, user interfaces, environmental safety, and multimodal AI perception.

1. Multimodal UAM System Architectures

Multimodal UAM frameworks are structurally composed of layered subsystems that span infrastructure (e.g., vertiports, ground access), transport network optimization modules, decision-support and learning engines, and multimodal user-perception interfaces. A representative architecture couples:

Strategic Network Design Layer: Outer-loop optimizers (population-level Genetic Algorithms, MIP solvers) identify key infrastructure placements and joint air-ground resource deployments (Brulin et al., 2023, Jiang et al., 5 Oct 2025).
Behavioral Simulation and Equilibrium Layer: Multi-agent simulation platforms such as MATSim, LPSim, or BlueSky, model the adaptive interaction of heterogeneous agents (individual travelers, pilots, autonomous vehicles) with full day-to-day, mode-switching, and equilibrium search (Brulin et al., 2023, Jiang et al., 5 Oct 2025).
Operational and Control Layer: Real-time trajectory management through probabilistic prediction, decentralized RL-based coordination, and robust speed/altitude adjustment (Cho et al., 28 Jan 2025, Murthy et al., 22 Aug 2025).
Resource Management Layer: Decentralized holonic architectures where Supervisor, Planner, Task, and Resource Holons orchestrate multi-leg journeys, coordinate disruptions, and balance dynamic resources across air and ground (Sadik et al., 1 May 2025).
Multimodal Perception and User Interaction Layer: Components for adaptive multimodal fusion (visual, audio, text, physiological signals), user-centered personalization, and robust perception under sensor uncertainty (Gomaa, 2022, Li et al., 15 Aug 2025, Chen et al., 21 Nov 2025).

2. Optimization and Equilibrium Formulation

A critical element in multimodal UAM frameworks is the optimization of both infrastructure location (vertiports, fleet staging) and operational assignments (routing, scheduling, resource allocation) under endogenous, agent-adaptive demand. For example:

Bi-level Activity-Based Network Design: The bi-level loop, as detailed in (Brulin et al., 2023), optimizes upper-level vertiport locations via NSGA-II, with inner-loop MATSim simulations endogenously reassigning agent mode choices to reach an approximate Nash equilibrium. The objectives are Pareto maximization of system-wide UAM demand and minimization of facility cost, subject to constraints on vertiport count:

$\max_{x_j \in \{0,1\}} f_1(x) = \sum_{j \in N} x_j \cdot d_j(x)$

$\min_{x_j \in \{0,1\}} f_2(x) = \sum_{j \in N} x_j$

s.t. $\sum_{j \in N} x_j \leq P$

where $d_j(x)$ is realized, simulation-driven UAM demand under vertiport configuration $x$ .

System-of-Systems Resource Allocation: Holonic UAM architectures formulate cost-time tradeoff programs over binary assignments for air and ground assets:

$\min_{x^a_{ij}, x^g_{ij}} f = \sum_{i,j}(c^a_{ij} x^a_{ij} + c^g_{ij} x^g_{ij}) + \lambda \sum_{i,j}(t^a_{ij} x^a_{ij} + t^g_{ij} x^g_{ij})$

with constraints on asset capacities and service time windows (Sadik et al., 1 May 2025).

Multimodal Equilibrium and Fleet Optimization: Large-Scale Parallel Simulation (LPSim) seeks a multimodal user equilibrium (Wardrop conditions) with no incentive to switch between ground and UAM, co-optimized with integer programming for minimum fleet size subject to timetable and capacity constraints (Jiang et al., 5 Oct 2025).
Passenger-Centric Fairness and Class Differentiation: MILP formulations accommodate differentiated service by travel class, quantile-based flight scheduling synchronized with real-time ground access, and lexicographic objectives balancing wait time, fairness indices, and operational costs (Bennaceur et al., 2021).

3. Real-Time Trajectory Management and Safety

The integration of UAM into shared airspace and urban environments necessitates robust frameworks for trajectory prediction, dynamic separation assurance, and noise/safety tradeoff management:

Probabilistic Trajectory Prediction: Conditional Normalizing Flows are trained to estimate $p(Y_t|X_t)$ where $Y_t$ is the forecast trajectory segment, $X_t$ is history. These models supply Monte Carlo samples to assess Loss-of-Separation (LoS) risks under multimodal (IFR, VFR, UAM) scenarios. UAM vehicles select speed adjustment actions to minimize LoS probability $P_{LoS}^a(t)$ (Cho et al., 28 Jan 2025).
Decentralized Reinforcement Learning for Airspace Management: Multi-agent Markov Decision Processes employ attention-augmented policies learned by PPO, with reward functions balancing ground-noise impact, separation penalties, and energy usage:

$r_i^t = \rho_{noise} r_{noise}(s_i^t) + \rho_{sep} r_{separation}(s_i^t, h^t) + \rho_{energy} r_{energy}(s_i^t, a_i^t)$

The framework demonstrates scalable management of hundreds of agents, revealing trade-offs among vertical distribution for noise/safety, and energy expenditures (Murthy et al., 22 Aug 2025).

4. Multimodal Perception and Robustness

Multimodal UAM frameworks advance adaptive perception and user interaction under diverse and uncertain sensor and interface combinations:

Adaptive User-Centered Fusion: Modular approaches hierarchically combine hand gesture, head pose, eye gaze, speech, and vehicle telemetry via configurable late fusion with per-user, per-context adaptation weights:

$y = \mathrm{softmax}(W_G w_G F_G + W_H w_H F_H + W_E w_E F_E + W_S w_S F_S)$

The system supports continual learning, transfer-of-learning personalization, and safety-constrained adaptation, validated in driving-simulator trials with quantitative improvements in referencing accuracy and user trust (Gomaa, 2022).

Uncertainty-Aware Multimodal Modeling: Token-mapper frameworks (e.g., CLIP-based UMM (Li et al., 15 Aug 2025)) employ pluggable convolutional and language encoders, synthetic modality generators, and transformer fusion cores. The architecture maintains performance even with arbitrary missing modalities, leveraging frozen vision-language priors, and achieving real-time inference speeds suitable for automotive perception tasks.
Unified Attention-Mamba Backbones: Integrated Transformer-Mamba blocks (UAM) fuse radiomics and image data for cell classification and segmentation, outperforming both single-modality and fixed-ratio hybrids, and enabling extensible fusion across biomedical modalities (Chen et al., 21 Nov 2025).

5. Holonic and Decentralized Planning Paradigms

Large-scale UAM requires decentralized and resilient coordination among diverse actors and modes:

LLM-Enhanced Holonic Coordination: An interplay of Supervisor, Planner, Task, and Resource Holons, coordinated by LLM-driven natural language understanding and situational adaptation, delivers flexible, human-in-the-loop multimodal planning. The case paper with air taxis and electric scooters showed reduced wait times, improved throughput, low-latency replanning, and rapid recovery from facility disruptions (Sadik et al., 1 May 2025).
Broker-Operator Layering: Separation of pooling/scheduling and operational routing, with broker layers ingesting probabilistic ETAs and traveler class to schedule (via MILP/UAM-Beam Search), and operator layers handling dynamic eVTOL assignment and battery-aware routing, yields scalable, fairness-enforceable multimodal synchronization (Bennaceur et al., 2021).

6. Evaluation, Deployment, and Generalizability

Multimodal UAM frameworks are validated via high-fidelity simulation or real-world case studies, with performance metrics including ridership share, travel time savings, wait time fairness, LoS rates, resource utilization, and computational efficiency.

Simulation Platforms: MATSim for activity-based behavioral equilibrium (Brulin et al., 2023), LPSim for multi-GPU parallelism and mixed-integer fleet optimization (Jiang et al., 5 Oct 2025), BlueSky for airspace dynamics and RL training (Murthy et al., 22 Aug 2025).
Empirical Results: Pareto designs with endogenous adaptation outperform static heuristics (89.98% vs. 70.97% UAM demand for same vertiport deployments) (Brulin et al., 2023). RL policies can nearly eliminate LoS events up to critical safety–noise tradeoff thresholds (Murthy et al., 22 Aug 2025).
Modular Generality: All frameworks can support substitution (e.g., swapping UAM with other modes or extending perception modules with new sensors), supporting transfer to different urban regions, infrastructure, or operational regimes (Brulin et al., 2023, Gomaa, 2022, Li et al., 15 Aug 2025).

7. Future Directions and Open Challenges

Prospective research directions include:

Generalizing pixel-to-pixel unification to video, audio, or 3D representations—transcending current text/image bounds (Zhang et al., 21 Nov 2025).
Deployment of on-edge, low-latency, privacy-respecting AI agents for holonic architectures (Sadik et al., 1 May 2025).
Integration of strategic demand-capacity balancing, federated-learning anomaly detection, and certification-to-fly AI control (Jiang et al., 5 Oct 2025, Sadik et al., 1 May 2025).
Extension of UAM backbones to genomics, high-throughput single-cell analytics, and hybrid medical/transport systems (Chen et al., 21 Nov 2025).

A plausible implication is that the simultaneous progress of optimization, learning-based control, resilient planning architectures, and robust multimodal perception is essential for large-scale, safe, and equitable deployment of Urban Air Mobility systems.