Papers
Topics
Authors
Recent
Search
2000 character limit reached

On-Board Driving Stack for Autonomous Vehicles

Updated 25 January 2026
  • On-Board Driving Stack is a comprehensive system integrating diverse sensor modalities, perception algorithms, localization techniques, prediction models, planning strategies, and control methods for autonomous driving.
  • The architecture emphasizes modularity and real-time performance by leveraging ROS middleware, strict time synchronization, and high-performance hardware like GPUs and dedicated controllers.
  • Recent enhancements incorporate deep language and vision models for personalized control and dynamic decision-making, improving safety through diagnostic routines and adaptive strategies.

An on-board driving stack refers to the complete integrated suite of sensing, perception, localization, prediction, planning, and control software and hardware deployed within an autonomous vehicle to enable real-time, closed-loop driving without reliance on off-board computation. Such stacks are responsible for real-time acquisition and fusion of multi-modal sensor data, semantic scene understanding, local state estimation, trajectory planning with hard safety guarantees, low-latency actuation, and increasingly, high-level reasoning via vision-language or LLMs—all operating under the compute, bandwidth, and safety constraints of deployment on production or research vehicles.

1. System Architecture and Module Composition

On-board stacks are universally architected as a real-time, modular sense–plan–act pipeline integrating the following primary subsystems:

  • Sensing and Time-Synchronization: Sensor suites typically comprise multi-camera arrays, 3D LiDAR, radar, GNSS-RTK, IMU, wheel encoders, and proprietary vehicle state sensors. Hardware platforms commonly feature dedicated real-time controllers (e.g., dSPACE MicroAutoBox II), GPU-accelerated compute units (e.g., NVIDIA Jetson Orin AGX), or COTS industrial PCs (Rampuria et al., 2024, Kessler et al., 2019, Teikmanis et al., 2023).
  • Perception and Scene Understanding: Fusion of vision (YOLO, Faster R-CNN, ViT, etc.), LiDAR (point cloud ground-plane removal, clustering), and radar is implemented to localize static/dynamic obstacles and track semantics (cones, vehicles, pedestrians). Recent architectures leverage deep learning models (YOLOv5s/TensorRT, RekTNet, ByteTrack) and multiple depth-estimation pipelines, achieving mean depth errors as low as 0.85% via LiDAR–camera fusion (Rampuria et al., 2024, Alvarez et al., 2022, Khaled et al., 18 Jan 2026, Cui et al., 2024).
  • Localization and State Estimation: Techniques range from GNSS/IMU-based EKF or ESKF, graph-based SLAM (g2o, MRPT, Cartographer, iSAM2), to pose-graph optimizers, frequently operating in parallel with robust outlier rejection. State representations often couple ego pose/velocity and mapped landmarks: x=[xv,yv,ψv,v,m1,...,mN]x = [x_v, y_v, \psi_v, v, m_1, ..., m_N] (Rampuria et al., 2024, Alvarez et al., 2022, Ali et al., 23 Sep 2025, Ochs et al., 2024).
  • Prediction: Leading stacks employ interpretable, goal- and intent-based predictors (IGP2, GOFI), tree-ensembles (GRIT), and GNNs for multi-modal distributional forecasting. Bayesian posteriors over agent goals and scene occlusion probabilities inform the planning module, supporting safety by design (2208.00096, Ochs et al., 2024).
  • Planning: Trajectory or path planners leverage curvature or comfort cost minimization, typically as finite-horizon nonlinear optimal control, hybrid QP/NLP stages (2s-OPT), relaxation-based QP solvers, or particle-based optimizers (PSO). Safety constraints are encoded as hard boundaries over reachable state-sets and STL logic (Rampuria et al., 2024, 2208.00096, Ochs et al., 2024, Alvarez et al., 2022, Ali et al., 23 Sep 2025).
  • Control: Lateral and longitudinal controls range from kinematic bicycle models, Stanley controller, Pure Pursuit, LTV-MPC, optimal control allocation (torque vectoring), to model-based PID/PI/PID+MPC cascades. Delay compensation and distributed over actuation channels (CAN, dSPACE CAN, Raptor DBW) are required for robust performance (Rampuria et al., 2024, Alvarez et al., 2022, Ali et al., 23 Sep 2025, Kessler et al., 2019).
  • Safety and Diagnostics: Supervisory modules explicitly monitor health/faults, maintain time budgets, enforce fallbacks, and publish diagnostic status. These routines intercept violations of latency, staleness, actuation limits, or anomalous localization (Ochs et al., 2024, Teikmanis et al., 2023, Ali et al., 23 Sep 2025).

Typical inter-module communication is implemented in ROS 1/2 (DDS or Fast-RTPS middleware) with type-safe topics and rigorous time-stamping. Cross-domain CAN/Ethernet bridges and UDP/MQTT relays support hardware interface and V2X scenarios (Rampuria et al., 2024, Teikmanis et al., 2023, Khaled et al., 18 Jan 2026, Kessler et al., 2019).

2. Sensing, Perception, and Localization Pipelines

Modern stacks implement multi-tiered fusion and parallelized perception, combining:

  • Deep Learning Detectors: YOLOv5s on NVIDIA TensorRT achieves mAP≈0.985 for cone detection. ByteTrack and YOLOv8-n are used for real-time object/subclass detection with <20ms latencies per frame at up to 50Hz (Rampuria et al., 2024, Khaled et al., 18 Jan 2026).
  • Depth Estimation: Pipelines include LiDAR–camera fusion (ground-plane removal by RANSAC, DBSCAN clustering, projective fusion for 0.85% avg. error), monocular bounding-box height power-law fits (4.49% error), and stereo keypoint-based triangulation by RekTNet and SIFT (6.39%) (Rampuria et al., 2024). RAFT-Stereo, DeepLabV3, and classical triangulation are also employed (Khaled et al., 18 Jan 2026).
  • SLAM/State Estimation: Feature-based EKF-SLAM or factor-graph SLAM enables vehicle pose and map estimation x=[xv,yv,ψv,v,m1...,mN]x = [x_v, y_v, \psi_v, v, m_1..., m_N], with motion/measurement updates at 100Hz/30Hz, and O(n2)O(n^2) data association in parallelized background threads, yielding <0.2m RMS error (Rampuria et al., 2024, Alvarez et al., 2022). RMS error of <0.03m (GNSS RTK) or <0.2m (pose-graph) is attained for urban deployments (Ochs et al., 2024).
  • Real-Time Synchronization: All sensors disciplined to GPS-PPS, with ROS message_filters ApproximateTime policy (±5ms thresholds), and dedicated buffer nodes to prevent sensor backlog (Rampuria et al., 2024).

Depth, object class, velocity, yaw, and safety metrics such as time-to-collision (TTC) and time-headway (THW) are computed per detection and serialized using custom ROS2 messages for distributed multi-agent or infrastructure interaction (Khaled et al., 18 Jan 2026).

3. Planning and Control Methodologies

The planning subsystem is typically formalized as:

minpath Γs=0Lκ(s)2ds\min_{\text{path } \Gamma} \int_{s=0}^{L} \kappa(s)^2 ds

where κ(s)\kappa(s) is path curvature, LL is path length, subject to static/dynamic safety constraints (Rampuria et al., 2024, Alvarez et al., 2022, Ali et al., 23 Sep 2025). For urban/complex domains, complete nonlinear finite-horizon optimal control problems are posed, with MILP warm-start (2s-OPT) and NLP refinement (2208.00096). For racing, Delaunay triangulation and minimum-curvature cubic splines define racelines, velocity profiles are shaped by G–G or GGS diagrams and aerodynamic models (Alvarez et al., 2022, Rampuria et al., 2024, Ali et al., 23 Sep 2025). Adaptive sector scaling and overtake planners (Frenet or local spline-based) are applied in head-to-head racing (Baumann et al., 2024).

Control architecture:

  • Lateral Control: Stanley/Pure Pursuit (with δ=ψerr+arctan(kee/v)\delta = \psi_{\rm err} + \arctan(k_e e / v) or δ=arctan(2Lsinα/dLA)\delta = \arctan(2L\sin\alpha/d_\mathrm{LA})), LTV-MPC on bicycle model (x=[y,vy,ψ,ψ˙]x=[y,v_y,\psi,\dot\psi], input u=δu=\delta), MAP/L1 pursuit for scaled platforms (Rampuria et al., 2024, Alvarez et al., 2022, Baumann et al., 2024).
  • Longitudinal Control: 2-DOF PI/PID (e.g., uthrottle=Kpev+Kievdtu_\mathrm{throttle}=K_p e_v+K_i\int e_v\,dt), with optional feed-forward term from trajectory (Rampuria et al., 2024, Alvarez et al., 2022, Ali et al., 23 Sep 2025).
  • Low-Level Control Allocation: Quadratic programs for four-wheel torque vectoring, active yaw stabilization (Alvarez et al., 2022).
  • Real-Time Budgeting: Module latencies: YOLOv5s ≈12 ms (GPU), SLAM prop. 0.8 ms, measurement upd. ≈15 ms for 30 landmarks, control loops ≤1 ms, CAN round-trip ≈3 ms (Rampuria et al., 2024).

Switching between controllers (Stanley/Pure Pursuit) can reduce cross-track error (from 0.33m to 0.27m), with performance monitored during simulation and full-scale robot operation (Rampuria et al., 2024). Competitive stacks maintain ≥1.5× emergency stopping distance at all times (Ochs et al., 2024).

4. Integration of High-Level Reasoning and Personalization via Deep Language/Vision Models

Recent architectures extend the stack through on-board deployment of compressed LLMs and vision-LLMs (VLMs):

  • Motion Control Personalization: VLMs (e.g., Qwen-VL, ~9B parameters, INT4 quantization) receive image, natural language instructions, and system/user context plus retrieved RAG memory, producing a 2×32 \times 3 action matrix that parametrizes longitudinal PID and lateral MPC controllers. The system learns from human feedback, updating memory via Chroma DB to fuse previous scenarios and user ratings. End-to-end stack achieves \approx1.6 s VLM inference latency on RTX-A4000, with customization lowering takeover rates by up to 76.9% in human-in-the-loop Level 3 trials (Cui et al., 2024).
  • Knowledge-Driven Adaptive Control: LLM-based DecisionxLLM and MPCxLLM modules operate as slow periodic (0.2–0.3Hz) or on-demand controllers, translating high-level behavioral intent and state summaries into dynamic reparametrization of MPC weights/bounds via prompt-based interaction (LoRA + RAG + Q5 quantization; up to 52.2% improvement in control adaptability, 10× computational speedup compared to FP16). The MPC always enforces hard constraints; the LLM never overrides feasibility (Baumann et al., 15 Apr 2025).
  • HMI and Safety: Human-comprehensible language prompts, continual learning from interaction, and integration of safety fallbacks (e.g., revert to conservative PID if predicted TTC < 1.5s) characterize the state of the art (Cui et al., 2024, Baumann et al., 15 Apr 2025).

5. Communication, Modularity, and Real-Time Guarantees

Stacks universally rely on modular software infrastructure:

  • Middleware: ROS/ROS2 (DDS/Fast-RTPS) for intra-vehicle, CAN bus (up to CAN FD), high-speed Ethernet (VLAN segmented for criticality separation), and V2I via UDP/MQTT/ASN.1 (Teikmanis et al., 2023, Rampuria et al., 2024, Khaled et al., 18 Jan 2026).
  • Time Synchronization: GPS-PPS, broadcasting time-stamped messages. Buffering camera/point-cloud data to match real-time pipeline clearance (Rampuria et al., 2024).
  • Node Modularity: Each functional subsystem operates as a stateless ROS node or lifecycle container with update/activate/deactivate semantics, facilitating hot-swapping, rapid edge-case testing, and real-to-sim/sim-to-real transitions (Ochs et al., 2024, Kessler et al., 2019).
  • System Integration: Full system latencies as low as 49.7ms (end-to-end control), control loops at 10–100Hz, node-level robustness enforced by heartbeat timeouts, hardware modularity (plug-and-play sensor frames), and resilience to node failures (Teikmanis et al., 2023, Ochs et al., 2024, Ali et al., 23 Sep 2025).

Most stacks adopt rigorous performance instrumentation and diagnostic logging, with formal diagnostic rules triggering safety stops or failover in the event of fault (Ochs et al., 2024, Teikmanis et al., 2023, Khaled et al., 18 Jan 2026). This ensures real-world safety and makes them suitable for Level 4–proof urban and racing deployments spanning >3,000 km of autonomy (Ochs et al., 2024).

6. Experimental Performance and Lessons Learned

Quantitative results for competitive stacks:

Task Depth Error (%) Localization RMS (m) Cross-Track Error (m) Takeover Rate Reduction (%) End-to-End Latency (ms)
Perception (fusion) 0.85–6.4 0.03–0.2 0.10–0.33 76.9 15–70
Planning (PSO/NLP) N/A N/A N/A N/A 50–180
Control (PID/MPC) N/A N/A 0.10 N/A 1–50
LLM/VLM inference N/A N/A N/A +52.2 adaptability 900–1980
  • Formula Student-AI, Indy Autonomous Challenge, and F1TENTH race teams achieve state-of-the-art by focusing on highly modular, minimal-latency, safety-verified subsystems and rapid iteration. Real-time reporting, code modularity, and system introspection tools are universally recommended for reproducible and safe deployment (Rampuria et al., 2024, Alvarez et al., 2022, Ali et al., 23 Sep 2025, Baumann et al., 2024).
  • Integration of knowledge-driven and learning-based modules facilitates robust handling of rare edge cases and individualized user preferences while maintaining deterministic control guarantees (Cui et al., 2024, Baumann et al., 15 Apr 2025).

Key guidelines arising from empirical experience include early end-to-end instrumentation, mixed-criticality partitioning (e.g., safety-critical controllers physically separated from soft-real-time pipeline), and progressive hardware/software-in-the-loop testing (Kessler et al., 2019, Ochs et al., 2024).

  • Edge Personalization: Wide adoption of vision-language and LLMs (VLM, LLM) on the vehicle compute edge, with quantization (INT4/INT5), LoRA, and RAG techniques to enable real-time, memory-efficient customization and continual adaptation (Cui et al., 2024, Baumann et al., 15 Apr 2025).
  • Safety-By-Design and Verification: Formal verification of tree-based predictors (e.g., GRIT/SMT invariants), scenario-based ODD parameterization, and dual high/low-fidelity simulation loops characterize modern validation methods (2208.00096, Ochs et al., 2024).
  • Open, Modular Toolchains: Standardization in ROS 2 message contracts, hardware abstraction, and modularization accelerates cross-platform support, facilitating adaptation from high-speed racing to urban shuttles and scaled platforms (Ochs et al., 2024, Baumann et al., 2024).
  • Human-in-the-Loop and Edge Reasoning: RAG-based interaction histories and prompt-driven dynamic control allow true human–machine co-adaptation and explainability, reducing intervention rates and enabling new interaction paradigms (Cui et al., 2024, Baumann et al., 15 Apr 2025).

A plausible implication is that future on-board stacks will increasingly integrate formally verified learning-based reasoning modules, dynamic reconfiguration, and modular hardware/software to maximize both safety and adaptability across a growing diversity of vehicles and operational domains.

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to On-Board Driving Stack.