Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 40 tok/s Pro
GPT-5 High 38 tok/s Pro
GPT-4o 103 tok/s Pro
Kimi K2 200 tok/s Pro
GPT OSS 120B 438 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Jetson Xavier NX Board

Updated 23 October 2025
  • Jetson Xavier NX board is a compact, power-efficient embedded module featuring a 384-core Volta GPU, 48 Tensor Cores, and a 6-core Carmel CPU for edge AI applications.
  • The board achieves up to 21 TOPS performance by leveraging TensorRT optimizations, quantization, and layer fusion to enable real-time inference on deep learning models.
  • Its configurable power modes (10W/15W) and robust software ecosystem (JetPack, CUDA, cuDNN, TensorRT) support versatile deployments in robotics, agriculture, and automotive domains.

The Jetson Xavier NX board is a compact, power-efficient embedded computing module from NVIDIA, designed for high-performance AI inferencing and moderate workloads in robotics, UAVs, digital agriculture, and edge computing applications. Integrating a 384-core Volta GPU with 48 Tensor Cores and a 6-core Carmel ARM CPU, the Xavier NX provides up to 21 TOPS (INT8), 8 GB LPDDR4x memory, and a configurable power envelope (10/15 W), enabling deployment of advanced deep learning models on constrained mobile and industrial platforms.

1. Hardware Architecture and System Capabilities

Jetson Xavier NX leverages a heterogeneous compute architecture, combining:

  • 384-core NVIDIA Volta GPU with 48 Tensor Cores (high-throughput parallel processing)
  • 6-core NVIDIA Carmel ARMv8.2 64-bit CPU (1.9 GHz peak)
  • 8 GB LPDDR4x memory (51.2 GB/s bandwidth)
  • I/O: high-speed interfaces (PCIe, USB 3.1, SD card slot, CSI/DSI for cameras)
  • Configurable operation modes (10 W and 15 W) for balancing compute and power

This architecture enables real-time processing of dense sensor data and execution of resource-intensive DNNs directly on edge devices. The Xavier NX supports NVIDIA’s JetPack stack, CUDA, cuDNN, TensorRT, and ONNX, facilitating deployment of advanced, quantized deep learning inference and computer vision pipelines on the board (Wang et al., 2021, Farooq et al., 2022, Lahmer et al., 2022).

2. Deep Learning Inference: Optimization and Workloads

The Xavier NX can execute various DNN inference workloads—object detection, segmentation, and depth estimation—at edge-appropriate throughput and efficiency:

  • Low-latency Object Detection: Models such as YOLOv3 and YOLOv5 are optimized via TensorRT, employing layer fusion (e.g., combining convolution, batchnorm, and activation into one operation) and quantization (FP16 and INT8). Example: For a YOLOv3 detector trained on a multi-dataset corpus and compiled for TensorRT, layer fusion and quantization collectively yield up to 11× latency reduction over the unoptimized baseline, reaching ≈12 ms per inference (≈83 FPS) on Xavier; when adapted for NX, similar methodologies yield high-throughput object detection suitable for real-time UAV surveillance (Vandersteegen et al., 2019). For a YOLOv5-small variant, TensorRT optimization achieves ≈60 FPS at INT8 on the NX board (Farooq et al., 2022).
  • Semantic Segmentation and Depth Estimation: Lightweight encoder–decoder architectures, such as GuideDepth, integrate Guided Upsampling Blocks (GUBs) guided by RGB input, preserving high-frequency detail and enabling extremely high throughput (up to 144.5 FPS on 240×320 images) (Rudolph et al., 2022).
  • Multi-tasking Networks: Deployment of multi-output architectures (e.g., joint object detection, drivable area, and lane segmentation) is supported by optimizing for edge via ONNX export, lightweight backbones (MobileNetV2), and resolution scaling (10–22 FPS at 384×640 or lower) (Miraliev et al., 2022).

These optimizations rely on minimizing memory bandwidth, exploiting the combined GPU/CPU architecture, and leveraging TensorRT’s fusion and dynamic quantization. Quantization calibration on INT8 uses KL-divergence minimization:

Dkl(P    Q)  =  xP(x)logP(x)Q(x)D_{kl}(P\;\|\;Q)\;=\;\sum_x P(x)\log\frac{P(x)}{Q(x)}

where PP is the full-precision activation distribution and QQ is the INT8-quantized distribution (Vandersteegen et al., 2019).

3. Robotics, SLAM, and Visual-Inertial Workloads

Jetson Xavier NX is deployed in mobile robotics—most notably UAVs and ground robots—for high-fidelity SLAM, Visual Odometry (VO), and sensor fusion:

  • Visual(-Inertial) Odometry: The NX board, equipped with sufficient CPU and GPU resources, permits execution of stereo and visual-inertial VO/VIO algorithms (VINS-Mono/Fusion, Kimera, ORB-SLAM2 stereo, Stereo-MSCKF). Stereo VIO on the NX provides significantly lower Absolute Trajectory Error (ATE) RMSE, especially in challenging trajectories (circle, fast rotation, pure head rotation) of the KAIST VIO dataset. Resource-wise, NX balances feature-rich stereo pipelines (1200 features/frame possible with ORB-SLAM2) against its moderate CPU performance; memory usage is dominated by stereo pipelines, especially with GPU-accelerated variants (Jeon et al., 2021).
  • GPU-Accelerated SLAM: Newer SLAM pipelines such as FeatSense partition computation between CPU (feature extraction, scan matching) and CUDA-accelerated GPU (TSDF voxel tube fusion, map generation). TSDF backend offloads voxel updates using CUDA atomicCAS for thread-safe updates and achieves speedup >100× versus FPGA solutions. Full LiDAR scan registration and map updates for 128-beam Ouster OS1-128 at 10 Hz are sustained with sub-26 ms latency per frame on the Xavier NX (Gaal et al., 2023).
  • GPU-Accelerated Tracking: In visual SLAM (e.g., FastTrack atop ORB-SLAM3), kernels for stereo feature matching and local map search-by-projection are offloaded to CUDA, providing up to 2.8× throughput improvement without affecting map accuracy. Full per-frame tracking is consistently delivered due to lower kernel-induced variance and optimized inter-Kernel memory reuse (Khabiri et al., 13 Sep 2025).

4. Energy/Performance Trade-offs and Empirical Models

Systematic studies profile the NX’s energy consumption and performance characteristics, providing practical models for deployment:

  • Empirical Layer Energy Models: Energy per network layer (e.g., convolution) is measured as a function of the computational load count (CLC), typically exhibiting a power-law dependence:

logE^=βlog(CLC)+logα\log \hat{E} = \beta\,\log(\mathrm{CLC}) + \log\alpha

Larger kernel sizes, increased output feature maps, and reduced strides all drive up energy, while the NX board outperforms earlier Jetson boards in normalized efficiency (Lahmer et al., 2022).

  • Training on Edge: While most use focuses on inference, the NX can conduct DNN training (e.g., for federated learning). Pipeline parallelism (e.g., PyTorch DataLoader workers), batch size, and I/O storage media are critical resource-determining factors. Due to its 8 GB shared memory and moderate CPU/GPU, caching is limited, making disk I/O and pre-processing optimization key for larger models (such as MobileNet-v3 on GLD). A typical power envelope is 15 W, with modest variability (≈5%) across epochs and devices. Predictive models for cumulative energy per epoch during training integrate instantaneous power and time:

Etotal=tiTpti(titi1)E_\mathrm{total} = \sum_{t_i \in T} p_{t_i} \cdot (t_i - t_{i-1})

(K. et al., 24 Sep 2025).

5. Application Domains: Edge Intelligence in Agriculture, Automotive, and UAV

The Xavier NX enables several real-world application fields:

  • Digital Agriculture: In the ORPHEUS living lab, NX boards process visual and sensor data at the edge (e.g., crop/livestock analytics, anomaly detection) instead of depending on central servers. Its AI throughput (up to 21 TOPS, 8 GB RAM) offsets connectivity limitations in rural deployments. On-board processing reduces required network transmission, as evidenced by measured network stability metrics—Packet Delivery Ratio (PDR), Packet Error Ratio (PER), and Packet Miss Ratio (PMR) (Wang et al., 2021).
  • Automotive Assist/ADAS: Deployment of thermal object detectors (optimized YOLOv5 small) demonstrates that TensorRT and FP16/INT8 quantization allow 60 FPS on NX for real-time vehicular thermal imaging. Such throughput is essential given real-world scenes (>35,000-frame dataset) and matches domain accuracy requirements (e.g., 70–72% mAP at 60 FPS) (Farooq et al., 2022).
  • Autonomous Navigation/UAVs: The board’s design—compact, low-mass, and energy efficient—enables real-time deployment in UAVs and small robots, running optimized object detectors, semantic segmentation, and SLAM stacks at operational framerates and accuracy (Palmas et al., 2022). In open-field agricultural robotics, object detectors (RetinaNet ResNet-50) optimized with TF-TRT/F16 on NX are projected to yield ≈10–15 FPS, approaching near-real-time operation with F1 ≈70% and mAP ≈60% (Magalhães et al., 2022).

6. Design, Programming, and Model Deployment Considerations

  • Software Ecosystem: The NX supports JetPack (CUDA, cuDNN, TensorRT, ONNX, DeepStream), enabling deployment via Docker or natively (with standard toolchains). ONNX conversion enables portable and efficient model inference—critical for multitask pipelines (Miraliev et al., 2022).
  • Memory/Power Optimization: Model architecture selection (e.g., preference for lightweight MobileNetV2 over ResNet50 where possible) and explicit quantization and layer fusion are necessary to both stay within device resource limits and achieve real-time throughput. In segmentation and detection contexts, model size (weights, bytes), FLOPs, and batch size must be tuned to maintain real-time output at manageable power usage (≈10–15 W).
  • Edge Device Benchmarking: Application-specific benchmarks (e.g., multi-task driving pipelines, UAV VIO/SLAM, video anomaly detection) consistently show that the Xavier NX is substantially more capable than predecessor boards (Nano, TX2) in both performance and energy normalization but is exceeded by latest-generation devices (e.g., Orin Nano) in raw throughput and energy efficiency, at the cost of higher power and price points (Wang et al., 2021, Pham et al., 2023).

7. Limitations and Performance Trade-offs

  • While the Xavier NX delivers significant advantages in computational density and AI inference throughput, constraints arise from its shared 8 GB RAM, moderate CPU core count, and total power budget (10/15 W). For some heavy stereo VIO or deep semantic models, the AGX Xavier (with 32 GB RAM and more CUDA cores) provides lower latency and higher sustained throughput.
  • Larger batch sizes for training or inference may saturate available cache/memory or degrade real-time performance. Optimizing pipeline parallelism (e.g., four DataLoader workers) and efficient pre-processing routines is required to minimize CPU/GPU stalls.
  • The performance delta between INT8-quantized and FP32/FP16 models is non-negligible for some sensitive tasks (e.g., small object or low-contrast segmentation), and choice of quantization should be evaluated per workload.

Summary Table: Representative Xavier NX Workloads

Application Domain Model/Stack FPS (Optimized) Accuracy Metrics
UAV Object Detection YOLOv3/TensorRT ≈83 mAP ↑ (3.5–50% gains)
Thermal Object Detection YOLOv5s/TensorRT ≈60 mAP ≈ 70–72%
Monocular Depth Estimation GuideDepth 99.6–144.5 RMSE: 0.501 (NYU)
Multi-Task Self-Driving MobileNetV2+ONNX 10–22 Output det/seg/lane
Visual-Inertial Odometry (Stereo) VINS-Fusion, Kimera Real-time Lower ATE RMSE vs mono
GPU-Accel SLAM (LiDAR) FeatSense (TSDF) 10Hz scans RMSE: 0.19–0.28 (traj)
Edge Agri Video Detection RetinaNet/TF-TRT ≈10–15* F1: ≈0.7, mAP: ≈0.6

*Estimated, based on paper extrapolations.

In conclusion, the Jetson Xavier NX board represents a versatile and efficient platform for edge AI across a spectrum of robotics, perception, and IoT workloads. It supports advanced quantized and fused model deployment, efficient parallelism via CUDA, and systematic energy/performance profiling, making it well suited for scenarios where performance, size, weight, and power must be carefully balanced. The board’s empirical energy and runtime models, coupled with a broad ecosystem of supported deep learning frameworks, enable both principled design and practical deployment of sophisticated edge intelligence systems.

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Jetson Xavier NX Board.