Papers
Topics
Authors
Recent
2000 character limit reached

Resolution-Invariant NIR-to-RGB Colorization

Updated 10 January 2026
  • The paper introduces scalable deep learning models that use SPADE and multi-scale feature fusion for accurate NIR-to-RGB translation across varying resolutions.
  • Key methodologies include multi-branch encoders, patch-based overlapping inference, and combined loss functions that ensure color, texture, and perceptual consistency.
  • Practical insights highlight robust integration in lidar and multi-modality imaging, enabling reliable feature extraction and performance despite sensor resolution variations.

Resolution-invariant NIR-to-RGB colorization refers to a set of deep learning strategies and network architectures designed to translate near-infrared (NIR) imagery into high-fidelity, naturalistic RGB renderings while maintaining stable performance across diverse image resolutions. This objective is motivated by practical challenges in multi-modality imaging—including lidar and NIR camera systems—where input resolutions may vary due to sensor design, application domain, or downstream computational constraints. The key principle is to ensure that both the chromatic accuracy and fine structural details of synthesized RGB outputs are robust to the spatial scale of the NIR input, while also enabling efficient adaptation to high-resolution imagery without loss of texture or color consistency (Attri et al., 3 Jan 2026, Zhai et al., 2024, Ha et al., 4 May 2025).

1. Core Approaches and Model Architectures

Resolution-invariant NIR-to-RGB colorization models are characterized by architectural features and training paradigms that break fixed-size dependencies and enable scale-agnostic processing. Notable recent advances include:

  • Multi-branch Encoders with Local-Global Feature Interactions: The HAQAGen model employs a Mamba-based encoder–decoder core with two decoding branches: one predicts an HSV prior field, the other reconstructs the RGB image, injecting local hue-saturation cues at every decoder stage via SPADE conditioning. This enables localized chromatic guidance even at large resolutions or spatially varying image content (Attri et al., 3 Jan 2026).
  • Multi-scale Feature Embedding and Fusion: MCFNet formalizes a multi-scale design, explicitly extracting and injecting HSV color feature maps and high-frequency texture signals (via Laplacian operations) at corresponding U-Net decoder levels. SPADE-based modulation ensures that color priors are adaptively utilized at each resolution, decoupling the scale of input from both texture and color propagation (Zhai et al., 2024).
  • Resolution-adaptive Inference Engines: HAQAGen’s adaptive-resolution inference leverages a sliding-window approach with feathered overlap-hanning blending, permitting inference on arbitrarily large images by locally applying the model to overlapping patches and aggregating results to preserve seam-free structure and texture (Attri et al., 3 Jan 2026).
  • Super-resolution Integration: In lidar-focused domains, architectures may first super-resolve NIR or reflectivity images (e.g., 2Ă— upsampling by CARN with pixel-shuffle) before or after colorization. This normalizes resolution and improves keypoint extraction for subsequent tasks such as odometry (Ha et al., 4 May 2025).

2. Loss Functions and Training Objectives

Resolution-invariant NIR-to-RGB colorization is typically formulated as a multi-component optimization with losses designed to encourage both pixelwise accuracy and perceptual/structural consistency:

  • Adversarial Losses: Most models utilize PatchGAN or Hinge GAN objectives for RGB and/or HSV outputs. The discriminator operates at local scales, which is inherently compatible with variable-resolution inputs (Attri et al., 3 Jan 2026, Zhai et al., 2024).
  • Reconstruction and Perceptual Losses: These comprise â„“1\ell_1 or MSE losses on both RGB and HSV outputs for direct color regression; VGG-based perceptual losses on feature activations to maintain high-level texture/semantics; and sometimes additional cosine similarity or feature-based objectives (Attri et al., 3 Jan 2026, Zhai et al., 2024, Ha et al., 4 May 2025).
  • Texture and Edge-Preserving Terms: Texture-aware components may leverage frozen autoencoders or direct Laplacian (edge) losses to align high-frequency content. MCFNet includes explicit edge loss components from image gradients (Zhai et al., 2024).
  • Global Color Statistics: HAQAGen introduces a differentiable histogram (CDF) matching term to globally regularize output color distributions, helping to prevent spatially coherent but globally unrealistic chromaticity (Attri et al., 3 Jan 2026).
  • Scale Consistency (Optional): Some frameworks suggest introducing scale-consistency terms, e.g., enforcing that outputs downsampled to coarser scales match independent colorizations of those scales. While present as a proposed (but unimplemented) extension in MCFNet, such a loss further regularizes prediction consistency across resolutions (Zhai et al., 2024).

3. Adaptive-Resolution Strategies and Inference

Contrary to resizing inputs to a canonical spatial size, state-of-the-art resolution-invariant models forward images at their native or arbitrary resolution. Central approaches include:

  • Patching with Overlap and Blending: HAQAGen divides high-resolution NIR images into overlapping patches (e.g., 256Ă—256 with ~30px overlaps), processes each patch independently, then aggregates results with a Hanning-blended mask to avoid seams and maintain texture continuity across patch borders (Attri et al., 3 Jan 2026).
  • Fully Convolutional and SPADE-Conditioned Networks: The absence of fully-connected layers or fixed positional embeddings, and reliance on spatially adaptive normalization, allows core architectures to generalize across input sizes. All convolutions and SPADE modulations operate locally, preserving network behavior irrespective of spatial extent (Attri et al., 3 Jan 2026, Zhai et al., 2024).
  • Super-Resolution as a Preprocessing Step: In lidar imagery, all input images are upsampled to at least double their native resolution using networks such as CARN. Colorization is then performed at this common high-resolution scale, ensuring that subsequent keypoint and odometry operations are resolution-independent (Ha et al., 4 May 2025).

4. Evaluation Metrics and Benchmarks

Resolution-invariant NIR-to-RGB colorization is assessed through a battery of perceptual and distortion-based measures, usually reported at multiple scales and across diverse datasets:

  • Quantitative Metrics:
    • PSNR (Peak signal-to-noise ratio)
    • SSIM (structural similarity index) [Hore & Ziou 2010]
    • Angular Error (AE) in RGB space (lower is better)
    • LPIPS (Learned Perceptual Image Patch Similarity) (Attri et al., 3 Jan 2026, Zhai et al., 2024)
  • Comparative Results:
Method PSNR↑ SSIM↑ AE↓ LPIPS↓
ColorMamba 24.56 0.71 2.81 0.212
HAQAGen 24.96 0.71 2.96 0.18
MCFNet 20.34 0.61 3.79 0.208

On VCIP2020, HAQAGen attains higher PSNR and lower LPIPS compared to alternatives, and its sliding-window inference preserves texture over cross-dataset tests (Attri et al., 3 Jan 2026). MCFNet demonstrates strong performance at 256×256, with hypothesized minor metric drop (<0.5 dB PSNR) under explicit resolution variation (Zhai et al., 2024). Lidar domain colorization maintains PSNR, SSIM, and ΔE stable above 2× super-resolved input (Ha et al., 4 May 2025).

5. Practical Considerations and Limitations

  • Paired Data Requirement: All leading models require paired NIR–RGB data for supervised training. This remains a bottleneck for deployment in domains with limited paired sensors (Attri et al., 3 Jan 2026, Zhai et al., 2024).
  • Patch-based Inference Overhead: Sliding-window strategies incur increased computation compared to single forward passes, though they are necessary to preserve texture at large resolutions (Attri et al., 3 Jan 2026).
  • Resolution Invariance Mechanisms: While HAQAGen and MCFNet are natively resolution-agnostic due to fully convolutional and SPADE-based conditioning, direct multi-scale supervision (e.g., with explicit scale-consistency losses) could further reinforce scale robustness (Zhai et al., 2024). In lidar frameworks, practical invariance emerges from upsampling and standardizing the resolution before colorization (Ha et al., 4 May 2025).
  • Downstream Generalization: Empirically, invariance to input resolution correlates with stable downstream performance in feature extraction (e.g., ALIKE, SuperPoint) and odometry, provided minimal upscaling thresholds are met (Ha et al., 4 May 2025).

6. Contributions, Extensions, and Future Directions

Recent literature delineates several key innovations and potential areas for extension:

  • Unified Losses for Color, Texture, and Perceptual Quality: Unified objectives combining differentiable histogram matching, perceptual, texture-aware, and direct pixelwise losses establish a balance between global chromatic fidelity and high-frequency structure (Attri et al., 3 Jan 2026).
  • SPADE-based Local Chromatic Conditioning: Both HAQAGen and MCFNet exploit SPADE to inject local hue and saturation priors, stabilizing reconstructions especially under spatially ambiguous or feature-poor NIR regions (Attri et al., 3 Jan 2026, Zhai et al., 2024).
  • Multi-scale Fusion and Feature Injection: MCFNet’s integration of color features at multiple scales supports rich detail recovery and transferability to variable test resolutions (Zhai et al., 2024).
  • Open Issues and Future Work: Extensions discussed include (i) data-efficient or unpaired training paradigms, (ii) optimized, real-time inference engines via model distillation or progressive growing, and (iii) end-to-end training jointly with downstream perception modules such as segmentation or tracking (Attri et al., 3 Jan 2026, Zhai et al., 2024).

A plausible implication is that, as research advances, explicit multi-scale consistency supervision and emergent, self-supervised colorization will further enhance both the resolution-invariance and the transferability of NIR-to-RGB translation networks to diverse, real-world imaging contexts.

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Resolution-Invariant NIR-to-RGB Colorization.