HVI: A New Color Space for Low-light Image Enhancement (2502.20272v2)

Published 27 Feb 2025 in cs.CV, cs.AI, and cs.LG

Abstract: Low-Light Image Enhancement (LLIE) is a crucial computer vision task that aims to restore detailed visual information from corrupted low-light images. Many existing LLIE methods are based on standard RGB (sRGB) space, which often produce color bias and brightness artifacts due to inherent high color sensitivity in sRGB. While converting the images using Hue, Saturation and Value (HSV) color space helps resolve the brightness issue, it introduces significant red and black noise artifacts. To address this issue, we propose a new color space for LLIE, namely Horizontal/Vertical-Intensity (HVI), defined by polarized HS maps and learnable intensity. The former enforces small distances for red coordinates to remove the red artifacts, while the latter compresses the low-light regions to remove the black artifacts. To fully leverage the chromatic and intensity information, a novel Color and Intensity Decoupling Network (CIDNet) is further introduced to learn accurate photometric mapping function under different lighting conditions in the HVI space. Comprehensive results from benchmark and ablation experiments show that the proposed HVI color space with CIDNet outperforms the state-of-the-art methods on 10 datasets. The code is available at https://github.com/Fediory/HVI-CIDNet.

Summary

The paper introduces the HVI color space and CIDNet, a dual-branch network leveraging HVI for robust low-light image enhancement by decoupling chromaticity and intensity.
The HVI color space minimizes noise by reducing Euclidean distances between similar colors using polarized HS maps and a trainable intensity collapse function.
Experiments show that the CIDNet method using HVI outperforms state-of-the-art methods across multiple datasets, demonstrating significant improvements in metrics like PSNR and SSIM.

The paper introduces the Horizontal/Vertical-Intensity (HVI) color space designed to address issues in Low-Light Image Enhancement (LLIE) related to color bias and artifacts when using standard RGB (sRGB) or Hue, Saturation, and Value (HSV) color spaces. HVI employs polarized HS maps and learnable intensity to minimize color space noise by reducing Euclidean distances between similar colors. A Color and Intensity Decoupling Network (CIDNet) is introduced to leverage the chromatic and intensity information in the HVI space for photometric mapping under different lighting conditions.

The paper's contributions include:

The HVI color space defined by polarized HS and trainable intensity for eliminating color space noise.
The CIDNet network for modeling intensity and chromaticity of low-light images in the HVI space.
Experimental results demonstrating improved performance over state-of-the-art methods across 10 datasets.

The HVI color space aims to minimize color space noise by reducing Euclidean distances between similar colors. For black plane noise, the paper introduces a trainable darkness density parameter $k$ and adaptive intensity collapse function $C_k$ :

$C_k(x)=\sqrt[k]{\sin (\frac{ \pi \mathbf{I}_{max}(x) }{2} )+\mathcal{\varepsilon} }$

where:

$k\in \mathbb{Q^+}$ is a trainable parameter controlling dark color point density
$\mathcal{\varepsilon}=1\times10^{-8}$ is used to avoid gradient explosion.
$\mathbf{I}_{max}(x)$ is the intensity map of an image

The Horizontal ( $\mathbf{\hat{H}}$ ) map and Vertical ( $\mathbf{\hat{V}}$ ) map are then formalized as:

$\mathbf{\hat{H}} = \mathbf{C}_k \odot \mathbf{S} \odot H$

$\mathbf{\hat{V}} = \mathbf{C}_k \odot \mathbf{S} \odot V$

where:

$h \in H$
$v \in V$
$\odot$ denotes element-wise multiplication

The HVI image is formed by concatenating $\mathbf{\hat{H}}$ , $\mathbf{\hat{V}}$ , and $\mathbf{I}_{max}$ .

The Color and Intensity Decoupling Network (CIDNet) models the HV-plane and I-axis information in the HVI space using a dual-branch architecture. It comprises an HVI transformation, a dual-branch enhancement network, and a perceptual-inverse HVI transformation. The dual-branch network, built upon the UNet architecture, consists of an encoder and decoder with Lighten Cross-Attention (LCA) modules.

The loss function guides the enhancement from the sRGB space and the HVI map:

$L= \lambda\cdot l(\mathbf{\hat{I}_{HVI},\mathbf{I_{HVI}) + l(\mathbf{\hat{I},\mathbf{I})}$

where:

$\lambda$ is a weighting hyperparameter.
$\mathbf{\hat{I}_{HVI}$ is the enhanced HVI map
$\mathbf{\hat{I}}$ is the restored sRGB image
$\mathbf{I}$ is the sRGB GroundTruth
$\mathbf{I_{HVI}$ is its corresponding HVI map

The paper evaluates the HVI color space and key modules in CIDNet through quantitative and qualitative analysis. Experiments were conducted on the LOLv2-Real dataset. Results showed that the HVI color space effectively preserves the decoupling of brightness and color while minimizing artifacts. Ablation studies validate the contribution of the dual-branch structure and the cross-attention mechanism.

The paper presents quantitative results on several datasets: LOLv1, LOLv2 (Real and Synthetic), SICE (Mix and Grad), Sony-Total-Dark, and unpaired datasets (DICM, LIME, MEF, NPE, VV). Evaluation metrics include PSNR, SSIM, Learned Perceptual Image Patch Similarity (LPIPS), BRISQUE, and NIQE.

On LOL datasets, CIDNet achieves optimal performance in PSNR, SSIM, and LPIPS, with 1.88M parameters and 7.57 GFLOPs.

On SICE and Sony-total-Dark, CIDNet demonstrates high performance in both PSNR and SSIM metrics. Particularly on Sony-Total-Dark, CIDNet surpasses the second-best method by 6.678 dB in PSNR.

The HVI transformation, when used as a plug-and-play module, improves PSNR, SSIM, and LPIPS metrics across various state-of-the-art methods in the sRGB color space. The GSAD method shows the most significant improvement, with a PSNR increase of 3.562 dB.

The paper also discusses the use of parameter $k$ to adjust the gradient of $C_k$ over Intensity. The density- $k$ is set to be a hyper-parameter that learns the noise and detail conflicts between different datasets.

The paper extends the HVI color space by applying a linear mapping to the Hue transformation $\mathbf{P}_\gamma$ and a saturation mapping function $T(x)$ to adapt to the varying sensitivity of different cameras to the RGB channels and different low-light scenes. Specifically, $\mathbf{P}_\gamma$ is set to be an linear map of Hue in HSV color space as

$\mathbf{P}_\gamma= \begin{cases} \frac{1}{2}\gamma_G\mathbf{h}, &\text{if } 0\le \mathbf{h} <2\ \frac{1}{2}(\gamma_B-\gamma_G)(\mathbf{h}-2)+\gamma_G, &\text{if } 2\le\mathbf{h} <4 \ \frac{1}{2}(6-\gamma_B)(\mathbf{h}-6)+6, &\text{if } 4\le \mathbf{h} \le6 \end{cases}$ where $\gamma_G,\gamma_B\in(0,6)$ , $\mathbf{h}\in\left[ 0,6\right]$ denotes the hue value.

A saturation mapping relationship between different scenes can be defined as

$\mathbf{D}_T = T(\frac{\mathbf{P}_\gamma}{6})$

where $T(\cdot)$ satisfies $T(0)=T(1)$ and $T(\mathbf{P}_\gamma)\ge0$ .

Experiments on generalization ability show that on LOLv2-Syn, CIDNet outperforms LLFlow, RetinexFormer, and GSAD, with improved PSNR, SSIM, and LPIPS metrics. CIDNet also outperformed the unsupervised methods RUAS, PairLIE and EnlightenGAN in terms of generalisation ability.

Additional experiments were performed on the LOL-Blur dataset for joint low-light image deblurring and enhancement, and the SIDD dataset for image denoising.