Norm-Preserved Feature Map (NP-Map)

Updated 5 February 2026

NP-Map is a feature normalization paradigm that computes per-position channel statistics to preserve spatial structure for enhanced image translation and generative tasks.
It integrates a forward-inverse process via PONO and Moment Shortcut, stabilizing training by re-injecting preserved statistical cues into later network layers.
Empirical results reveal significant improvements in FID and LPIPS on benchmarks like CycleGAN and Pix2pix, underscoring its potential in encoder-decoder architectures.

The Norm-Preserved Feature Map (NP-Map) is a feature normalization paradigm in deep neural networks that uniquely computes and preserves per-position statistical information across the channel dimension, as implemented in Positional Normalization (PONO). Unlike conventional normalization schemes such as BatchNorm, InstanceNorm, or LayerNorm, which typically aggregate statistics across spatial dimensions and subsequently discard them, NP-Map leverages the spatial distribution of feature moments to explicitly transfer structural information within the network. This mechanism both stabilizes training and enhances the propagation of crucial structure, particularly in encoder–decoder and generative models (Li et al., 2019).

1. Mathematical Definition and Forward–Inverse Process

Given an activation tensor $F\in\mathbb{R}^{C\times H\times W}$ (with $C$ channels and spatial dimensions $H\times W$ ), NP-Map at each spatial coordinate $(x,y)$ computes the channelwise mean $\mu_{x,y}$ and standard deviation $\sigma_{x,y}$ as follows: $\mu_{x,y} = \frac{1}{C} \sum_{c=1}^C F_{c,x,y}$

$\sigma_{x,y} = \sqrt{ \frac{1}{C} \sum_{c=1}^C (F_{c,x,y}-\mu_{x,y})^2 + \varepsilon }$

where $\varepsilon$ is a small constant (typically $10^{-5}$ ) for numerical stability. The normalized activation at each channel, $\hat F_{c,x,y}$ , is then: $\hat F_{c,x,y} = \frac{F_{c,x,y} - \mu_{x,y}}{\sigma_{x,y}}$ NP-Map introduces a denormalization, or "re-injection," stage (called Moment Shortcut, MS): the decoder or a subsequent network stage utilizes the preserved $\mu_{x,y}$ and $\sigma_{x,y}$ to reconstruct the original scale and position: $F'_{c,x,y} = \hat F_{c,x,y}\, \sigma_{x,y} + \mu_{x,y}$ In PONO, these per-position statistics are "shortcut" to later layers, directly supplying the decoder with spatial structural cues.

2. Structural Information Preservation

NP-Map's per-position computation of $\mu_{x,y}$ and $\sigma_{x,y}$ yields two $H\times W$ "statistic maps" that encapsulate the spatial structure of activations. Visualization of these maps in pretrained architectures (e.g., VGG-19, ResNet, DenseNet) demonstrates that object boundaries, facial features, and image silhouettes are explicitly traced out by these moments. This structural information, critical for image generation and translation, is preserved and exploited by NP-Map, as these statistics are not discarded but rather provided as explicit guidance for reconstruction in the decoder. In contrast, other normalizations aggregate and then discard spatial cues, forcing the network to relearn spatial structure in the decoding process (Li et al., 2019).

3. Implementation Details and Pseudocode

The minimalist implementation of PONO and its Moment Shortcut is as follows:

import torch, torch.nn.functional as F

def PONO(x, eps=1e-5):
    # x: [B, C, H, W]
    mu   = x.mean(dim=1, keepdim=True)                # [B,1,H,W]
    var  = x.var(dim=1, keepdim=True, unbiased=False) # [B,1,H,W]
    sigma = torch.sqrt(var + eps)                     # [B,1,H,W]
    x_norm = (x - mu) / sigma                        # normalized feature
    return x_norm, mu, sigma

def MS(x, mu, sigma):
    # x: normalized decoder feature [B,C,H,W]
    # mu, sigma: from encoder [B,1,H,W]
    return x * sigma + mu

A representative usage pattern in an encoder-decoder block:

The encoder output is normalized with PONO to yield $(\text{encoder\_out},\mu,\sigma)$ .
The decoder output is then denormalized with MS using preserved $(\mu, \sigma)$ .

The $\varepsilon$ hyperparameter is typically set to $1\mathrm{e}{-5}$ for stability (Li et al., 2019).

4. Empirical Performance and Key Findings

NP-Map, via PONO and its Moment Shortcut mechanism, delivers substantial improvements on diverse image-to-image translation benchmarks, including CycleGAN, Pix2pix, MUNIT, and DRIT. Specifically:

FID (Fréchet Inception Distance) reductions of $10$–$20$\% are reported on datasets such as Map↔Photo, Horse↔Zebra, Cityscapes, and Day↔Night.
Perceptual similarity, as measured by LPIPS, consistently improves.
Training stability is enhanced, with fewer mode collapses and faster convergence.
Examples: On CycleGAN Map $\rightarrow$ Photo, FID drops from approximately $58.0$ to $53.0$ with PONO-MS; on Pix2pix Cityscapes label $\rightarrow$ photo, FID reduces from roughly $71.2$ to $64.8$.
On large-scale classification (ResNet-18 on ImageNet), PONO accelerates training loss reduction and slightly improves top-1 error (from $30.09$ to $30.01$) (Li et al., 2019).

5. Comparison with Other Normalization Schemes

NP-Map/PONO diverges fundamentally from existing normalization techniques in both normalization target and statistic handling, as summarized below:

Method	Normalization Axis	Statistics Retained	Statistic Usage
BatchNorm (BN)	Over $(B,H,W)$ per channel	Discarded	Uses affine only
InstanceNorm (IN)	Over $(H,W)$ per channel	Discarded	Uses affine only
LayerNorm (LN)	Over $(C,H,W)$ per example	Discarded	Uses affine only
GroupNorm (GN)	Over channel groups	Discarded	Uses affine only
NP-Map (PONO)	Over $C$ at each $(x,y)$	Retained	Re-injected (MS)

Unlike these methods, NP-Map (PONO) strictly computes $\mu_{x,y}$ and $\sigma_{x,y}$ over channels at each spatial position, with no spatial pooling, and retains them for explicit re-injection in later layers. This mechanism is particularly effective when introduced into generative models, style transfer, image translation, and domains requiring preservation of spatial cues, such as segmentation, inpainting, super-resolution, and video (Li et al., 2019).

6. Applications and Implications

NP-Map is especially beneficial in encoder-decoder, generative, and multimodal translation architectures, offering:

Direct propagation of structure from encoder to decoder stages.
Enhanced content shape and structure preservation during style transfer and image translation.
Potential utility for spatially sensitive computer vision tasks where explicit structural statistics can be leveraged.

A plausible implication is that carrying forward explicit spatial moments relieves downstream layers from the burden of reconstructing lost structure, thereby encouraging more robust and efficient learning dynamics in deep architectures. In summary, NP-Map as realized by PONO constitutes a lightweight, per-position normalization system that complements rather than replaces standard normalizers, directly integrating structural information into the information flow of deep networks (Li et al., 2019).

Markdown Report Issue Upgrade to Chat

References (1)

Positional Normalization (2019)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Norm-Preserved Feature Map (NP-Map).