- The paper introduces Flickerformer, which integrates periodicity and directionality priors into a transformer to robustly remove flicker artifacts from burst imaging.
- It employs FFT-based phase fusion and autocorrelation modules alongside wavelet-based directional attention to precisely localize and suppress structured flicker.
- Quantitative evaluations on the BurstDeflicker benchmark reveal significant PSNR, SSIM, and LPIPS improvements, confirming its practical impact on image quality.
Short-exposure photography under AC-powered illumination is fundamentally challenged by spatially and temporally structured flicker artifacts arising from periodic light oscillations and rolling-shutter sensor inconsistencies. Such artifacts degrade image perceptual quality and undermine downstream vision pipelines, especially in dynamic scenarios or burst imaging. Existing restoration frameworks treat flicker as generic noise, thus failing to leverage its inherent physical structureโspecifically, periodicity and directionalityโresulting in inferior suppression and ghosting.
This paper introduces the Flickerformer architecture, which explicitly integrates periodicity and directionality priors into a transformer-based burst restoration pipeline, yielding systematic improvements in flicker localization and removal.
Methodological Innovations
Periodicity Modeling
The periodic nature of flicker stems from both lighting modulation and sensor exposure mechanisms. Flickerformer operationalizes periodicity via two modules:
- Phase-Based Fusion Module (PFM): Inter-frame phase correlation is deployed in the frequency domain to enable robust multi-frame feature aggregation. Fast Fourier Transform (FFT) is applied per frame to extract amplitude and phase spectra, with phase similarity computed by element-wise comparison and used as adaptive frequency-domain weighting for reference frames. The fusion leverages these weights, enhancing flicker localization while avoiding ghosting.
- Autocorrelation Feed-Forward Network (AFFN): Intra-frame periodicity is reinforced by calculating spatial autocorrelation via the Wiener-Khinchin theoremโsquared magnitude in frequency domain followed by IFFTโwhich amplifies recurring flicker structures and suppresses uncorrelated noise. AFFN refines fused features through dual-domain processing and depth-wise gated feed-forward layers, further encoding periodical cues.
Directionality Exploitation
Rolling-shutter sensors induce strong directional structuring in flicker, manifesting as horizontally or vertically aligned luminance stripes. Flickerformer leverages this via:
- Wavelet-Based Directional Attention Module (WDAM): Haar wavelet decomposition splits the input feature into low-frequency and orientation-specific high-frequency subbands. Directional weights are synthesized from high-frequency horizontal and vertical components via convolution and sigmoid activation. These weights modulate window-based multi-head attention applied only to low-frequency subband, enabling precise identification and restoration of flicker-affected regions, with substantial reduction in computational overhead.
Architecture Overview
Flickerformer integrates PFM, AFFN, and WDAM within a U-shaped encoder-decoder transformer backbone. A burst of three input frames is processed; after initial convolutional feature extraction, PFM fuses across frames, followed by hierarchical encoding and feature refinement (AFFN), with WDAM deployed during decoding. The final flicker-free output is generated via residual learning.
Quantitative and Qualitative Evaluation
Extensive evaluation on the BurstDeflicker benchmark demonstrates Flickerformerโs superiority. Numerical results show consistent outperformance across PSNR, SSIM, and LPIPS metrics:
- PSNR: Flickerformer achieves 31.226 dB, a +0.580 dB gain over the second-best method (AST [76]) while maintaining low parameter (3.92M) and FLOP counts.
- SSIM and LPIPS: Best-in-class scores, indicating perceptual improvements and structural fidelity.
Visual comparisons highlight Flickerformer's capacity to thoroughly suppress flicker without color deviations or motion ghosting, especially in challenging regions (e.g., screens, extreme light extinction).
Ablation Studies and Module Effectiveness
Ablations confirm substantial gains from each design element:
- AFFN improves PSNR by +0.265 dB over FRFN alternatives at equivalent complexity.
- WDAM confers a +0.229 dB PSNR increment over the best sparse attention module, with reduced computational cost.
- PFM, AFFN, and WDAM individually yield notable performance increments compared to baseline architectures, validating the periodicity and directionality priors.
Limitations and Practical Implications
Flickerformerโs restoration relies on the presence of clean regions across the burst. Complete recovery in scenarios where all burst frames are severely degraded remains problematic. This limitation suggests that future architectures should explore hallucination or global priors to address flicker-induced information gaps.
Practically, Flickerformer sets a new reference for flicker removal in burst imaging, with deployment potential across HDR, slow-motion, and surveillance pipelines.
Theoretical Implications and Future Directions
The approach demonstrates the efficacy of embedding explicit physical priors into deep restoration models, illustrating that structured degradations demand principled architecture design beyond generic restoration. The periodicity-directionality duality may extend to other artifact domains (e.g., moirรฉ, banding) where physical source characteristics are partially known.
Future directions include:
- Generalizing phase and wavelet-based priors for broader classes of structured artifacts.
- Enhancing burst restoration models with cross-frame attention and global context aggregation to compensate for extreme flicker.
- Investigating joint flicker removal and other restoration tasks (e.g., denoising, deblurring) under unified frameworks for compound degradations.
Conclusion
Flickerformer achieves state-of-the-art flicker suppression by coupling frequency-domain periodicity modeling and spatial-directional attention within a transformer framework. Its principled integration of signal processing techniques with advanced attention mechanics substantiates the necessity for physics-aware deep restoration models in challenging burst imaging scenarios. Limitations under severe degradation warrant future exploration of global priors and hybrid restoration paradigms.