- The paper proposes a method to recover authentic images from AI-hallucinated camera outputs using implicit neural representations.
- It efficiently fine-tunes a lightweight MLP with modality-specific encoders to predict pixel residuals in 3–10 seconds per image.
- Experimental results show superior PSNR performance over baselines, ensuring trusted image authenticity for digital forensics and regulatory compliance.
Addressing the Authenticity of Camera Images in the Era of Generative AI
Motivation and Problem Statement
Modern smartphone image signal processors (ISPs) increasingly embed generative AI (GenAI) modules for operations such as digital zoom and low-light enhancement. These GenAI-based processes can hallucinate scene content, in some cases introducing semantic changes that impact the perceived meaning or trustworthiness of the image. For example, text or facial details may be subtly or overtly altered, with significant implications for forensics, evidence integrity, and user trust (Figure 1).
Figure 1: Example of AI-based super-resolution hallucinogenically altering license plate characters, engendering a false sense of image authenticity in the user.
Beyond the well-recognized issue of deepfakes and post-hoc image manipulation, the paper "Addressing Image Authenticity When Cameras Use Generative AI" (2604.21879) raises the critical issue that even camera-native outputs can no longer be assumed trustworthy. The work proposes both a theoretical and practical framework for users to recover an "unhallucinated" or authentic version of a scene image, even when the original camera pipeline has irreversibly altered the data using GenAI modules.
Traditional digital forensics targets post-capture manipulation and is largely ineffective against ISP-induced AI hallucinations. While active methods involving digital signatures and watermarking offer some promise, their capabilities are fundamentally limited to detection or localization of tampered pixels. The only related approach, Punnappurath et al. [access], proposed a binary mask to flag possibly hallucinated pixels but could not reconstruct the unhallucinated image.
Present literature on neural ISPs demonstrates that generative or perceptual losses (VGG, LPIPS, GAN) are widely used in deployed camera modules and routinely generate fictitious content (Figure 2). However, these changes are usually subtle, raising the risk that end-users do not realize their camera output has been manipulated at the semantic level.
Figure 2: ISP-induced GenAI hallucinations can alter facial attributes or render information (e.g., QR codes) unreadable, motivating the need for a trusted authentic image recovery path.
Technical Approach
The proposed solution is based on implicit neural representations, leveraging coordinated multi-layer perceptrons (MLPs) and image- or modality-specific encoders. The central workflow is as follows:
This approach operates independently of the ISP and GenAI modules, requiring no black-box or white-box access to the vendor processing chain.
Experimental Results
Quantitative Evaluation
Extensive experiments evaluate recovery accuracy for natural image super-resolution (DIV2K), text super-resolution (MARCONet), and low-light enhancement (LOL dataset), all focused on representing typical hallucination-prone operations in commercial smartphone cameras.
The method outperforms established implicit neural representation baselines including SIREN [siren], NeRF [nerf], hashgrid [hashgrid], and blind deep image-to-image networks (e.g., NAFNet [nafnet]). When fine-tuned within a constrained capture-time budget, the proposed approach achieves superior PSNR and visual consistency, particularly in ambiguous cases such as text or low-light scenes. Key results (PSNR, dB):
| Method |
DIV2K (Nat. Img SR) |
MARCONet (Text SR) |
LOL (Low-Light) |
| SIREN |
28.75 |
27.56 |
34.87 |
| Hashgrid |
29.20 |
30.32 |
35.65 |
| NAFNet |
32.25 |
27.22 |
23.04 |
| Ours |
32.96 |
31.26 |
36.34 |
The method is particularly robust where hallucinations can change semantics — e.g., digits in license plates or characters in QR codes.
Efficiency and Storage
Unlike alternative schemes (e.g., residual JPEG compression, binary masks), the metadata overhead for neural parameters is both lower and invariant to image resolution. This supports practical deployment at scale in systems where authenticity metadata must travel with the image.
Figure 4: Capture-time training/fine-tuning speed versus recovery accuracy (PSNR). Proposed approach offers a favorable trade-off compared to baseline implicit neural representations.
Qualitative Evaluation
In diverse scenario demonstrations (face super-resolution, text reconstruction, and low-light recovery), the method reverses subtle and overt hallucinations induced by GenAI. For example, altered eye color and misshapen facial features introduced by GAN-based upsampling are corrected, and QR codes rendered unreadable by the ISP's low-light enhancer are restored to machine-scannable fidelity.
Figure 5: Qualitative visual recovery examples for natural image super-resolution (DIV2K). Hallucinated semantic changes (eye color) are reversed by the proposed method.
Figure 6: Low-light enhancement recovery on LOL dataset. The authentic image textures and characters (e.g., ‘i’ vs ‘l’) are accurately reconstructed, unlike blind and baseline approaches.
Figure 7: AI-based text super-resolution can hallucinate license plate and Chinese characters. The proposed recovery closely matches authentic unsupervised content.
Ablation and Design Analysis
The architecture and training ablations evidence crucial design choices:
- A modality-specific encoder outperforms a generic universal encoder by up to 1.5 dB PSNR, emphasizing the domain specificity of ISP module effects.
- Increasing MLP or embedding size marginally improves fidelity but with diminishing returns given storage and runtime efficiency constraints.
- Metadata sampling optimizations (e.g., error-based weighting, sampling rate) do not improve aggregate recovery PSNR beyond simple random sampling.
Implications and Future Directions
The integration of GenAI into photon-to-photo pipelines invalidates foundational trust assumptions in digital forensics and legal frameworks. This work proposes a scalable, compact, and effective mechanism for storing 'reversible provenance' in image files themselves, independent of manufacturer involvement post-capture.
The practical implication is that, as regulatory standards codify authenticity and transparency requirements for AI-augmented images (e.g., EU AI Act, platform-level provenance), mechanisms such as neural metadata-based reversible pipelines may become a baseline compliance feature.
Theoretically, this study opens avenues for:
- More granular, region-specific, or multimodal authenticity recovery leveraging emerging compact neural representations.
- Robustness under adversarial conditions, ensuring security of metadata against tampering or loss.
- Generalization to more complex ISP chains (e.g., multi-stage neural and conventional hybrid ISPs).
- Potential extension toward active authentication, watermarking, or legal chain-of-custody integrations in camera hardware and software stacks.
Conclusion
This work presents a comprehensive method for recovering authentic, unhallucinated versions of images processed by GenAI-enhanced camera pipelines through a compact neural metadata representation. Extensive experiments demonstrate efficacy outstripping both blind and metadata-assisted baselines, with low storage requirements and high practical viability. The proposal constitutes a significant step toward restoring trust in camera-captured images as generative AI becomes pervasive in consumer and professional imaging devices.
The adoption of such approaches is poised to underwrite next-generation standards for image authenticity, provenance, and legal evidence, as the intersection of computer vision, AI, and digital forensics continues to evolve.