Using Set Covering to Generate Databases for Holistic Steganalysis (2211.03447v2)
Abstract: Within an operational framework, covers used by a steganographer are likely to come from different sensors and different processing pipelines than the ones used by researchers for training their steganalysis models. Thus, a performance gap is unavoidable when it comes to out-of-distributions covers, an extremely frequent scenario called Cover Source Mismatch (CSM). Here, we explore a grid of processing pipelines to study the origins of CSM, to better understand it, and to better tackle it. A set-covering greedy algorithm is used to select representative pipelines minimizing the maximum regret between the representative and the pipelines within the set. Our main contribution is a methodology for generating relevant bases able to tackle operational CSM. Experimental validation highlights that, for a given number of training samples, our set covering selection is a better strategy than selecting random pipelines or using all the available pipelines. Our analysis also shows that parameters as denoising, sharpening, and downsampling are very important to foster diversity. Finally, different benchmarks for classical and wild databases show the good generalization property of the extracted databases. Additional resources are available at github.com/RonyAbecidan/HolisticSteganalysisWithSetCovering.
- Q. Giboulot, R. Cogranne, D. Borghys, and P. Bas, “Effects and Solutions of Cover-Source Mismatch in Image Steganalysis,” Signal Processing: Image Communication, Aug. 2020. [Online]. Available: https://hal-utt.archives-ouvertes.fr/hal-02631559
- J. Pasquet, S. Bringay, and M. Chaumont, “Steganalysis with cover-source mismatch and a small learning database,” in EUSIPCO: European Signal Processing Conference, Lisbon, Portugal, Sep. 2014, pp. 2425–2429. [Online]. Available: https://hal-lirmm.ccsd.cnrs.fr/lirmm-01234249
- J. Kodovský, V. Sedighi, and J. Fridrich, “Study of cover source mismatch in steganalysis and ways to mitigate its impact,” in Media Watermarking, Security, and Forensics 2014, A. M. Alattar, N. D. Memon, and C. D. Heitzenrater, Eds., vol. 9028, International Society for Optics and Photonics. SPIE, 2014, pp. 204 – 215. [Online]. Available: https://doi.org/10.1117/12.2039693
- P. Bas, T. Filler, and T. Pevny, “”Break Our Steganographic System”: The Ins and Outs of Organizing BOSS,” in INFORMATION HIDING, ser. Lecture Notes in Computer Science, vol. 6958/2011, Czech Republic, May 2011, pp. 59–70. [Online]. Available: https://hal.archives-ouvertes.fr/hal-00648057
- R. Cogranne, Q. Giboulot, and P. Bas, “The ALASKA Steganalysis Challenge: A First Step Towards Steganalysis ”Into The Wild”,” in ACM IH&MMSec (Information Hiding & Multimedia Security), ser. ACM IH&MMSec (Information Hiding & Multimedia Security), Paris, France, Jul. 2019. [Online]. Available: https://hal.archives-ouvertes.fr/hal-02147763
- G. Quentin, P. Bas, C. Rémi, and D. Borghys, “The Cover Source Mismatch Problem in Deep-Learning Steganalysis,” in European Signal Processing Conference, Belgrade, Serbia, Aug. 2022. [Online]. Available: https://hal-utt.archives-ouvertes.fr/hal-03694662
- L. Guo, J. Ni, W. Su, C. Tang, and Y.-Q. Shi, “Using statistical image model for jpeg steganography: Uniform embedding revisited,” IEEE Transactions on Information Forensics and Security, vol. 10, no. 12, pp. 2669–2680, 2015.
- V. Holub and J. Fridrich, “Low-complexity features for jpeg steganalysis using undecimated dct,” IEEE Transactions on Information Forensics and Security, vol. 10, no. 2, pp. 219–228, 2015.
- D. Šepák, L. Adam, and T. Pevný, “Formalizing cover-source mismatch as a robust optimization,” in EUSIPCO: European Signal Processing Conference, Belgrade, Serbia, Sep. 2022.
- V. Chvatal, “A greedy heuristic for the set-covering problem,” Mathematics of Operations Research, vol. 4, no. 3, pp. 233–235, 1979. [Online]. Available: https://doi.org/10.1287/moor.4.3.233
- L. Breiman, “Manual on setting up, using, and understanding random forests v3. 1,” Statistics Department University of California Berkeley, CA, USA, vol. 1, no. 58, pp. 3–42, 2002.
- J. Butora, Y. Yousfi, and J. Fridrich, “How to pretrain for steganalysis,” in Proceedings of the 2021 ACM Workshop on Information Hiding and Multimedia Security, ser. IHamp;MMSec ’21. New York, NY, USA: Association for Computing Machinery, 2021, p. 143–148. [Online]. Available: https://doi.org/10.1145/3437880.3460395
- V. Holub, J. J. Fridrich, and T. Denemark, “Universal distortion function for steganography in an arbitrary domain,” EURASIP Journal on Information Security, vol. 2014, pp. 1–13, 2014.
- T. Furon and P. Bas, “Broken Arrows,” EURASIP Journal on Information Security, vol. 2008, p. ID 597040, Oct. 2008. [Online]. Available: https://hal.archives-ouvertes.fr/hal-00335311
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 248–255.
- R. Abecidan, V. Itier, J. Boulanger, and P. Bas, “Unsupervised JPEG Domain Adaptation for Practical Digital Image Forensics,” in WIFS 2021 : IEEE International Workshop on Information Forensics and Security. Montpellier, France: IEEE, Dec. 2021. [Online]. Available: https://hal.archives-ouvertes.fr/hal-03374780
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days freePaper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.