Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Image Safeguarding: Reasoning with Conditional Vision Language Model and Obfuscating Unsafe Content Counterfactually (2401.11035v1)

Published 19 Jan 2024 in cs.CV

Abstract: Social media platforms are being increasingly used by malicious actors to share unsafe content, such as images depicting sexual activity, cyberbullying, and self-harm. Consequently, major platforms use AI and human moderation to obfuscate such images to make them safer. Two critical needs for obfuscating unsafe images is that an accurate rationale for obfuscating image regions must be provided, and the sensitive regions should be obfuscated (\textit{e.g.} blurring) for users' safety. This process involves addressing two key problems: (1) the reason for obfuscating unsafe images demands the platform to provide an accurate rationale that must be grounded in unsafe image-specific attributes, and (2) the unsafe regions in the image must be minimally obfuscated while still depicting the safe regions. In this work, we address these key issues by first performing visual reasoning by designing a visual reasoning model (VLM) conditioned on pre-trained unsafe image classifiers to provide an accurate rationale grounded in unsafe image attributes, and then proposing a counterfactual explanation algorithm that minimally identifies and obfuscates unsafe regions for safe viewing, by first utilizing an unsafe image classifier attribution matrix to guide segmentation for a more optimal subregion segmentation followed by an informed greedy search to determine the minimum number of subregions required to modify the classifier's output based on attribution score. Extensive experiments on uncurated data from social networks emphasize the efficacy of our proposed method. We make our code available at: https://github.com/SecureAIAutonomyLab/ConditionalVLM

Overview

The field of social media content moderation has seen a significant development with the introduction of an advanced visual reasoning model, known as Conditional Vision LLM (ConditionalVLM), and a Counterfactual Subobject Explanation (CSE) method. This novel framework addresses the dual problem of providing clear rationales for obfuscating unsafe images and accurately pinpointing the segments necessary for obfuscation.

Visual Reasoning with ConditionalVLM

The ConditionalVLM is designed to address the issue of generating comprehensive and specific rationales for the obfuscation of images depicting sexual activity, cyberbullying, and self-harm. The VLM employs a strategy of conditioning on pre-trained unsafe image classifiers. The process factors in the attributes particular to each category, such as explicit gestures in cyberbullying or distinguishing marks on skin in sexually explicit content. This enables the VLM to generate explanations that are not only pertinent but also safeguard the integrity of context for future investigations or evidence collection.

Counterfactual Subobject Explanations for Obfuscation

The paper then explores the novel counterfactual explanation algorithm that smartly identifies and obfuscates only the unsafe aspects of an image. By leveraging a FullGrad-based model for calculating an attribution matrix to guide Bayesian superpixel segmentation, the method achieves a dynamic and efficient identification of the key regions. An informed greedy search is then undertaken to find the minimum subregions necessary to shift the classifier's decision, ensuring maximum retention of the safe parts of the image.

Experimental Efficacy

The paper extensively experiments with both components—the ConditionalVLM and the CSE—using uncurated datasets from social networks, demonstrating their efficiency. The ConditionalVLM achieves impressive performance, surpassing other state-of-the-art models in providing accurate rationales for content obfuscation. Concurrently, the CSE method shows a marked improvement in identifying the least number of subregions for modification while maintaining a high level of accuracy in segmenting only the unsafe portions for obfuscation purposes.

Contributions and Implications

The combined approach of ConditionalVLM and CSE has several notable contributions to the field: accurate rationale generation for image obfuscation grounded in unsafe attributes, minimal and seamless obfuscation to aid in investigations, and a substantial reduction in exposure to harmful content for moderators and law enforcement agents. The codebase for this work is also made available for public use, encouraging further research and development in the domain. The implications of this research are profound, safeguarding those who are on the frontlines of content moderation and also protecting vulnerable users from potential harmful exposure.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. 2021. Facebook moderator: ‘Every day was a nightmare’. https://www.bbc.com/news/technology-57088382.
  2. 2021. Judge OKs $85 mln settlement of Facebook moderators’ PTSD claims. https://www.reuters.com/legal/transactional/judge-oks-85-mln-settlement-facebook-moderators-ptsd-claims-2021-07-23/.
  3. 2023. Vicuna. https://github.com/lm-sys/FastChat.
  4. Slic superpixels. Technical report.
  5. “When a Tornado Hits Your Life:” Exploring Cyber Sexual Abuse Survivors’ Perspectives on Recovery. Journal of Counseling Sexology & Sexual Wellness: Research, Practice, and Education, 4(1): 1–8.
  6. Flamingo: a visual language model for few-shot learning. Advances in Neural Information Processing Systems, 35: 23716–23736.
  7. Are, C. 2020. How Instagram’s algorithm is censoring women and vulnerable users but helping online abusers. Feminist media studies, 20(5): 741–744.
  8. Young people, peer-to-peer grooming and sexual offending: Understanding and responding to harmful sexual behaviour within a social media society. Probation Journal, 62(4): 374–388.
  9. Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language. arXiv preprint arXiv:2306.16410.
  10. Towards targeted obfuscation of adversarial unsafe images using reconstruction and counterfactual super region attribution explainability. In 32nd USENIX Security Symposium (USENIX Security 23), 643–660.
  11. Billy Perrigo. 2019. Facebook Says It’s Removing More Hate Speech Than Ever Before. But There’s a Catch.
  12. Binder, M. 2019. Facebook claims its new AI technology can automatically detect revenge porn. https://mashable.com/article/facebook-ai-tool-revenge-porn.
  13. Bronstein, C. 2021. Deplatforming sexual speech in the age of FOSTA/SESTA. Porn Studies, 8(4): 367–380.
  14. The EU digital markets act: a report from a panel of economic experts. Cabral, L., Haucap, J., Parker, G., Petropoulos, G., Valletti, T., and Van Alstyne, M., The EU Digital Markets Act, Publications Office of the European Union, Luxembourg.
  15. A combinatorial approach to explaining image classifiers. In 2021 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW), 35–43. IEEE.
  16. Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks. In 2018 IEEE winter conference on applications of computer vision (WACV), 839–847. IEEE.
  17. Minority report: Cyberbullying prediction on Instagram. In Proceedings of the 10th ACM conference on web science, 37–45.
  18. InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning. arXiv:2305.06500.
  19. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, 248–255. Ieee.
  20. Exon, J. 1996. The Communications Decency Act. Federal Communications Law Journal, 49(1): 4.
  21. Eva: Exploring the limits of masked visual representation learning at scale. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 19358–19369.
  22. Efficient graph-based image segmentation. International journal of computer vision, 59(2): 167–181.
  23. Axiom-based Grad-CAM: Towards Accurate Visualization and Explanation of CNNs. In BMVC.
  24. Gesley, J. 2021. Germany: Network Enforcement Act Amended to Better Fight Online Hate Speech. In Library of Congress, at: https://www. loc. gov/item/global-legal-monito r/2021-07-06/germany-network-enforcement-act-amended-to-better-fight-online-hat e-speech/#:~: text= Article% 20Germany% 3A% 20Network% 20Enforcement% 20Act, fake% 20news% 20in% 20social% 20networks.
  25. PyTorch library for CAM methods. https://github.com/jacobgil/pytorch-grad-cam.
  26. Harm and offence in media content: A review of the evidence.
  27. Hendricks, T. 2021. Cyberbullying increased 70% during the pandemic; Arizona schools are taking action. https://www.12news.com/article/news/crime/cyberbullying-increased-70-during-the-pandemic-arizona-schools-are-taking-action/75-fadf8d2c-cf11-43f0-b074-5de485a3247d.
  28. Self-harm, suicidal behaviours, and cyberbullying in children and young people: Systematic review. Journal of Medical Internet Research, 20(4).
  29. Kim, A. 2021. NSFW Data Scraper. https://github.com/alex000kim/nsfw˙data˙scraper.
  30. Segment anything. arXiv preprint arXiv:2304.02643.
  31. Krause, M. 2009. Identifying and managing stress in child pornography and child exploitation investigators. Journal of Police and Criminal Psychology, 24(1): 22–29.
  32. Nonconsensual image sharing: one in 25 Americans has been a victim of” revenge porn”.
  33. mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 7241–7259.
  34. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597.
  35. Effectiveness and users’ experience of obfuscation as a privacy-enhancing technology for sharing photos. Proceedings of the ACM on Human-Computer Interaction, 1(CSCW): 1–24.
  36. Meta. 2022. Appealed Content. https://transparency.fb.com/policies/improving/appealed-content-metric/.
  37. Compact watershed and preemptive slic: On improving trade-offs of superpixel segmentation algorithms. In 2014 22nd international conference on pattern recognition, 996–1001. IEEE.
  38. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Wallach, H.; Larochelle, H.; Beygelzimer, A.; d'Alché-Buc, F.; Fox, E.; and Garnett, R., eds., Advances in Neural Information Processing Systems 32, 8024–8035. Curran Associates, Inc.
  39. Learning transferable visual models from natural language supervision. In International conference on machine learning, 8748–8763. PMLR.
  40. Ramaswamy, H. G.; et al. 2020. Ablation-cam: Visual explanations for deep convolutional network via gradient-free localization. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 983–991.
  41. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125.
  42. A practitioner survey exploring the value of forensic tools, AI, filtering, & safer presentation for investigating child sexual abuse material (CSAM). Digital Investigation, 29.
  43. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, 618–626.
  44. Full-gradient representation for neural network visualization. Advances in neural information processing systems, 32.
  45. The psychological well-being of content moderators: the emotional labor of commercial moderation and avenues for improving support. In Proceedings of the 2021 CHI conference on human factors in computing systems, 1–14.
  46. Tenbarge, K. 2023. Instagram’s sex censorship sweeps up educators, adult stars and sex workers.
  47. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
  48. Bayesian adaptive superpixel segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 8470–8479.
  49. scikit-image: image processing in Python. PeerJ, 2: e453.
  50. Explainable image classification with evidence counterfactual. Pattern Analysis and Applications, 25(2): 315–335.
  51. Towards Understanding and Detecting Cyberbullying in Real-world Images. In 2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA).
  52. Counterfactual explanations without opening the black box: Automated decisions and the GDPR. Harv. JL & Tech., 31: 841.
  53. Ofa: Unifying architectures, tasks, and modalities through a simple sequence-to-sequence learning framework. In International Conference on Machine Learning, 23318–23340. PMLR.
  54. Image as a Foreign Language: BEiT Pretraining for Vision and Vision-Language Tasks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 19175–19186.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Mazal Bethany (8 papers)
  2. Brandon Wherry (3 papers)
  3. Nishant Vishwamitra (13 papers)
  4. Peyman Najafirad (33 papers)
Citations (3)
Github Logo Streamline Icon: https://streamlinehq.com