Ultra Low-Cost Two-Stage Multimodal System for Non-Normative Behavior Detection (2403.16151v1)
Abstract: The online community has increasingly been inundated by a toxic wave of harmful comments. In response to this growing challenge, we introduce a two-stage ultra-low-cost multimodal harmful behavior detection method designed to identify harmful comments and images with high precision and recall rates. We first utilize the CLIP-ViT model to transform tweets and images into embeddings, effectively capturing the intricate interplay of semantic meaning and subtle contextual clues within texts and images. Then in the second stage, the system feeds these embeddings into a conventional machine learning classifier like SVM or logistic regression, enabling the system to be trained rapidly and to perform inference at an ultra-low cost. By converting tweets into rich multimodal embeddings through the CLIP-ViT model and utilizing them to train conventional machine learning classifiers, our system is not only capable of detecting harmful textual information with near-perfect performance, achieving precision and recall rates above 99\% but also demonstrates the ability to zero-shot harmful images without additional training, thanks to its multimodal embedding input. This capability empowers our system to identify unseen harmful images without requiring extensive and costly image datasets. Additionally, our system quickly adapts to new harmful content; if a new harmful content pattern is identified, we can fine-tune the classifier with the corresponding tweets' embeddings to promptly update the system. This makes it well suited to addressing the ever-evolving nature of online harmfulness, providing online communities with a robust, generalizable, and cost-effective tool to safeguard their communities.
- ArXiv abs/2305.10403 (2023). URL https://api.semanticscholar.org/CorpusID:258740735
- ArXiv abs/2006.11477 (2020). https://api.semanticscholar.org/CorpusID:219966759
- https://www.adept.ai/blog/fuyu-8b
- ArXiv abs/0912.3599 (2009)
- In: European Conference on Computer Vision. Springer (2020). https://arxiv.org/abs/2005.12872
- ArXiv abs/1803.11175 (2018). URL https://api.semanticscholar.org/CorpusID:4494896
- In: COIN@AAMAS (2020). URL https://api.semanticscholar.org/CorpusID:215745091
- Proceedings of the 25th International Conference on Evaluation and Assessment in Software Engineering (2021). URL https://api.semanticscholar.org/CorpusID:235352820
- ArXiv abs/1705.02364 (2017). URL https://api.semanticscholar.org/CorpusID:28971531
- In: Proceedings of the 11th International AAAI Conference on Web and Social Media, ICWSM ’17, pp. 512–515 (2017). https://arxiv.org/abs/1703.04009
- In: International Conference on Web and Social Media (2017). URL https://api.semanticscholar.org/CorpusID:1733167
- In: NeurIPS Datasets and Benchmarks (2021). Https://arxiv.org/abs/2111.11431
- In: British Machine Vision Conference (2017). https://api.semanticscholar.org/CorpusID:6095318
- Communications of the ACM 63, 139–144 (2014). https://api.semanticscholar.org/CorpusID:1033682
- In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 2700–2717. Association for Computational Linguistics, Online (2021). https://aclanthology.org/2021.acl-long.210
- In: International Conference on Learning Representations (2016). URL https://api.semanticscholar.org/CorpusID:46798026
- ArXiv abs/2310.06825 (2023)
- Jolliffe, I.T.: Principal component analysis. Wiley Interdisciplinary Reviews: Computational Statistics 2(4), 433–459 (2010). DOI 10.1002/wics.101
- ArXiv abs/1411.2539 (2014)
- URL https://arxiv.org/abs/2305.14791
- ArXiv abs/2310.03744 (2023)
- ArXiv abs/2304.08485 (2023)
- PLOS ONE (2023). DOI 10.1371/journal.pone.0278511
- J. Open Source Softw. 3, 861 (2018). URL https://api.semanticscholar.org/CorpusID:53244226
- 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 2156–2164 (2016). https://api.semanticscholar.org/CorpusID:945386
- Risk Analysis 42, 1155 – 1178 (2020). URL https://api.semanticscholar.org/CorpusID:211817846
- OpenAI: Gpt-4 technical report. ArXiv abs/2303.08774 (2023)
- ArXiv abs/2306.01116 (2023)
- In: M. Meila, T. Zhang (eds.) Machine Learning, Proceedings of the 38th International Conference on, Proceedings of Machine Learning Research, vol. 139, pp. 8748–8763. PMLR, Virtual Event (2021). URL https://proceedings.mlr.press/v139/radford21a.html
- ArXiv abs/2212.04356 (2022)
- In: Conference on Empirical Methods in Natural Language Processing (2019). https://api.semanticscholar.org/CorpusID:201646309
- International Journal of Computer Vision 77, 125–141 (2008). https://api.semanticscholar.org/CorpusID:1089627
- In: Proceedings of the 10th ACM Conference on Web Science, pp. 255–264 (2019). https://doi.org/10.1145/3292522.3326032
- In: International Conference on Artificial Neural Networks (1997). https://api.semanticscholar.org/CorpusID:7831590
- In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 3519–3524 (2018). https://github.com/facebookresearch/LASER
- ArXiv abs/2307.09288 (2023)
- 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 11,564–11,573 (2019). https://api.semanticscholar.org/CorpusID:145047863
- Neurocomputing 184, 232–242 (2016). https://api.semanticscholar.org/CorpusID:207111259
- In: A. Birch, A. Finch, H. Hayashi, I. Konstas, T. Luong, G. Neubig, Y. Oda, K. Sudoh (eds.) Proceedings of the 3rd Workshop on Neural Generation and Translation, pp. 215–220. Association for Computational Linguistics, Hong Kong (2019). DOI 10.18653/v1/D19-5623. URL https://aclanthology.org/D19-5623
- CoRR abs/2309.05519 (2023). URL https://arxiv.org/abs/2309.05519
- In: AAAI Conference on Artificial Intelligence (2017). URL https://api.semanticscholar.org/CorpusID:2060721
- In: AAAI Conference on Artificial Intelligence (2017). https://api.semanticscholar.org/CorpusID:2060721
- ArXiv abs/2310.07554 (2023)
- Journal of Computational and Graphical Statistics 15, 265–286 (2006). https://api.semanticscholar.org/CorpusID:5730904
- Albert Lu (7 papers)
- Stephen Cranefield (17 papers)