When the Few Outweigh the Many: Illicit Content Recognition with Few-Shot Learning (2311.17026v1)
Abstract: The anonymity and untraceability benefits of the Dark web account for the exponentially-increased potential of its popularity while creating a suitable womb for many illicit activities, to date. Hence, in collaboration with cybersecurity and law enforcement agencies, research has provided approaches for recognizing and classifying illicit activities with most exploiting textual dark web markets' content recognition; few such approaches use images that originated from dark web content. This paper investigates this alternative technique for recognizing illegal activities from images. In particular, we investigate label-agnostic learning techniques like One-Shot and Few-Shot learning featuring the use Siamese neural networks, a state-of-the-art approach in the field. Our solution manages to handle small-scale datasets with promising accuracy. In particular, Siamese neural networks reach 90.9% on 20-Shot experiments over a 10-class dataset; this leads us to conclude that such models are a promising and cheaper alternative to the definition of automated law-enforcing machinery over the dark web.
- Classifying illegal activities on tor network based on web textual contents. In European Chapter of the Association for Computational Linguistics, volume 1, pages 35–43.
- Torank: Identifying the most influential suspicious domains in the tor network. Expert Systems with Applications, 123:212 – 226.
- Appendix (2023). When the few outweigh the many: Illicit content recognition with few-shot learning https://doi.org/10.6084/m9.figshare.22726745.
- “When the Code becomes a Crime Scene” Towards Dark Web Threat Intelligence with Software Quality Metrics. In 2022 IEEE International Conference on Software Maintenance and Evolution (ICSME), pages 439–443. IEEE.
- Illicit darkweb classification via natural-language processing: Classifying illicit content of webpages based on textual information. In Proceedings of the 19th International Conference on Security and Cryptography - Volume 1: SECRYPT,, pages 620–626. INSTICC, SciTePress.
- Learning a similarity metric discriminatively, with application to face verification. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, volume 1, pages 539–546.
- The language of legal and illegal activity on the Darknet. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4271–4279.
- Laying foundations for effective machine learning in law enforcement. majura – a labelling schema for child exploitation materials. Digital Investigation, 26:40 – 54.
- One-shot learning of object categories. IEEE transactions on pattern analysis and machine intelligence, 28:594–611.
- Classifying suspicious content in tor darknet through semantic attention keypoint filtering. Digital Investigation, 30:12 – 22.
- Illegal activity categorisation in darknet based on image classification using creic method. pages 600–609.
- Few-shot learning with graph neural networks.
- Automated categorization of onion sites for analyzing the darkweb ecosystem. pages 1793–1802.
- Dark web along with the dark web marketing and surveillance. In PDCAT, pages 483–485. IEEE.
- Analysis and design of selenium webdriver automation testing framework. Procedia Computer Science, 50:341 – 346. Big Data, Cloud and Computing Challenges.
- Dimensionality reduction by learning an invariant mapping. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), volume 2, pages 1735–1742. IEEE.
- Detecting and classifying online dark visual propaganda. Image and Vision Computing, 89:95 – 105.
- Few-shot learning with metric-agnostic conditional embeddings.
- Juan Sanchez, G. G. (2019). Who’s afraid of the dark? hype versus reality on the dark web. https://www.recordedfuture.com/dark-web-reality.
- Siamese neural networks for one-shot image recognition. In ICML deep learning workshop, volume 2.
- One shot learning of simple visual concepts. Cognitive Science, 33.
- Meta-sgd: Learning to learn quickly for few shot learning. CoRR, abs/1707.09835.
- Transfer of view-manifold learning to similarity perception of novel objects.
- Siamese network features for image matching. In 2016 23rd International Conference on Pattern Recognition (ICPR), pages 378–383.
- Ochal, M. et al. (2021). Class imbalance in few-shot learning.
- One-shot learning for custom identification tasks; a review. Procedia Manufacturing, 38:186–193.
- Few-shot image recognition by predicting parameters from activations.
- Crawling the hidden web. In Proceedings of the 27th International Conference on Very Large Databases (VLDB 2001), pages 129–138.
- Using deep neural networks to translate multi-lingual threat intelligence.
- Replication-Package (2023). When the few outweigh the many: Illicit content recognition with few-shot learning. https://doi.org/10.5281/zenodo.7657482.
- Facenet: A unified embedding for face recognition and clustering. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
- One-shot learning for semantic segmentation.
- Gated siamese convolutional neural network architecture for human re-identification.
- Matching networks for one shot learning. In NIPS.
- Generalizing from a few examples: A survey on few-shot learning.