Sound-skwatter (Did You Mean: Sound-squatter?) AI-powered Generator for Phishing Prevention (2310.07005v1)
Abstract: Sound-squatting is a phishing attack that tricks users into malicious resources by exploiting similarities in the pronunciation of words. Proactive defense against sound-squatting candidates is complex, and existing solutions rely on manually curated lists of homophones. We here introduce Sound-skwatter, a multi-language AI-based system that generates sound-squatting candidates for proactive defense. Sound-skwatter relies on an innovative multi-modal combination of Transformers Networks and acoustic models to learn sound similarities. We show that Sound-skwatter can automatically list known homophones and thousands of high-quality candidates. In addition, it covers cross-language sound-squatting, i.e., when the reader and the listener speak different languages, supporting any combination of languages. We apply Sound-skwatter to network-centric phishing via squatted domain names. We find ~ 10% of the generated domains exist in the wild, the vast majority unknown to protection solutions. Next, we show attacks on the PyPI package manager, where ~ 17% of the popular packages have at least one existing candidate. We believe Sound-skwatter is a crucial asset to mitigate the sound-squatting phenomenon proactively on the Internet. To increase its impact, we publish an online demo and release our models and code as open source.
- Seven months' worth of mistakes: A longitudinal study of typosquatting abuse. In Proceedings of the 22nd Network and Distributed System Security Symposium (NDSS 2015). Internet Society.
- The International Phonetic Association. 2022. The International Phonetic Association Homepage. https://www.internationalphoneticassociation.org/
- Kevin Atkinson. 2006. Gnu aspell 0.60. 4.
- MITRE ATT&CK. 2022a. CAPEC-616: Establish Rogue Location. https://capec.mitre.org/data/definitions/616.html.
- MITRE ATT&CK. 2022b. CAPEC-631: SoundSquatting. https://capec.mitre.org/data/definitions/631.html.
- Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks. In Proceedings of the 23rd International Conference on Machine Learning (Pittsburgh, Pennsylvania, USA) (ICML '06). Association for Computing Machinery, New York, NY, USA, 369–376. https://doi.org/10.1145/1143844.1143891
- Cutting through the Confusion: A Measurement Study of Homograph Attacks.. In USENIX Annual Technical Conference, General Track. 261–266.
- Music Transformer. In International Conference on Learning Representations. https://openreview.net/forum?id=rJe4ShAcF7
- Transgan: Two transformers can make one strong gan. arXiv preprint arXiv:2102.07074 1, 3 (2021).
- Every second counts: Quantifying the negative externalities of cybercrime via typosquatting. In 2015 IEEE Symposium on Security and Privacy. IEEE, 135–150.
- Hiding in plain sight: A longitudinal study of combosquatting abuse. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. 569–586.
- Skill squatting attacks on Amazon Alexa. In 27th USENIX security symposium (USENIX Security 18). 33–47.
- Automating Domain Squatting Detection Using Representation Learning. In 2020 IEEE International Conference on Big Data (Big Data). 1021–1030. https://doi.org/10.1109/BigData50022.2020.9377875
- Similarweb LTD. 2022. Similarweb - Check and Analyse any website. https://www.similarweb.com/.
- Inceptionism: Going deeper into neural networks. (2015).
- Soundsquatting: Uncovering the use of homophones in domain squatting. In International Conference on Information Security. Springer, 291–308.
- Image Transformer. In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 80), Jennifer Dy and Andreas Krause (Eds.). PMLR, 4055–4064. https://proceedings.mlr.press/v80/parmar18a.html
- Authors Removed. 2022. Removed to adhere to the double blind policy. In Booktitle. 1–6.
- DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. https://doi.org/10.48550/ARXIV.1910.01108
- John Semmlow. 2018. Chapter 6 - Linear Systems in the Frequency Domain: The Transfer Function. In Circuits, Signals and Systems for Bioengineers (Third Edition) (third edition ed.), John Semmlow (Ed.). Academic Press, 245–294. https://doi.org/10.1016/B978-0-12-809395-5.00006-0
- Natural tts synthesis by conditioning wavenet on mel spectrogram predictions. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 4779–4783.
- Gunikhan Sonowal. 2020. A Model for Detecting Sounds-alike Phishing Email Contents for Persons with Visual Impairments. In 2020 Sixth International Conference on e-Learning (econf). IEEE, 17–21.
- G Sonowal and KS Kuppusamy. 2019. MMSPhiD: A Phoneme based Phishing Verification Model for Persons with Visual Impairments. Information and Computer Security Journal.
- The Long {{\{{“Taile”}}\}} of Typosquatting Domain Names. In 23rd USENIX Security Symposium (USENIX Security 14). 191–206.
- Needle in a Haystack: Tracking Down Elite Phishing Domains in the Wild. In Proceedings of the Internet Measurement Conference 2018 (Boston, MA, USA) (IMC '18). Association for Computing Machinery, New York, NY, USA, 429–442. https://doi.org/10.1145/3278532.3278569
- Attention is all you need. Advances in neural information processing systems 30 (2017).
- Strider Typo-Patrol: Discovery and Analysis of Systematic Typo-Squatting. SRUTI 6, 31-36 (2006), 2–2.
- Dyslexia Reading Well. 2022. The 44 Phonemes in English. https://www.dyslexia-reading-well.com/44-phonemes-in-english.html
- TorchAudio: Building Blocks for Audio and Speech Processing. arXiv preprint arXiv:2110.15018 (2021).
- A Comprehensive Measurement Study of Domain-Squatting Abuse. In ICC 2019 - 2019 IEEE International Conference on Communications (ICC). 1–6. https://doi.org/10.1109/ICC.2019.8761388
- Dangerous skills: Understanding and mitigating security risks of voice-controlled third-party functions on virtual personal assistant systems. In 2019 IEEE Symposium on Security and Privacy (SP). IEEE, 1381–1396.