PagPassGPT: Pattern Guided Password Guessing via Generative Pretrained Transformer (2404.04886v2)
Abstract: Amidst the surge in deep learning-based password guessing models, challenges of generating high-quality passwords and reducing duplicate passwords persist. To address these challenges, we present PagPassGPT, a password guessing model constructed on Generative Pretrained Transformer (GPT). It can perform pattern guided guessing by incorporating pattern structure information as background knowledge, resulting in a significant increase in the hit rate. Furthermore, we propose D&C-GEN to reduce the repeat rate of generated passwords, which adopts the concept of a divide-and-conquer approach. The primary task of guessing passwords is recursively divided into non-overlapping subtasks. Each subtask inherits the knowledge from the parent task and predicts succeeding tokens. In comparison to the state-of-the-art model, our proposed scheme exhibits the capability to correctly guess 12% more passwords while producing 25% fewer duplicates.
- A. Narayanan and V. Shmatikov, “Fast dictionary attacks on passwords using time-space tradeoff,” in Proceedings of the 12th ACM conference on Computer and communications security, 2005, pp. 364–372.
- F. Yu and M. V. Martin, “Gnpassgan: improved generative adversarial networks for trawling offline password guessing,” in 2022 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW). IEEE, 2022, pp. 10–18.
- D. Florêncio, C. Herley, and P. C. Van Oorschot, “An {{\{{Administrator’s}}\}} guide to internet password research,” in 28th large installation system administration conference (LISA14), 2014, pp. 44–61.
- R. Morris and K. Thompson, “Password security: A case history,” Commun. ACM, vol. 22, no. 11, p. 594–597, nov 1979. [Online]. Available: https://doi.org/10.1145/359168.359172
- M. Weir, S. Aggarwal, B. De Medeiros, and B. Glodek, “Password cracking using probabilistic context-free grammars,” in 2009 30th IEEE symposium on security and privacy. IEEE, 2009, pp. 391–405.
- R. Hranickỳ, L. Zobal, O. Ryšavỳ, D. Kolář, and D. Mikuš, “Distributed pcfg password cracking,” in Computer Security–ESORICS 2020: 25th European Symposium on Research in Computer Security, ESORICS 2020, Guildford, UK, September 14–18, 2020, Proceedings, Part I 25. Springer, 2020, pp. 701–719.
- S. Houshmand, S. Aggarwal, and R. Flood, “Next gen pcfg password cracking,” IEEE Transactions on Information Forensics and Security, vol. 10, no. 8, pp. 1776–1791, 2015.
- W. Han, M. Xu, J. Zhang, C. Wang, K. Zhang, and X. S. Wang, “Transpcfg: transferring the grammars from short passwords to guess long passwords effectively,” IEEE Transactions on Information Forensics and Security, vol. 16, pp. 451–465, 2020.
- J. Ma, W. Yang, M. Luo, and N. Li, “A study of probabilistic password models,” Annual Information Security Symposium,Annual Information Security Symposium, Mar 2014.
- M. Dürmuth, F. Angelstorf, C. Castelluccia, D. Perito, and A. Chaabane, “Omen: Faster password guessing using an ordered markov enumerator,” in Engineering Secure Software and Systems: 7th International Symposium, ESSoS 2015, Milan, Italy, March 4-6, 2015. Proceedings 7. Springer, 2015, pp. 119–132.
- S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
- W. Melicher, B. Ur, S. Komanduri, L. Bauer, N. Christin, and L. Cranor, “Fast, lean, and accurate: Modeling password guessability using neural networks,” USENIX Annual Technical Conference,USENIX Annual Technical Conference, Jan 2017.
- I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” Journal of Japan Society for Fuzzy Theory and Intelligent Informatics, p. 177–177, Oct 2017. [Online]. Available: http://dx.doi.org/10.3156/jsoft.29.5_177_2
- I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville, “Improved training of wasserstein gans,” Advances in neural information processing systems, vol. 30, 2017.
- S. Nam, S. Jeon, and J. Moon, “A new password cracking model with generative adversarial networks,” in Information Security Applications: 20th International Conference, WISA 2019, Jeju Island, South Korea, August 21–24, 2019, Revised Selected Papers 20. Springer, 2020, pp. 247–258.
- B. Hitaj, P. Gasti, G. Ateniese, and F. Perez-Cruz, “Passgan: A deep learning approach for password guessing,” in Applied Cryptography and Network Security: 17th International Conference, ACNS 2019, Bogota, Colombia, June 5–7, 2019, Proceedings. Berlin, Heidelberg: Springer-Verlag, 2019, p. 217–237. [Online]. Available: https://doi.org/10.1007/978-3-030-21568-2_11
- D. Pasquini, A. Gangwal, G. Ateniese, M. Bernaschi, and M. Conti, “Improving password guessing via representation learning,” in 2021 IEEE Symposium on Security and Privacy (SP). IEEE, 2021, pp. 1382–1399.
- K. Yang, X. Hu, Q. Zhang, J. Wei, and W. Liu, “Vaepass: A lightweight passwords guessing model based on variational auto-encoder,” Computers & Security, vol. 114, p. 102587, Mar 2022. [Online]. Available: http://dx.doi.org/10.1016/j.cose.2021.102587
- D. Biesner, K. Cvejoski, B. Georgiev, R. Sifa, and E. Krupicka, “Generative deep learning techniques for password generation.” arXiv: Learning,arXiv: Learning, Dec 2020.
- A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, “Improving language understanding by generative pre-training,” 2018.
- A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever et al., “Language models are unsupervised multitask learners,” OpenAI blog, vol. 1, no. 8, p. 9, 2019.
- T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell et al., “Language models are few-shot learners,” Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020.
- J. Rando, F. Perez-Cruz, and B. Hitaj, “Passgpt: Password modeling and (guided) generation with large language models,” arXiv preprint arXiv:2306.01545, 2023.
- S. Riley, “Password security: What users know and what they actually do,” Usability News, vol. 8, no. 1, pp. 2833–2836, 2006.
- C. Kuo, S. Romanosky, and L. F. Cranor, “Human selection of mnemonic phrase-based passwords,” in Proceedings of the second symposium on Usable privacy and security, 2006, pp. 67–78.
- R. Shay, S. Komanduri, P. G. Kelley, P. G. Leon, M. L. Mazurek, L. Bauer, N. Christin, and L. F. Cranor, “Encountering stronger password requirements: user attitudes and behaviors,” in Proceedings of the sixth symposium on usable privacy and security, 2010, pp. 1–20.
- J. Bonneau and E. Shutova, “Linguistic properties of multi-word passphrases,” in International conference on financial cryptography and data security. Springer, 2012, pp. 1–12.
- R. Shay, S. Komanduri, A. L. Durity, P. Huh, M. L. Mazurek, S. M. Segreti, B. Ur, L. Bauer, N. Christin, and L. F. Cranor, “Can long passwords be secure and usable?” in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 2014, pp. 2927–2936.
- Wikipedia contributors, “Divide-and-conquer algorithm — Wikipedia, the free encyclopedia,” https://en.wikipedia.org/w/index.php?title=Divide-and-conquer_algorithm&oldid=1173528752, 2023, [Online; accessed 20-October-2023].
- J. Bonneau, “The science of guessing: Analyzing an anonymized corpus of 70 million passwords,” in 2012 IEEE Symposium on Security and Privacy, May 2012. [Online]. Available: http://dx.doi.org/10.1109/sp.2012.49
- D. Wang, Z. Zhang, P. Wang, J. Yan, and X. Huang, “Targeted online password guessing: An underestimated threat,” in Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Oct 2016. [Online]. Available: http://dx.doi.org/10.1145/2976749.2978339
- Y. Li, H. Wang, and K. Sun, “Personal information in passwords and its security implications,” IEEE Transactions on Information Forensics and Security, vol. 12, no. 10, pp. 2320–2333, 2017.
- D. Wang, P. Wang, D. He, and Y. Tian, “Birthday, name and bifacial-security: understanding passwords of chinese web users,” in 28th USENIX security symposium (USENIX security 19), 2019, pp. 1537–1555.
- Wikipedia contributors, “Data breach — Wikipedia, the free encyclopedia,” https://en.wikipedia.org/w/index.php?title=Data_breach&oldid=1187974960, 2023, [Online; accessed 3-December-2023].
- L. Whitney, “Billions of passwords leaked online from past data breaches,” https://www.techrepublic.com/article/billions-of-passwords-leaked-online-from-past-data-breaches/, 2021.
- Wikipedia contributors, “Personal data — Wikipedia, the free encyclopedia,” https://en.wikipedia.org/w/index.php?title=Personal_data&oldid=1184949923, 2023, [Online; accessed 6-December-2023].
- D. Wang and P. Wang, “The emperor’s new password creation policies: An evaluation of leading web services and the effect of role in resisting against online guessing,” in Computer Security–ESORICS 2015: 20th European Symposium on Research in Computer Security, Vienna, Austria, September 21-25, 2015, Proceedings, Part II 20. Springer, 2015, pp. 456–477.
- Wikipedia contributors, “Markov chain — Wikipedia, the free encyclopedia,” https://en.wikipedia.org/w/index.php?title=Markov_chain&oldid=1179889677, 2023, [Online; accessed 20-October-2023].
- “Hashcat: Advanced password recovery,” https://hashcat.net/hashcat/.
- Openwall, “John the ripper password cracker,” https://www.openwall.com/john/.
- E. Charniak, “Statistical parsing with a context-free grammar and word statistics,” AAAI/IAAI, vol. 2005, no. 598-603, p. 18, 1997.
- C. Buck, K. Heafield, and B. Van Ooyen, “N-gram counts and language models from the common crawl.” in LREC, vol. 2, 2014, p. 4.
- Wikipedia contributors, “N-gram — Wikipedia, the free encyclopedia,” https://en.wikipedia.org/w/index.php?title=N-gram&oldid=1188371904, 2023, [Online; accessed 6-December-2023].
- P. Werbos, “Backpropagation through time: what it does and how to do it,” Proceedings of the IEEE, p. 1550–1560, Jan 1990. [Online]. Available: http://dx.doi.org/10.1109/5.58337
- Wikipedia contributors, “Autoencoder — Wikipedia, the free encyclopedia,” https://en.wikipedia.org/w/index.php?title=Autoencoder&oldid=1185816731, 2023, [Online; accessed 30-November-2023].
- I. Tolstikhin, O. Bousquet, S. Gelly, and B. Schoelkopf, “Wasserstein auto-encoders,” in International Conference on Learning Representations, 2018. [Online]. Available: https://openreview.net/forum?id=HkL7n1-0b
- D. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv: Machine Learning,arXiv: Machine Learning, Dec 2013.
- M. Xu, J. Yu, X. Zhang, C. Wang, S. Zhang, H. Wu, and W. Han, “Improving real-world password guessing attacks via bi-directional transformers,” in 32nd USENIX Security Symposium (USENIX Security 23). Anaheim, CA: USENIX Association, Aug. 2023, pp. 1001–1018. [Online]. Available: https://www.usenix.org/conference/usenixsecurity23/presentation/xu-ming
- J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the 2019 Conference of the North, Jan 2019. [Online]. Available: http://dx.doi.org/10.18653/v1/n19-1423
- A. Cremers and S. Ginsburg, “Context-free grammar forms,” Journal of Computer and System Sciences, vol. 11, no. 1, pp. 86–117, 1975.
- “Openai,” https://openai.com/, 2023.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” Neural Information Processing Systems,Neural Information Processing Systems, Jun 2017.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
- Wikipedia contributors, “Autoregressive model — Wikipedia, the free encyclopedia,” https://en.wikipedia.org/w/index.php?title=Autoregressive_model&oldid=1183431794, 2023, [Online; accessed 7-November-2023].
- M. Xu, C. Wang, J. Yu, J. Zhang, K. Zhang, and W. Han, “Chunk-level password guessing: Towards modeling refined password composition representations,” in Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, 2021, pp. 5–20.
- D. Florêncio, C. Herley, and P. C. Van Oorschot, “Pushing on string: The’don’t care’region of password strength,” Communications of the ACM, vol. 59, no. 11, pp. 66–74, 2016.
- J. Tan, L. Bauer, N. Christin, and L. F. Cranor, “Practical recommendations for stronger, more usable passwords combining minimum-strength, minimum-length, and blocklist requirements,” in Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, 2020, pp. 1407–1426.
- Wikipedia contributors, “Flow-based generative model — Wikipedia, the free encyclopedia,” https://en.wikipedia.org/w/index.php?title=Flow-based_generative_model&oldid=1172203906, 2023, [Online; accessed 4-December-2023].
- G. H. de Rosa and J. P. Papa, “A survey on text generation using generative adversarial networks,” Pattern Recognition, vol. 119, p. 108098, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0031320321002855
- S. Islam, H. Elmekki, A. Elsebai, J. Bentahar, N. Drawel, G. Rjoub, and W. Pedrycz, “A comprehensive survey on applications of transformers for deep learning tasks,” Expert Systems with Applications, vol. 241, p. 122666, 2024. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0957417423031688
- M. Ji, R. Fu, T. Xing, and F. Yin, “Research on text summarization generation based on lstm and attention mechanism,” in 2021 International Conference on Information Science, Parallel and Distributed Systems (ISPDS), 2021, pp. 214–217.
- Wikipedia contributors, “Rockyou — Wikipedia, the free encyclopedia,” https://en.wikipedia.org/w/index.php?title=RockYou&oldid=1154686206, 2023, [Online; accessed 25-September-2023].
- ——, “2012 linkedin hack — Wikipedia, the free encyclopedia,” https://en.wikipedia.org/w/index.php?title=2012_LinkedIn_hack&oldid=1180726322, 2023, [Online; accessed 6-December-2023].
- g. Daniel Miessler, Jason Haddix, “Seclists is the security tester’s companion,” https://github.com/danielmiessler/SecLists/blob/master/Passwords/Leaked-Databases/phpbb.txt, 2019.
- S. Khandelwal, “427 million myspace passwords leaked in major security breach,” https://thehackernews.com/2016/06/myspace-passwords-leaked.html, 2016.
- Wikipedia contributors, “Yahoo! data breaches — Wikipedia, the free encyclopedia,” https://en.wikipedia.org/w/index.php?title=Yahoo!_data_breaches&oldid=1147596368, 2023, [Online; accessed 25-September-2023].
- “GPT2 Hugging Face,” 2023. [Online]. Available: https://huggingface.co/gpt2/tree/main
- G. Pagnotta, D. Hitaj, F. De Gaspari, and L. V. Mancini, “Passflow: Guessing passwords with generative flows,” in 2022 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2022, pp. 251–262.
- L. Dinh, D. Krueger, and Y. Bengio, “Nice: Non-linear independent components estimation,” arXiv preprint arXiv:1410.8516, 2014.