2000 character limit reached
A Matrix Factorization Based Network Embedding Method for DNS Analysis (2401.07410v2)
Published 15 Jan 2024 in cs.SI and cs.CR
Abstract: In this paper, I explore the potential of network embedding (a.k.a. graph representation learning) to characterize DNS entities in passive network traffic logs. I propose an MF-DNS-E (\underline{M}atrix-\underline{F}actorization-based \underline{DNS} \underline{E}mbedding) method to represent DNS entities (e.g., domain names and IP addresses), where a random-walk-based matrix factorization objective is applied to learn the corresponding low-dimensional embeddings.
- I. Mishsky, N. Gal-Oz, and E. Gudes, “A topology based flow model for computing domain reputation,” in Proceedings of the 2015 IFIP Annual Conference on Data and Applications Security and Privacy (DBSec), 2015.
- M. Antonakakis, R. Perdisci, D. Dagon, W. Lee, and N. Feamster, “Building a dynamic reputation system for dns,” in Proceedings of the 2010 USENIX Security Symposium, 2010.
- L. Bilge, S. Sen, D. Balzarotti, E. Kirda, and C. Kruegel, “Exposure: A passive dns analysis service to detect and report malicious domains,” ACM Transactions on Information and System Security (TISSEC), vol. 16, no. 4, p. 14, 2014.
- C. Peng, X. Yun, Y. Zhang, and S. Li, “Malshoot: Shooting malicious domains through graph embedding on passive dns data,” in Proceedings of the 2018 International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom), 2018.
- W. López, J. Merlino, and P. Rodriguez-Bocca, “Vector representation of internet domain names using a word embedding technique,” in Proceedings of the 43rd IEEE Latin American Computer Conference (CLEI), 2017.
- F. Le, M. Srivatsa, and D. Verma, “Unearthing and exploiting latent semantics behind dns domains for deep network traffic analysis,” in Proceedings of 2019 IJCAI Workshop on AI for Internet of Things (AI4IoT), 2019.
- Y. Zhang and Q. Yang, “A survey on multi-task learning,” arXiv preprint:1707.08114, 2017.
- H. Le, Q. Pham, D. Sahoo, and S. C. Hoi, “Urlnet: learning a url representation with deep learning for malicious url detection,” arXiv preprint:1802.03162, 2018.
- Y. Zhauniarovich, I. Khalil, T. Yu, and M. Dacier, “A survey on malicious domains detection through dns data analysis,” ACM Computing Surveys (CSUR), vol. 51, no. 4, p. 67, 2018.
- B. Rahbarinia, R. Perdisci, and M. Antonakakis, “Efficient and accurate behavior-based tracking of malware-control domains in large isp networks,” ACM Transactions on Privacy and Security (TOPS), vol. 19, no. 2, p. 4, 2016.
- I. Khalil, T. Yu, and B. Guan, “Discovering malicious domains through passive dns data graph analysis,” in Proceedings of the 11th ACM Asia Conference on Computer and Communications Security (ASIACCS), 2016.
- M. Qin, D. Jin, K. Lei, B. Gabrys, and K. Musial-Gabrys, “Adaptive community detection incorporating topology and content in social networks,” Knowledge-Based Systems, vol. 161, pp. 342–356, 2018.
- M. Qin, K. Lei, B. Bai, and G. Zhang, “Towards a profiling view for unsupervised traffic classification by exploring the statistic features and link patterns,” in Proceedings of the 2019 ACM SIGOMM Workshop on Network Meets AI & ML, 2019, pp. 50–56.
- W. Li, M. Qin, and K. Lei, “Identifying interpretable link communities with user interactions and messages in social networks,” in Proceedings of the 2019 IEEE International Conference on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom), 2019, pp. 271–278.
- M. Qin and K. Lei, “Dual-channel hybrid community detection in attributed networks,” Information Sciences, vol. 551, pp. 146–167, 2021.
- M. Qin, C. Zhang, B. Bai, G. Zhang, and D.-Y. Yeung, “Towards a better trade-off between quality and efficiency of community detection: An inductive embedding method across graphs,” ACM Transactions on Knowledge Discovery from Data (TKDD), vol. 17, no. 9, pp. 127:1–127:34, 2023.
- Y. Gao, M. Qin, Y. Ding, L. Zeng, C. Zhang, W. Zhang, W. Han, R. Zhao, and B. Bai, “Raftgp: Random fast graph partitioning,” in 2023 IEEE High Performance Extreme Computing Conference (HPEC), 2023, pp. 1–7.
- J. Qiu, Y. Dong, H. Ma, J. Li, K. Wang, and J. Tang, “Network embedding as matrix factorization: Unifying deepwalk, line, pte, and node2vec,” in Proceedings of the 11th ACM International Conference on Web Search and Data Mining (WSDM), 2018.
- B. Perozzi, R. Al-Rfou, and S. Skiena, “Deepwalk: Online learning of social representations,” in Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), 2014.
- X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS), 2010.
- T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint:1301.3781, 2013.
- J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei, “Line: Large-scale information network embedding,” in Proceedings of the 24th International Conference on World Wide Web (WWW), 2015.
- T. Fawcett, “An introduction to roc analysis,” Pattern Recognition Letters, vol. 27, no. 8, pp. 861–874, 2006.
- L. v. d. Maaten and G. Hinton, “Visualizing data using t-sne,” Journal of Machine Learning Research (JMLR), vol. 9, no. 11, pp. 2579–2605, 2008.
- F. Zou, S. Zhang, B. Pei, L. Pan, L. Li, and J. Li, “Survey on domain name system security,” in Proceedings of the 1st IEEE International Conference on Data Science in Cyberspace (DSC), 2016, pp. 602–607.
- D. S. Berman, A. L. Buczak, J. S. Chavis, and C. L. Corbett, “A survey of deep learning methods for cyber security,” Information, vol. 10, no. 4, p. 122, 2019.
- A. Khormali, J. Park, H. Alasmary, A. Anwar, and D. Mohaisen, “Domain name system security and privacy: A contemporary survey,” arXiv preprint arXiv:2006.15277, 2020.
- H. Esquivel, A. Akella, and T. Mori, “On the effectiveness of ip reputation for spam filtering,” in Proceedings of the 2nd IEEE International Conference on Communication Systems and Networks (COMSNETS), 2010.
- S. Sinha, M. Bailey, and F. Jahanian, “Shades of grey: On the effectiveness of reputation-based “blacklists”,” in Proceedings of the 3rd International Conference on Malicious and Unwanted Software (MALWARE), 2008.
- V. Bartoš and J. Kořenek, “Evaluating reputation of internet entities,” in Proceedings of the 2016 IFIP International Conference on Autonomous Infrastructure, Management and Security (AIMS), 2016.
- K. Lei, M. Qin, B. Bai, and G. Zhang, “Adaptive multiple non-negative matrix factorization for temporal link prediction in dynamic networks,” in Proceedings of the 2018 ACM SIGCOMM Workshop on Network Meets AI & ML, 2018, pp. 28–34.
- K. Lei, M. Qin, B. Bai, G. Zhang, and M. Yang, “Gcn-gan: A non-linear temporal link prediction model for weighted dynamic networks,” in Proceedings of the 2019 IEEE Conference on Computer Communications (INFOCOM), 2019, pp. 388–396.
- M. Qin and D.-Y. Yeung, “Temporal link prediction: A unified framework, taxonomy, and review,” ACM Computing Surveys, vol. 56, no. 4, pp. 1–40, 2023.
- M. Qin, C. Zhang, B. Bai, G. Zhang, and D.-Y. Yeung, “High-quality temporal link prediction for weighted dynamic graphs via inductive embedding aggregation,” IEEE Transactions on Knowledge & Data Engineering (TKDE), vol. 35, no. 9, pp. 9378–9393, 2023.