A Matrix Factorization Based Network Embedding Method for DNS Analysis (2401.07410v2)

Published 15 Jan 2024 in cs.SI and cs.CR

Abstract: In this paper, I explore the potential of network embedding (a.k.a. graph representation learning) to characterize DNS entities in passive network traffic logs. I propose an MF-DNS-E (\underline{M}atrix-\underline{F}actorization-based \underline{DNS} \underline{E}mbedding) method to represent DNS entities (e.g., domain names and IP addresses), where a random-walk-based matrix factorization objective is applied to learn the corresponding low-dimensional embeddings.

References (34)

I. Mishsky, N. Gal-Oz, and E. Gudes, “A topology based flow model for computing domain reputation,” in Proceedings of the 2015 IFIP Annual Conference on Data and Applications Security and Privacy (DBSec), 2015.
M. Antonakakis, R. Perdisci, D. Dagon, W. Lee, and N. Feamster, “Building a dynamic reputation system for dns,” in Proceedings of the 2010 USENIX Security Symposium, 2010.
L. Bilge, S. Sen, D. Balzarotti, E. Kirda, and C. Kruegel, “Exposure: A passive dns analysis service to detect and report malicious domains,” ACM Transactions on Information and System Security (TISSEC), vol. 16, no. 4, p. 14, 2014.
C. Peng, X. Yun, Y. Zhang, and S. Li, “Malshoot: Shooting malicious domains through graph embedding on passive dns data,” in Proceedings of the 2018 International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom), 2018.
W. López, J. Merlino, and P. Rodriguez-Bocca, “Vector representation of internet domain names using a word embedding technique,” in Proceedings of the 43rd IEEE Latin American Computer Conference (CLEI), 2017.
F. Le, M. Srivatsa, and D. Verma, “Unearthing and exploiting latent semantics behind dns domains for deep network traffic analysis,” in Proceedings of 2019 IJCAI Workshop on AI for Internet of Things (AI4IoT), 2019.
Y. Zhang and Q. Yang, “A survey on multi-task learning,” arXiv preprint:1707.08114, 2017.
H. Le, Q. Pham, D. Sahoo, and S. C. Hoi, “Urlnet: learning a url representation with deep learning for malicious url detection,” arXiv preprint:1802.03162, 2018.
Y. Zhauniarovich, I. Khalil, T. Yu, and M. Dacier, “A survey on malicious domains detection through dns data analysis,” ACM Computing Surveys (CSUR), vol. 51, no. 4, p. 67, 2018.
B. Rahbarinia, R. Perdisci, and M. Antonakakis, “Efficient and accurate behavior-based tracking of malware-control domains in large isp networks,” ACM Transactions on Privacy and Security (TOPS), vol. 19, no. 2, p. 4, 2016.
I. Khalil, T. Yu, and B. Guan, “Discovering malicious domains through passive dns data graph analysis,” in Proceedings of the 11th ACM Asia Conference on Computer and Communications Security (ASIACCS), 2016.
M. Qin, D. Jin, K. Lei, B. Gabrys, and K. Musial-Gabrys, “Adaptive community detection incorporating topology and content in social networks,” Knowledge-Based Systems, vol. 161, pp. 342–356, 2018.
M. Qin, K. Lei, B. Bai, and G. Zhang, “Towards a profiling view for unsupervised traffic classification by exploring the statistic features and link patterns,” in Proceedings of the 2019 ACM SIGOMM Workshop on Network Meets AI & ML, 2019, pp. 50–56.
W. Li, M. Qin, and K. Lei, “Identifying interpretable link communities with user interactions and messages in social networks,” in Proceedings of the 2019 IEEE International Conference on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom), 2019, pp. 271–278.
M. Qin and K. Lei, “Dual-channel hybrid community detection in attributed networks,” Information Sciences, vol. 551, pp. 146–167, 2021.
M. Qin, C. Zhang, B. Bai, G. Zhang, and D.-Y. Yeung, “Towards a better trade-off between quality and efficiency of community detection: An inductive embedding method across graphs,” ACM Transactions on Knowledge Discovery from Data (TKDD), vol. 17, no. 9, pp. 127:1–127:34, 2023.
Y. Gao, M. Qin, Y. Ding, L. Zeng, C. Zhang, W. Zhang, W. Han, R. Zhao, and B. Bai, “Raftgp: Random fast graph partitioning,” in 2023 IEEE High Performance Extreme Computing Conference (HPEC), 2023, pp. 1–7.
J. Qiu, Y. Dong, H. Ma, J. Li, K. Wang, and J. Tang, “Network embedding as matrix factorization: Unifying deepwalk, line, pte, and node2vec,” in Proceedings of the 11th ACM International Conference on Web Search and Data Mining (WSDM), 2018.
B. Perozzi, R. Al-Rfou, and S. Skiena, “Deepwalk: Online learning of social representations,” in Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), 2014.
X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS), 2010.
T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint:1301.3781, 2013.
J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei, “Line: Large-scale information network embedding,” in Proceedings of the 24th International Conference on World Wide Web (WWW), 2015.
T. Fawcett, “An introduction to roc analysis,” Pattern Recognition Letters, vol. 27, no. 8, pp. 861–874, 2006.
L. v. d. Maaten and G. Hinton, “Visualizing data using t-sne,” Journal of Machine Learning Research (JMLR), vol. 9, no. 11, pp. 2579–2605, 2008.
F. Zou, S. Zhang, B. Pei, L. Pan, L. Li, and J. Li, “Survey on domain name system security,” in Proceedings of the 1st IEEE International Conference on Data Science in Cyberspace (DSC), 2016, pp. 602–607.
D. S. Berman, A. L. Buczak, J. S. Chavis, and C. L. Corbett, “A survey of deep learning methods for cyber security,” Information, vol. 10, no. 4, p. 122, 2019.
A. Khormali, J. Park, H. Alasmary, A. Anwar, and D. Mohaisen, “Domain name system security and privacy: A contemporary survey,” arXiv preprint arXiv:2006.15277, 2020.
H. Esquivel, A. Akella, and T. Mori, “On the effectiveness of ip reputation for spam filtering,” in Proceedings of the 2nd IEEE International Conference on Communication Systems and Networks (COMSNETS), 2010.
S. Sinha, M. Bailey, and F. Jahanian, “Shades of grey: On the effectiveness of reputation-based “blacklists”,” in Proceedings of the 3rd International Conference on Malicious and Unwanted Software (MALWARE), 2008.
V. Bartoš and J. Kořenek, “Evaluating reputation of internet entities,” in Proceedings of the 2016 IFIP International Conference on Autonomous Infrastructure, Management and Security (AIMS), 2016.
K. Lei, M. Qin, B. Bai, and G. Zhang, “Adaptive multiple non-negative matrix factorization for temporal link prediction in dynamic networks,” in Proceedings of the 2018 ACM SIGCOMM Workshop on Network Meets AI & ML, 2018, pp. 28–34.
K. Lei, M. Qin, B. Bai, G. Zhang, and M. Yang, “Gcn-gan: A non-linear temporal link prediction model for weighted dynamic networks,” in Proceedings of the 2019 IEEE Conference on Computer Communications (INFOCOM), 2019, pp. 388–396.
M. Qin and D.-Y. Yeung, “Temporal link prediction: A unified framework, taxonomy, and review,” ACM Computing Surveys, vol. 56, no. 4, pp. 1–40, 2023.
M. Qin, C. Zhang, B. Bai, G. Zhang, and D.-Y. Yeung, “High-quality temporal link prediction for weighted dynamic graphs via inductive embedding aggregation,” IEEE Transactions on Knowledge & Data Engineering (TKDE), vol. 35, no. 9, pp. 9378–9393, 2023.

Summary

We haven't generated a summary for this paper yet.

Summarize Now

A Matrix Factorization Based Network Embedding Method for DNS Analysis (2401.07410v2)

Summary

Related Papers