AGIR: Automating Cyber Threat Intelligence Reporting with Natural Language Generation (2310.02655v1)
Abstract: Cyber Threat Intelligence (CTI) reporting is pivotal in contemporary risk management strategies. As the volume of CTI reports continues to surge, the demand for automated tools to streamline report generation becomes increasingly apparent. While Natural Language Processing techniques have shown potential in handling text data, they often struggle to address the complexity of diverse data sources and their intricate interrelationships. Moreover, established paradigms like STIX have emerged as de facto standards within the CTI community, emphasizing the formal categorization of entities and relations to facilitate consistent data sharing. In this paper, we introduce AGIR (Automatic Generation of Intelligence Reports), a transformative Natural Language Generation tool specifically designed to address the pressing challenges in the realm of CTI reporting. AGIR's primary objective is to empower security analysts by automating the labor-intensive task of generating comprehensive intelligence reports from formal representations of entity graphs. AGIR utilizes a two-stage pipeline by combining the advantages of template-based approaches and the capabilities of LLMs such as ChatGPT. We evaluate AGIR's report generation capabilities both quantitatively and qualitatively. The generated reports accurately convey information expressed through formal language, achieving a high recall value (0.99) without introducing hallucination. Furthermore, we compare the fluency and utility of the reports with state-of-the-art approaches, showing how AGIR achieves higher scores in terms of Syntactic Log-Odds Ratio (SLOR) and through questionnaires. By using our tool, we estimate that the report writing time is reduced by more than 40%, therefore streamlining the CTI production of any organization and contributing to the automation of several CTI tasks.
- P. Chen, L. Desmet, and C. Huygens, “A study on advanced persistent threats,” in Communications and Multimedia Security: 15th IFIP TC 6/TC 11 International Conference, CMS 2014, Aveiro, Portugal, September 25-26, 2014. Proceedings 15, pp. 63–72, Springer, 2014.
- T. D. Wagner, K. Mahbub, E. Palomar, and A. E. Abdallah, “Cyber threat intelligence sharing: Survey and research directions,” Computers & Security, vol. 87, p. 101589, 2019.
- S. Barnum, “Standardizing cyber threat intelligence information with the structured threat information expression (stix),” Mitre Corporation, vol. 11, pp. 1–22, 2012.
- K. Oosthoek and C. Doerr, “Cyber threat intelligence: A product without a process?,” International Journal of Intelligence and CounterIntelligence, vol. 34, no. 2, pp. 300–315, 2021.
- R. Dale, “Natural language generation: The commercial state of the art in 2020,” Natural Language Engineering, vol. 26, no. 4, pp. 481–487, 2020.
- V. Plachouras, C. Smiley, H. Bretz, O. Taylor, J. L. Leidner, D. Song, and F. Schilder, “Interacting with financial data using natural language,” in Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, pp. 1121–1124, 2016.
- S. Balloccu, S. Pauws, and E. Reiter, “A nlg framework for user tailoring and profiling in healthcare.,” in SmartPhil@ IUI, pp. 13–32, 2020.
- E. Adamopoulou and L. Moussiades, “An overview of chatbot technology,” in IFIP international conference on artificial intelligence applications and innovations, pp. 373–383, Springer, 2020.
- Y. Keim and A. Mohapatra, “Cyber threat intelligence framework using advanced malware forensics,” International Journal of Information Technology, pp. 1–10, 2019.
- Z. Porkorny, “What are the phases of the threat intelligence lifecycle,” The Threat Intelligence Handbook, 2018.
- B. Jordan, “GitHub - freetaxii/stix2-graphics: Graphics, icons, and diagrams to support STIX 2 — github.com.” https://github.com/freetaxii/stix2-graphics. [Accessed 21-09-2023].
- D. D. McDonald, “Issues in the choice of a source for natural language generation,” Computational Linguistics, vol. 19, no. 1, pp. 191–197, 1993.
- E. Reiter and R. Dale, “Building applied natural language generation systems,” Natural Language Engineering, vol. 3, no. 1, pp. 57–87, 1997.
- A. Gatt and E. Krahmer, “Survey of the state of the art in natural language generation: Core tasks, applications and evaluation,” Journal of Artificial Intelligence Research, vol. 61, pp. 65–170, 2018.
- E. Reiter, “Nlg vs. templates,” arXiv preprint cmp-lg/9504013, 1995.
- M. Theune, E. Klabbers, J.-R. De Pijper, E. Krahmer, and J. Odijk, “From data to speech: a general approach,” Natural Language Engineering, vol. 7, no. 1, pp. 47–86, 2001.
- M. Kale and A. Rastogi, “Template guided text generation for task-oriented dialogue,” arXiv preprint arXiv:2004.15006, 2020.
- F. Marchiori, M. Conti, and N. V. Verde, “Stixnet: A novel and modular solution for extracting all stix objects in cti reports,” in Proceedings of the 18th International Conference on Availability, Reliability and Security, ARES ’23, (New York, NY, USA), Association for Computing Machinery, 2023.
- A. Das and R. Verma, “Automated email generation for targeted attacks using natural language,” arXiv preprint arXiv:1908.06893, 2019.
- H. K. Skrodelis, A. Romanovs, N. Zenina, and H. Gorskis, “The latest in natural language generation: Trends, tools and applications in industry,” in 2023 IEEE 10th Jubilee Workshop on Advances in Information, Electronic and Electrical Engineering (AIEEE), pp. 1–5, IEEE, 2023.
- P. Ranade, A. Piplai, S. Mittal, A. Joshi, and T. Finin, “Generating fake cyber threat intelligence using transformer-based models,” in 2021 International Joint Conference on Neural Networks (IJCNN), pp. 1–9, IEEE, 2021.
- J. Abraham and S. Polzunov, “Narrator: Generating intelligence reports from structured data,” Forum of Incident Response and Security Teams (FIRST), 2020.
- B. E. Strom, A. Applebaum, D. P. Miller, K. C. Nickels, A. G. Pennington, and C. B. Thomas, “Mitre att&ck: Design and philosophy,” in Technical report, The MITRE Corporation, 2018.
- M. Mijwil, M. Aljanabi, et al., “Towards artificial intelligence-based cybersecurity: the practices and chatgpt generated ways to combat cybercrime,” Iraqi Journal For Computer Science and Mathematics, vol. 4, no. 1, pp. 65–70, 2023.
- K. Kann, S. Rothe, and K. Filippova, “Sentence-level fluency evaluation: References help, but can be spared!,” arXiv preprint arXiv:1809.08731, 2018.
- J. H. Lau, A. Clark, and S. Lappin, “Grammaticality, acceptability, and probability: A probabilistic view of linguistic knowledge,” Cognitive science, vol. 41, no. 5, pp. 1202–1241, 2017.
- Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, and Q. V. Le, “Xlnet: Generalized autoregressive pretraining for language understanding,” Advances in neural information processing systems, vol. 32, 2019.
- R. Likert, “A technique for the measurement of attitudes.,” Archives of psychology, 1932.