Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AGIR: Automating Cyber Threat Intelligence Reporting with Natural Language Generation (2310.02655v1)

Published 4 Oct 2023 in cs.CR and cs.CL

Abstract: Cyber Threat Intelligence (CTI) reporting is pivotal in contemporary risk management strategies. As the volume of CTI reports continues to surge, the demand for automated tools to streamline report generation becomes increasingly apparent. While Natural Language Processing techniques have shown potential in handling text data, they often struggle to address the complexity of diverse data sources and their intricate interrelationships. Moreover, established paradigms like STIX have emerged as de facto standards within the CTI community, emphasizing the formal categorization of entities and relations to facilitate consistent data sharing. In this paper, we introduce AGIR (Automatic Generation of Intelligence Reports), a transformative Natural Language Generation tool specifically designed to address the pressing challenges in the realm of CTI reporting. AGIR's primary objective is to empower security analysts by automating the labor-intensive task of generating comprehensive intelligence reports from formal representations of entity graphs. AGIR utilizes a two-stage pipeline by combining the advantages of template-based approaches and the capabilities of LLMs such as ChatGPT. We evaluate AGIR's report generation capabilities both quantitatively and qualitatively. The generated reports accurately convey information expressed through formal language, achieving a high recall value (0.99) without introducing hallucination. Furthermore, we compare the fluency and utility of the reports with state-of-the-art approaches, showing how AGIR achieves higher scores in terms of Syntactic Log-Odds Ratio (SLOR) and through questionnaires. By using our tool, we estimate that the report writing time is reduced by more than 40%, therefore streamlining the CTI production of any organization and contributing to the automation of several CTI tasks.

Automating Cyber Threat Intelligence Reporting with AGIR

Introduction to AGIR

In the rapidly evolving domain of Cyber Threat Intelligence (CTI), the need for efficient, automated solutions for generating comprehensive reports is increasingly critical. The paper introduces AGIR (Automatic Generation of Intelligence Reports), a novel tool in the cybersecurity field designed to streamline the complex process of producing CTI reports. By leveraging a combination of template-based approaches and the capabilities of LLMs such as ChatGPT, AGIR aims to significantly reduce the time and effort required to generate detailed intelligence reports from structured data.

Challenges in CTI Reporting

CTI plays a pivotal role in contemporary risk management strategies, yet the sheer volume and complexity of data involved pose significant challenges. Traditional manual report generation is not only time-consuming but also prone to inconsistencies due to the varied nature of the data sources involved. Moreover, the adoption of standards like STIX for formal data representation highlights the need for tools that can interpret and transform structured data into insightful, fluent reports.

AGIR's Approach

AGIR addresses these challenges with a two-stage pipeline: the first leverages a template-based module for initial report generation, and the second employs an LLM for enhancing report fluency and utility. This innovative approach enables the automatic creation of four distinct types of reports: Overview, Subject, Timeline, and Vulnerability, each catering to specific analytical needs. Importantly, AGIR combines the precision and structured input handling of template-based methods with the generative capabilities of LLMs, delivering reports that are not only accurate but also readable and engaging.

Evaluation of AGIR

The evaluation of AGIR's performance, both quantitative and qualitative, underscores its effectiveness. The system achieved a remarkable recall value of 0.99, indicating that almost all relevant information from the input data is faithfully represented in the output reports without any hallucination. Additionally, AGIR demonstrated superior fluency and utility over state-of-the-art approaches, as evidenced by higher Syntactic Log-Odds Ratio (SLOR) scores and positive feedback from experienced cyber threat analysts. Crucially, AGIR was found to reduce report writing time by more than 40%, signifying a substantial efficiency gain for organizations.

Theoretical and Practical Implications

From a theoretical standpoint, AGIR contributes to the ongoing discourse on the application of NLG in cybersecurity, particularly in the area of CTI reporting. Practically, its ability to automate the generation of detailed, fluent reports from structured data presents a significant advancement for security analysts. By reducing the manual effort required in report creation, AGIR allows analysts to dedicate more time to strategic analysis and response activities, thereby enhancing the overall effectiveness of CTI practices.

Future Prospects

Looking ahead, the development of AGIR opens up several avenues for future research. Enhancements in the tool's pipeline to support additional report types and formats could further its applicability. Moreover, integrating AGIR with an extensive database for training a dedicated LLM could address potential limitations related to third-party LLMs, including cost and privacy concerns.

In conclusion, AGIR represents a substantial step forward in the automation of CTI reporting, offering a practical solution to the challenges faced by security analysts. By combining the strengths of template-based approaches with the advanced capabilities of LLMs, AGIR sets a new benchmark for efficiency and effectiveness in the generation of cybersecurity intelligence reports.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. P. Chen, L. Desmet, and C. Huygens, “A study on advanced persistent threats,” in Communications and Multimedia Security: 15th IFIP TC 6/TC 11 International Conference, CMS 2014, Aveiro, Portugal, September 25-26, 2014. Proceedings 15, pp. 63–72, Springer, 2014.
  2. T. D. Wagner, K. Mahbub, E. Palomar, and A. E. Abdallah, “Cyber threat intelligence sharing: Survey and research directions,” Computers & Security, vol. 87, p. 101589, 2019.
  3. S. Barnum, “Standardizing cyber threat intelligence information with the structured threat information expression (stix),” Mitre Corporation, vol. 11, pp. 1–22, 2012.
  4. K. Oosthoek and C. Doerr, “Cyber threat intelligence: A product without a process?,” International Journal of Intelligence and CounterIntelligence, vol. 34, no. 2, pp. 300–315, 2021.
  5. R. Dale, “Natural language generation: The commercial state of the art in 2020,” Natural Language Engineering, vol. 26, no. 4, pp. 481–487, 2020.
  6. V. Plachouras, C. Smiley, H. Bretz, O. Taylor, J. L. Leidner, D. Song, and F. Schilder, “Interacting with financial data using natural language,” in Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, pp. 1121–1124, 2016.
  7. S. Balloccu, S. Pauws, and E. Reiter, “A nlg framework for user tailoring and profiling in healthcare.,” in SmartPhil@ IUI, pp. 13–32, 2020.
  8. E. Adamopoulou and L. Moussiades, “An overview of chatbot technology,” in IFIP international conference on artificial intelligence applications and innovations, pp. 373–383, Springer, 2020.
  9. Y. Keim and A. Mohapatra, “Cyber threat intelligence framework using advanced malware forensics,” International Journal of Information Technology, pp. 1–10, 2019.
  10. Z. Porkorny, “What are the phases of the threat intelligence lifecycle,” The Threat Intelligence Handbook, 2018.
  11. B. Jordan, “GitHub - freetaxii/stix2-graphics: Graphics, icons, and diagrams to support STIX 2 — github.com.” https://github.com/freetaxii/stix2-graphics. [Accessed 21-09-2023].
  12. D. D. McDonald, “Issues in the choice of a source for natural language generation,” Computational Linguistics, vol. 19, no. 1, pp. 191–197, 1993.
  13. E. Reiter and R. Dale, “Building applied natural language generation systems,” Natural Language Engineering, vol. 3, no. 1, pp. 57–87, 1997.
  14. A. Gatt and E. Krahmer, “Survey of the state of the art in natural language generation: Core tasks, applications and evaluation,” Journal of Artificial Intelligence Research, vol. 61, pp. 65–170, 2018.
  15. E. Reiter, “Nlg vs. templates,” arXiv preprint cmp-lg/9504013, 1995.
  16. M. Theune, E. Klabbers, J.-R. De Pijper, E. Krahmer, and J. Odijk, “From data to speech: a general approach,” Natural Language Engineering, vol. 7, no. 1, pp. 47–86, 2001.
  17. M. Kale and A. Rastogi, “Template guided text generation for task-oriented dialogue,” arXiv preprint arXiv:2004.15006, 2020.
  18. F. Marchiori, M. Conti, and N. V. Verde, “Stixnet: A novel and modular solution for extracting all stix objects in cti reports,” in Proceedings of the 18th International Conference on Availability, Reliability and Security, ARES ’23, (New York, NY, USA), Association for Computing Machinery, 2023.
  19. A. Das and R. Verma, “Automated email generation for targeted attacks using natural language,” arXiv preprint arXiv:1908.06893, 2019.
  20. H. K. Skrodelis, A. Romanovs, N. Zenina, and H. Gorskis, “The latest in natural language generation: Trends, tools and applications in industry,” in 2023 IEEE 10th Jubilee Workshop on Advances in Information, Electronic and Electrical Engineering (AIEEE), pp. 1–5, IEEE, 2023.
  21. P. Ranade, A. Piplai, S. Mittal, A. Joshi, and T. Finin, “Generating fake cyber threat intelligence using transformer-based models,” in 2021 International Joint Conference on Neural Networks (IJCNN), pp. 1–9, IEEE, 2021.
  22. J. Abraham and S. Polzunov, “Narrator: Generating intelligence reports from structured data,” Forum of Incident Response and Security Teams (FIRST), 2020.
  23. B. E. Strom, A. Applebaum, D. P. Miller, K. C. Nickels, A. G. Pennington, and C. B. Thomas, “Mitre att&ck: Design and philosophy,” in Technical report, The MITRE Corporation, 2018.
  24. M. Mijwil, M. Aljanabi, et al., “Towards artificial intelligence-based cybersecurity: the practices and chatgpt generated ways to combat cybercrime,” Iraqi Journal For Computer Science and Mathematics, vol. 4, no. 1, pp. 65–70, 2023.
  25. K. Kann, S. Rothe, and K. Filippova, “Sentence-level fluency evaluation: References help, but can be spared!,” arXiv preprint arXiv:1809.08731, 2018.
  26. J. H. Lau, A. Clark, and S. Lappin, “Grammaticality, acceptability, and probability: A probabilistic view of linguistic knowledge,” Cognitive science, vol. 41, no. 5, pp. 1202–1241, 2017.
  27. Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, and Q. V. Le, “Xlnet: Generalized autoregressive pretraining for language understanding,” Advances in neural information processing systems, vol. 32, 2019.
  28. R. Likert, “A technique for the measurement of attitudes.,” Archives of psychology, 1932.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Filippo Perrina (1 paper)
  2. Francesco Marchiori (17 papers)
  3. Mauro Conti (195 papers)
  4. Nino Vincenzo Verde (3 papers)
Citations (5)