Generating Fake Cyber Threat Intelligence Using Transformer-Based Models (2102.04351v3)

Published 8 Feb 2021 in cs.CR and cs.AI

Abstract: Cyber-defense systems are being developed to automatically ingest Cyber Threat Intelligence (CTI) that contains semi-structured data and/or text to populate knowledge graphs. A potential risk is that fake CTI can be generated and spread through Open-Source Intelligence (OSINT) communities or on the Web to effect a data poisoning attack on these systems. Adversaries can use fake CTI examples as training input to subvert cyber defense systems, forcing the model to learn incorrect inputs to serve their malicious needs. In this paper, we automatically generate fake CTI text descriptions using transformers. We show that given an initial prompt sentence, a public LLM like GPT-2 with fine-tuning, can generate plausible CTI text with the ability of corrupting cyber-defense systems. We utilize the generated fake CTI text to perform a data poisoning attack on a Cybersecurity Knowledge Graph (CKG) and a cybersecurity corpus. The poisoning attack introduced adverse impacts such as returning incorrect reasoning outputs, representation poisoning, and corruption of other dependent AI-based cyber defense systems. We evaluate with traditional approaches and conduct a human evaluation study with cybersecurity professionals and threat hunters. Based on the study, professional threat hunters were equally likely to consider our fake generated CTI as true.

PDF Abstract

Generating fake Cyber Threat Intelligence (CTI) using transformer-based models is a critical research area focusing on the potential risks posed by artificially created CTI data that could corrupt automated cyber-defense systems. This concept primarily revolves around using advanced LLMs, such as GPT-2, to produce believable yet false CTI text descriptions that mimic authentic threat reports.

In the paper "Generating Fake Cyber Threat Intelligence Using Transformer-Based Models" (Ranade et al., 2021 ), the authors illustrate how a public LLM like GPT-2, when fine-tuned, can generate plausible CTI text from initial prompts. This fake CTI can be used to strategically perform data poisoning attacks on cyber defense systems like Cybersecurity Knowledge Graphs (CKGs) and cybersecurity corpora. The generated counterfeit data has shown to induce several adverse effects including incorrect reasoning outputs, representation poisoning, and corruption of dependent AI-based cyber defense systems.

The paper conducted evaluations using traditional metrics and a human evaluation paper involving cybersecurity professionals and threat hunters. Notably, professional threat hunters were, at times, unable to distinguish between true and fake CTI generated by the model, highlighting the sophistication and believability of the synthetic data.

Furthermore, this research underlines significant implications for cybersecurity frameworks that rely on automated ingestion of Open-Source Intelligence (OSINT) for populating CTIs. Adversaries could exploit this vulnerability by injecting falsified intelligence to subvert the learning mechanisms of cyber defense systems, thus impairing their ability to accurately detect and respond to threats.

While transformer-based models present a powerful tool for automating various natural language processing tasks, their potential misuse in generating fake CTI signals the need for enhanced verification mechanisms and robust security measures within these systems. This ensures the integrity of intelligence ingested from external sources and mitigates the risks associated with adversarial data poisoning.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Priyanka Ranade (6 papers)
Aritran Piplai (16 papers)
Sudip Mittal (66 papers)
Anupam Joshi (23 papers)
Tim Finin (25 papers)

Citations (63)

View on Semantic Scholar

Generating Fake Cyber Threat Intelligence Using Transformer-Based Models (2102.04351v3)

Related Papers