Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Recognizing and Extracting Cybersecurtity-relevant Entities from Text (2208.01693v1)

Published 2 Aug 2022 in cs.CL, cs.AI, and cs.CR

Abstract: Cyber Threat Intelligence (CTI) is information describing threat vectors, vulnerabilities, and attacks and is often used as training data for AI-based cyber defense systems such as Cybersecurity Knowledge Graphs (CKG). There is a strong need to develop community-accessible datasets to train existing AI-based cybersecurity pipelines to efficiently and accurately extract meaningful insights from CTI. We have created an initial unstructured CTI corpus from a variety of open sources that we are using to train and test cybersecurity entity models using the spaCy framework and exploring self-learning methods to automatically recognize cybersecurity entities. We also describe methods to apply cybersecurity domain entity linking with existing world knowledge from Wikidata. Our future work will survey and test spaCy NLP tools and create methods for continuous integration of new information extracted from text.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Casey Hanks (1 paper)
  2. Michael Maiden (1 paper)
  3. Priyanka Ranade (6 papers)
  4. Tim Finin (25 papers)
  5. Anupam Joshi (23 papers)
Citations (3)