Papers
Topics
Authors
Recent
2000 character limit reached

Label Unification for Cross-Dataset Generalization in Cybersecurity NER (2507.13870v1)

Published 18 Jul 2025 in cs.CL

Abstract: The field of cybersecurity NER lacks standardized labels, making it challenging to combine datasets. We investigate label unification across four cybersecurity datasets to increase data resource usability. We perform a coarse-grained label unification and conduct pairwise cross-dataset evaluations using BiLSTM models. Qualitative analysis of predictions reveals errors, limitations, and dataset differences. To address unification limitations, we propose alternative architectures including a multihead model and a graph-based transfer model. Results show that models trained on unified datasets generalize poorly across datasets. The multihead model with weight sharing provides only marginal improvements over unified training, while our graph-based transfer model built on BERT-base-NER shows no significant performance gains compared BERT-base-NER.

Summary

We haven't generated a summary for this paper yet.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.