Papers
Topics
Authors
Recent
Search
2000 character limit reached

Towards a better labeling process for network security datasets

Published 2 May 2023 in cs.CR | (2305.01337v1)

Abstract: Most network security datasets do not have comprehensive label assignment criteria, hindering the evaluation of the datasets, the training of models, the results obtained, the comparison with other methods, and the evaluation in real-life scenarios. There is no labeling ontology nor tools to help assign the labels, resulting in most analyzed datasets assigning labels in files or directory names. This paper addresses the problem of having a better labeling process by (i) reviewing the needs of stakeholders of the datasets, from creators to model users, (ii) presenting a new ontology of label assignment, (iii) presenting a new tool for assigning structured labels for Zeek network flows based on the ontology, and (iv) studying the differences between generating labels and consuming labels in real-life scenarios. We conclude that a process for structured label assignment is paramount for advancing research in network security and that the new ontology-based label assignation rules should be published as an artifact of every dataset.

Citations (2)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.