Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
88 tokens/sec
Gemini 2.5 Pro Premium
45 tokens/sec
GPT-5 Medium
37 tokens/sec
GPT-5 High Premium
24 tokens/sec
GPT-4o
91 tokens/sec
DeepSeek R1 via Azure Premium
91 tokens/sec
GPT OSS 120B via Groq Premium
466 tokens/sec
Kimi K2 via Groq Premium
103 tokens/sec
2000 character limit reached

Malware Classification Leveraging NLP & Machine Learning for Enhanced Accuracy (2506.16224v1)

Published 19 Jun 2025 in cs.CR and cs.LG

Abstract: This paper investigates the application of NLP-based n-gram analysis and machine learning techniques to enhance malware classification. We explore how NLP can be used to extract and analyze textual features from malware samples through n-grams, contiguous string or API call sequences. This approach effectively captures distinctive linguistic patterns among malware and benign families, enabling finer-grained classification. We delve into n-gram size selection, feature representation, and classification algorithms. While evaluating our proposed method on real-world malware samples, we observe significantly improved accuracy compared to the traditional methods. By implementing our n-gram approach, we achieved an accuracy of 99.02% across various machine learning algorithms by using hybrid feature selection technique to address high dimensionality. Hybrid feature selection technique reduces the feature set to only 1.6% of the original features.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.