Papers
Topics
Authors
Recent
2000 character limit reached

Unveiling A Hidden Risk: Exposing Educational but Malicious Repositories in GitHub

Published 7 Mar 2024 in cs.SE and cs.CR | (2403.04419v1)

Abstract: Are malicious repositories hiding under the educational label in GitHub? Recent studies have identified collections of GitHub repositories hosting malware source code with notable collaboration among the developers. Thus, analyzing GitHub repositories deserves inevitable attention due to its open-source nature providing easy access to malicious software code and artifacts. Here we leverage the capabilities of ChatGPT in a qualitative study to annotate an educational GitHub repository based on maliciousness of its metadata contents. Our contribution is twofold. First, we demonstrate the employment of ChatGPT to understand and annotate the content published in software repositories. Second, we provide evidence of hidden risk in educational repositories contributing to the opportunities of potential threats and malicious intents. We carry out a systematic study on a collection of 35.2K GitHub repositories claimed to be created for educational purposes only. First, our study finds an increasing trend in the number of such repositories published every year. Second, 9294 of them are labeled by ChatGPT as malicious, and further categorization of the malicious ones detects 14 different malware families including DDoS, keylogger, ransomware and so on. Overall, this exploratory study flags a wake-up call for the community for better understanding and analysis of software platforms.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (7)
  1. A study on prompt design, advantages and limitations of chatgpt for deep learning program repair. arXiv preprint arXiv:2304.08191 (2023).
  2. GitHub. 2023. https://docs.github.com/en/site-policy/acceptable-use-policies/github-active-malware-or-exploits
  3. The Hacker News. 2023. https://thehackernews.com/2023/06/fake-researcher-profiles-spread-malware.html
  4. SourceFinder: Finding Malware Source-Code from Publicly Available Repositories in GitHub.. In RAID. 149–163.
  5. Wikipedia. 2023. https://en.wikipedia.org/wiki/GitHub
  6. Cyber-guided deep neural network for malicious repository detection in GitHub. In 2020 IEEE International Conference on Knowledge Graph (ICKG). IEEE, 458–465.
  7. Guido Zuccon and Bevan Koopman. 2023. Dr ChatGPT, tell me what I want to hear: How prompt knowledge impacts health answer correctness. arXiv preprint arXiv:2302.13793 (2023).

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 1 like about this paper.