An Empirical Study of C++ Vulnerabilities in Crowd-Sourced Code Examples (1910.01321v2)

Published 3 Oct 2019 in cs.SE

Abstract: Software developers share programming solutions in Q&A sites like Stack Overflow. The reuse of crowd-sourced code snippets can facilitate rapid prototyping. However, recent research shows that the shared code snippets may be of low quality and can even contain vulnerabilities. This paper aims to understand the nature and the prevalence of security vulnerabilities in crowd-sourced code examples. To achieve this goal, we investigate security vulnerabilities in the C++ code snippets shared on Stack Overflow over a period of 10 years. In collaborative sessions involving multiple human coders, we manually assessed each code snippet for security vulnerabilities following CWE (Common Weakness Enumeration) guidelines. From the 72,483 reviewed code snippets used in at least one project hosted on GitHub, we found a total of 69 vulnerable code snippets categorized into 29 types. Many of the investigated code snippets are still not corrected on Stack Overflow. The 69 vulnerable code snippets found in Stack Overflow were reused in a total of 2859 GitHub projects. To help improve the quality of code snippets shared on Stack Overflow, we developed a browser extension that allow Stack Overflow users to check for vulnerabilities in code snippets when they upload them on the platform.

Citations (49)

View on Semantic Scholar

Summary

The paper conducted an empirical study of 72,483 C++ code snippets from Stack Overflow reused in GitHub, revealing prevalent security vulnerabilities and their propagation.
The study identified 99 vulnerable snippets across 31 CWE types, noting major issues like bad coding practices, improper checks, and input validation failures.
These vulnerabilities pose security risks to software projects reusing Stack Overflow code, highlighting the need for tools like the proposed browser extension to notify developers.

An Empirical Study of C++ Vulnerabilities in Crowd-Sourced Code Examples

The paper, "An Empirical Study of C++ Vulnerabilities in Crowd-Sourced Code Examples," delineates an investigative effort into the prevalence and dissemination of vulnerabilities in C++ code snippets shared on Stack Overflow over a decade. This paper provides a meticulous manual review of 72,483 C++ code snippets that were reused in at least one GitHub project.

Key Findings

The empirical analysis identified 99 vulnerable code snippets, encompassing 31 distinct vulnerability types as per the Common Weakness Enumeration (CWE) classification. The paper notably highlights that a significant quantity of these vulnerable code snippets remain uncorrected on Stack Overflow, having been propagated to 2,859 GitHub files. Predominant vulnerabilities include CWE-1006 (Bad Coding Practices), CWE-754 (Improper Check for Unusual or Exceptional Conditions), and CWE-20 (Improper Input Validation). These vulnerabilities underscore critical deficiencies in code validation and adherence to secure coding guidelines.

Methodology

A rigorous methodology was employed, comprising multiple stages of manual inspection by reviewers with expertise in Software Security. The paper utilized the SOTorrent dataset to trace the migration of C++ code snippets from Stack Overflow to GitHub repositories. To ensure precision, the investigation employed Syntaxnet to identify valid C++ code snippets and SourcererCC for clone detection among duplicate codes—targeting Type-1 code clones with exact matches.

Implications

The practical implications of the findings are profound, especially considering that reused code snippets are widely adopted across various software projects. The presence of security vulnerabilities in Stack Overflow's shared code examples poses considerable security risks to real-world software projects that incorporate these snippets. This necessitates developing auxiliary tools and practices to enhance the security of crowd-sourced code adoption.

The authors propose a browser extension to notify developers of potential vulnerabilities directly when visiting Stack Overflow. This tool is designed to operate across multiple programming languages, leveraging a REST-based architecture to enable real-time security alerts based on an actively updated database of known vulnerabilities.

Research Contributions

The paper provides compelling empirical evidence of vulnerability migration between widespread online platforms used by developers. Unlike prior studies, this research expands into the C++ programming language, which is critical given C++'s prominence in mission-critical and resource-constrained domains. Furthermore, it applies a longitudinal approach, examining the code snippets over a decade and their repercussions in GitHub repositories.

Speculation on Future Developments

In contemplating future developments, this research could serve as a foundational effort poised to enhance automated tools for detecting vulnerabilities in diverse cyberinfrastructures. Future studies could replicate this methodology across other programming realms and collaborative platforms to develop a comprehensive understanding of vulnerability propagation and the systemic risks associated with crowd-sourced code.

Overall, this paper contributes to the broader discourse on code security within collaborative software development ecosystems, highlighting the critical intersection of software engineering practices and cybersecurity standards. Such insights are invaluable for steering both academic inquiry and practical security enhancements in software development processes.

Related Papers

YouTube

Show All Videos