Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Uncovering and Quantifying Social Biases in Code Generation (2305.15377v1)

Published 24 May 2023 in cs.CL

Abstract: With the popularity of automatic code generation tools, such as Copilot, the study of the potential hazards of these tools is gaining importance. In this work, we explore the social bias problem in pre-trained code generation models. We propose a new paradigm to construct code prompts and successfully uncover social biases in code generation models. To quantify the severity of social biases in generated code, we develop a dataset along with three metrics to evaluate the overall social bias and fine-grained unfairness across different demographics. Experimental results on three pre-trained code generation models (Codex, InCoder, and CodeGen) with varying sizes, reveal severe social biases. Moreover, we conduct analysis to provide useful insights for further choice of code generation models with low social bias. (This work contains examples that potentially implicate stereotypes, associations, and other harms that could be offensive to individuals in certain social groups.)

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Yan Liu (420 papers)
  2. Xiaokang Chen (39 papers)
  3. Yan Gao (157 papers)
  4. Zhe Su (33 papers)
  5. Fengji Zhang (12 papers)
  6. Daoguang Zan (24 papers)
  7. Jian-Guang Lou (69 papers)
  8. Pin-Yu Chen (311 papers)
  9. Tsung-Yi Ho (57 papers)
Citations (11)