Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
51 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Adding guardrails to advanced chatbots (2306.07500v1)

Published 13 Jun 2023 in cs.CY, cs.AI, and cs.CL

Abstract: Generative AI models continue to become more powerful. The launch of ChatGPT in November 2022 has ushered in a new era of AI. ChatGPT and other similar chatbots have a range of capabilities, from answering student homework questions to creating music and art. There are already concerns that humans may be replaced by chatbots for a variety of jobs. Because of the wide spectrum of data chatbots are built on, we know that they will have human errors and human biases built into them. These biases may cause significant harm and/or inequity toward different subpopulations. To understand the strengths and weakness of chatbot responses, we present a position paper that explores different use cases of ChatGPT to determine the types of questions that are answered fairly and the types that still need improvement. We find that ChatGPT is a fair search engine for the tasks we tested; however, it has biases on both text generation and code generation. We find that ChatGPT is very sensitive to changes in the prompt, where small changes lead to different levels of fairness. This suggests that we need to immediately implement "corrections" or mitigation strategies in order to improve fairness of these systems. We suggest different strategies to improve chatbots and also advocate for an impartial review panel that has access to the model parameters to measure the levels of different types of biases and then recommends safeguards that move toward responses that are less discriminatory and more accurate.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. N. Grant and C. Metz, “A new chat bot is a ’code red’ for google’s search business,” December 2022. [Online]. Available: https://www.nytimes.com/2022/12/21/technology/ai-chatgpt-google-search.html
  2. J. Briggs and D. Kodnani, “Generative ai could raise global gdp by 7%,” April 2023.
  3. F. of Life Institute, “Pause giant ai experiments: An open letter,” March 2023.
  4. S. Schechner, “Chatgpt and advanced ai face new regulatory push in europe,” April 2023. [Online]. Available: https://www.wsj.com/articles/chatgpt-and-advanced-ai-face-new-regulatory-push-in-europe-16e3282c
  5. D. Shepardson and D. Bartz, “Us begins study of possible rules to regulate ai like chatgpt,” April 2023. [Online]. Available: https://www.reuters.com/technology/us-begins-study-possible-rules-regulate-ai-like-chatgpt-2023-04-11/
  6. T. Strohman, D. Metzler, H. Turtle, and W. B. Croft, “Indri: A language model-based search engine for complex queries,” in Proceedings of the international conference on intelligent analysis, vol. 2, no. 6.   Washington, DC., 2005, pp. 2–6.
  7. D. Metzler and W. B. Croft, “Combining the language model and inference network approaches to retrieval,” Information processing & management, vol. 40, no. 5, pp. 735–750, 2004.
  8. E. Clark, A. S. Ross, C. Tan, Y. Ji, and N. A. Smith, “Creative writing with a machine in the loop: Case studies on slogans and stories,” in 23rd International Conference on Intelligent User Interfaces, 2018, pp. 329–340.
  9. K. Elkins and J. Chun, “Can gpt-3 pass a writer’s turing test?” Journal of Cultural Analytics, vol. 5, no. 2, 2020.
  10. N. Akoury, S. Wang, J. Whiting, S. Hood, N. Peng, and M. Iyyer, “Storium: A dataset and evaluation platform for machine-in-the-loop story generation,” arXiv preprint arXiv:2010.01717, 2020.
  11. S. Toshniwal, A. Kannan, C.-C. Chiu, Y. Wu, T. N. Sainath, and K. Livescu, “A comparison of techniques for language model integration in encoder-decoder speech recognition,” in 2018 IEEE spoken language technology workshop (SLT).   IEEE, 2018, pp. 369–375.
  12. T. Nakatani, “Improving transformer-based end-to-end speech recognition with connectionist temporal classification and language model integration,” in Proc. Interspeech, vol. 2019, 2019.
  13. C. Shan, C. Weng, G. Wang, D. Su, M. Luo, D. Yu, and L. Xie, “Component fusion: Learning replaceable language model component for end-to-end speech recognition system,” in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).   IEEE, 2019, pp. 5361–5635.
  14. T. Brants, A. C. Popat, P. Xu, F. J. Och, and J. Dean, “Large language models in machine translation,” 2007.
  15. D. He, Y. Xia, T. Qin, L. Wang, N. Yu, T.-Y. Liu, and W.-Y. Ma, “Dual learning for machine translation,” Advances in neural information processing systems, vol. 29, 2016.
  16. OpenAI and A. Pilipiszyn, “Gpt-3 powers the next generation of apps,” March 2021. [Online]. Available: https://openai.com/blog/gpt-3-apps
  17. A. Abid, M. Farooqi, and J. Zou, “Persistent anti-muslim bias in large language models,” in Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, 2021, pp. 298–306.
  18. L. Lucy and D. Bamman, “Gender and representation bias in gpt-3 generated stories,” in Proceedings of the Third Workshop on Narrative Understanding, 2021, pp. 48–55.
  19. U. B. O. L. STATISTICS, “Occupational outlook handbook,” September 2022. [Online]. Available: https://www.bls.gov/ooh/
  20. Glassdoor, “Discover careers,” April 2023, visited on 2023-04-29. [Online]. Available: https://www.glassdoor.com/Career/index.htm
  21. ——, “Software developer overview united states,” April 2023, visited on 2023-04-29. [Online]. Available: https://www.glassdoor.com/Career/software-developer-career_KO0,18.htm
  22. U. B. O. L. STATISTICS, “Occupational outlook handbook: Software developers, quality assurance analysts, and testers,” February 2023. [Online]. Available: https://www.bls.gov/ooh/computer-and-information-technology/software-developers.htm
  23. ——, “Labor force statistics from the current population survey,” January 2023. [Online]. Available: https://www.bls.gov/cps/cpsaat11.htm
  24. ——, “Occupational outlook handbook: Delivery truck drivers and driver/sales workers,” February 2023. [Online]. Available: https://www.bls.gov/ooh/transportation-and-material-moving/delivery-truck-drivers-and-driver-sales-workers.htm
  25. S. Bhaimiya, “Chatgpt could hypothetically get hired as an entry level coder if it interviewed at google, internal document reportedly says,” February 2023. [Online]. Available: https://www.businessinsider.com/chatgpt-google-could-get-hired-as-coder-internal-document-report-2023-2
  26. S. Delouya, “Chatgpt creator openai might be training its ai technology to replace some software engineers, report says,” January 2023. [Online]. Available: https://www.businessinsider.com/openai-chatgpt-contractors-train-ai-software-engineering-autonomous-vehicles-report-2023-1
  27. A. Mok and J. Zinkula, “Chatgpt may be coming for our jobs. here are the 10 roles that ai is most likely to replace,” April 2023. [Online]. Available: https://www.businessinsider.com/chatgpt-jobs-at-risk-replacement-artificial-intelligence-ai-labor-trends-2023-02
  28. J. Elias, “Google is asking employees to test potential chatgpt competitors, including a chatbot called ‘apprentice bard’,” January 2023. [Online]. Available: https://www.cnbc.com/2023/01/31/google-testing-chatgpt-like-chatbot-apprentice-bard-with-employees.html
  29. T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell et al., “Language models are few-shot learners,” Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020.
  30. D. Shah, H. A. Schwartz, and D. Hovy, “Predictive biases in natural language processing models: A conceptual framework and overview,” arXiv preprint arXiv:1912.11078, 2019.
  31. M. Sap, S. Gabriel, L. Qin, D. Jurafsky, N. A. Smith, and Y. Choi, “Social bias frames: Reasoning about social and power implications of language,” arXiv preprint arXiv:1911.03891, 2019.
  32. M. R. Costa-jussà, “An analysis of gender bias studies in natural language processing,” Nature Machine Intelligence, vol. 1, no. 11, pp. 495–496, 2019.
  33. S. L. Blodgett and B. O’Connor, “Racial disparity in natural language processing: A case study of social media african-american english,” arXiv preprint arXiv:1707.00061, 2017.
  34. F. Gilardi, M. Alizadeh, and M. Kubli, “Chatgpt outperforms crowd-workers for text-annotation tasks,” arXiv preprint arXiv:2303.15056, 2023.
  35. S. Meyer, D. Elsweiler, B. Ludwig, M. Fernandez-Pichel, and D. E. Losada, “Do we still need human assessors? prompt-based gpt-3 user simulation in conversational ai,” in Proceedings of the 4th Conference on Conversational User Interfaces, 2022, pp. 1–6.
  36. S. Wang, Y. Liu, Y. Xu, C. Zhu, and M. Zeng, “Want to reduce labeling cost? gpt-3 can help,” arXiv preprint arXiv:2108.13487, 2021.
  37. B. Chintagunta, N. Katariya, X. Amatriain, and A. Kannan, “Medically aware gpt-3 as a data generator for medical dialogue summarization,” in Machine Learning for Healthcare Conference.   PMLR, 2021, pp. 354–372.
  38. P. Hämäläinen, M. Tavast, and A. Kunnari, “Evaluating large language models in generating synthetic hci research data: a case study,” in Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, 2023, pp. 1–19.
  39. S. G. Mayson, “Bias in, bias out,” The Yale Law Journal, vol. 128, no. 8, pp. 2218–2300, 2019.
  40. S. Palminteri, G. Lefebvre, E. J. Kilford, and S.-J. Blakemore, “Confirmation bias in human reinforcement learning: Evidence from counterfactual feedback processing,” PLoS computational biology, vol. 13, no. 8, p. e1005684, 2017.
  41. T. Tarantola, T. Folke, A. Boldt, O. D. Pérez, and B. D. Martino, “Confirmation bias optimizes reward learning,” BioRxiv, pp. 2021–02, 2021.
  42. G. Lefebvre, C. Summerfield, and R. Bogacz, “A normative account of confirmation bias during reinforcement learning,” Neural Computation, vol. 34, no. 2, pp. 307–337, 2022.
  43. M. Yoshida, K. Matsumoto, M. Yoshida, and K. Kita, “A system to correct toxic expression with bert,” 2022.
  44. J. A. Leite, D. F. Silva, K. Bontcheva, and C. Scarton, “Toxic language detection in social media for brazilian portuguese: New dataset and multilingual analysis,” arXiv preprint arXiv:2010.04543, 2020.
  45. B. Van Aken, J. Risch, R. Krestel, and A. Löser, “Challenges for toxic comment classification: An in-depth error analysis,” arXiv preprint arXiv:1809.07572, 2018.
  46. A. Lindqvist, E. A. Renström, and M. Gustafsson Sendén, “Reducing a male bias in language? establishing the efficiency of three different gender-fair language strategies,” Sex Roles, vol. 81, no. 1, pp. 109–117, 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Yanchen Wang (8 papers)
  2. Lisa Singh (5 papers)
Citations (6)