Papers
Topics
Authors
Recent
Search
2000 character limit reached

Counteracts: Testing Stereotypical Representation in Pre-trained Language Models

Published 11 Jan 2023 in cs.CL | (2301.04347v3)

Abstract: Recently, LLMs have demonstrated strong performance on various natural language understanding tasks. LLMs trained on large human-generated corpus encode not only a significant amount of human knowledge, but also the human stereotype. As more and more downstream tasks have integrated LLMs as part of the pipeline, it is necessary to understand the internal stereotypical representation in order to design the methods for mitigating the negative effects. In this paper, we use counterexamples to examine the internal stereotypical knowledge in pre-trained LLMs (PLMs) that can lead to stereotypical preference. We mainly focus on gender stereotypes, but the method can be extended to other types of stereotype. We evaluate 7 PLMs on 9 types of cloze-style prompt with different information and base knowledge. The results indicate that PLMs show a certain amount of robustness against unrelated information and preference of shallow linguistic cues, such as word position and syntactic structure, but a lack of interpreting information by meaning. Such findings shed light on how to interact with PLMs in a neutral approach for both finetuning and evaluation.

Citations (1)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.