Uncovering Implicit Bias in Large Language Models with Concept Learning Dataset (2510.01219v1)

Published 21 Sep 2025 in cs.CL and cs.AI

Abstract: We introduce a dataset of concept learning tasks that helps uncover implicit biases in LLMs. Using in-context concept learning experiments, we found that LLMs may have a bias toward upward monotonicity in quantifiers; such bias is less apparent when the model is tested by direct prompting without concept learning components. This demonstrates that in-context concept learning can be an effective way to discover hidden biases in LLMs.

Summary

The paper reveals that LLMs exhibit a significant upward monotonicity bias when assessed through in-context concept learning tasks.
It employs quantifiers like 'more than p' and 'less than p' in structured prompts to evaluate semantic biases.
The findings suggest that cognitive preferences in LLM training may favor simpler, upward monotone conceptual frameworks.

Uncovering Implicit Bias in LLMs with Concept Learning Dataset

Abstract

The paper introduces a dataset designed to investigate implicit biases prevalent in LLMs. The focus lies on examining biases related to upward monotonicity within quantifiers and revealing these biases through in-context concept learning tasks. The findings demonstrate a bias toward upward monotonicity when employing concept learning tasks, which is not as apparent when using direct prompt-based evaluation methods. Concept learning is shown to be crucial for identifying these hidden biases effectively.

Introduction

LLMs are critical components in various NLP systems, yet they harbor potential biases that pose challenges for deployment and trustworthiness. Traditional methods for detecting biases in LLMs can fail to uncover implicit biases that remain hidden within the models. To address this issue, the paper proposes a novel approach inspired by human concept learning theories. Utilizing in-context concept learning tasks, the paper investigates how LLMs respond to concepts with varying semantic monotonicity properties, aiming to detect biases related to upward and downward monotonicity in quantifiers.

Figure 1: In-context concept learning assists in identifying cognitive biases in monotonicity that are not easily detectable through standard evaluation methods.

Methodology

Concept Selection

The paper employs quantifiers "more than $p$ " and "less than $p$ " to create concept learning tasks that test upward and downward monotone semantics. The values used for $p$ include fractional representations from 1/2 to 1/6.

Prompt Generation

Each prompt is composed of 20 labeled examples, with equal divisions between positive and negative labels. These prompts follow a fixed template, varying linguistic items sampled from frequently occurring nouns in a training corpus. Models predict the classification of these prompts based on appended positive or negative responses.

Evaluation

Accuracy is determined by measuring the probability of models appending a "Yes" response compared to a "No" response. The models tested boast extensive evaluation with numerous prompt iterations, allowing comprehensive assessment of their conceptual biases toward monotonicity.

Results

The experiments reveal that certain LLMs, like OLMo-2 models, exhibit biases towards upward monotonicity within concept learning tasks, contrasting with performance in explicit semantic evaluation. This characteristic of bias offers insights into not only the nature of the training data but also the inherent cognitive biases underlying the models.

Discussion

The paper hypothesizes that downward monotone quantifiers may intrinsically require more complex logical manipulations due to their inherent negative operations. The observed biases suggest certain LLMs may reflect human-like preferences for simpler conceptual frameworks in learning tasks, echoing findings on human cognitive processes involving downward monotone complexities.

Conclusion

The paper presents a comprehensive methodology for revealing implicit biases through in-context concept learning tasks. It underscores the importance of examining monotonicity bias in LLMs, proposes hypotheses on cognitive complexity factors contributing to these biases, and encourages further explorations into the systematic detection of biases within artificial systems. Future research may extend the methodology to encompass a wider variety of concepts and models, fostering a deeper understanding of implicit biases in computational learning systems.