Explicit–Implicit Bias Divergence in Smaller or Older LLMs
Determine whether the explicit–implicit divergence in bias expression across evaluation tasks manifests similarly in smaller-scale or older large language models.
References
Our model set, while spanning commercial and open-weight families, consists entirely of frontier-scale 2026 models. Whether the explicit-implicit divergence manifests similarly in smaller or older models remains an open question.
— Redirected, Not Removed: Task-Dependent Stereotyping Reveals the Limits of LLM Alignments
(2604.02669 - Kumar et al., 3 Apr 2026) in Discussion, Limitations