Dice Question Streamline Icon: https://streamlinehq.com

Datasets and probing methods for Aboutness in LLMs

Develop culturally grounded datasets and robust black-box or white-box probing methods to evaluate the Aboutness dimension of linguistic–culture interaction in large language models, where Aboutness refers to culturally contingent topic prioritization and domain relevance; specifically, devise concrete procedures and benchmarks that operationalize Aboutness for assessing LLM behavior across different cultural contexts.

Information Square Streamline Icon: https://streamlinehq.com

Background

The survey organizes how culture is studied in LLMs via proxies and interaction axes, including Aboutness, Common Ground, and Objectives and Values. While prior work has largely focused on values and norms, the Aboutness axis—concerning which topics and domains are prioritized or deemed relevant within different cultures—has not been explicitly addressed by current benchmarks or methods.

The authors note that Aboutness is both important and subtle, and that creating explicit datasets for it may be difficult. They emphasize that, beyond identifying its neglect in the literature, it is unclear how to construct datasets and probing methodologies that would reliably measure Aboutness in LLMs across diverse cultural contexts.

References

Similarly, Aboutness remains completely unexplored and it is unclear even how to create datasets and methods for probing LLMs for Aboutness.

Towards Measuring and Modeling "Culture" in LLMs: A Survey (2403.15412 - Adilazuarda et al., 5 Mar 2024) in Limited Exploration, Section 5 (Gaps and Recommendations)