Benchmarking Cognitive Domains for LLMs: Insights from Taiwanese Hakka Culture (2409.01556v2)
Abstract: This study introduces a comprehensive benchmark designed to evaluate the performance of LLMs in understanding and processing cultural knowledge, with a specific focus on Hakka culture as a case study. Leveraging Bloom's Taxonomy, the study develops a multi-dimensional framework that systematically assesses LLMs across six cognitive domains: Remembering, Understanding, Applying, Analyzing, Evaluating, and Creating. This benchmark extends beyond traditional single-dimensional evaluations by providing a deeper analysis of LLMs' abilities to handle culturally specific content, ranging from basic recall of facts to higher-order cognitive tasks such as creative synthesis. Additionally, the study integrates Retrieval-Augmented Generation (RAG) technology to address the challenges of minority cultural knowledge representation in LLMs, demonstrating how RAG enhances the models' performance by dynamically incorporating relevant external information. The results highlight the effectiveness of RAG in improving accuracy across all cognitive domains, particularly in tasks requiring precise retrieval and application of cultural knowledge. However, the findings also reveal the limitations of RAG in creative tasks, underscoring the need for further optimization. This benchmark provides a robust tool for evaluating and comparing LLMs in culturally diverse contexts, offering valuable insights for future research and development in AI-driven cultural knowledge preservation and dissemination.
- “Ai hallucinations: a misnomer worth clarifying,” in 2024 IEEE Conference on Artificial Intelligence (CAI), Singapore, 2024.
- “Mime: Minority inclusion for majority group enhancement of ai performance,” in European conference on computer vision, 2022.
- B. Bloom, Handbook I: Cognitive domain, David McKay Company, 1956.
- E. J. Furst, “Bloom’s taxonomy of educational objectives for the cognitive domain: Philosophical and educational issues,” Review of educational research, vol. 51, no. 4, pp. 441–453, 1981.
- G. M. Seddon, “The properties of bloom’s taxonomy of educational objectives for the cognitive domain,” Review of educational research, vol. 48, no. 2, pp. 303–323, 1978.
- “Retrieval-augmented generation for knowledge-intensive nlp tasks,” Advances in neural information processing systems, vol. 33, pp. 9459–9474, 2020.
- H. Amin and M. S. Mirza, “Comparative study of knowledge and use of bloom’s digital taxonomy by teachers and students in virtual and conventional universities,” Asian Association of Open Universities Journal, vol. 15, no. 2, pp. 223–238, 2020.
- “Leveraging large language models for learning complex legal concepts through storytelling,” arXiv preprint arXiv:2402.17019, 2024.
- “Question generation based on performance analysis,” International Research Journal on Advanced Engineering and Management (IRJAEM), vol. 2, no. 08, pp. 2758–2762, 2024.
- A. Spanos, “Bloomgpt: Using chatgpt as learning assistant in relation to bloom’s taxonomy of educational objectives,” in Conference Proceedings. The Future of Education 2024, 2024.
- “Re-task: Revisiting llm tasks from capability, skill, and knowledge perspectives,” arXiv preprint arXiv:2408.06904, 2024.
- “Retrieval-augmented generation for large language models: A survey,” arXiv preprint arXiv:2312.10997, 2023.
- W. Yu, “Retrieval-augmented generation across heterogeneous knowledge,” in Proceedings of the 2022 conference of the North American chapter of the association for computational linguistics: human language technologies: student research workshop, 2022.
- “Measuring massive multitask language understanding,” arXiv preprint arXiv:2009.03300, 2020.
- “Systematic error analysis of the stanford question answering dataset,” in Proceedings of the Workshop on Machine Reading for Question Answering, 2018.
- “Don’t give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization,” arXiv preprint arXiv:1808.08745, 2018.
- J. Sarkis, “Benchmarking for agility,” Benchmarking: An International Journal, vol. 8, no. 2, pp. 88–107, 2001.
- S. Talluri and J. Sarkis, “A computational geometry approach for benchmarking,” International Journal of Operations & Production Management, vol. 21, no. 1/2, pp. 210–223, 2001.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.