iREL at SemEval-2024 Task 9: Improving Conventional Prompting Methods for Brain Teasers (2405.16129v1)
Abstract: This paper describes our approach for SemEval-2024 Task 9: BRAINTEASER: A Novel Task Defying Common Sense. The BRAINTEASER task comprises multiple-choice Question Answering designed to evaluate the models' lateral thinking capabilities. It consists of Sentence Puzzle and Word Puzzle subtasks that require models to defy default common-sense associations and exhibit unconventional thinking. We propose a unique strategy to improve the performance of pre-trained LLMs, notably the Gemini 1.0 Pro Model, in both subtasks. We employ static and dynamic few-shot prompting techniques and introduce a model-generated reasoning strategy that utilizes the LLM's reasoning capabilities to improve performance. Our approach demonstrated significant improvements, showing that it performed better than the baseline models by a considerable margin but fell short of performing as well as the human annotators, thus highlighting the efficacy of the proposed strategies.
- Gpt-4 technical report. arXiv preprint arXiv:2303.08774.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
- Bert: Pre-training of deep bidirectional transformers for language understanding.
- Sandra Dingli. 2008. Thinking outside the box: Edward de bono’s lateral thinking. In The Routledge companion to creativity, pages 338–350. Routledge.
- Measuring massive multitask language understanding.
- Lateval: An interactive llms evaluation benchmark with incomplete information from lateral thinking puzzles.
- Semeval-2024 task 9: Brainteaser: A novel task defying common sense. In Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), pages 1996–2010, Mexico City, Mexico. Association for Computational Linguistics.
- BRAINTEASER: Lateral thinking puzzles for large language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 14317–14332, Singapore. Association for Computational Linguistics.
- Brainteaser: Lateral thinking puzzles for large language models.
- Knowledge-driven data construction for zero-shot evaluation in commonsense question answering. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 13507–13515.
- Multiple Choice Question Answering in the Legal Domain Using Reinforced Co-occurrence, pages 138–148.
- Divide and conquer for large language models reasoning.
- Can generalist foundation models outcompete special-purpose tuning? case study in medicine.
- Towards expert-level medical question answering with large language models.
- Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805.
- Shlomo Waks. 1997. Lateral thinking and technology education. Journal of Science Education and Technology, 6:245–255.
- Olagpt: Empowering llms with human-like problem-solving abilities.