Papers
Topics
Authors
Recent
Search
2000 character limit reached

iREL at SemEval-2024 Task 9: Improving Conventional Prompting Methods for Brain Teasers

Published 25 May 2024 in cs.CL | (2405.16129v1)

Abstract: This paper describes our approach for SemEval-2024 Task 9: BRAINTEASER: A Novel Task Defying Common Sense. The BRAINTEASER task comprises multiple-choice Question Answering designed to evaluate the models' lateral thinking capabilities. It consists of Sentence Puzzle and Word Puzzle subtasks that require models to defy default common-sense associations and exhibit unconventional thinking. We propose a unique strategy to improve the performance of pre-trained LLMs, notably the Gemini 1.0 Pro Model, in both subtasks. We employ static and dynamic few-shot prompting techniques and introduce a model-generated reasoning strategy that utilizes the LLM's reasoning capabilities to improve performance. Our approach demonstrated significant improvements, showing that it performed better than the baseline models by a considerable margin but fell short of performing as well as the human annotators, thus highlighting the efficacy of the proposed strategies.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (17)
  1. Gpt-4 technical report. arXiv preprint arXiv:2303.08774.
  2. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  3. Bert: Pre-training of deep bidirectional transformers for language understanding.
  4. Sandra Dingli. 2008. Thinking outside the box: Edward de bono’s lateral thinking. In The Routledge companion to creativity, pages 338–350. Routledge.
  5. Measuring massive multitask language understanding.
  6. Lateval: An interactive llms evaluation benchmark with incomplete information from lateral thinking puzzles.
  7. Semeval-2024 task 9: Brainteaser: A novel task defying common sense. In Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), pages 1996–2010, Mexico City, Mexico. Association for Computational Linguistics.
  8. BRAINTEASER: Lateral thinking puzzles for large language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 14317–14332, Singapore. Association for Computational Linguistics.
  9. Brainteaser: Lateral thinking puzzles for large language models.
  10. Knowledge-driven data construction for zero-shot evaluation in commonsense question answering. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 13507–13515.
  11. Multiple Choice Question Answering in the Legal Domain Using Reinforced Co-occurrence, pages 138–148.
  12. Divide and conquer for large language models reasoning.
  13. Can generalist foundation models outcompete special-purpose tuning? case study in medicine.
  14. Towards expert-level medical question answering with large language models.
  15. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805.
  16. Shlomo Waks. 1997. Lateral thinking and technology education. Journal of Science Education and Technology, 6:245–255.
  17. Olagpt: Empowering llms with human-like problem-solving abilities.
Citations (1)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.