Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
91 tokens/sec
GPT-4o
12 tokens/sec
Gemini 2.5 Pro Pro
o3 Pro
5 tokens/sec
GPT-4.1 Pro
15 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
Gemini 2.5 Flash Deprecated
12 tokens/sec
2000 character limit reached

iREL at SemEval-2024 Task 9: Improving Conventional Prompting Methods for Brain Teasers (2405.16129v1)

Published 25 May 2024 in cs.CL

Abstract: This paper describes our approach for SemEval-2024 Task 9: BRAINTEASER: A Novel Task Defying Common Sense. The BRAINTEASER task comprises multiple-choice Question Answering designed to evaluate the models' lateral thinking capabilities. It consists of Sentence Puzzle and Word Puzzle subtasks that require models to defy default common-sense associations and exhibit unconventional thinking. We propose a unique strategy to improve the performance of pre-trained LLMs, notably the Gemini 1.0 Pro Model, in both subtasks. We employ static and dynamic few-shot prompting techniques and introduce a model-generated reasoning strategy that utilizes the LLM's reasoning capabilities to improve performance. Our approach demonstrated significant improvements, showing that it performed better than the baseline models by a considerable margin but fell short of performing as well as the human annotators, thus highlighting the efficacy of the proposed strategies.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (17)
  1. Gpt-4 technical report. arXiv preprint arXiv:2303.08774.
  2. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  3. Bert: Pre-training of deep bidirectional transformers for language understanding.
  4. Sandra Dingli. 2008. Thinking outside the box: Edward de bono’s lateral thinking. In The Routledge companion to creativity, pages 338–350. Routledge.
  5. Measuring massive multitask language understanding.
  6. Lateval: An interactive llms evaluation benchmark with incomplete information from lateral thinking puzzles.
  7. Semeval-2024 task 9: Brainteaser: A novel task defying common sense. In Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), pages 1996–2010, Mexico City, Mexico. Association for Computational Linguistics.
  8. BRAINTEASER: Lateral thinking puzzles for large language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 14317–14332, Singapore. Association for Computational Linguistics.
  9. Brainteaser: Lateral thinking puzzles for large language models.
  10. Knowledge-driven data construction for zero-shot evaluation in commonsense question answering. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 13507–13515.
  11. Multiple Choice Question Answering in the Legal Domain Using Reinforced Co-occurrence, pages 138–148.
  12. Divide and conquer for large language models reasoning.
  13. Can generalist foundation models outcompete special-purpose tuning? case study in medicine.
  14. Towards expert-level medical question answering with large language models.
  15. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805.
  16. Shlomo Waks. 1997. Lateral thinking and technology education. Journal of Science Education and Technology, 6:245–255.
  17. Olagpt: Empowering llms with human-like problem-solving abilities.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets