Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DaVinci at SemEval-2024 Task 9: Few-shot prompting GPT-3.5 for Unconventional Reasoning (2405.11559v1)

Published 19 May 2024 in cs.CL and cs.AI

Abstract: While significant work has been done in the field of NLP on vertical thinking, which involves primarily logical thinking, little work has been done towards lateral thinking, which involves looking at problems from an unconventional perspective and defying existing conceptions and notions. Towards this direction, SemEval 2024 introduces the task of BRAINTEASER, which involves two types of questions -- Sentence Puzzles and Word Puzzles that defy conventional common-sense reasoning and constraints. In this paper, we tackle both types of questions using few-shot prompting on GPT-3.5 and gain insights regarding the difference in the nature of the two types. Our prompting strategy placed us 26th on the leaderboard for the Sentence Puzzle and 15th on the Word Puzzle task.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (21)
  1. Evaluating chatgpt as a question answering system: A comprehensive analysis and comparison with existing models.
  2. Piqa: Reasoning about physical commonsense in natural language. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 7432–7439.
  3. Language models are few-shot learners.
  4. Scaling instruction-finetuned language models.
  5. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
  6. Semeval-2024 task 9: Brainteaser: A novel task defying common sense. In Proceedings of the 18th International Workshop on Semantic Evaluation. Association for Computational Linguistics.
  7. BRAINTEASER: Lateral thinking puzzles for large language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 14317–14332, Singapore. Association for Computational Linguistics.
  8. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9):1–35.
  9. Sengjie Liu and Christopher G. Healey. 2023. Abstractive summarization of large document collections using gpt.
  10. Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. arXiv preprint arXiv:2104.08786.
  11. Fairness-guided few-shot prompting for large language models. Advances in Neural Information Processing Systems, 36.
  12. OpenAI. 2022. Chatgpt: A language model by openai. https://www.openai.com.
  13. Leveraging large language models for multiple choice question answering. arXiv preprint arXiv:2210.12353.
  14. Multitask prompted training enables zero-shot task generalization.
  15. Social IQa: Commonsense reasoning about social interactions. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4463–4473, Hong Kong, China. Association for Computational Linguistics.
  16. Commonsenseqa: A question answering challenge targeting commonsense knowledge. arXiv preprint arXiv:1811.00937.
  17. Attention is all you need. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
  18. Shlomo Waks. 1997. Lateral thinking and technology education. Journal of Science Education and Technology, 6:245–255.
  19. Towards understanding chain-of-thought prompting: An empirical study of what matters. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2717–2739, Toronto, Canada. Association for Computational Linguistics.
  20. A comprehensive capability analysis of gpt-3 and gpt-3.5 series models.
  21. On large language models’ selection bias in multi-choice questions. arXiv preprint arXiv:2309.03882.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Suyash Vardhan Mathur (3 papers)
  2. Akshett Rai Jindal (2 papers)
  3. Manish Shrivastava (62 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets