Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

uTeBC-NLP at SemEval-2024 Task 9: Can LLMs be Lateral Thinkers? (2404.02474v1)

Published 3 Apr 2024 in cs.CL, cs.AI, cs.IR, and cs.LG

Abstract: Inspired by human cognition, Jiang et al.(2023c) create a benchmark for assessing LLMs' lateral thinking-thinking outside the box. Building upon this benchmark, we investigate how different prompting methods enhance LLMs' performance on this task to reveal their inherent power for outside-the-box thinking ability. Through participating in SemEval-2024, task 9, Sentence Puzzle sub-task, we explore prompt engineering methods: chain of thoughts (CoT) and direct prompting, enhancing with informative descriptions, and employing contextualizing prompts using a retrieval augmented generation (RAG) pipeline. Our experiments involve three LLMs including GPT-3.5, GPT-4, and Zephyr-7B-beta. We generate a dataset of thinking paths between riddles and options using GPT-4, validated by humans for quality. Findings indicate that compressed informative prompts enhance performance. Dynamic in-context learning enhances model performance significantly. Furthermore, fine-tuning Zephyr on our dataset enhances performance across other commonsense datasets, underscoring the value of innovative thinking.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (23)
  1. LM-CPPF: Paraphrasing-guided data augmentation for contrastive prompt-based few-shot fine-tuning. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 670–681, Toronto, Canada. Association for Computational Linguistics.
  2. Language models are few-shot learners. In Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc.
  3. Active prompting with chain-of-thought for large language models. arXiv preprint arXiv:2302.12246.
  4. Promptbreeder: Self-referential self-improvement via prompt evolution. arXiv preprint arXiv:2309.16797.
  5. Large language models are reasoning teachers. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 14852–14882, Toronto, Canada. Association for Computational Linguistics.
  6. Mistral 7b. arXiv preprint arXiv:2310.06825.
  7. LLMLingua: Compressing prompts for accelerated inference of large language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 13358–13376, Singapore. Association for Computational Linguistics.
  8. Semeval-2024 task 9: Brainteaser: A novel task defying common sense. In Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), pages 1996–2010, Mexico City, Mexico. Association for Computational Linguistics.
  9. BRAINTEASER: Lateral thinking puzzles for large language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 14317–14332, Singapore. Association for Computational Linguistics.
  10. Large language models are zero-shot reasoners. In Advances in Neural Information Processing Systems, volume 35, pages 22199–22213. Curran Associates, Inc.
  11. Retrieval-augmented generation for knowledge-intensive nlp tasks. In Advances in Neural Information Processing Systems, volume 33, pages 9459–9474. Curran Associates, Inc.
  12. Zackary Rackauckas. 2024. Rag-fusion: a new take on retrieval-augmented generation. arXiv preprint arXiv:2402.03367.
  13. Direct preference optimization: Your language model is secretly a reward model. Advances in Neural Information Processing Systems, 36.
  14. CommonsenseQA: A question answering challenge targeting commonsense knowledge. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4149–4158, Minneapolis, Minnesota. Association for Computational Linguistics.
  15. Zephyr: Direct distillation of lm alignment. arXiv preprint arXiv:2310.16944.
  16. Shlomo Waks. 1997. Lateral thinking and technology education. Journal of Science Education and Technology, 6:245–255.
  17. Albert Webson and Ellie Pavlick. 2022. Do prompt-based models really understand the meaning of their prompts? In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2300–2344, Seattle, United States. Association for Computational Linguistics.
  18. Chain-of-thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems, volume 35, pages 24824–24837. Curran Associates, Inc.
  19. C-pack: Packaged resources to advance general chinese embedding.
  20. Large language models as analogical reasoners. In The Twelfth International Conference on Learning Representations.
  21. SWAG: A large-scale adversarial dataset for grounded commonsense inference. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 93–104, Brussels, Belgium. Association for Computational Linguistics.
  22. Automatic chain of thought prompting in large language models. arXiv preprint arXiv:2210.03493.
  23. Meta-cot: Generalizable chain-of-thought prompting in mixed-task scenarios with large language models. arXiv preprint arXiv:2310.06692.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Pouya Sadeghi (6 papers)
  2. Amirhossein Abaskohi (14 papers)
  3. Yadollah Yaghoobzadeh (34 papers)
Citations (1)

Summary

  • The paper demonstrates that enhanced prompting and targeted fine-tuning significantly boost LLMs’ lateral thinking and commonsense reasoning abilities.
  • It introduces novel Chain-of-Thought and Retrieval-Augmented Generation methods to dynamically select context, optimizing creative problem-solving.
  • Fine-tuning on lateral thinking datasets yields transferable improvements on standard commonsense tasks, suggesting broad practical applications.

Enhancing LLMs' Lateral Thinking Through Prompt Engineering and Fine-tuning

Introduction

The emerging field of natural language processing has witnessed considerable advancements with the inception of LLMs, propelling the development of models capable of understanding and generating human-like text. A paper has shifted focus towards evaluating LLMs' lateral thinking abilities, a cognitive process that diverges from traditional vertical thinking by emphasizing creativity and problem-solving from novel perspectives. Through a series of experiments, this research explores various prompting methods and fine-tuning techniques aimed at bolstering LLMs’ lateral thinking, employing models such as GPT-3.5, GPT-4, and Zephyr-7B-β\beta.

Methodological Framework

Dataset Construction and Utilization

At the heart of this paper lies the BrainTeaser dataset, designed specifically to challenge and assess the lateral thinking capabilities of LLMs. The dataset constitutes a collection of puzzles that require beyond-the-norm reasoning, a deviation from standard logical reasoning benchmarks. The authors meticulously crafted a dataset focusing on sentence puzzles, with both question and answer sets requiring models to engage in creative and unconventional thought processes. Additional datasets, SWAG and CommonsenseQA, were integrated to evaluate the generalization of fine-tuned models on conventional commonsense reasoning tasks.

Prompting Strategies and CoT Paradigms

The paper introduces innovative prompting strategies to navigate the potential of LLMs in showcasing lateral thinking. A salient feature is the application of Chain of Thoughts (CoT) prompting, bifurcated into internal and external methodologies, aiming to guide models through intricate reasoning pathways. The concept of enhancing prompts with detailed task descriptions and employing a compressed version to encapsulate essential information was also explored, underscoring the need for brevity and clarity within prompts to improve model performance.

Retrieval-Augmented Generation for Dynamic In-context Learning

A notable methodology employed is the Retrieval-Augmented Generation (RAG) pipeline, facilitating dynamic in-context learning through the provision of examples tailored to the question’s context. This approach sought to overcome the limitations of static few-shot examples, leveraging the RAG mechanism to dynamically select and incorporate relevant samples into the prompting process. The investigation encompassed various RAG configurations, including ordinary, ranked, and fusion methods, to ascertain the most effective strategy for enhancing model performance in lateral thinking tasks.

Experimental Insights and Outcomes

Evaluation of Prompting Methods

Experimental results highlighted discernible impacts conferred by different prompting strategies on the models' ability to engage in lateral thinking. The findings suggest a preference for simple and internally guided CoT over more complex arrangements, indicating the significance of prompt length and information density in influencing performance.

In-context Learning Enhancements

The dynamic selection of context through the RAG pipeline demonstrated notable improvements in model performance, emphasizing the utility of customizing examples based on the question’s content. Interestingly, the omission of explicit explanations within prompts and relying on the model’s innate capability to infer relationships underscored the potential for LLMs to independently navigate through lateral reasoning tasks.

Fine-tuning for Lateral Thinking

A pivotal aspect of the research involved fine-tuning Zephyr-7B-β\beta on a dataset curated for lateral thinking, to ascertain its influence on general commonsense reasoning capabilities. The experiment revealed that models fine-tuned with a lateral thinking dataset exhibited improvements in performance across other commonsense datasets, suggesting a beneficial transfer of lateral reasoning abilities.

Theoretical and Practical Implications

The paper’s exploration into the lateral thinking capabilities of LLMs, through strategic prompting and fine-tuning, introduces a nuanced perspective on enhancing model performance beyond traditional reasoning tasks. Theoretically, it challenges existing paradigms by integrating creative reasoning into the repertoire of LLMs, suggesting a blend of vertical and lateral thinking for comprehensive language understanding. Practically, the approaches and methodologies articulated offer actionable insights for leveraging LLMs in applications requiring innovative problem-solving and creativity.

Future Directions in AI

Looking forward, the research posits intriguing prospects for integrating lateral thinking more cohesively into the development and training of LLMs. Further exploration into prompt engineering, coupled with advanced fine-tuning techniques, holds the potential to unlock new dimensions of cognitive capabilities in AI, paving the way for models that more profoundly mirror the intricacies of human thought processes.

In summation, this paper represents a significant stride towards endowing LLMs with the capability to think outside the box, underscoring the intricate balance between creativity and logic essential for the next generation of AI systems.

X Twitter Logo Streamline Icon: https://streamlinehq.com