uTeBC-NLP at SemEval-2024 Task 9: Can LLMs be Lateral Thinkers? (2404.02474v1)

Published 3 Apr 2024 in cs.CL, cs.AI, cs.IR, and cs.LG

Abstract: Inspired by human cognition, Jiang et al.(2023c) create a benchmark for assessing LLMs' lateral thinking-thinking outside the box. Building upon this benchmark, we investigate how different prompting methods enhance LLMs' performance on this task to reveal their inherent power for outside-the-box thinking ability. Through participating in SemEval-2024, task 9, Sentence Puzzle sub-task, we explore prompt engineering methods: chain of thoughts (CoT) and direct prompting, enhancing with informative descriptions, and employing contextualizing prompts using a retrieval augmented generation (RAG) pipeline. Our experiments involve three LLMs including GPT-3.5, GPT-4, and Zephyr-7B-beta. We generate a dataset of thinking paths between riddles and options using GPT-4, validated by humans for quality. Findings indicate that compressed informative prompts enhance performance. Dynamic in-context learning enhances model performance significantly. Furthermore, fine-tuning Zephyr on our dataset enhances performance across other commonsense datasets, underscoring the value of innovative thinking.

References (23)

Authors (3)

Pouya Sadeghi (6 papers)
Amirhossein Abaskohi (14 papers)
Yadollah Yaghoobzadeh (34 papers)

Citations (1)

View on Semantic Scholar

Summary

The paper demonstrates that enhanced prompting and targeted fine-tuning significantly boost LLMs’ lateral thinking and commonsense reasoning abilities.
It introduces novel Chain-of-Thought and Retrieval-Augmented Generation methods to dynamically select context, optimizing creative problem-solving.
Fine-tuning on lateral thinking datasets yields transferable improvements on standard commonsense tasks, suggesting broad practical applications.

Enhancing LLMs' Lateral Thinking Through Prompt Engineering and Fine-tuning

Introduction

The emerging field of natural language processing has witnessed considerable advancements with the inception of LLMs, propelling the development of models capable of understanding and generating human-like text. A paper has shifted focus towards evaluating LLMs' lateral thinking abilities, a cognitive process that diverges from traditional vertical thinking by emphasizing creativity and problem-solving from novel perspectives. Through a series of experiments, this research explores various prompting methods and fine-tuning techniques aimed at bolstering LLMs’ lateral thinking, employing models such as GPT-3.5, GPT-4, and Zephyr-7B- $\beta$ .

Methodological Framework

Dataset Construction and Utilization

At the heart of this paper lies the BrainTeaser dataset, designed specifically to challenge and assess the lateral thinking capabilities of LLMs. The dataset constitutes a collection of puzzles that require beyond-the-norm reasoning, a deviation from standard logical reasoning benchmarks. The authors meticulously crafted a dataset focusing on sentence puzzles, with both question and answer sets requiring models to engage in creative and unconventional thought processes. Additional datasets, SWAG and CommonsenseQA, were integrated to evaluate the generalization of fine-tuned models on conventional commonsense reasoning tasks.

Prompting Strategies and CoT Paradigms

The paper introduces innovative prompting strategies to navigate the potential of LLMs in showcasing lateral thinking. A salient feature is the application of Chain of Thoughts (CoT) prompting, bifurcated into internal and external methodologies, aiming to guide models through intricate reasoning pathways. The concept of enhancing prompts with detailed task descriptions and employing a compressed version to encapsulate essential information was also explored, underscoring the need for brevity and clarity within prompts to improve model performance.

Retrieval-Augmented Generation for Dynamic In-context Learning

A notable methodology employed is the Retrieval-Augmented Generation (RAG) pipeline, facilitating dynamic in-context learning through the provision of examples tailored to the question’s context. This approach sought to overcome the limitations of static few-shot examples, leveraging the RAG mechanism to dynamically select and incorporate relevant samples into the prompting process. The investigation encompassed various RAG configurations, including ordinary, ranked, and fusion methods, to ascertain the most effective strategy for enhancing model performance in lateral thinking tasks.

Experimental Insights and Outcomes

Evaluation of Prompting Methods

Experimental results highlighted discernible impacts conferred by different prompting strategies on the models' ability to engage in lateral thinking. The findings suggest a preference for simple and internally guided CoT over more complex arrangements, indicating the significance of prompt length and information density in influencing performance.

In-context Learning Enhancements

The dynamic selection of context through the RAG pipeline demonstrated notable improvements in model performance, emphasizing the utility of customizing examples based on the question’s content. Interestingly, the omission of explicit explanations within prompts and relying on the model’s innate capability to infer relationships underscored the potential for LLMs to independently navigate through lateral reasoning tasks.

Fine-tuning for Lateral Thinking

A pivotal aspect of the research involved fine-tuning Zephyr-7B- $\beta$ on a dataset curated for lateral thinking, to ascertain its influence on general commonsense reasoning capabilities. The experiment revealed that models fine-tuned with a lateral thinking dataset exhibited improvements in performance across other commonsense datasets, suggesting a beneficial transfer of lateral reasoning abilities.

Theoretical and Practical Implications

The paper’s exploration into the lateral thinking capabilities of LLMs, through strategic prompting and fine-tuning, introduces a nuanced perspective on enhancing model performance beyond traditional reasoning tasks. Theoretically, it challenges existing paradigms by integrating creative reasoning into the repertoire of LLMs, suggesting a blend of vertical and lateral thinking for comprehensive language understanding. Practically, the approaches and methodologies articulated offer actionable insights for leveraging LLMs in applications requiring innovative problem-solving and creativity.

Future Directions in AI

Looking forward, the research posits intriguing prospects for integrating lateral thinking more cohesively into the development and training of LLMs. Further exploration into prompt engineering, coupled with advanced fine-tuning techniques, holds the potential to unlock new dimensions of cognitive capabilities in AI, paving the way for models that more profoundly mirror the intricacies of human thought processes.

In summation, this paper represents a significant stride towards endowing LLMs with the capability to think outside the box, underscoring the intricate balance between creativity and logic essential for the next generation of AI systems.

PDF Markdown

Related Papers

Tweets

https://twitter.com/Ipouyall/status/1778150405626589633