Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning (2405.10385v2)

Published 16 May 2024 in cs.CL, cs.AI, cs.IR, and cs.LG

Abstract: The SemEval 2024 BRAINTEASER task represents a pioneering venture in NLP by focusing on lateral thinking, a dimension of cognitive reasoning that is often overlooked in traditional linguistic analyses. This challenge comprises of Sentence Puzzle and Word Puzzle subtasks and aims to test LLMs' capacity for divergent thinking. In this paper, we present our approach to the BRAINTEASER task. We employ a holistic strategy by leveraging cutting-edge pre-trained models in multiple choice architecture, and diversify the training data with Sentence and Word Puzzle datasets. To gain further improvement, we fine-tuned the model with synthetic humor or jokes dataset and the RiddleSense dataset which helped augmenting the model's lateral thinking abilities. Empirical results show that our approach achieve 92.5% accuracy in Sentence Puzzle subtask and 80.2% accuracy in Word Puzzle subtask.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (17)
  1. Codah: An adversarially authored question-answer dataset for common sense. arXiv preprint arXiv:1904.04365.
  2. Bert: Pre-training of deep bidirectional transformers for language understanding.
  3. Debertav3: Improving deberta using electra-style pre-training with gradient-disentangled embedding sharing.
  4. Cosmos qa: Machine reading comprehension with contextual commonsense reasoning. arXiv preprint arXiv:1909.00277.
  5. Cskg: The commonsense knowledge graph. In The Semantic Web: 18th International Conference, ESWC 2021, Virtual Event, June 6–10, 2021, Proceedings 18, pages 680–696. Springer.
  6. Semeval-2024 task 9: Brainteaser: A novel task defying common sense. In Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), pages 1996–2010, Mexico City, Mexico. Association for Computational Linguistics.
  7. BRAINTEASER: Lateral thinking puzzles for large language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 14317–14332, Singapore. Association for Computational Linguistics.
  8. Riddlesense: Reasoning about riddle questions featuring linguistic creativity and commonsense knowledge. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics (ACL-IJCNLP 2021): Findings. To appear.
  9. Gpt-4 technical report.
  10. Scaling language models: Methods, analysis & insights from training gopher. arXiv preprint arXiv:2112.11446.
  11. Conceptnet 5.5: An open multilingual graph of general knowledge.
  12. Inductive learning on commonsense knowledge graph completion. In 2021 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE.
  13. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
  14. Huggingface’s transformers: State-of-the-art natural language processing.
  15. Olagpt: Empowering llms with human-like problem-solving abilities. arXiv preprint arXiv:2305.16334.
  16. Swag: A large-scale adversarial dataset for grounded commonsense inference. arXiv preprint arXiv:1808.05326.
  17. Conditional prompt learning for vision-language models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16816–16825.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Mina Ghashami (11 papers)
  2. Soumya Smruti Mishra (3 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com