Solving and Generating NPR Sunday Puzzles with Large Language Models (2306.12255v1)

Published 21 Jun 2023 in cs.CL

Abstract: We explore the ability of LLMs to solve and generate puzzles from the NPR Sunday Puzzle game show using PUZZLEQA, a dataset comprising 15 years of on-air puzzles. We evaluate four LLMs using PUZZLEQA, in both multiple choice and free response formats, and explore two prompt engineering techniques to improve free response performance: chain-of-thought reasoning and prompt summarization. We find that state-of-the-art LLMs can solve many PUZZLEQA puzzles: the best model, GPT-3.5, achieves 50.2% loose accuracy. However, in our few-shot puzzle generation experiment, we find no evidence that models can generate puzzles: GPT-3.5 generates puzzles with answers that do not conform to the generated rules. Puzzle generation remains a challenging task for future work.

PDF HTML Abstract

Summarize Bookmark Chat (Pro)

References (8)

Authors (2)

Jingmiao Zhao (2 papers)
Carolyn Jane Anderson (15 papers)

Citations (3)

View on Semantic Scholar

Solving and Generating NPR Sunday Puzzles with Large Language Models (2306.12255v1)

Related Papers