Thinking Tokens for Language Modeling (2405.08644v1)

Published 14 May 2024 in cs.CL and cs.AI

Abstract: How much is 56 times 37? LLMs often make mistakes in these types of difficult calculations. This is usually explained by their inability to perform complex reasoning. Since LLMs rely on large training sets and great memorization capability, naturally they are not equipped to run complex calculations. However, one can argue that humans also cannot perform this calculation immediately and require a considerable amount of time to construct the solution. In order to enhance the generalization capability of LLMs, and as a parallel to human behavior, we propose to use special 'thinking tokens' which allow the model to perform much more calculations whenever a complex problem is encountered.

Collections

Sign up for free to add this paper to one or more collections.

Sign Up

Summary

The paper demonstrates that inserting 'thinking tokens' reduces perplexity in complex tasks, enhancing language model reasoning.
It employs a methodology that dynamically allocates extra computation via $<T>$ tokens to tackle intricate numerical and linguistic queries.
Experimental results show improved accuracy in mathematical problems, paving the way for more self-adjusting, human-like language models.

Improving LLMs with 'Thinking Tokens'

Understanding the Problem

LLMs (LMs) have been advancing significantly but performing complex calculations, like multiplying two large numbers, still remains an Achilles heel. Think about when you're asked to solve 56 times 37 — it's not something most people can do instantly. LMs share a similar struggle; they can't always handle intricate reasoning or heavy calculations effectively.

Humans use thinking time to solve such calculations, and this idea is fundamental to the intriguing concept presented in this paper: employing 'thinking tokens' to enable LMs to process more complex problems more effectively.

The Core Concept: 'Thinking Tokens'

What exactly are these 'thinking tokens'? Imagine you're reading a complex paragraph, and every difficult word or concept you encounter allows you a few extra moments to think. In the LLMing world, these moments are represented by tokens, denoted as $<T>$ , inserted after each word in a sentence that requires deep reasoning. These tokens grant the model extra computational steps to think more thoroughly before reaching an answer.

Here's the core idea broken down:

'Thinking tokens' are inserted into sentences where complex calculations are needed.
Each token gives the LM additional computational time, essentially enabling it to run more in-depth calculations.
The goal is not just to improve accuracy in solving complex tasks but to allow the model to self-adjust the number of thinking tokens based on the problem's complexity.

Experiments and Results

The researchers tested the insertion of these thinking tokens in several datasets, primarily focusing on sentences requiring non-trivial reasoning, such as those from mathematical problems. The results were promising.

Notable Findings:

The introduction of 'thinking tokens' significantly reduced the perplexity (a measure of how well a model predicts a sample) in sentences requiring complex reasoning.
For instance, "What is the remainder when 8922293 is divided by 263?" saw an improvement in perplexity from 16.8 to 13.1 when 'thinking tokens' were used.

Here's a snapshot from the results showing the effectiveness:

Dataset	Sentence	Perplexity Original	Perplexity with $<T>$
maths	What is the remainder when 8922293 is divided by 263? 18	16.8	13.1
maths	Convert -464 (base 9) to base 6. -1434	24.3	19.8

The lower perplexity values imply that the model's ability to predict and understand the sentence improved with the use of thinking tokens.

Implications and Future Directions

This concept has both practical and theoretical implications. Practically, it can make LMs more competent in handling tasks requiring intricate reasoning, such as complex queries in natural language processing applications or aiding in theorem proving.

Theoretical implications:

Enhances our understanding of how to simulate human-like thinking in LMs.
Opens up avenues for developing more self-adjusting models that can decide strategically how much "thinking time" is needed for different tasks.

Speculation for the Future:

The next step could involve creating adaptive models that not only utilize thinking tokens but also determine the optimal number of tokens needed dynamically.
Further research could explore how thinking tokens impact other challenging tasks beyond mathematical reasoning, such as abstract problem-solving or understanding nuanced contexts in text.

Could thinking tokens be the next incremental step in making LMs more "thoughtful" and capable of handling tasks close to human cognition levels? Only time and further research will tell, but the preliminary results are certainly encouraging.

This wraps up the explanation of this novel approach to improving LLMs with 'thinking tokens'. It's a fascinating area of paper that bridges human cognitive behavior and machine learning, pushing the boundaries of how we can make LMs more efficient and intelligent.

PDF Markdown

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Related Papers

Authors (2)

Tweets

https://twitter.com/secemp9/status/1822900513433641343

https://twitter.com/TaNGSoFT/status/1874274434200334452

https://twitter.com/GptMaestro/status/1791772303128440874

https://twitter.com/gm8xx8/status/1790611050116649082

HackerNews

Thinking Tokens for Language Modeling (6 points, 1 comment)