Inverse-Entropy Weighted Voting for LLMs

Updated 9 November 2025

Inverse-entropy weighted voting is a method that quantifies token-level uncertainty using average Shannon entropy to weight and aggregate LLM reasoning chains.
The aggregation rule assigns chains weights inversely proportional to their entropy, enabling statistically sound answer selection over standard majority voting.
Empirical evaluations demonstrate consistent accuracy gains across diverse models and benchmarks, with minimal computational overhead in both parallel and sequential settings.

Inverse-entropy weighted voting (IEW) is a training-free aggregation method for reasoning chains produced by LLMs that leverages token-level uncertainty to improve the selection of final answers. By assigning greater influence to chains with low internal entropy, IEW provides a statistically grounded mechanism for integrating multiple chain-of-thought (CoT) outputs, outperforming standard majority voting ("self-consistency") in both parallel and sequential test-time inference schemes. IEW operates without the need for additional model queries or tuning, and offers consistent empirical gains across diverse open source models and reasoning benchmarks.

1. Formal Definition of Chain Entropy

IEW quantifies uncertainty in each LLM reasoning chain using the average Shannon entropy over the chain’s token-level probability distributions. Specifically, for a set of $n$ generated chains $c_1, \dots, c_n$ , each chain $c_i$ generates a sequence of tokens, with per-token probability vectors $\ell_i = \{p_{t,1}, \dots, p_{t,V}\}_{t=1}^{|l_i|}$ , where $V$ is the vocabulary size considered (typically the top- $k$ tokens), and $|l_i|$ is the chain length. The average entropy for chain $i$ is

$H_i = -\frac{1}{|l_i|}\sum_{t=1}^{|l_i|} \sum_{j=1}^V p_{t,j} \log_2 p_{t,j}$

where $p_{t,j}$ is the normalized probability of token $j$ at position $t$ .

The entropy $H_i$ concretely measures the “spread” of the model’s belief at each step: low entropy implies peaked, confident distributions, while high entropy indicates uncertainty or ambiguity during the next-token prediction process.

2. Inverse-Entropy Weighted Aggregation Rule

IEW assigns each chain a weight inversely proportional to its entropy:

$w_i = \frac{1}{\max(H_i, \epsilon)}$

where $\epsilon = 10^{-10}$ is a small constant to avoid division by zero.

To derive a probability distribution over the chains' contributions, weights are normalized:

$\tilde w_i = \frac{w_i}{\sum_{j=1}^n w_j}$

Aggregated answer selection is performed by summing weights for chains producing each answer $y$ :

$S(y) = \sum_{i : y_i = y} \tilde w_i$

and returning

$\hat{y} = \arg\max_{y \in \mathcal{Y}} S(y)$

where $y_i$ is the answer produced by chain $i$ , and $\mathcal{Y}$ is the set of all possible answers.

These steps yield a confidence-weighted vote in which more self-consistent (low-entropy) reasoning trajectories have a larger impact on the ensemble prediction.

3. Comparison to Majority Voting: Algorithmic Protocol

IEW can be contrasted with unweighted majority voting (self-consistency) using the following protocols:

Step	Inverse-Entropy Weighted Voting (IEW)	Standard Majority Voting (Self-Consistency)
1	For each chain, extract token probabilities, compute entropy $H_i$ , and assign weight $w_i = 1/\max(H_i, \epsilon)$ .	For each chain, extract answer $y_i$ .
2	Normalize weights: $\tilde w_i = w_i / \sum_j w_j$ .	Count number of votes per answer $N(y)$ .
3	For each distinct answer $y$ , sum $\tilde w_i$ for $i$ with $y_i = y$ .	Select answer with most votes: $\arg\max_y N(y)$ .
4	Return $y$ maximizing the weighted sum $S(y)$ .	Return the majority answer.

IEW thus considers not only the frequency but also the confidence embedded in the token prediction process.

4. Theoretical Motivation

IEW's weighting rationale is rooted in several principles:

Confidence Quantification: A chain with low entropy is more likely to encode a coherent, certain chain of reasoning, as the model is assigning most probability mass to the next token at each step.
Noise Attenuation: High-entropy chains reflect uncertainty or indecision, so their down-weighting acts as a filter against error-prone or misguided reasoning sequences.
Information-Theoretic Justification: As Shannon entropy is a fundamental quantitative measure of uncertainty, its inverse serves as a natural proxy for confidence in probabilistic outputs.

This mechanism does not require additional model calls or training, and leverages intrinsic properties of the LLM’s output distributions.

5. Empirical Performance and Benchmarks

IEW exhibits consistent improvements over self-consistency across both parallel and sequential reasoning paradigms. On parallel self-consistency (6 chains), IEW achieved gains of 0.5–3.4% over majority voting across diverse open-source LLMs and benchmarks. In the sequential refinement regime, IEW was best in 29 out of 30 configurations (97%) evaluated. Maximum observed gains reach up to 6.7% on some settings.

Example Benchmark Results

Configuration	Majority	Entropy-Weighted
GPT-OSS-20B, AIME	50.0%	53.3%
GPT-OSS-120B, AIME	53.3%	56.7%
Qwen3-235B, GPQA	67.7%	68.2%
Kimi-K2, GPQA-Diamond (sequential)	73.7%	74.8%

These results indicate that entropy-based voting delivers consistent, if sometimes modest, improvements in test-time accuracy under matched compute budgets.

6. Computational Properties and Implementation Guidance

IEW incurs negligible additional computational cost compared to standard majority voting. The primary overhead is the calculation of $\sum_{t,j} p_{t,j} \log p_{t,j}$ for each chain, which is $O(n \times |l_i| \times k)$ , with $k$ denoting the number of top tokens considered for entropy estimation ( $k=5$ suffices for stable results). Since chain lengths are typically bounded and entropy is computed post-hoc from already available outputs, the extra wall-time cost is typically on the order of milliseconds—even in sequential settings where the overall latency is dominated by the chain generation itself.

IEW requires only basic arithmetic operations and does not necessitate any further model evaluation or gradient-based optimization.

7. Application Example and Practical Recommendations

A worked example demonstrates the voting mechanism. Consider three chains with entropies $H_1=1.20$ , $H_2=2.50$ , $H_3=0.80$ and corresponding answers A, B, and A:

Weights are assigned:

$w_1 = 1/1.20 \approx 0.833$
$w_2 = 1/2.50 = 0.400$
$w_3 = 1/0.80 = 1.250$

Normalizing, the weights become:

$\tilde w_1 \approx 0.336$
$\tilde w_2 \approx 0.161$
$\tilde w_3 \approx 0.503$

The total for answer A is $0.336 + 0.503 = 0.839$; for B, $0.161$. The aggregated answer is A, consistent with the majority vote, but IEW would allow a low-entropy but minority answer to overturn the result if warranted by confidence considerations.

Recommended best practices include using a value of at least $k=5$ for top_logprobs during generation, employing $\epsilon=10^{-10}$ to avoid division errors, and setting the chain count to six under matched compute constraints, as this has empirically proven effective. If a chain fails to return log-probabilities, defaulting to majority voting if fewer than two valid chains remain is advisable.

—

Inverse-entropy weighted voting augments CoT ensemble prediction by weighting answers according to internal model uncertainty, producing consistent performance improvements with negligible computational overhead and without additional model queries. Its theoretical foundation and empirical success suggest it is a robust aggregation mechanism for LLM reasoning outputs at test time, especially within sequential refinement paradigms.

PDF Markdown Chat (Pro)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Inverse-Entropy Weighted Voting.