Random Forest-of-Thoughts (RFoT) Overview

Updated 21 December 2025

Random Forest-of-Thoughts (RFoT) is a prompting-based ensemble methodology that generates multi-level candidate thoughts and applies Shapley value scoring to improve uncertainty-aware analysis.
It uses iterative chain-of-thought generation and bootstrap sampling to construct a forest of randomized thought-trees, capturing diverse reasoning paths in complex survey contexts.
Experimental evaluations show that RFoT outperforms standard CoT and ToT approaches, achieving significant gains in accuracy, weighted-F1, and robustness in computational social science tasks.

Random Forest-of-Thoughts (RFoT) is a prompting-based ensemble reasoning methodology developed for LLMs to address uncertainty-aware reasoning within computational social science, particularly complex survey analysis with combinatorial branching and respondent-specific decision paths. RFoT overcomes inherent limitations in standard Chain-of-Thought (CoT) and Tree-of-Thoughts (ToT) prompting by generating diverse candidate intermediate thoughts, selecting among them with model-agnostic Shapley values, and constructing a forest of random thought-trees via bootstrap sampling, thereby extending both exploration and predictive reliability in large, uncertain reasoning spaces (Wu et al., 26 Feb 2025).

1. Motivation and Limitations of Prior Approaches

Conventional CoT prompting compels LLMs to produce a single, left-to-right chain of reasoning without recourse to alternative intermediate steps once “commitment” is made. This regime fails to adequately address tasks where the true “thought space” is highly branching, often determined by respondent-dependent “skip logic” and context-sensitive sub-questionnaires pervasive in social-survey research. While ToT introduces branching, it typically generates only one tree per instance, commonly pruned greedily (e.g., via highest-value branches), and does not sufficiently model the variability in survey paths resulting from latent respondent heterogeneity. These deficiencies limit the coverage of plausible reasoning paths and reduce robustness in the analysis of multi-turn, interdependent social survey data (Wu et al., 26 Feb 2025).

2. Formal Structure of RFoT

Given a dataset $\mathcal{D}=\{(q_i, a_i)\}_{i=1}^m$ of $m$ multi-turn question–answer pairs, RFoT formalizes the candidate thought set $\mathcal{T}'$ for each $(q_i, a_i)$ and utilizes a predictive head $f_\theta$ mapping thought-augmented inputs to target mental-state labels $y$ .

Thought Space Generation via Iterative CoT

Each input $(q_i, a_i)$ is decomposed into multi-level intermediate steps—specifically:

Aspect level: $\mathcal{T}'_{L_1} = A(q_i, a_i) = \mathcal{F}_{LLM}^{(L_1)}(q_i, a_i)$ ,
Keyword level: $\mathcal{T}'_{L_2} = K(q_i, a_i) = \mathcal{F}_{LLM}^{(L_2)}(q_i, a_i)$ ,
Response level: $\mathcal{T}'_{L_3} = R(q_i, a_i) = \mathcal{F}_{LLM}^{(L_3)}(q_i, a_i)$ .

These multi-level, theory-informed “thoughts” comprise the candidate pool for downstream consideration.

Shapley Value Scoring

Informative selection among candidate thoughts is achieved via Shapley value:

$m$ 0

where $m$ 1 is the candidate set and $m$ 2 is the predictive score of $m$ 3 for subset $m$ 4. The set of top- $m$ 5 thoughts $m$ 6 are those maximizing the total Shapley value.

Bootstrap Sampling and Forest Construction

RFoT builds $m$ 7 randomized thought-trees by:

Bootstrap sampling from $m$ 8 with replacement (emulating respondent variability).
Sampling thoughts from $m$ 9 (probability proportional to their Shapley value).
Root selection and tree growth via DFS, splitting on thoughts yielding maximal information gain—tree depth is limited (practically set to 4).

Aggregation

Predictions from each tree $\mathcal{T}'$ 0 yield outputs $\mathcal{T}'$ 1; the final prediction is the average across all $\mathcal{T}'$ 2 trees:

$\mathcal{T}'$ 3

3. Algorithmic Workflow

Algorithm RFoT-DFS executes the above methodology, sequentially constructing $\mathcal{T}'$ 4 randomized trees from iterative thought-generation (ICoT), Shapley-based scoring, and DFS-based tree growth per bootstrap-sampled data split. A practical instantiation uses $\mathcal{T}'$ 5 forests, with top- $\mathcal{T}'$ 6 thoughts and tree depth limit of 4 (Wu et al., 26 Feb 2025).

Step	Description
Thought Generation (ICoT)	Decompose each QA pair into multi-level thoughts
Shapley Scoring	Compute marginal utility for each candidate thought
Top- $\mathcal{T}'$ 7 Thought Selection	Retain most informative thoughts
Bootstrap + Forest Growth	Build $\mathcal{T}'$ 8 trees via data and thought sampling; grow via DFS
Ensemble Aggregation	Average predictions over all trees

4. Experimental Evaluation

RFoT was evaluated on two representative social survey analysis tasks:

Chinese General Social Survey (CGSS): $\mathcal{T}'$ 9 3,300 respondents, 124-question turns, 5-point happiness scale.
European Social Survey (ESS): $(q_i, a_i)$ 0 15,000 respondents, 102-question turns, 5-point scale.

Comparisons included zero-shot I/O prompting, fine-tuned LoRA, CoT, Self-Consistency CoT (SC-CoT), and ToT. Metrics included Success Rate (class exact match), Weighted-F1, Consistency ( $(q_i, a_i)$ 1), and per-sample runtime. All methods used Llama3-8B; 100 random test samples were averaged per setting (Wu et al., 26 Feb 2025).

5. Quantitative Results

RFoT demonstrated substantial gains in reasoning accuracy and robustness across all datasets and metrics.

Method	Success (%)	Weighted-F1 (%)	Time (s)	Consistency (%)
I/O Prompt	22.2	22.1	3.4	100
Fine-Tuning (LoRA)	41.4	41.1	0.3	100
CoT	52.9	55.2	3.1	98
SC-CoT	64.0	69.9	13.0	100
ToT	66.6	68.0	12.1	100
RFoT	78.4	80.5	15.5	100

On CGSS, RFoT achieved 78.4% success and 80.5% weighted-F1, outperforming ToT (66.6% / 68.0%). On ESS, RFoT gave 68.6%/68.5% compared to 62.1%/60.0% for ToT. Similar 5–15 point gains were observed when using Qwen2.5-7B. This improvement is attributed to RFoT's capacity to navigate combinatorially richer reasoning spaces and to aggregate over uncertainty inherent in subquestionnaire branching (Wu et al., 26 Feb 2025).

6. Component Analysis

Ablations showed that:

Increasing the number of trees ( $(q_i, a_i)$ 2) from 1 to 10 yielded monotonic improvements (success rate increased from ~66% to ~78% on CGSS), with diminishing returns above $(q_i, a_i)$ 3.
Replacing Shapley-based sampling with uniform thought selection reduced weighted-F1 by $(q_i, a_i)$ 46 points, confirming the necessity of explanation-based feature attribution.
Increasing tree depth from 4 to 5 produced only minor gains (<1%) but significantly increased runtime, suggesting a practical depth limit of 4.
Deterministic single-tree (ToT) approaches underperformed ensembles: substituting RFoT’s randomized forest construction with a single beam-searched tree led to an 8-point decrease in success rate.

7. Significance and Implications

RFoT operationalizes a principled, uncertainty-aware ensemble reasoning framework for LLMs that is particularly well-suited to computational social science tasks characterized by combinatorial branching and respondent-dependent logical paths. By generating and evaluating theory-informed candidate thoughts at multiple abstraction levels, scoring their marginal predictive value, and constructing a forest of thought-trees via randomized sampling, RFoT produces outputs that are more accurate and robust than traditional CoT, SC-CoT, and ToT prompting paradigms. A plausible implication is that similar random-forest–style ensemble metareasoning approaches may be fruitfully adapted to other domains with rich intermediate variable structure and model-uncertainty requirements (Wu et al., 26 Feb 2025).

Markdown Report Issue Upgrade to Chat

References (1)

Random Forest-of-Thoughts: Uncertainty-aware Reasoning for Computational Social Science (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Random Forest-of-Thoughts (RFoT).