Random Forest-of-Thoughts (RFoT) Overview
- Random Forest-of-Thoughts (RFoT) is a prompting-based ensemble methodology that generates multi-level candidate thoughts and applies Shapley value scoring to improve uncertainty-aware analysis.
- It uses iterative chain-of-thought generation and bootstrap sampling to construct a forest of randomized thought-trees, capturing diverse reasoning paths in complex survey contexts.
- Experimental evaluations show that RFoT outperforms standard CoT and ToT approaches, achieving significant gains in accuracy, weighted-F1, and robustness in computational social science tasks.
Random Forest-of-Thoughts (RFoT) is a prompting-based ensemble reasoning methodology developed for LLMs to address uncertainty-aware reasoning within computational social science, particularly complex survey analysis with combinatorial branching and respondent-specific decision paths. RFoT overcomes inherent limitations in standard Chain-of-Thought (CoT) and Tree-of-Thoughts (ToT) prompting by generating diverse candidate intermediate thoughts, selecting among them with model-agnostic Shapley values, and constructing a forest of random thought-trees via bootstrap sampling, thereby extending both exploration and predictive reliability in large, uncertain reasoning spaces (Wu et al., 26 Feb 2025).
1. Motivation and Limitations of Prior Approaches
Conventional CoT prompting compels LLMs to produce a single, left-to-right chain of reasoning without recourse to alternative intermediate steps once “commitment” is made. This regime fails to adequately address tasks where the true “thought space” is highly branching, often determined by respondent-dependent “skip logic” and context-sensitive sub-questionnaires pervasive in social-survey research. While ToT introduces branching, it typically generates only one tree per instance, commonly pruned greedily (e.g., via highest-value branches), and does not sufficiently model the variability in survey paths resulting from latent respondent heterogeneity. These deficiencies limit the coverage of plausible reasoning paths and reduce robustness in the analysis of multi-turn, interdependent social survey data (Wu et al., 26 Feb 2025).
2. Formal Structure of RFoT
Given a dataset of multi-turn question–answer pairs, RFoT formalizes the candidate thought set for each and utilizes a predictive head mapping thought-augmented inputs to target mental-state labels .
Thought Space Generation via Iterative CoT
Each input is decomposed into multi-level intermediate steps—specifically:
- Aspect level: ,
- Keyword level: ,
- Response level: .
These multi-level, theory-informed “thoughts” comprise the candidate pool for downstream consideration.
Shapley Value Scoring
Informative selection among candidate thoughts is achieved via Shapley value:
where is the candidate set and is the predictive score of for subset . The set of top- thoughts are those maximizing the total Shapley value.
Bootstrap Sampling and Forest Construction
RFoT builds randomized thought-trees by:
- Bootstrap sampling from with replacement (emulating respondent variability).
- Sampling thoughts from (probability proportional to their Shapley value).
- Root selection and tree growth via DFS, splitting on thoughts yielding maximal information gain—tree depth is limited (practically set to 4).
Aggregation
Predictions from each tree yield outputs ; the final prediction is the average across all trees:
3. Algorithmic Workflow
Algorithm RFoT-DFS executes the above methodology, sequentially constructing randomized trees from iterative thought-generation (ICoT), Shapley-based scoring, and DFS-based tree growth per bootstrap-sampled data split. A practical instantiation uses forests, with top- thoughts and tree depth limit of 4 (Wu et al., 26 Feb 2025).
| Step | Description |
|---|---|
| Thought Generation (ICoT) | Decompose each QA pair into multi-level thoughts |
| Shapley Scoring | Compute marginal utility for each candidate thought |
| Top- Thought Selection | Retain most informative thoughts |
| Bootstrap + Forest Growth | Build trees via data and thought sampling; grow via DFS |
| Ensemble Aggregation | Average predictions over all trees |
4. Experimental Evaluation
RFoT was evaluated on two representative social survey analysis tasks:
- Chinese General Social Survey (CGSS): 3,300 respondents, 124-question turns, 5-point happiness scale.
- European Social Survey (ESS): 15,000 respondents, 102-question turns, 5-point scale.
Comparisons included zero-shot I/O prompting, fine-tuned LoRA, CoT, Self-Consistency CoT (SC-CoT), and ToT. Metrics included Success Rate (class exact match), Weighted-F1, Consistency (), and per-sample runtime. All methods used Llama3-8B; 100 random test samples were averaged per setting (Wu et al., 26 Feb 2025).
5. Quantitative Results
RFoT demonstrated substantial gains in reasoning accuracy and robustness across all datasets and metrics.
| Method | Success (%) | Weighted-F1 (%) | Time (s) | Consistency (%) |
|---|---|---|---|---|
| I/O Prompt | 22.2 | 22.1 | 3.4 | 100 |
| Fine-Tuning (LoRA) | 41.4 | 41.1 | 0.3 | 100 |
| CoT | 52.9 | 55.2 | 3.1 | 98 |
| SC-CoT | 64.0 | 69.9 | 13.0 | 100 |
| ToT | 66.6 | 68.0 | 12.1 | 100 |
| RFoT | 78.4 | 80.5 | 15.5 | 100 |
On CGSS, RFoT achieved 78.4% success and 80.5% weighted-F1, outperforming ToT (66.6% / 68.0%). On ESS, RFoT gave 68.6%/68.5% compared to 62.1%/60.0% for ToT. Similar 5–15 point gains were observed when using Qwen2.5-7B. This improvement is attributed to RFoT's capacity to navigate combinatorially richer reasoning spaces and to aggregate over uncertainty inherent in subquestionnaire branching (Wu et al., 26 Feb 2025).
6. Component Analysis
Ablations showed that:
- Increasing the number of trees () from 1 to 10 yielded monotonic improvements (success rate increased from ~66% to ~78% on CGSS), with diminishing returns above .
- Replacing Shapley-based sampling with uniform thought selection reduced weighted-F1 by 6 points, confirming the necessity of explanation-based feature attribution.
- Increasing tree depth from 4 to 5 produced only minor gains (<1%) but significantly increased runtime, suggesting a practical depth limit of 4.
- Deterministic single-tree (ToT) approaches underperformed ensembles: substituting RFoT’s randomized forest construction with a single beam-searched tree led to an 8-point decrease in success rate.
7. Significance and Implications
RFoT operationalizes a principled, uncertainty-aware ensemble reasoning framework for LLMs that is particularly well-suited to computational social science tasks characterized by combinatorial branching and respondent-dependent logical paths. By generating and evaluating theory-informed candidate thoughts at multiple abstraction levels, scoring their marginal predictive value, and constructing a forest of thought-trees via randomized sampling, RFoT produces outputs that are more accurate and robust than traditional CoT, SC-CoT, and ToT prompting paradigms. A plausible implication is that similar random-forest–style ensemble metareasoning approaches may be fruitfully adapted to other domains with rich intermediate variable structure and model-uncertainty requirements (Wu et al., 26 Feb 2025).