FANToM: Multi-Domain Frameworks and Benchmarks

Updated 10 February 2026

FANToM is a collection of independently developed frameworks and benchmarks spanning distributed consensus, theory of mind evaluations, causal discovery in time series, neural topic modeling, and cosmological scalar field theory.
It employs varied methodologies including leaderless DAG consensus, Bayesian regime-switching models, VAE-driven topic alignment, and few-shot sentiment classification to address domain-specific challenges.
Empirical findings highlight sub-second ledger confirmation, limitations in LLM theory of mind, improved topic purity, and robust detection in sentiment analysis, guiding future research.

FANToM refers to multiple, independently developed frameworks and benchmarks spanning consensus protocols, theory of mind evaluation, neural topic modeling, causal discovery in time series, and even cosmological scalar field theory. Each instance is domain-specific and characterized by distinct core ideas but shares a technical, research-driven motivation. This article surveys the major frameworks and benchmarks named FANToM, detailing their motivations, formalisms, empirical properties, and positions in their respective research landscapes.

1. Leaderless DAG-based Distributed Ledger: Fantom and the Lachesis Protocol

The Fantom platform is an asynchronous, leaderless Byzantine Fault Tolerant (BFT) framework for distributed ledgers, built on the Lachesis protocol. Unlike classical blockchains, Fantom employs a directed acyclic graph (DAG) called the OPERA chain rather than a linear chain of blocks. Each node originatessigned events referencing both its own prior event and the tops of $k-1$ peers, constructed through peer-to-peer gossip. All known events (including transactions and references) are maintained in the local DAG view.

Consensus is achieved through the following hierarchy:

Roots: An event becomes a root when it references enough unique roots from the previous frame (more than $2n/3$ in a network of $n$ nodes).
Clotho: A root is promoted to Clotho when enough subsequent roots "know" it by transitive gossip.
Atropos: A Clotho is finalized as Atropos upon reaching a further threshold of confirmations; its timestamp determines the main chain order.

Formal properties are established through flag tables, dominance relationships, and consistent partial orders. The protocol leverages Lamport timestamps and per-event metadata to guarantee a total order. Deterministic consensus and finality are achieved in $O(1)$ gossip work per event plus $O(n^2)$ root-set computations. In practice, Fantom demonstrates sub-second confirmation latency, high throughput (thousands of tps in controlled settings), and energy efficiency. Proof-of-Stake (in Fantom Opera) is integrated by assigning validating power proportional to token holdings, impacting peer selection and quorum thresholds.

Consensus safety and liveness are formally proven for $f < n/3$ Byzantine nodes, with fork detection, root-set majority, and domination chains enforcing exclusivity. The DAG architecture allows for modular pruning (epochs) and dynamic participation, supporting full validator and observer nodes, with EVM compatibility for smart contracts. Fantom's Lachesis contrasts with classical PoW or PBFT systems by combining leaderlessness, permissionless node creation, fully asynchronous operation, and deterministic finality in a succinct DAG-centric framework (Choi et al., 2018, Nguyen et al., 2021, Devarajan et al., 2023).

2. FANToM: Benchmarks for Machine Theory of Mind

FANToM also denotes a suite of evaluations and datasets for measuring the theory of mind (ToM) capabilities of LLMs. The core FANToM benchmark provides multi-party, text-only conversational scenarios in which agents enter, exit, and share or miss information, creating dynamic information asymmetry and necessitating belief-tracking.

The primary task suite probes models with multiple question formats:

FactQ: Basic recall of facts—tests omniscient vs. agent-limited knowledge.
BeliefQ: Both free-response and multiple-choice queries about what a given agent believes.
AnswerabilityQ and InfoAccessQ: List-type and binary questions assessing which agents could obtain certain facts.

Dataset construction employs InstructGPT/MTurk pipelines for dialogue generation, information control, and validation. Metrics include accuracy for binary/multichoice items, token-level F1 (SQuAD-style), and a "consistency" ToM score requiring unified correctness across all six concurrent ToM probes for a scenario. Human baselines achieve ≈ 90%; state-of-the-art LLMs score at most ≈ 27% in challenging settings, with surface-form errors and illusory ToM (success on individual items but inconsistency across formats) predominating. Fine-tuning on partial datasets improves some metrics but not cross-format consistency (Kim et al., 2023, Jung et al., 2024, Xu et al., 5 Mar 2025).

Extensions include:

Percept-FANToM: Enriches the benchmark with explicit utterance-by-agent perception annotations, enabling separate evaluation of perception inference and perception-to-belief inference. LLMs reliably infer perception ( $>0.9$ accuracy) but struggle to convert it into coherent false-belief reasoning.
Reinforcement Learning with Verifiable Rewards (RLVR): Training on binary false-belief tasks with reward signals yields in-distribution accuracy $>90\%$ but fails to generalize to list or format-variant queries, revealing a vulnerability to overfitting and reward hacking (Sarangi et al., 21 Jul 2025).
Neuro-symbolic ToM (EnigmaToM): Integrates neural knowledge bases for entity state tracking, iteratively masking context to simulate perspective, and shows order-wise accuracy improvements.

In summary, the FANToM ToM suite diagnoses the limitations of current LLMs in belief attribution, resistance to distractors, and perspective consistency, providing a fertile ground for evaluating emerging reasoning architectures.

3. FANToM for Causal Discovery in Regime-Switching Time Series

FANTOM (Flow-based Non-stationary Temporal regime causal discovery) is a Bayesian framework for causal structure learning in multivariate time series with nonstationarity and regime shifts. The method assumes that observed data $x_1,\dots,x_T\in\mathbb{R}^d$ are segmented into $K$ (unknown) temporal regimes, each governed by its own parent structure and statistical mechanisms, including non-Gaussian and fully heteroscedastic noise.

Key model components:

Latent regime sequence $2n/3$0, with softmax-parameterized temporal prior ($2n/3$1) enforcing persistence and smoothing regime transitions.
Joint causal model: At time $2n/3$2 in regime $2n/3$3, $2n/3$4, where $2n/3$5 and $2n/3$6 are differentiable, and $2n/3$7 introduces heteroscedasticity.
Learning: The evidence lower bound is optimized via Bayesian EM, employing conditional normalizing flows for non-Gaussian, heteroscedastic noise modeling, layers of variational inference for DAG structure and parameter priors, and alternating updates for regime segmentation.

The framework is theoretically identifiable under Markov, minimality, and persistence assumptions, recovering regimes, change points, and per-regime DAGs. FANTOM achieves empirical improvements on both synthetic and real-world, non-stationary time series (Rahmani et al., 20 Jun 2025).

4. FANToM: Aligning Neural Topic Models with Labels and Authors

FANToM (Framework for Aligning Neural Topic Models) is a modular mechanism for aligning variational autoencoded neural topic models (NTMs) with document-level label and authorship metadata. VAE-based topic models suffer from label-topic misalignment, topic redundancy, and poor interpretability when labels are ignored.

FANToM introduces a per-document, label-informed Dirichlet prior and an author-reconstruction decoder within the NTM objective:

$2n/3$8

where $2n/3$9 is document-topic proportions, $n$ 0 is the Dirichlet prior parameterized by the label assignment and topic-label map, $n$ 1 is the multi-hot author vector, and $n$ 2 are the parameters of the author decoder. This approach constrains the posterior latent space to reflect label and author structure, resulting in improved topic purity, normalized mutual information (NMI), topic quality (the product of coherence and diversity), and enhanced recovery of author-topic relations. The method generalizes across corpora and for both supervised and semi-supervised regimes (Nagda et al., 2024).

5. FANToM Scalar Field Theory in Relativistic Plasma Cosmology

In the cosmological context, "FANToM" refers not to an acronym but to a "phantom" scalar field—i.e., a real scalar field $n$ 3 with an inverted (negative) mass term:

$n$ 4

Here, $n$ 5 yields the phantom (tachyonic) mass. Coupled to relativistic plasma as in the macroscopic kinetic theory, the scalar mediates energy exchange and modifies the Einstein and Boltzmann equations, resulting in the following coupled cosmological system in spatially flat FRW spacetimes:

$n$ 6

where $n$ 7 is the plasma scalar charge density and $n$ 8 (the field's equation of state parameter) dips below $n$ 9 for suitable choices of $O(1)$ 0, enabling "phantom crossing" and late-time cosmic acceleration. Stability and observational implications depend on the detailed coupling structure and the avoidance of ghost-like instabilities (Ignatyev et al., 2014).

6. FANToM in Cryptocurrency Sentiment Analysis

A further domain-specific application is the use of GPT-4o for few-shot classification of sentiment and predictive behaviors in social media discussions regarding the Fantom cryptocurrency (Tash et al., 2024). The method employs a classification scheme comprising Predictive Incremental, Predictive Decremental, Predictive Neutral, and Non-Predictive categories, as well as hope/regret detection. The analysis of 1,000 Fantom tweets finds:

Non-predictive comments dominate (94.4%), with modest optimism (PI: 3.6%) outpacing negative outlooks (PD: 1.9%).
Hopeful sentiment is rare (3% realistic, 15.8% unrealistic), with regret nearly absent.
Inter-annotator agreement with GPT-4o predictions is substantial for prediction ( $O(1)$ 1), moderate for hope ( $O(1)$ 2) and regret ( $O(1)$ 3).
The Fantom social-discord landscape is characterized as "low-noise," "cautious," and resistant to speculative hype.

This empirical profile provides quantitative context for market sentiment analysis and informs algorithmic trading and future market strategy development.

References: (Choi et al., 2018, Nguyen et al., 2021, Devarajan et al., 2023, Kim et al., 2023, Jung et al., 2024, Xu et al., 5 Mar 2025, Nagda et al., 2024, Ignatyev et al., 2014, Rahmani et al., 20 Jun 2025, Tash et al., 2024, Sarangi et al., 21 Jul 2025)