LLM-Driven Synthetic Social Networks

Updated 5 December 2025

LLM-driven synthetic social networks are computational frameworks where language model-powered agents simulate user behavior, opinion evolution, and social interaction.
They utilize empirical data for agent initialization, personality embedding, and prompt-driven updates to replicate realistic opinion dynamics and network formation.
Emergent properties such as community clustering, preferential attachment, and polarization are observed, highlighting both scalability and challenges in fine-grained behavioral realism.

LLM-driven synthetic social networks are computational environments in which simulated agents—each parameterized and operated via a LLM—generate, consume, and transmit natural language content, initiate and dissolve social ties, and evolve opinions or behaviors over time. These platforms are poised to transform social simulation by enabling high-fidelity modeling of agent cognition, interaction, decision-making, and emergent social phenomena. LLM-driven agents can be calibrated to empirical user data and executed at scale; yet the realism and heterogeneity of the synthetic networks they produce remain active research frontiers.

1. Agent Initialization, Profile Calibration, and Prompting

LLM-driven synthetic agents are instantiated by mapping empirical user distributions and behavioral features onto model inputs. In Composta et al. (Composta et al., 23 Sep 2025), agent initialization draws directly from the ITA-ELECTION-22 Twitter dataset, matching platform-level age (18–60), gender, and per-user activity via quantile-normalized formulas: $\text{activity}_x = \min\left[ \frac{\log(1 + n_{\text{posts}_x})}{\log(1 + N_{99.5})}, 1 \right]$ where $N_{99.5}$ is the 99.5th percentile post count. Political orientation (among four Italian coalitions) and with-topic opinion stances (numeric in $[-1, +1]$ plus justification) are sampled from empirical retweets and platform user analysis.

Personality embedding uses the “Big Five” traits in prompts. Role-play templates for Llama 2-70B/3.2-3B integrate demographics, personality, party principles, topic definitions, and prior opinions, while action-specific few-shot prompts (posting, following, opinion updates) enforce character and format constraints.

Personality-driven architectures such as that of Choudhury et al. (Rende et al., 13 Jul 2025) explicitly assign each agent a vector of Big Five facets (adjectival pairs: easygoing vs. easily-angered, assertive vs. passive, etc.), systematically varying “positive” vs “negative” distributions to study their macro-social effects.

Emergent behavioral attributes—such as task-level drives (social interaction, information seeking, etc.) and relationship memories—are embedded in the agent profile in persona-rich frameworks (Schneider et al., 22 Oct 2025).

2. Opinion Dynamics and Update Mechanisms

Opinion formation in LLM-driven social networks typically fuses classical equations with prompt-based, semantically grounded reasoning. Composta et al. (Composta et al., 23 Sep 2025) augment the traditional Friedkin–Johnsen model,

$o_i^k(t+1) = \lambda_i \sum_{j\in N(i)} w_{ij} o_j^k(t) + (1-\lambda_i) o_i^k(0)$

with end-of-day LLM-based "Opinion Update," supplying agents a social-summary prompt and requiring categorical stance output (e.g., STRONGLY SUPPORTIVE, SUPPORTIVE, etc.) plus textual rationale. LLMs display "step-like" (categorical) shifts versus the smooth convergence of mathematical models.

Echo chamber and polarization simulations replace weights $w_{ij}$ with LLM-inferred compatibility or influence functions, evaluating recent post histories $C_i, C_j$ for simulated peers, and updating belief states $O_i(t)$ via DeGroot-style or backfire/confirmation-biased dynamics (Gu et al., 25 Feb 2025, Donkers et al., 3 Feb 2025). These prompt-driven mechanisms allow context-sensitive, topic-specific opinion shifts driven by observed conversational evidence.

3. Network Topology: Formation, Coevolution, and Structural Metrics

LLM-driven networks are constructed either from scratch by sequential or iterative persona-based prompting (Chang et al., 2024, Papachristou et al., 2024), or by calibration to empirical social graphs with LLM-generated agent-level rewiring (Shirani et al., 30 Oct 2025, Composta et al., 23 Sep 2025). “Local” methods, where the LLM is asked to form links for each persona considering only the pool of potential alters and their attributes, yield networks that closely match real-world distributions for density, clustering, largest connected component size, modularity, and degree distribution. Preferential attachment, triadic closure, and homophily emerge spontaneously in these frameworks (Papachristou et al., 2024). Assortativity coefficients and modularity $Q$ for party, gender, and other attributes show LLM networks overestimate political homophily versus empirical ones ( $H_{\text{politics}} \approx 1.85$ vs $1.40$ for Twitter) (Chang et al., 2024, Composta et al., 23 Sep 2025).

Tie evolution—i.e., follower/unfollower dynamics—uses LLM-driven decision functions that combine activity level, personality, content alignment, and coalition priors (Composta et al., 23 Sep 2025). Edge formation probabilities can be probed for context sensitivity to attributes, content similarity, or observed history, enabling deep analysis of emergent mesoscale structures (homophily, community formation, echo chambers).

A summary of core structural metrics found in LLM-driven networks is organized below:

Metric	Typical LLM-driven Value	Empirical Reference Value
Degree Distribution	Right-skewed/power law	Twitter: power law, few influencers
Clustering Coefficient	~0.1–0.2	Political Twitter subs: 0.1–0.2
Modularity $Q$	0.3–0.5	Twitter: 0.3–0.6
Assortativity (party)	0.3–0.5 (LLM), $H>1$	0.2–0.4 (empirical)

4. Content Generation, Heterogeneity, and Behavioral Fidelity

LLMs generate coherent, on-topic, and persona-consistent language outputs. However, simulated posts in these networks display reduced variance in tone and toxicity compared to real conversations (Composta et al., 23 Sep 2025). Toxicity levels and their dispersion (mean toxicity, $T_{95}$ ) are systematically lower and less heterogeneous, and per-coalition or in-group/out-group toxicity gaps are diminished.

Salient deficiencies include muted emotional expression, over-clean language, and narrow stylistic diversity relative to wild-type postings or highly active human users (e.g., near-absence of abusive or expletive content, reduced use of mentions/URLs, overuse of hashtags) (Ng et al., 1 Aug 2025). Personality and emotional models based on explicit "emotional state" (e.g., PAD/OCC) or Big Five facets support richer, albeit computationally expensive, behavioral simulations (Rende et al., 13 Jul 2025, Koley, 14 May 2025). Theoretical bounds on personality drift have been established in temporally extended systems (Koley, 14 May 2025).

5. Experimental Setups, Evaluation, and Sensitivity

Studies simulate networks with tens to hundreds of agents over periods of 2–30 days, commonly repeating simulations for each configuration (LLM type, network initialization, recommender system). Metrics include in-group/out-group interaction fidelity (e.g., Pearson $\rho$ with real reply rates), modularity, clustering, engagement rates, and content-level comparison with real-world data (Composta et al., 23 Sep 2025, Rende et al., 13 Jul 2025). Sensitivity analysis reveals that shifting LLM model size, network seeding, or recommender algorithms yields only modest changes in outcomes, indicating that increased heterogeneity at the cognitive or personality level is required for realism (Composta et al., 23 Sep 2025).

Reward-based frameworks in which agents optimize compositional objectives (interaction, information seeking, self-presentation, coordination, emotional support) have emerged as a method to enforce alignment with empirically observed online motivations (Schneider et al., 22 Oct 2025). In-context learning with a "coaching" signal accelerates behavioral adaptation, augments social tie formation, and supports the emergence of robust, realistic mesoscale network properties (density $\rho$ 0.1–0.3, clustering $\kappa$ 0.1–0.4) (Schneider et al., 22 Oct 2025).

6. Comparison to Traditional and Hybrid Simulation Frameworks

Traditional ABMs employ rule-based or equation-driven update rules (e.g. DeGroot, Friedkin–Johnsen) and lack semantic natural language reasoning. LLM-based frameworks, by contrast, yield richer agent cognition and contextually grounded decisions, but at dramatically higher computational cost and with challenges in scaling (Composta et al., 23 Sep 2025, Koley, 14 May 2025, Li et al., 18 Oct 2025). Hybrid models integrate LLM-powered "core users" with diffusion models (hypergraph-based neural encoders), achieving both high accuracy in cascade prediction and tractable scaling to thousands of agents (Li et al., 18 Oct 2025).

At longer horizons, attention-based memory and hierarchical prompting architectures have enabled successful modeling of stable personalities, sublinear memory growth, and high behavioral fidelity (micro/meso/macro correlations $r>0.85$ to real world benchmarks) (Koley, 14 May 2025). Systems such as S³ (Gao et al., 2023) formally tie agent emotion/attitude/action transitions to statistical profiles extracted from real-world data, resulting in plausible emergent attitudes, emotion waves, and diffusion dynamics.

7. Limitations, Biases, and Future Directions

LLM-driven synthetic social networks consistently underrepresent the heterogeneity, emotion, toxicity, and fine-grained dynamism of real online communities. Current agents lack long-term explicit memory, nuanced trust metrics, and the event-driven cognition (e.g. trust decay, explicit social reinforcement) that drive human network evolution (Composta et al., 23 Sep 2025). Simulations are typically short-term; unfollowing, memory decay, or large-scale echo-chamber emergence are incompletely captured.

Notably, LLMs exhibit systematic overestimation of political homophily (party alignment dominating all other dimensions) (Chang et al., 2024) and susceptibility to echo-chamber formation, presumably reflecting biases in pretraining corpora. Simple prompt engineering is insufficient to correct this; future work is likely to focus on more varied and calibrated personality/interest assignment, explicit memory pools, simulation of exogenous shocks, and longer simulation horizons. Hybridizing LLM agents with diffusion or macro-level ABMs is proposed as a path to combining fidelity with scalability (Li et al., 18 Oct 2025).

Expanded model personalization, richer emotional and trust frameworks, and more sophisticated recommendation and network formation modules are central to closing the realism gap. Evaluation best practices include offline validation against empirical datasets, micro/meso/macro behavioral fit, and longitudinal stress-testing under intervention scenarios (Composta et al., 23 Sep 2025, Schneider et al., 22 Oct 2025, Koley, 14 May 2025).

LLM-driven synthetic social networks offer powerful, flexible laboratories for the investigation of online phenomena, bridging ABM, language modeling, opinion dynamics, and causal inference. Their empirical realism, scalability, and capacity for capturing true human heterogeneity will depend critically on advances not only in LLM architectures but also in agent initialization, cognitive modeling, memory systems, and methodological rigor (Composta et al., 23 Sep 2025, Rende et al., 13 Jul 2025, Schneider et al., 22 Oct 2025, Chang et al., 2024, Li et al., 18 Oct 2025).