How AI Aggregation Affects Knowledge

Published 6 Apr 2026 in econ.TH, cs.AI, cs.CY, and cs.SI | (2604.04906v1)

Abstract: AI changes social learning when aggregated outputs become training data for future predictions. To study this, we extend the DeGroot model by introducing an AI aggregator that trains on population beliefs and feeds synthesized signals back to agents. We define the learning gap as the deviation of long-run beliefs from the efficient benchmark, allowing us to capture how AI aggregation affects learning. Our main result identifies a threshold in the speed of updating: when the aggregator updates too quickly, there is no positive-measure set of training weights that robustly improves learning across a broad class of environments, whereas such weights exist when updating is sufficiently slow. We then compare global and local architectures. Local aggregators trained on proximate or topic-specific data robustly improve learning in all environments. Consequently, replacing specialized local aggregators with a single global aggregator worsens learning in at least one dimension of the state.

Abstract PDF Upgrade to Chat

Authors (4)

Summary

The paper introduces a formal framework showing how AI aggregation alters consensus dynamics via a modified DeGroot model.
It uses analytical and numerical methods, including perturbation theory on Markov chains, to quantify the learning gap under various network conditions.
The results highlight tradeoffs where rapid AI updates can lead to model collapse, while local aggregators improve fairness and robustness in knowledge synthesis.

Overview

"How AI Aggregation Affects Knowledge" (2604.04906) examines the effects of AI-mediated aggregation on the dynamics and efficiency of social learning in networked populations. The authors model the introduction of generative AI systems as central information aggregators within the classical DeGroot framework, formalizing the recursive feedback interactions between population beliefs and AI-generated outputs that are incorporated into subsequent rounds of training. The paper rigorously analyzes the conditions under which AI aggregation can improve or degrade collective learning, quantifies the resulting inefficiencies, and characterizes how group structure, data representation, and aggregation architecture interact to determine epistemic outcomes.

Technical Framework

The central setting is an extension of the DeGroot model: a directed graph encodes the influence structure among agents, while a single AI aggregator is introduced as a node that forms a weighted average of agents’ beliefs and injects this output back into the network. The aggregator's training weights reflect data composition—overrepresentation or underrepresentation of groups, content visibility, and platform-level decisions. AI adoption by agents is parameterized by reliance weights, governing the strength with which each agent incorporates the aggregator’s output relative to peer beliefs.

The dynamics are then described by an augmented Markov process, where population beliefs and the aggregator’s outputs co-evolve through recursive updates. The learning gap is defined as the deviation of long-run consensus beliefs from an efficient benchmark (the posterior mean under frictionless signal aggregation), providing a quantitative measure of mislearning induced by network structure and AI-mediated feedback.

The authors derive a closed-form representation for the limiting consensus, exploiting perturbation theory for Markov chains to express the effect of AI aggregation as a low-rank modification of baseline DeGroot dynamics. This result generalizes to arbitrary networks and enables precise tracking of how influence weights, feedback speed, and data imbalance alter final beliefs.

Effects of Aggregation Speed and Robustness

A key finding is the existence of a critical threshold in the speed of AI updating. When the aggregator retrains and refreshes its output too rapidly (i.e., low memory parameter), the feedback loop between population beliefs and AI-generated training data amplifies endogenous distortions: beliefs shaped by the network structure are recycled into new training data, leading to recursive reinforcement. In this regime, the scope for robust improvement collapses—there does not exist any positive-measure set of training weights that can reliably improve learning across a broad class of plausible environments. This mathematically formalizes the "model collapse" phenomenon, where excessive feedback between AI-generated outputs and training leads to degeneration in learning quality, even in large populations and with abundant data.

Conversely, slow updating (high memory parameter) allows the aggregator to anchor its outputs in a smoother history, dampening feedback and enlarging the set of beneficial training weights. This tradeoff between feedback speed and robustness demonstrates that design choices in AI aggregation must carefully balance responsiveness against recursive bias amplification.

Group Structure, Fairness, and Data Imbalance

The interplay between network segregation and AI training regime is examined in stylized two-island models—majority and minority groups with homophilic connections. Majority-weighted training exacerbates inefficiency: learning gap increases monotonically as segregation rises, with the aggregator amplifying within-group reinforcement and underweighting minority signals. Minority-weighted training yields non-monotone effects, initially counteracting majority dominance at moderate segregation but risking overcorrection and learning deterioration at higher segregation, due to reduced cross-group information propagation.

Crucially, the presence of AI aggregation induces endogenous shifts in epistemic influence: even absent explicit discrimination, the training process reallocates whose information drives consensus, tightly coupling robust aggregation and fairness. Unlike error-based fairness notions, here unfairness stems from structural reweighting and recursive feedback.

Local vs Global Aggregators: Architectural Implications

The paper extends the model to environments with multiple topics and localized aggregators. Specialized aggregators are trained on topic-relevant, community-specific data. Agents rely more heavily on their local aggregator (reflecting domain expertise and informational advantage). This architecture compartmentalizes feedback—errors and distortions remain local, enhancing robustness and preserving informational diversity.

Strongly, local aggregators uniformly improve learning relative to both decentralized social learning and global aggregation, for all topics. In contrast, replacing specialized local aggregators with a single global architecture worsens learning in at least one dimension, as global aggregation cannot simultaneously match group-specific informational advantages. The impossibility result is rooted in conflicting objectives: optimizing for topic 1 requires emphasizing group 1, while optimizing for topic 2 requires emphasizing group 2, which a shared global design cannot do without tradeoffs.

Numerical and Analytical Results

The paper provides explicit formulas for consensus beliefs and learning gaps under a range of network structures, group sizes, training weights, and aggregation speeds. The threshold phenomenon is substantiated both analytically and by strong inequalities—when segregation and rapid feedback align, robust improvement vanishes. Sensitivity analysis reveals parameter regions where beneficial aggregation is possible and quantifies the fragility of designs when network parameters are uncertain. The closed-form computations allow rigorous evaluation of different architectures, training regimes, and adoption patterns.

Implications and Future Directions

The findings delineate precise conditions under which AI aggregation can either facilitate or undermine effective social learning, grounding concerns about recursive bias and model collapse in formal structure. Practically, the results imply that platform and model designers must carefully calibrate the speed and scope of aggregation, avoid indiscriminate global pooling, and strive for localized and topic-specific architectures that anchor feedback in grounded, diverse information sources. Tradeoffs between robustness, fairness, and centralization are intrinsic—no single global aggregator can robustly improve learning or guarantee distributional neutrality.

Theoretically, the work suggests avenues for generalizations: multiple aggregators, endogenized adoption and reliance weights, hybrid architectures, and dynamic adaptation to changing network structure. Experimental investigation of the predicted effects, as well as integration into empirical studies of AI-mediated knowledge generation, constitute promising next steps.

Conclusion

"How AI Aggregation Affects Knowledge" rigorously characterizes the feedback-driven dynamics introduced by AI aggregation in social learning networks. Centralized global aggregation, particularly with rapid updating, is shown to be fragile, susceptible to feedback-based amplification of endogenous network distortions and group imbalance, and cannot robustly deliver improvement across heterogeneous environments. Local and topic-specific architectures provide structurally superior outcomes, compartmentalizing feedback and preserving diversity. The implications for AI system design, social information mediation, and epistemic fairness are profound, inviting deeper theoretical and empirical exploration of aggregation architectures and feedback regimes in the age of generative AI.

Markdown Report Issue