Subgraph Frequencies: Mapping the Empirical and Extremal Geography of Large Graph Collections (1304.1548v2)

Published 4 Apr 2013 in cs.SI and physics.soc-ph

Abstract: A growing set of on-line applications are generating data that can be viewed as very large collections of small, dense social graphs -- these range from sets of social groups, events, or collaboration projects to the vast collection of graph neighborhoods in large social networks. A natural question is how to usefully define a domain-independent coordinate system for such a collection of graphs, so that the set of possible structures can be compactly represented and understood within a common space. In this work, we draw on the theory of graph homomorphisms to formulate and analyze such a representation, based on computing the frequencies of small induced subgraphs within each graph. We find that the space of subgraph frequencies is governed both by its combinatorial properties, based on extremal results that constrain all graphs, as well as by its empirical properties, manifested in the way that real social graphs appear to lie near a simple one-dimensional curve through this space. We develop flexible frameworks for studying each of these aspects. For capturing empirical properties, we characterize a simple stochastic generative model, a single-parameter extension of Erdos-Renyi random graphs, whose stationary distribution over subgraphs closely tracks the concentration of the real social graph families. For the extremal properties, we develop a tractable linear program for bounding the feasible space of subgraph frequencies by harnessing a toolkit of known extremal graph theory. Together, these two complementary frameworks shed light on a fundamental question pertaining to social graphs: what properties of social graphs are 'social' properties and what properties are 'graph' properties? We conclude with a brief demonstration of how the coordinate system we examine can also be used to perform classification tasks, distinguishing between social graphs of different origins.

Citations (174)

View on Semantic Scholar

Summary

The paper maps the empirical and extremal geography of subgraph frequencies in large graph collections, particularly social networks.
It uses empirical data from Facebook combined with theoretical extremal graph theory and a stochastic model incorporating triadic closure.
Subgraph frequency distributions serve as coordinate systems to classify distinct social graph structures, improving classification performance over basic metrics.

This paper explores the empirical and theoretical examination of small, dense subgraphs inherent in vast datasets generated by online social platforms such as Facebook. The central aim is to establish a domain-independent coordinate system that can succinctly represent and understand large collections of small social graphs. The approach is grounded in graph homomorphisms, focusing on the frequencies of small induced subgraphs, with an emphasis on mapping the empirical distributions and understanding the extremal bounds that govern these graphs.

Methodology and Key Findings

The authors explore the distribution of subgraph frequencies in two main frameworks: empirical characterization of actual social graphs and theoretical extremal graph theory predictions.

Empirical Framework:
- Leveraging Facebook data, the paper describes subgraphs within user neighborhoods, groups, and events.
- It introduces a simple one-dimensional model through a stochastic generative model that extends the Erdős–Rényi model with an additional parameter to capture triadic closure, aligning more closely with observed social graph data.
- This "Edge Formation Random Walk with Triadic Closure" stochastic model effectively explains the observed intensification of triangles in empirical social graphs, a result consistent with the sociological theory of triadic closure.
Theoretical Framework:
- The authors elaborate on extremal bounds employing techniques from graph theory, such as those postulated by Lovász and others, to delineate the feasible space of subgraph frequencies.
- They employ a linear programming approach to bound subgraph frequencies, establishing that most real social graphs fall within a concentrated band of the permissible space.
- Notably, the paper discusses the phenomenon where particular structures, such as certain triads, are far less frequently observed, both due to combinatorial constraints and sociological effects.
Classification Tasks:
- The coordinate system of subgraph frequencies is utilized to discern different types of social graph structures. By analyzing subgraph frequency distributions, the authors effectively classify social graphs into categories like Facebook's user neighborhoods and event groups, demonstrating consistent structural differentiations across categories.
- The incorporation of the frequency deviation from the modeled backbone (captured by the additional parameter in the stochastic model) enhanced classification accuracy, substantively surpassing basic graph metrics.

Implications and Future Developments

From both practical and theoretical standpoints, this paper lays out valuable insights into the structural intricacies of social graphs. Practical implications include improved understanding and classification of user group behavior on social media platforms, which can inform privacy settings and targeted content delivery. On a theoretical level, the work propels further inquiry into graph homomorphisms and extremal graph theory as foundational tools in network analysis.

The research highlights the potential for future exploration in several avenues: refining stochastic models to incorporate additional sociological principles, comprehensive application to directed graphs which include reciprocal and non-reciprocal relationships, and extending classification to multilayer networks such as those seen in cross-platform social interactions.

In summary, this paper furnishes a dual approach—empirical observations and theoretical limits—in understanding small, dense subgraphs within vast social networks, illuminating both "social" and "graph" properties that define these structures.

Subgraph Frequencies: Mapping the Empirical and Extremal Geography of Large Graph Collections (1304.1548v2)

Summary

Subgraph Frequencies: A Detailed Analysis of Social Graph Structures

Methodology and Key Findings

Implications and Future Developments

Related Papers