Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Dense Subgraph Maintenance under Streaming Edge Weight Updates for Real-time Story Identification (1203.0060v1)

Published 1 Mar 2012 in cs.DB

Abstract: Recent years have witnessed an unprecedented proliferation of social media. People around the globe author, every day, millions of blog posts, social network status updates, etc. This rich stream of information can be used to identify, on an ongoing basis, emerging stories, and events that capture popular attention. Stories can be identified via groups of tightly-coupled real-world entities, namely the people, locations, products, etc., that are involved in the story. The sheer scale, and rapid evolution of the data involved necessitate highly efficient techniques for identifying important stories at every point of time. The main challenge in real-time story identification is the maintenance of dense subgraphs (corresponding to groups of tightly-coupled entities) under streaming edge weight updates (resulting from a stream of user-generated content). This is the first work to study the efficient maintenance of dense subgraphs under such streaming edge weight updates. For a wide range of definitions of density, we derive theoretical results regarding the magnitude of change that a single edge weight update can cause. Based on these, we propose a novel algorithm, DYNDENS, which outperforms adaptations of existing techniques to this setting, and yields meaningful results. Our approach is validated by a thorough experimental evaluation on large-scale real and synthetic datasets.

Citations (196)

Summary

  • The paper introduces DYNDENS, an efficient algorithm designed to incrementally maintain dense subgraphs under streaming edge weight updates.
  • DYNDENS employs a novel approach to quantify the magnitude of change caused by edge weight updates, enabling efficient incremental computation enhanced by heuristics.
  • Empirical evaluation demonstrates DYNDENS's effectiveness in real-time story identification from large datasets, showing significant potential for social media monitoring.

Dense Subgraph Maintenance for Story Identification

The paper, "Dense Subgraph Maintenance under Streaming Edge Weight Updates for Real-time Story Identification," addresses the challenge of identifying emerging stories in real-time by maintaining dense subgraphs under dynamic edge weight updates. This work is particularly timely due to the exponential growth of user-generated content on social media platforms, which provides a continuous stream of data ripe for analysis and real-time story identification.

The primary contribution of this research is the development of an efficient algorithm, DYNDENS, which incrementally maintains dense subgraphs without the need to recompute them from scratch as edge weights change in real-time. DYNDENS employs a novel approach to quantify the magnitude of change a single edge weight update can cause in the density of subgraphs. This allows for the efficient incremental updating of subgraphs, making it superior to existing methods adapted for this problem domain. The algorithm's performance is further enhanced by heuristics that optimize processing time and memory usage.

DYNDENS operates on a theoretical basis that includes a formalized definition of subgraph density and a parameterized threshold for density, enabling it to handle a broad range of definitions of graph density. The paper presents a thorough theoretical analysis underpinning DYNDENS, including proofs that guarantee all dense subgraphs can be identified through a limited number of exploration iterations following an edge weight update. The index structure used by DYNDENS minimizes redundant information storage, thus enhancing performance. Additionally, advanced techniques such as implicit representation of overly dense subgraphs (IMPLICITTOODENSE) further reduce computational costs, showing significant performance benefits in practice.

The empirical evaluation showcases DYNDENS's ability to efficiently process large datasets across various operating parameters, validating its applicability to the challenge of ENGAGEMENT (dENse subGrAph maintenance for edGE-weight update streaMs under sizE constraiNTs). Comparisons with other methods, such as STIX for maximal clique maintenance and GRASP for quasi-clique identification, reveal that while DYNDENS offers high recall and efficiency in maintaining dense subgraphs for real-time story identification, these alternative methods are suited to their distinct goals.

From a qualitative perspective, DYNDENS demonstrates effectiveness in identifying real-time stories from social media data, capturing diverse and significant events such as the U.S. military strike resulting in Osama Bin Laden's death. This evidence shows DYNDENS's potential impact on real-time monitoring and analysis of social media, with implications extending to fields such as community detection and dynamic network analysis.

Future directions for the work could include adapting the DYNDENS algorithm for directed graphs, which are common in social networks, and exploring dynamic adjustments to the density threshold to better accommodate real-time changes in data patterns. Additionally, integrating DYNDENS into a framework for observing online communities could further enhance our understanding of social dynamics and influence detection.

In conclusion, this paper provides substantial advancements in real-time story identification by maintaining dense subgraphs under streaming updates, offering theoretical insights and practical efficiency crucial for keeping pace with the rapid evolution of data in social media landscapes.