Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
173 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SocioXplorer: Computational Social Science Tool

Updated 1 July 2025
  • SocioXplorer is an interactive computational social science tool performing AI-enhanced topic and network analysis on large-scale Twitter and YouTube data.
  • Integrating AI, NLP, and SNA, SocioXplorer offers modules for sentiment, topic, and network analysis with interlinked visualizations on Twitter and YouTube data.
  • SocioXplorer supports live/batch data updates for longitudinal analysis, scales to millions of records, and is open-source, facilitating research on evolving online dynamics and communities.

SocioXplorer is an interactive computational social science tool designed for topic and network analysis on large-scale social data from Twitter (X) and YouTube. It advances prior systems by integrating artificial intelligence, natural language processing, and social network analysis, supporting both archival and live datasets with batch processing and rich, interlinked analytics. SocioXplorer is released under the Apache 2.0 license and is suitable for multi-modal exploration of evolving online conversations and communities, with demonstrable scalability to millions of social media records.

1. Platform Integration and Data Handling

SocioXplorer natively ingests and processes datasets from both Twitter (X) and YouTube, supporting unified workflows for these platforms. On Twitter, analyses are centered on tweet objects and user interactions. On YouTube, the unit of analysis is the comment and its reply structure—modeled analogously to the tweet/retweet paradigm. Unlike its predecessor TwiXplorer, SocioXplorer introduces robust batch processing functionality: it can incrementally incorporate live data updates, thus enabling dynamic, near-real-time analysis of social phenomena as they unfold.

The tool can scale to large datasets, handling up to 45 million tweets and over 4 million YouTube comments. GPU acceleration is employed for intensive AI/NLP tasks, supporting timely preprocessing and feature extraction.

2. Analytical Modules and User Interface

SocioXplorer provides a suite of analytical modules accessible via an interactive user interface:

  • Timeline Analysis: Visualization of data volume and user activity over time.
  • Sentiment Analysis: Automated affect classification of posts/comments using NLP methodology.
  • Language and Geo-Analytics: Detection and filtering by language; geographic mapping is available for Twitter data where location is accessible.
  • Top Content Extraction: Identification and display of salient posts, URLs, and images.
  • Word Cloud Visualization: Frequency-based summary of key tokens.
  • Topic Discovery and Semantic Clustering: Posts are embedded into a semantic space (using techniques consistent with transformer-based embeddings, though not explicitly detailed) and clustered into conversation topics. The system also visualizes major detected claims within topical clusters.
  • Social Network Analysis: Construction and visualization of user interaction networks (retweet/reply on Twitter; comment/reply/mention on YouTube), community detection, and layout optimization.

All analytical modules are interlinked, supporting cross-dimension filtering (by topic, community, geography, language, or sentiment). The user interface supports labeling of communities and topics to facilitate collaborative annotation and interpretability.

3. Algorithms and Mathematical Foundations

Sentiment and Topic Modeling

Sentiment analysis is ML-based and can be expressed as:

Sentiment(xi)=MLClassify(xi)\text{Sentiment}(x_i) = \text{MLClassify}(x_i)

where xix_i is a post or comment.

For topic discovery, posts are embedded:

vi=Embed(xi)\vec{v}_i = \text{Embed}(x_i)

Clustering is performed via an objective function:

minimizei=1nviμzi2\text{minimize} \sum_{i=1}^n \left\| \vec{v}_i - \vec{\mu}_{z_i} \right\|^2

where μzi\vec{\mu}_{z_i} is the centroid of topic cluster ziz_i for post ii.

Social Network Analysis

SocioXplorer constructs directed graphs where nodes are users/channels and edges encode relationships (e.g., retweet, reply, or mention). On YouTube, edges are drawn from top-level comments to channel, and from replies to the author of the comment being replied to (or the mentioned user).

Community detection is via the Louvain algorithm, maximizing modularity:

Q=12mi,j[Aijkikj2m]δ(ci,cj)Q = \frac{1}{2m} \sum_{i,j} \left[ A_{ij} - \frac{k_i k_j}{2m} \right] \delta(c_i, c_j)

where AijA_{ij} is the adjacency matrix, kik_i the degree of node ii, cic_i the community, mm the number of edges, and δ\delta is the Kronecker delta.

Network visualization uses the Force Atlas 2 layout, with batch processing preserving user positions for longitudinal studies. After each update, new communities are mapped to prior ones by overlap.

4. Batch Processing and Longitudinal Analysis

SocioXplorer enables incremental batch updates: as new data streams in (e.g., additional YouTube comments on an ongoing event), user positions and prior structure are maintained, with new communities and users integrated via re-optimization. Community matching relies on overlap with prior batch communities, supporting the tracking of community growth, splits, and user switching over time.

This temporal updating framework equips researchers to paper dynamic phenomena such as evolving debates, the spread of activism, or the emergence of new influencer groups.

5. Applications and Case Studies

SocioXplorer has been utilized in diverse social media research contexts:

  • Event Analysis: For the 2022 FIFA World Cup, analysts used SocioXplorer to map topic and network structure in over 1 million tweets, revealing distinct fan communities and geographical patterns (e.g., K-pop fans in Japan, UK football clusters).
  • YouTube Network Evolution: In climate communication research, SocioXplorer handled 4 million YouTube comments on climate change, illustrating community growth, switching, and splitting during the progression of online campaigns.
  • Misinformation and Activism Studies: The YouTube analytics enable examination of radicalization mechanisms, activist coordination, and the longitudinal impact of campaign events on community structure.
  • Live Incident Monitoring: With its ability to process live data updates, SocioXplorer allows researchers to monitor and analyze information dynamics, emergent communities, and topic shifts as real-world events unfold.

6. Comparative Improvements and Technical Significance

Relative to TwiXplorer, SocioXplorer introduces several new features:

  • Cross-Platform Analytics: Inclusion of YouTube addresses the need for multi-platform analysis—critical as social debates and misinformation now cross platform boundaries.
  • Live and Batch Updates: Supports continuously updating analyses, necessary for tracking rapidly evolving phenomena.
  • Interlinked Multi-Modal Visualizations: Enables a holistic view, connecting social topology to content, sentiment, geography, and language.
  • User-Driven Labeling and Annotation: Facilitates mixed-methods research workflows, combining computational pattern discovery with context-rich human annotation.

SocioXplorer’s architecture and methodological design directly support comprehensive, scalable, and iterative social media exploration and are intended to underpin both quantitative and qualitative social research at web scale.

7. Availability and Open Science

SocioXplorer is released under the Apache 2.0 license with source code accessible at https://github.com/smash-edin/socioxplorer. This ensures the tool can be freely adopted, extended, and reproduced in computational social science workflows.


SocioXplorer represents a substantial methodological and practical advance for the computational social science community. By integrating cross-platform, live/batch data analytics, AI-driven topic and sentiment modeling, and robust network analysis—within an open, interactive environment—it empowers research on the structure, dynamics, and content of large-scale social data, facilitating deeper understanding of online conversation and community evolution.